-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alveo u200 "Packet Length Mismatch" during DPDK pktgen and testpmd application execution #16
Comments
Hi @attdone, Can you say what version of Vivado that you used to build the open-nic-shell, and which version of the QDMA IP was generated? Best regards, |
Hi @cneely-amd, Vivado : v2021.2 (64-bit) Below are the cpu configurations of PowerEdge R520. |
Hi @attdone, One first guess is that your machine might not have good performance because the processor architecture at first glance appears to be old / from over 10 years ago and appears to be a single processor containing only 6 physical cores. The example machine configurations that @aneesullah and I were discussing on that open-nic-dpdk issue 2 above your comment at (Xilinx/open-nic-dpdk#2 (comment)) page contained, e.g., 16 or 32 cores on much more recent processor architectures. The pktgen-dpdk command example within the open-nic-dpdk instructions includes a bunch of parameters, and some of these relate to the number of logical cores and mapping.
Another suggestion, @aneesullah's message within that issue 2 post for open-nic-dpdk, described initial performance at 10 Gbps, she discovered that this improved to the full 100 Gbps after disabling NUMA / making sure NUMA nodes are not enabled in the BIOS, for one of her machines (I know though that you have a different architecture). One more suggestion: check the width (number of lanes) that are being provided to the card, for example, run: sudo lspci -d 10ee: -vv … And check that the "Width" for the card is "x16". Also, to get more info on your setup, what versions of: Linux, DPDK, pktgen-dpdk, and AMD-Xilinx QDMA drivers are you trying in your setup? Best regards, |
Hi @cneely-amd, I am using Rhel8 OS , DPDK v20.11.0, Pktgen-DPDK v21.03.0, DMA IP drivers commit id 7859957 (has QDMA-DPDK v2020.2.1) I had performed
Sharing the CPU Layout,
2 ) Disable NUMA under grub.
Still there is no increment in throughput; even the packet length mismatch occurs. 3 ) The value of Width under PCIe is x8 instead of x16.
|
Hi @attdone , Also, I also want to check was your open-nic-shell bitfile meeting timing, too? Hopefully something like the following, from within Vivado? Best regards, |
Hi @cneely-amd, |
Hi @attdone , Thanks. I also want to ask if you are using an unmodified version of the open-nic-shell (like git status and git diff report no changes)? (The reason that I'm asking is because I haven't experienced Packet length mismatch with DPDK before, but I could imagine something like that possibly occurring if the TUSER_MTY signal was not correct for some reason for the packets leaving box 250 and entering the QDMA subsystem in terms of thinking about a possible hardware related explanation) I have U250s and use ubuntu in my test setup, however, what you described of your test setup seems reasonable. Also why are you using a newer version of pktgen-dpdk than your DPDK? (in writing up the instructions I had intended for matching versions of DPDK and pktgen-dpdk but I don't know if changing would make a difference, trying to think about differences on the software side) Best regards, |
Hi @cneely-amd,
At first, I conducted tests using Pktgen v20.11.3. Subsequently, I came across two GitHub issues (Xilinx/open-nic-dpdk#2 and Xilinx/open-nic-dpdk#3) which highlighted the use of a higher version of the Pktgen application. Regardless of whether it was Pktgen v20.11.3 or the higher version (v21.03.0) used in the GitHub issues, I consistently encountered the 'Packet Length Mismatch' error. I had enabled "RTE_LIBRTE_QDMA_DEBUG_DRIVER" in dpdk-20.11/config/rte_config.h file to debug on with Packet Length Mismatch error.
|
Hi @attdone, I was attempting today to see if I could quickly create a similar test environment to see whether I could reproduce the issue you were having. I installed RedHat v8.9 on an old computer, and I built dpdk and pktgen for the QDMA. However, I admit that I have very limited development experience with RedHat. I also realize that there are some small gaps in trying to directly translate over the steps for Ubuntu from the current open-nic-dpdk instructions, for using RedHat. I'm at the point right now, when I attempt to load pktgen, it is complaining about not finding the librte_timer.so.21: [cneely@localhost ~]$ sudo /home/cneely/pktgen-dpdk-pktgen-20.11.3/usr/local/bin/pktgen -a 01:00.0 -a 01:00.1 -d librte_net_qdma.so -l 1-7 -n 4 -a 00:01.0 -- -m [3:4].0 -m [5:6].1 /home/cneely/pktgen-dpdk-pktgen-20.11.3/usr/local/bin/pktgen: error while loading shared libraries: librte_timer.so.21: cannot open shared object file: No such file or directory There seems to be some basic difference here because I didn't encounter this sort of issue with Ubuntu. I wanted to ask you if you encountered this same issue along the way, and if so, how did you get past it? I'm hoping that you might know this part already. Did you have to add anything to your /etc/ld.so.conf.d/ for loading additional libraries? Best regards, |
Hi @cneely-amd, The file contents can be..
Execute ldconfig and recompile Pktgen application
|
Hi @attdone, Thank you for the advice, it helped me tremendously. I'm able to run pktgen on this old test machine with RedHat v8.9. The initial test seems to be working. [cneely@localhost ~]$ lscpu This is with PCIe width: x16 This is what I'm getting from in terms of send and receive stats: [cneely@localhost ~]$ uname -a less /proc/cmdline |
@attdone #!/bin/bash sudo setpci -s 01:00.0 COMMAND=0x02; #setup the QDMA registers #init CMAC0, with serdes loopback (third command) #init CMAC1, with serdes loopback (third command) #read the rx_status register: expecting 0x03 on second readback if working |
Hi @cneely-amd, When I enable Could you please enable RTE_LIBRTE_QDMA_DEBUG_DRIVER, to confirm whether you are receiving Packet Length Mismatch? |
Hi @attdone, I'm about to pause for today, and I'll try testing the debug flag tomorrow. One quick suggestion is that if you see the rapid change to 0 after a couple of seconds, try doing a reboot, and run the test again. (I don't know what causes that but I know that rebooting helps.) I should also say that within pktgen, I'm specifically running:
|
Hi @cneely-amd, For an updation, I had rebooted and restarted the Pktgen application multiple times still the transmission rapidly change to 0. When I enable RTE_LIBRTE_QDMA_DEBUG_DRIVER macro, I could see the transmission sets to 0 when the QDMA driver prints "Packet Length Mismatch" during Pktgen application execution. |
Hi @attdone I tried with enabling My screenshot has one stray PMD message now, but otherwise looks about the same: I had noticed yesterday that in one of your messages you given a path for using Vitis (rather than Vivado) to build your open-nic-shell. I wanted to ask if you installed XRT on the same machine? The reason is because I vaguely remember there is a potential for XRT to install a driver that gets loaded each boot that can interfere with OpenNIC's drivers. Like if there was another driver for XRT you might need to blacklist it, so that it doesn't get loaded at boot. On this old test machine, I only installed Vivado_lab edition for the sake of loading the bitfile over JTAG. Also, I want to confirm whether you are trying the CMAC serdes loopback, too? Best regards, |
Hi @cneely-amd, Regarding Serdes Loopback, 'yes' I am enabling those CMAC's. |
Hi @attdone, I checked with some others and they said that JTAG programming from a second system is typical and is recommended by some. That shouldn't cause any issues. Best regards, |
Hi @cneely-amd, The PCIe width is x8 and speed 8GT/s in PowerEdge R520. Does the width has any relation with "Packet Length Mismatch" error? |
Hi @attdone , I would have expected that experimentally modifying the max_pkt_size to 9600 not to work properly because the QDMA IP has a limitation of physical page size data units (4kB) for transmitting and receiving. So OpenNIC shell's top level (https://github.com/Xilinx/open-nic-shell/blob/main/src/open_nic_shell.sv) contains a setting to limit packet sizes to 1518 bytes. I personally haven't encountered the Packet Length Mismatch error before, and I'm not sure how to best advise for that. You could maybe experiment with removing some other PCI devices in your test system, e.g. using integrated graphics, to free up some lanes to see if it helps, but I don't know if that could be a cause. Do you have only a single Alveo card, or do you have access to any other Alveo cards that could be tried? You had mentioned using a second test system in an earlier message, and I didn't know whether that was with the same card or another card. --Chris |
Hi, |
Hi @cneely-amd, The other PCIe devices are not able to detach, so is there any other method to resolve "Packet Length Mismatch error"? Is it possible to share the PCIe Capabilities obtained at your PC (sudo lspci -s 08:00.0 -vvv ) ? |
Hi Team
I am working on OpenNIC design for au200, and I have diligently followed the steps provided in https://github.com/Xilinx/open-nic-dpdk.
However, I am encountering an issue while executing the pktgen application, specifically a "Packet Length Mismatch" error. This discrepancy is resulting in lower-than-expected Tx/Rx values and packet drops. Furthermore, the RX functionality appears to stop altogether, resulting in a throughput of only 9Gbps, significantly lower than the target of 100Gbps.
Here is the command I am using to run pktgen:
./pktgen-dpdk-pktgen-20.11.3/usr/local/bin/pktgen -a 08:00.0 -a 08:00.1 -d librte_net_qdma.so -l 4-10 -n 4 -a 03:00.0 -a 03:00.0 -- -m [6:7].0 -m [8:9].1
Error Message
"Timeout on request to dma internal csr register", "Packet length mismatch error" and "Detected Fatal length mismatch"
Error generated,
C2H_STAT_S_AXIS_C2H_ACCEPTED 0xa88 0x110cb 69835
C2H_STAT_S_AXIS_WRB_ACCEPTED 0xa8c 0x10ed0 69328
C2H_STAT_DESC_RSP_PKT_ACCEPTED 0xa90 0x10ed1 69329
C2H_STAT_AXIS_PKG_CMP 0xa94 0x10ed1 69329
C2H_STAT_DBG_DMA_ENG_0 0xb1c 0x48e00304 1222640388
C2H_STAT_DBG_DMA_ENG_1 0xb20 0xe7e40000 -404488192
C2H_STAT_DBG_DMA_ENG_2 0xb24 0x80000000 -2147483648
C2H_STAT_DBG_DMA_ENG_3 0xb28 0x80020813 -2147350509
C2H_STAT_DESC_RSP_DROP_ACCEPTED 0xb10 0x1c8e1 116961
C2H_STAT_DESC_RSP_ERR_ACCEPTED 0xb14 0 0
eqdma_hw_error_process detected Fatal Len mismatch error
I have raised queries in git under below links, no reply yet
Xilinx/open-nic-dpdk#2 (comment)
Please let me know how to resolve this issue.
The text was updated successfully, but these errors were encountered: