Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to transmit at full 491.52MSPS (or 245.76MSPS!) rate with an USRP x410 #736

Open
Aang23 opened this issue Mar 25, 2024 · 7 comments

Comments

@Aang23
Copy link

Aang23 commented Mar 25, 2024

Hello!

Lately I have been doing some tests and writing code with an USRP x410 I have access to for the moment (via https://twitter.com/SDR_Radio).

Getting it working after flashing the correct image was not too complicated, except some documentation was lacking (specifically, the MTU configuration is mentioned in the x3xx documentation but entirely missing on the x4xx side of things).

In receive, after flashing the CG_400 image and doing all the required network configuration, I am able to stream the full 471.52MSPS with no issues at all using SatDump. The benchmark_rate example also shows no drops, overflows etc in cf32 host format and sc16 wire format.

However, when I got to attempting to transmit, nothing I was able to write was able to stream as the expected full rate of 471.52MSPS without major underruns. This occurs even with a simple while loop sending large buffers or similar, with UHD seemingly saturating that thread already. It's visible enough the transmit (red) LED on the USRP visibly blinks constantly. This behavior is also observed using the provided benchmark_rate example. (I have also attempted modifying various buffer sizes and network configuration to no avail.)

RX Bench :

[00:00:03.964700876] Setting device timestamp to 0...
[00:00:03.968637931] Testing receive rate 491.520000 Msps on 1 channels
[00:00:13.969628985] Benchmark complete.


Benchmark rate summary:
  Num received samples:     4915270896
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  0
  Num sequence errors (Tx): 0
  Num sequence errors (Rx): 0
  Num underruns detected:   0
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        0

TX Bench :

Benchmark rate summary:
  Num received samples:     0
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  4771406160
  Num sequence errors (Tx): 0
  Num sequence errors (Rx): 0
  Num underruns detected:   15192
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        0

The x410 is connected via QSFP+ (100GB) on Mellanox cards on a high-end workstation :

  • Mellanox Technologies MT27800 Family [ConnectX-5]
  • Intel i9-10980XE (36) @ 4.600GHz

I have tested raw streaming over the 100GB link and the machine is able to sustain far more than required to feed the x410 in this configuration. Considering the setup in use I would expect this to work.

@Aang23 Aang23 changed the title Unable to transmit as full 471.52MSPS rate with an USRP x410 Unable to transmit at full 471.52MSPS (or 245.76MSPS!) rate with an USRP x410 Mar 25, 2024
@Aang23
Copy link
Author

Aang23 commented Mar 25, 2024

Did the same tests with the UC_200 image attempting to stream at 245.76MSPS on the exact same system. Unfortunately that does not work at all either with the same symptoms as using the CG_400 image.

Benchmark rate summary:
  Num received samples:     0
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  2455190080
  Num sequence errors (Tx): 0
  Num sequence errors (Rx): 0
  Num underruns detected:   1734
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        0

And while it does manage to sustain 122.88MSPS a bit better, underruns are also observed along the way. There are very few in the benchmark at random points during the test, but they happen more often in any other UHD transmit example.

Benchmark rate summary:
  Num received samples:     0
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  1228853888
  Num sequence errors (Tx): 0
  Num sequence errors (Rx): 0
  Num underruns detected:   5
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        0

The same system does manage to keep a x310 at 200MSPS happy however.

@basti-schr
Copy link

basti-schr commented Mar 27, 2024

Hey, i tried this and got it to work with all tricks from the KB Guide

I also have a small script to do this after Startup:

#!/bin/bash
size=250000000
sudo sysctl -w net.core.wmem_max=$size
sudo sysctl -w net.core.rmem_max=$size
sudo sysctl -w net.core.wmem_default=$size
sudo sysctl -w net.core.rmem_default=$size

sudo ip link set dev <dev1> mtu 9000
sudo ip link set dev <dev2> mtu 9000

sudo ethtool -G <dev1> tx 4096 rx 4096
sudo ethtool -G <dev2> tx 4096 rx 4096

for ((i=0;i<$(nproc --all);i++)); do sudo cpufreq-set -c $i -r -g performance; done

What arguments do you use for the Benchmark?
You should use the priority flag. For me the full command looks like this:

$sudo ./benchmark_rate \                    
--args "type=x4xx,addr=192.168.10.2,second_addr=192.168.20.2,mgmt_addr=<IPaddr>,master_clock_rate=500e6" \
--priority "high" \
--multi_streamer \
--duration 60 \
--channels "0" \
--rx_rate 500e6 \
--rx_subdev "B:1" \
--tx_rate 500e6 \
--tx_subdev "A:0"

[...]

[00:00:17.641169153] Testing transmit rate 500.000000 Msps on 1 channels
[00:01:17.641902988] Benchmark complete.


Benchmark rate summary:
  Num received samples:     29999545568
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  29999401344
  Num sequence errors (Tx): 0
  Num sequence errors (Rx): 0
  Num underruns detected:   0
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        0


Done!

So at least the Benchmark works fine. However i have not yet achieved that high rates with other Applications like GNU Radio.

Cheers, Sebastian!

@Aang23
Copy link
Author

Aang23 commented Mar 28, 2024

Thanks @basti-schr, unfortunately this is all things I have already done (in order to get RX working at these rates), but that does not help on my end.

I don't expect GNU Radio to handle it at all, but the benchmark not passing and UHD saturating a single CPU core is a bit suspicious to me in comparison to the behavior seen with other USRPs.

@basti-schr
Copy link

basti-schr commented Mar 28, 2024

Okay, another thing I forgot to mention is to install DPDK. Especially the last point from this article helped a lot by setting the RT_RUNTIME_SHARE feature flag. This made a big difference for me and brought the underruns without active DPDK to <5 and with DPDK to 0.

@Aang23
Copy link
Author

Aang23 commented Apr 2, 2024

Thanks for the information again @basti-schr. Clearly this should be a bit more obvious than it is now in the documentation! :-)

I have set DPDK up as instructed (v21.11.4). Testing loopback via DPDK examples works as expected, but UHD still does not work at all.

/etc/uhd/uhd.conf

;When present in device args, use_dpdk indicates you want DPDK to take over the UDP transports
;The value here represents a config, so you could have another section labeled use_dpdk=myconf
;instead and swap between them
[use_dpdk=1]
;dpdk_mtu is the NIC's MTU setting
;This is separate from MPM's maximum packet size
dpdk_mtu=9000
;dpdk_driver is the -d flag for the DPDK EAL. If DPDK doesn't pick up the driver for your NIC
;automatically, you may need this argument to point it to the folder where it can find the drivers
;Note that DPDK will attempt to load _everything_ in that folder as a driver, so you may want to
;create a separate folder with symlinks to the librte_pmd_* and librte_mempool_* libraries.
;dpdk_driver=/usr/local/lib/x86_64-linux-gnu/dpdk/pmds-21.0/
;dpdk_corelist is the -l flag for the DPDK EAL. See more at the link
; https://doc.dpdk.org/guides-21.11/linux_gsg/build_sample_apps.html#running-a-sample-application
;Note if you use multiple SFP ports in a streaming application simultaneously,
;you can specify multiple cores in the core list (e.g. 0, 1, 2) and then assign
;them each to the separate SFP port/NIC.
dpdk_corelist=0,1
;dpdk_num_mbufs is the total number of packet buffers allocated
;to each direction's packet buffer pool
;This will be multiplied by the number of NICs, but NICs on the same
;CPU socket share a pool. When using Mellanox NICs, this value must be greater
;than the dpdk_num_desc value in the next section.
dpdk_num_mbufs=4096
;dpdk_mbuf_cache_size is the number of buffers to cache for a CPU
;The cache reduces the interaction with the global pool
dpdk_mbuf_cache_size=64

[dpdk_mac=b8:3f:d2:b6:ff:52]
;Using a separate dpdk_lcore value for each SFP connection/MAC entry
;can possibly result in improved streaming performance. E.g. dpdk_lcore = 2.
dpdk_lcore = 1
dpdk_ipv4 = 192.168.20.1/24
dpdk_num_desc=4096

/etc/default/grub

# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX="iommu=pt intel_iommu=on hugepages=2048"

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

Benchmark command :

sudo ./examples/benchmark_rate --tx_rate 491.52e6 --args use_dpdk=1,mgmt_addr0=10.10.10.130,addr0=192.168.20.2

This results in a RFNoc error, which I am unable to find much information about in this context.

[INFO] [UHD] linux; GNU C++ version 11.4.0; Boost_107400; DPDK_21.11; UHD_4.6.0.HEAD-0-g50fa3baa
EAL: Detected CPU lcores: 36
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 1048576 kB hugepages reported
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:04:00.0 (socket 0)
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:04:00.1 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
[00:00:00.000174] Creating the usrp device with: use_dpdk=1,mgmt_addr0=10.10.10.130,addr0=192.168.20.2...
[INFO] [MPMD] Initializing 1 device(s) in parallel with args: mgmt_addr=10.10.10.130,type=x4xx,product=x410,serial=32C3D84,name=ni-x4xx-32C3D84,fpga=CG_400,claimed=False,use_dpdk=1,mgmt_addr0=10.10.10.130,addr0=192.168.20.2
[INFO] [MPM.PeriphManager] init() called with device args `fpga=CG_400,mgmt_addr=10.10.10.130,name=ni-x4xx-32C3D84,product=x410,use_dpdk=1,clock_source=internal,time_source=internal,initializing=True'.
[ERROR] [RFNOC::GRAPH] Error during initialization of block 0/Radio#0!
[ERROR] [RFNOC::GRAPH] Caught exception while initializing graph: RfnocError: OpTimeout: Control operation timed out waiting for space in command buffer
Error: RuntimeError: Failure to create rfnoc_graph.

Unforunately though here RT_RUNTIME_SHARE does not seem to have any effect.

@basti-schr
Copy link

Okay, from this point i can only guess:

  1. I guess you have, but just to make sure: Have you installed the NVIDIA MLNX_OFED drivers for dpdk? (./mlnxofedinstall --dpdk)

Maybe that`s why for you the EAL reports IOVA mode as 'PA' but for me it`s 'VA'

  1. Are the Hugepages working? For me it looks like this:
$ grep Huge /proc/meminfo
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    2048
HugePages_Free:     2000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         4194304 kB

But this message from EAL is normal:

EAL: No free 1048576 kB hugepages reported on node 0
EAL: No available 1048576 kB hugepages reported
  1. In my config i have not set the dpdk_num_desc because i got other problems and just let them at the default setting.

/etc/uhd/uhd.conf

[use_dpdk=1]
dpdk_mtu=9000
dpdk_driver=/usr/lib/x86_64-linux-gnu/dpdk/pmds-22.0
dpdk_corelist=2,3,4
dpdk_num_mbufs=4095
dpdk_mbuf_cache_size=64
# dpdk_link_timeout=5000

[dpdk_mac=08:c0:eb:97:8c:ee]
dpdk_lcore = 3
dpdk_ipv4 = 192.168.10.1/24
# dpdk_num_desc=4096

[dpdk_mac=08:c0:eb:97:8c:ef]
dpdk_lcore = 4
dpdk_ipv4 = 192.168.20.1/24
# dpdk_num_desc=4096

Also note that i have configured the dpdk_driver.

I hope i can give you some hints, i also took some weeks to figure out the right setup.

@Aang23
Copy link
Author

Aang23 commented Apr 2, 2024

@basti-schr I ended up figuring it out before I saw your reply. My changes were very similar to yours and now, outside of heavy drops for a few seconds while buffers settle everything is functional.

Thanks a lot for your hints! However, it's honestly disappointing how badly this is documented. The high bandwidth is advertised as one of the main feature but it's taken a lot of research to get anywhere near there :-)

It would be good if NI/Ettus could add this in the main x4xx documentation. I'll probably leave this open for this purpose.

@michaelld michaelld changed the title Unable to transmit at full 471.52MSPS (or 245.76MSPS!) rate with an USRP x410 Unable to transmit at full 491.52MSPS (or 245.76MSPS!) rate with an USRP x410 Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants