-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dpdk-ans is slower than regular Linux epoll with 100Gbit/s #16
Comments
We could also discuss this on Slack if you could send me an invite. My email is in my profile. |
|
TCP or UDP?It's about TCP performance, I haven't gotten UDP working yet. Test stepsThere is a client and a server machine (these can also be the same machine if two interfaces are connected to eachother). Most of the steps are the same for both machines if they are different I will explain.
git clone https://github.com/JelteF/iperf
cd iperf/src
git checkout ans
make
sudo build/iperf3 -s --bind <ip-of-the-interface>
sudo build/iperf3 -c <ip-of-the-server>
Do the same test with regular epoll
git clone https://github.com/JelteF/iperf iperf-epoll
cd iperf-epoll
git checkout epoll
./configure
make
src/iperf3 -s --bind <ip-of-the-interface>
src/iperf3 -c <ip-of-the-server> ANS startup logsOnly port one (2.2.2.1) is actually connected. On the root@ps1 /home/jelte/dpdk-ans ans/build/ans -c 0x3 -n3 -- -p=0x1 --config="(0,0,0),(1,0,1)"
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 0 on socket 0
EAL: Detected lcore 5 as core 1 on socket 0
EAL: Detected lcore 6 as core 2 on socket 0
EAL: Detected lcore 7 as core 3 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 8 lcore(s)
EAL: Setting up physically contiguous memory...
EAL: Ask a virtual area of 0x7f000000 bytes
EAL: Virtual area found at 0x7f2d00400000 (size = 0x7f000000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7f2cffe00000 (size = 0x400000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7f2cff800000 (size = 0x400000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f2cff400000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f2cff000000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f2cfec00000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f2cfe800000 (size = 0x200000)
EAL: Requesting 1024 pages of size 2MB from socket 0
EAL: TSC frequency is ~3700010 KHz
EAL: Master lcore 0 is ready (tid=7fa408c0;cpuset=[0])
EAL: lcore 1 is ready (tid=fd9f2700;cpuset=[1])
EAL: PCI device 0000:02:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1013 librte_pmd_mlx5
PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_0" (VF: false, MPS: false)
PMD: librte_pmd_mlx5: 1 port(s) detected
PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:c0:88:aa
EAL: PCI device 0000:02:00.1 on NUMA socket 0
EAL: probe driver: 15b3:1013 librte_pmd_mlx5
PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_1" (VF: false, MPS: false)
PMD: librte_pmd_mlx5: 1 port(s) detected
PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:c0:88:ab
EAL: PCI device 0000:05:00.0 on NUMA socket 0
EAL: probe driver: 8086:1521 rte_igb_pmd
EAL: Not managed by a supported kernel driver, skipped
EAL: PCI device 0000:05:00.1 on NUMA socket 0
EAL: probe driver: 8086:1521 rte_igb_pmd
EAL: Not managed by a supported kernel driver, skipped
param nb 2 ports 2
port id 0
port id 1
Start to Init port
port 0:
port name librte_pmd_mlx5:
max_rx_queues 65535: max_tx_queues:65535
rx_offload_capa 14: tx_offload_capa:15
Creating queues: rx queue number=1 tx queue number=2...
PMD: librte_pmd_mlx5: 0xdebdc0: TX queues number update: 0 -> 2
PMD: librte_pmd_mlx5: 0xdebdc0: RX queues number update: 0 -> 1
MAC Address:E4:1D:2D:C0:88:AA
Deault-- tx pthresh:0, tx hthresh:0, tx wthresh:0, txq_flags:0x0
lcore id:0, tx queue id:0, socket id:0
Conf-- tx pthresh:36, tx hthresh:0, tx wthresh:0, txq_flags:0xfffff1ff
Deault-- tx pthresh:0, tx hthresh:0, tx wthresh:0, txq_flags:0x0
lcore id:1, tx queue id:1, socket id:0
Conf-- tx pthresh:36, tx hthresh:0, tx wthresh:0, txq_flags:0xfffff1ff
port 1:
port name librte_pmd_mlx5:
max_rx_queues 65535: max_tx_queues:65535
rx_offload_capa 14: tx_offload_capa:15
Creating queues: rx queue number=1 tx queue number=2...
PMD: librte_pmd_mlx5: 0xdefe08: TX queues number update: 0 -> 2
PMD: librte_pmd_mlx5: 0xdefe08: RX queues number update: 0 -> 1
MAC Address:E4:1D:2D:C0:88:AB
Deault-- tx pthresh:0, tx hthresh:0, tx wthresh:0, txq_flags:0x0
lcore id:0, tx queue id:0, socket id:0
Conf-- tx pthresh:36, tx hthresh:0, tx wthresh:0, txq_flags:0xfffff1ff
Deault-- tx pthresh:0, tx hthresh:0, tx wthresh:0, txq_flags:0x0
lcore id:1, tx queue id:1, socket id:0
Conf-- tx pthresh:36, tx hthresh:0, tx wthresh:0, txq_flags:0xfffff1ff
Allocated mbuf pool on socket 0, mbuf number: 16384
Initializing rx queues on lcore 0 ...
Default-- rx pthresh:0, rx hthresh:0, rx wthresh:0
port id:0, rx queue id: 0, socket id:0
Conf-- rx pthresh:8, rx hthresh:8, rx wthresh:4
Initializing rx queues on lcore 1 ...
Default-- rx pthresh:0, rx hthresh:0, rx wthresh:0
port id:1, rx queue id: 0, socket id:0
Conf-- rx pthresh:8, rx hthresh:8, rx wthresh:4
core mask: 3, sockets number:1, lcore number:2
start to init ans
USER8: LCORE[0] lcore mask 0x3
USER8: LCORE[0] lcore id 0 is enable
USER8: LCORE[0] lcore id 1 is enable
USER8: LCORE[0] lcore number 2
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER8: LCORE[0] UDP layer init successfully, Use memory:4194304 bytes
USER8: LCORE[0] TCP hash table init successfully, tcp pcb size 448 total size 29360128
USER8: LCORE[0] TCP hash table init successfully, tcp pcb size 448 total size 29360128
USER8: LCORE[0] so shm memory 16777216 bytes, so number 131072, sock shm size 128 bytes
USER8: LCORE[0] Sock init successfully, allocated of 41943040 bytes
add eth0 device
USER8: LCORE[0] Interface eth0 if_capabilities: 0xf
add IP 1020202 on device eth0
add eth1 device
USER8: LCORE[0] Interface eth1 if_capabilities: 0xf
add IP 1020203 on device eth1
Show interface
eth0 HWaddr e4:1d:2d:c0:88:aa
inet addr:2.2.2.1
inet addr:255.255.255.0
eth1 HWaddr e4:1d:2d:c0:88:ab
inet addr:3.2.2.1
inet addr:255.255.255.0
add static route
Destination Gateway Netmask Flags Iface
2.2.2.0 * 255.255.255.0 U C 0
2.2.2.5 * 255.255.255.255 U H L 0
3.2.2.0 * 255.255.255.0 U C 1
3.3.3.0 2.2.2.5 255.255.255.0 U G 0
USER8: LCORE[-1] ANS mgmt thread startup
Checking link status done
Port 0 Link Up - speed 100000 Mbps - full-duplex
Port 1 Link Up - speed 0 Mbps - full-duplex
USER8: main loop on lcore 1
USER8: -- lcoreid=1 portid=1 rxqueueid=0
nb ports 2 hz: 3700010272
USER8: main loop on lcore 0
USER8: -- lcoreid=0 portid=0 rxqueueid=0
nb ports 2 hz: 3700010272 Startup logs of iperf ans: jelte@ps1 ~/iperf/src sudo build/iperf3 -s --bind 2.2.2.1
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 0 on socket 0
EAL: Detected lcore 5 as core 1 on socket 0
EAL: Detected lcore 6 as core 2 on socket 0
EAL: Detected lcore 7 as core 3 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 8 lcore(s)
EAL: Setting up physically contiguous memory...
EAL: Analysing 1024 files
EAL: Mapped segment 0 of size 0x7f000000
EAL: Mapped segment 1 of size 0x400000
EAL: Mapped segment 2 of size 0x400000
EAL: Mapped segment 3 of size 0x200000
EAL: Mapped segment 4 of size 0x200000
EAL: Mapped segment 5 of size 0x200000
EAL: Mapped segment 6 of size 0x200000
EAL: memzone_reserve_aligned_thread_unsafe(): memzone <RG_MP_log_history> already exists
RING: Cannot reserve memory
EAL: TSC frequency is ~3700000 KHz
EAL: Master lcore 0 is ready (tid=f7fed8a0;cpuset=[0])
USER8: LCORE[-1] anssock any lcore id 0xffffffff
USER8: LCORE[2] anssock app id: 5380
USER8: LCORE[2] anssock app name: iperf3
USER8: LCORE[2] anssock app bind lcoreId: 0
USER8: LCORE[2] anssock app bind queueId: 0
USER8: LCORE[2] anssock app lcoreId: 2
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
|
|
3/4. I will try to disable TSO in linux and share my results. |
I sent a invite to you in slack.com. |
Thanks, I accepted it and sent a question there already. |
I have machines with 8 lcores. I have set # off
ethtool -K eth2 rxvlan off txvlan off gso off tso off rx off tx off sg off rxhash off gro off rx-vlan-filter off lro off
# on
ethtool -K eth2 rxvlan on txvlan on gso on tso on rx on tx on sg on rxhash on gro on rx-vlan-filter on lro on ANS was started using this command: ans/build/ans -c 0x2 -n1 -- -p=0x1 --config="(0,0,1))" Tests with offloading offWith all types of offloading disabled I am able to get the following speeds for Linux:
And these speeds for ANS:
Tests with offloading onLinux: jelte@ps2 ~/iperf-epoll src/iperf3 -c 2.2.2.3 -A 2,2 -t 10
Connecting to host 2.2.2.3, port 5201
[ 5] local 2.2.2.4 port 38140 connected to 2.2.2.3 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 5] 0.00-1.00 sec 3.21 GBytes 27.6 Gbits/sec 60 822 KBytes
[ 5] 1.00-2.00 sec 3.21 GBytes 27.5 Gbits/sec 13 932 KBytes
[ 5] 2.00-3.00 sec 3.21 GBytes 27.6 Gbits/sec 16 1.00 MBytes
[ 5] 3.00-4.00 sec 3.21 GBytes 27.6 Gbits/sec 13 1.09 MBytes
[ 5] 4.00-5.00 sec 3.21 GBytes 27.5 Gbits/sec 16 1.16 MBytes
[ 5] 5.00-6.00 sec 3.21 GBytes 27.5 Gbits/sec 21 1.25 MBytes
[ 5] 6.00-7.00 sec 3.21 GBytes 27.5 Gbits/sec 15 1.32 MBytes
[ 5] 7.00-8.00 sec 3.21 GBytes 27.5 Gbits/sec 30 748 KBytes
[ 5] 8.00-9.00 sec 3.21 GBytes 27.5 Gbits/sec 18 846 KBytes ANS:
ResultsAs you can see in both tests ANS has about the same throughput and in both cases it is lower than for Linux. Although when offloading is enabled this difference is a lot larger. |
When jumbo frames are enabled (MTU=9000) the normal Linux kernel has even more performance and it gets around ~43Gbit/s:
|
Thanks for your detail testing. |
It opens two connections. One for command communication and one for the
|
if TCP Window Scale Option is enable in linux kernel? if enable, the windows is large, the speed will faster. |
Yes it was enabled. Does ANS not support that?
|
ANS still don't support window scale, and any TSO now, |
When I let my ANS iperf open multiple concurrent connections to the server it becomes much quicker the speed maxes out at three:
Somehow a single connection even becomes faster than before, from about 5Gbit to 8Gbit. When I open 4 connections it will lower them to 6Gbit/per connection still totalling to 24.5. At more than 4 connections it will start to behave strangely and have lots of differences in the speed
Also, one question how do you enable jumboframes? Is it still the method described in this #9 (comment)? |
it seems ANS with one lcore could only handle about 24G tcp data. |
In ans_main.c it should look like this right: int mtu = 9000;
ans_intf_set_mtu((caddr_t)ifname, &mtu); And for the output, the Cwnd column is congestion window size and Retr column is number of retransmits. The method of gathering this data does not seem to work for ANS though. |
Yes, you can set the mtu like that. by the way. |
When setting the MTU with this command:
And with the code in my previous comment. I get a Bus error like this:
|
With all the offloading off and MTU size of 1500 and TCP window scaling off I get this Linux performance:
But when using multiple connections I still get around 8Gbit/s like before with a single one. |
ok, i will test the jumbo frame case. it seems the tcp speed also depend on the cwnd. |
Ok thank you. Maybe it will be much faster then. |
Have you had time to test the jumbo frame case? |
No, haven't ENV to test it. |
Ok too bad. Do you have any idea why it could be that the per stream speed increases when multiple streams are used? |
You may change below macro in ans_main.h. check if increases the speed of one stream. If yes, maybe this is the root cause. |
Oh man, thanks for this. Changing the values there really changes the single stream performance a lot. |
However, I now do get lower performance for multiple streams. |
When setting MAX_TX_BURST to 1 I get the instability behaviour again:
|
MAX_TX_BURST means how many packets buffer before send them to NIC. so if your NIC is high speed NIC, it is better to set it to 1, and the packet latency is lower, tcp speed stream is high. |
Shall tcpdump the packets to analysis the instability behaviour, maybe the packets are lost. |
If ANS print any logs? |
ANS does not print any logs. |
ANS's error or info logs can be saw on the screens and syslog. |
For instability behaviour, it is related to tcp timestamp feature, now has fixed it, it also fixed EPOLLOUT issue which has been reported by you before. |
I've been converting iperf3 to use DPDK instead of the Linux networking stack.
However when running it normally it can get ~40Gbit/s en when using the dpdk-ans version it can only achieve ~4Gbit/s.
The source code which uses regular epoll can be found here: https://github.com/JelteF/iperf/tree/epoll
And the code for the ANS version here: https://github.com/JelteF/iperf/tree/ans
Do you have any suggestions on how to improve the performance?
The text was updated successfully, but these errors were encountered: