Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance on debian #25

Open
paulhirst opened this issue Dec 28, 2020 · 16 comments
Open

Poor performance on debian #25

paulhirst opened this issue Dec 28, 2020 · 16 comments

Comments

@paulhirst
Copy link

Hello,

I've got a debian machine with both an AQC107 10GigE card, and also some Realtek 1GigE boards.
scp of a large file from this machine to a client has throughput of about 100Mbyte / sec on the 1GigE Realtek interfaces (ie saturating the gigabit) but only about 30 Mbyte / sec on the ACQ107. The link is up at 10GigE.

The machine is acting as a router. LRO and GRO are off (by ethtool) on all interfaces

I do notice during the scp using the atlantic card, ksoftirqd is using 100% of a CPU core. During the scp using a realtek card, ksoftirqd uses about 1% of a CPU core.

OK, I'm not expecting scp to necessarily saturate the 10GigE connection, but having it run a factor of 3 slower on the 10GigE card than a 1GigE card is disappointing to say the least, and the ksoftirqd behavior suggested to me something about the way interrupts and the driver are configured.

I've google around a lot and not found anything helpful. The driver readme notes suggest that the driver should be compiled with LRO disabled when using in a router, but also provides instructions for disabling it with ethtool. Am I correct in understanding that disabling LRO and GRO using ethtool is equivalent, or do I really need to rebuild the driver with is disabled at compile time?

Any other hints or suggestions on getting at least GigE performance out of this much appreciated. I'm running stock debian kernel and the bundles driver:

# uname -a
Linux ventus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux
# ethtool -i enp4s0
driver: atlantic
version: 2.0.3.0-kern
firmware-version: 3.1.71
expansion-rom-version: 
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
# ethtool -k enp4s0 | fgrep receive-offload
generic-receive-offload: off
large-receive-offload: off

Many thanks for any suggestions or pointers,
Paul

@cail
Copy link
Member

cail commented Dec 29, 2020

Am I correct in understanding that disabling LRO and GRO using ethtool is equivalent, or do I really need to rebuild the driver with is disabled at compile time?

Thats correct, most probably you don't need to disable these explicitly. you can do that through ethtool.

Overall, your issue is very strange. From driver perspective I can't think of nothing what could lead to 10G link be slower than 1G link.

First, you may try using the out of box driver from this repo. may be some fixes are missing in your kernel tree.

I suspect this is something about packet corruption or retransmits. You may try using iperf tcp to measure the linerate. It'll show you if retransmits are happening.

You may also inspect "netstat -s" diffs to check for some suspects.

@paulhirst
Copy link
Author

Many thanks for the helpful reply, very interesting. I did some checks and find no evidence of retransmits or retrys. I also swapped the cables and between the good 1G card and the slow 10G card and the performance on both remained the same. I tried de-rating the 10G switch port to be 1G and the Aquantia card renegotiated to 1G but still would only get about 30MByte/sec throughput.

So now it gets stranger. I was trying various things, and at some point the problem disappeared. The thing I did that seemed to fix it was to enable GRO on the Aquantia card. I read elsewhere that GRO is safe for routing, but LRO is not. It's odd though, because disabling GRO (ie reverting to how it was before) does not make the problem come back. I'm now getting about 300MByte/s throughput on ssh, which is more like what I had in mind! And I see no significant ksoftirqd load while it's doing that.

So I'm wondering now what caused it to go into the slow / high-ksortirqd state. It was in that state for several weeks at least, including one system reboot, so it doesn't seem to be a completely freak occurrence. But also I currently can't reproduce it.

I'll report back if I find anything further on this. I'm happy either way if you want to close this out or leave it open for a short while in case I can figure out what causes this.

Thanks again for the help,
Paul

@cail
Copy link
Member

cail commented Dec 31, 2020

Lets keep it, I really interested in finding the reason of that deadly perf drop. It could be related to LRO/LSO settings in the driver. Thats a tricky offload which in theory may corrupt TCP stream somewere.

@robert-sc
Copy link

Aha, I also noticed a performance drop on my AQC107 (ASUS XG-C100C) adapter, using the 2.4.10 driver on my Ubuntu 18.04 PC. Rolling back to driver version 2.4.3 (from the Marvell website) restored the performance.

I had downloaded the current master from this repository, made the "usual" change in aq_cfg.h (which I had done in 2.4.3 as well):

/* LRO */
-#define AQ_CFG_IS_LRO_DEF 1U
+#define AQ_CFG_IS_LRO_DEF 0U

built and installed... and the performance was noticeably worse.

I didn't bother to investigate further, thinking that's just a "glitch" in the developer preview and the next update would fix it. But as there has been no update since, it's probably worth investigating after all...

@cail
Copy link
Member

cail commented Jan 5, 2021

Hi robert, could you provide some numbers? what do you mean with worse?
We saw no such degradation, m.b. something related to your configuration/kernel.

@robert-sc
Copy link

Hi cail, trying to come up with numbers I remembered why I did not report this before: The symptoms are somewhat "fuzzy". I reverted from 2.4.10 to 2.4.3 when I experienced glitches during video conferencing calls (using Slack or Microsoft Teams). As these are rather vital to my home office work, I did not have time to investigate or write down any notes.

Unfortunately, now my Internet provider has issues, so it is even harder to tell whether any symptoms I am seeing are caused by the driver or my provider :-/ And my provider requires me to use their router while the issue is ongoing, so I cannot use my router with the 2.5G port...

For now, I'll try building the 2.4.7 commit and using that. If I don't see any issues with that, I'll try 2.4.10 again and see if I can isolate an issue. Or if I see issues with 2.4.7, I'll go back to 2.4.3 again. Maybe that'll allow me to tell whether there really is an issue, and if it was introduced between 2.4.3 and 2.4.7 or between 2.4.7 and 2.4.10...

@cail
Copy link
Member

cail commented Jan 11, 2021

Thanks, please let us know if you'll find out something.
Of important verification steps: if you'll find the degradation, you may try disabling various offload features: checksumming, LSO LRO offloads (via ethtool -K)

@skiminki
Copy link

Mine's not a debian box, but seems related.

Yesterday, I had a similar issue. I was transferring a large file between two machines, and I also got perf with scp at around 30 Mbytes/s. I did not see any RX/TX errors with ifconfig nor anything other exceptional in the kernel log. However, I do get these notes:

Feb 13 12:48:22 boombox kernel: TCP: lan0: Driver has suspect GRO implementation, TCP performance may be compromised.

To fix the slow mode, all I had to do was:

ethtool -K lan0 rx-vlan-offload off   # this already fixed the perf
ethtool -K lan0 rx-vlan-offload on    # switching this back kept the perf

The slow mode only affected receive perf on this machine. Sending was just fine at around 260 Mbytes/s, which is the CPU-bound limit for scp between my boxes.

I think rx-vlan-offload itself had nothing to do with this. I don't do vlan at the interface level. I think that all that was needed was whatever reinitialization at hw/driver level gets done when toggling this.

Two more notes:

  • My NIC is running a bit hot as reported by lm-sensors (PHY and MAC at around 80C). Not sure if that's still within specs or not, but I thought to mention anyway. I haven't found info on thermal limits, although I probably should improve ventilation for the nic.
  • Jumbo frames are enabled (MTU 9000)

Some more info below.

% ethtool -i lan0
driver: atlantic
version: 5.10.100
firmware-version: 3.1.58
expansion-rom-version: 
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: yes

% uname -a # this is a gentoo box with custom-configured vanilla kernel
Linux boombox 5.10.100 #1 SMP Fri Feb 11 16:51:55 EET 2022 x86_64 Intel(R) Core(TM) i9-9900KF CPU @ 3.60GHz GenuineIntel GNU/Linux

lspci reports:

03:00.0 Ethernet controller: Aquantia Corp. AQC107 NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 02) (1d6a:07b1)

@cail
Copy link
Member

cail commented Feb 14, 2022

Hi Samy, thanks for the observation.
'ethtool -K' manipulations basically reloads the hardware and driver logic. So it could be that something gets wrong either in hardware or in the driver. Temperature is abit high, but I don't think that critical.

Couple of questions/suggestions:

  1. Do you use in-kernel driver?
  2. How often can you observe this
  3. Next time you face it, please try to capture tcpdump trace on the interface while performing "slow" operations
  4. While in "slow" mode - could you run UDP based test to see if it is also slow (e.g. iperf udp)?

@skiminki
Copy link

  1. Do you use in-kernel driver?

The driver is the in-tree kernel driver that comes with the upstream kernel. Loaded as a module. The only out-of-tree driver I have is the nvidia.ko GPU driver.

  1. How often can you observe this

This was the first time I noticed the slow mode. But I've had the 10G setup only for a month or so.

3 & 4

Sure.

@ffries
Copy link

ffries commented Apr 23, 2022

Hello.

Firstly:
You should be warned that 10Gb/s copper does get very hot and after a certain temperature speed goes down to 1Gb/s. This is why using 10Gb SFP+ is always preferable. 10Gb/s copper is a marketing technology to sell the 10Gb/s technology to newbees. But in reality with 10Gb/s only full fiber works at normal temperatures. 10Gb copper is NO-GO NO-FIX solution.

I was running a full copper solution and I had to ship back all equipment because of this problem.
Now I am very happy with a full-fiber solution.

Secondly:
When using SFTP or SCP, make sure that you are using a cipher with AES-NI acceleration on bother client and server. Otherwize speed is limited to around 80Mb/s depending on your hardware. Reaching good speeds on 10Gb/s is difficult because of ciphers.

Make a speedtest with iPerf3 to measure the real speed of your network.

Hope this helps.

@skiminki
Copy link

Ok, it happened again. This time I was able to make couple of measurements.

Main observations:

  • it seems that only the reception path is affected
  • TCP, UDP, and ICMP are affected
  • it seems that large frames are more affected
  • this seems to be related to initialization. Either the NIC is slow on machine boot, or it's not. I have never seen a degradation to slow mode once the NIC is fast. (I.e., shouldn't be related to temps)
  • I've observed the slow mode now maybe 4-5 times. I use this machine daily.
  • The other machine (i.e., the remote box below) also uses the same NIC model. I've never observed a problem with that machine. There's a Unifi Flex XG switch between these two machines.
    I have jumbo frames enabled (9000 MTU).

Unplugging and plugging the cable made the problem go away. Previously, I have also noticed that the use of ethtool to change settings, or ifdown/ifup makes the problem go away.

Measurement data below:

IPERF3 server inbound / UDP: (server on this machine)
(client: iperf3 --client boombox --time 10 --udp -b 9G)

Accepted connection from 192.168.3.12, port 42412
[  5] local 192.168.3.10 port 5201 connected to 192.168.3.12 port 44006
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  2.30 GBytes  1.98 Gbits/sec  0.013 ms  516825/792733 (65%)  
[  5]  10.00-10.00  sec  69.9 KBytes  2.14 Gbits/sec  0.015 ms  8/16 (50%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  2.30 GBytes  1.98 Gbits/sec  0.015 ms  516833/792749 (65%)  receiver
-----------------------------------------------------------

IPERF3 server inbound / TCP: (server on this machine)
(client: iperf3 --client boombox --time 10)

Accepted connection from 192.168.3.12, port 42402
[  5] local 192.168.3.10 port 5201 connected to 192.168.3.12 port 42404
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   175 KBytes   143 Kbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   175 KBytes   143 Kbits/sec                  receiver

IPERF3 server outbound / UDP (server on another machine):

Accepted connection from 192.168.3.10, port 48536
[  5] local 192.168.3.12 port 5201 connected to 192.168.3.10 port 52811
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  8.24 GBytes  7.08 Gbits/sec  0.004 ms  187/988897 (0.019%)  
[  5]  10.00-10.00  sec   192 KBytes  5.18 Gbits/sec  0.004 ms  0/22 (0%)  

IPERF3 server outbound / TCP (server on another machine):

Accepted connection from 192.168.3.10, port 47222
[  5] local 192.168.3.12 port 5201 connected to 192.168.3.10 port 47228
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  11.5 GBytes  9.90 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  11.5 GBytes  9.90 Gbits/sec                  receiver

Inbound pings: (i.e., pinged from another machine)

$ ping boombox
PING boombox (192.168.3.10) 56(84) bytes of data.
--- boombox statistics ---
26 packets transmitted, 24 received, 7,69231% packet loss, time 25575ms

$ ping boombox -s 2000

PING boombox (192.168.3.10) 2000(2028) bytes of data.
--- boombox ping statistics ---
24 packets transmitted, 19 received, 20,8333% packet loss, time 23531ms
rtt min/avg/max/mdev = 0.165/0.245/0.263/0.020 ms

$ ping boombox -s 10000

PING boombox (192.168.3.10) 10000(10028) bytes of data.
--- boombox statistics ---
20 packets transmitted, 8 received, 60% packet loss, time 19444ms
rtt min/avg/max/mdev = 0.261/0.321/0.344/0.023 ms

I also made a wireshark capture during ping -s 10000 on boombox (the affected machine). Nothing unusual there, I just see that some IP packets are missing.

@skiminki
Copy link

Additional note. I also dual-boot this machine occasionally on Windows. I don't recall ever observing the slow mode there, although that wouldn't mean it doesn't happen.

I also use the other machine almost daily with the same NIC model and no slow mode ever observed.

@adonespitogo
Copy link

adonespitogo commented Oct 28, 2022

I am experiencing a similar issue. The transfer speed from my ubuntu machine which have aquantia 10gbe to my NAS is good at around ~8Gbits/sec. But the transfer speed from my NAS to my ubuntu machine is very slow at ~94Mbits/sec. My ubuntu onboard NIC is Aquantia Corp. AQC107 NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 02) if that helps.

EDIT:
I used iperf3 for the benchmark.

@cail
Copy link
Member

cail commented Nov 2, 2022

Such a low perfomance may be a sign of packet corruption or extensive packet drop.
Please try disabling HW offloads one by one.

@cfrigaard
Copy link

cfrigaard commented Jul 2, 2024

Same issue seen frequently in a overheated server room: many servers with aquantia network interface had a slowdown on only the RX line, from around normally 400 to 800 MiB/s, that drops to around 100 MiB/s and sometimes as low as 32 MiB/s!

Now, all affected network adapters was aquantia/atlantic

$ ethtool -i aquantia-10Gbps
driver: atlantic
version: 5.15.0-94-generic
firmware-version: 3.1.88
[...]

until the same defect hit a server with only Intel network interfaces/cards! Same issue, same RX slowdown, but now a server without the Aquantia card and atlantic driver!?!

I've made a shell script, that detects the problem, reading a 4GiB file, using dd and with the cache-off switch (iflag=direct).

The test case creates file with random values (zero files can be problematic in file system with sparse-file handling), that, in turn, is read by dd to get a measure of the network speed.

Since this script runs nightly on my servers (with option -q and -w), a random wait/sleep period is introduced, so that multiple servers do not run at exactly the same time, and the script is repeated a number of time before marked as failed.

Running without options just test the network RX defect...

Now, the defect began this spring, when the server room started to get very hot, but I have not proven in any way, that the RX defect is actually related to this temperature issue, manually running 'sensors' report aquantia temperatures up to 83 deg C, but these have not been monitored continuously

$ sensors
[..]
aquantia-10Gbps-pci-c100
Adapter: PCI adapter
PHY Temperature:  +68.1°C  
MAC Temperature:  +67.5°C 
[..]

But it could also be a general kernel defect or a firmware issue in Ubuntu (running Linux 5.15.0-94-generic, Ubuntu 22.04.3 LTS).

This defect is the 'reverse' to the comment:

[.. ] this seems to be related to initialization. Either the NIC is slow on machine boot, or it's not. I have never seen a degradation to slow mode once the NIC is fast. (I.e., shouldn't be related to temps.

since it resets to 'normal' speed (fast) at boot, and degradation only happens after boot.

Looked into packet errors, and drop but found no issues (ping reports `15 packets transmitted, 15 received, 0% packet loss, time 14334ms rtt min/avg/max/mdev = 0.300/0.342/0.388/0.021 ms'). Tried disabling all offload settings, did not help either....

#!/bin/bash 
 
set -eaou pipefail

function Echo
{
	local T=$1
	shift
	if [ $QUIET -lt 1 ]; then	
		if [[ $T == 1 ]]; then
			echo -n "    "
		fi
		echo $@ #>/dev/null
	fi
}
   
function TestNetRXDefect()
{
	# Slow RX seen via dd, normal speed is 400 MiB/s to 783 MiB/s
	#    dd if=~xx/Temp/random_4GiB.txt  count=1024 bs=4M iflag=direct oflag=direct | pv >/dev/null 
	#    1024+0 records in32.0MiB/s] [1024+0 records out 4294967296 bytes  4.3 GB, 4.0 GiB) copied, 133.819 s, 32.1 MB/s
	#    pv: 4.00GiB 0:02:50 [24.1MiB/s] [

	local N=$1
	Echo 0 "TESTNETRXDEFECT[$N]:.."

	local RANDFILE=/home/xxx/Temp/random_4GiB.txt
	if [ ! -f $RANDFILE ]; then
		echo "  creating random file '$RANDFILE' of 4GiB (4*1024 blocks of size 1MiB).."
		dd count=1024 bs=4M if=/dev/urandom of=$RANDFILE
	fi	

	local DATE=`date`
	local F=/tmp/.test_defect_netrx_$HOSTNAME
	local DD=`dd if=/home/xxx/Temp/random_4GiB.txt count=1024 bs=4M iflag=direct of=/dev/null 2>&1`
	
	local M=30
	local T=`echo $DD | tail -n 1 | cut -d ',' -f 3 | tr s ' '`
	local R=`echo "$T > $M" | bc -l`
	local S="NA"
	
	Echo 1 "DATE=$DATE"
	Echo 1 "F=$F"
	Echo 1 "T=$T, M=$M, R=$R"
	Echo 1 "DD=$DD"
	
	if [ $R == 0 ]; then
		Echo 1 "RESULT: OK, $T is less than $M.."
		FOUT=$F.txt
		FDEL=$F.err
		S="OK"
	else 
		Echo 1 "RESULT: FAIL, $T is greater than $M.."
		if [ $N -gt 2 ]; then
			FOUT=$F.err
			FDEL=$F.txt
			S="FAILED"			
		else
			local M=`echo $N + 1 | bc`
			Sleep $M
			TestNetRXDefect $M
		fi
	fi
	
	if [ $S != "NA" ]; then
		Echo 0 "TESTNETRXDEFECT: DONE ($S)"
		if [ $WRITEOUTPUT == 1 ]; then
			test ! -f $FDEL || rm $FDEL
			echo "TESTNETRXDEFECT: DATE=$DATE, T=$T, M=$M, R=$R, DD=$DD, DONE" > $FOUT
			#chmod ugo+w $FOUT # only if user needs to access root-owned file..
		fi
	fi		
}

function Sleep()
{
	local M=$1
	local SFACTOR=20
	if [ $M -gt 0 ]; then
		SFACTOR=50
	fi
	
	T=`echo $RANDOM / $SFACTOR | bc`
	if [ $QUIET -eq 0  ]; then
		Echo 0 "[no sleep..T=$T s, SFACTOR=$SFACTOR, M=$M]"
	else		
		#Echo "PRERUN: sleep $T s.."
		Echo 0 "[sleep..T=$T s, SFACTOR=$SFACTOR, M=$M]"
		sleep $T
	fi
}

QUIET=0
WRITEOUTPUT=0
while [[ $# -gt 0 ]]; do
    case $1 in
        "-q")
            QUIET=1
            shift
            ;;
        "-w")
            WRITEOUTPUT=1
            shift
            ;;
        *)
            echo "unknown argument(s) '$@'" && false
            ;;
    esac
done


Sleep 0
TestNetRXDefect 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants