-
Notifications
You must be signed in to change notification settings - Fork 26
Paced TCP downloads in 1745.6.0 break down #2457
Description
Issue Report
It looks like any application on a coreos 1745.6.0 node not processing the data as fast as the source can provide it, will suffer a break down in transfer speed larger than the actual processing speed.
Docker image downloads where affected first, but it can be easily reproduced with curl.
Reverting to a prior version 1745.5.0 on the same host does not exhibit the behaviour.
But going back again to 1745.6.0 will.
It is also happening across machines (of the same type)
Bug
Container Linux Version
$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1745.6.0
VERSION_ID=1745.6.0
BUILD_ID=2018-06-08-0926
PRETTY_NAME="Container Linux by CoreOS 1745.6.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
Environment
Baremetal
Intel E7- 4870
2x Cisco VIC ENIC in a bond (active-backup) mtu 9000, 10000baseT/Full
Expected Behavior
As in the prior version (1745.5.0), same machine.
curl --limit-rate 1M -o /dev/null http://<high-speed-low-latency-source>
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 956M 0 8439k 0 0 1023k 0 0:15:57 0:00:08 0:15:49 1020k
The downloads proceeds with (more or less) the limited speed.
This is not limited to curl, but also the docker daemon downloads, and presumably others.
If the client does not process the data as fast as the network delivers it, the speed breaks down.
Actual Behavior
curl --limit-rate 1M -o /dev/null http://<high-speed-low-latency-source>
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 956M 0 6482k 0 0 41779 0 6:40:17 0:02:38 6:37:39 623
The traffic drops fastly under the speed limit.
This is also happening, with kubernetes stopped and iptables cleared.
Reproduction Steps
- Run curl with a speed limit lower than the source can deliver. The faster the source the better
- Wait for the traffic to drop vastly under the limit, when some buffer is presumably full (in our case ~6MiB)
- Restart same host with prior version (1745.5.0)
- Run same command and see expected behaviour.
Other Information
A tcpdump seems to indicate, that the client scales the window size to 384 bytes at a roughly 10 packets per seconds.
Prior version:
cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1745.5.0
VERSION_ID=1745.5.0
BUILD_ID=2018-05-31-0701
PRETTY_NAME="Container Linux by CoreOS 1745.5.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"