New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow download, very small packet sizes #1170
Comments
Please use current master, you will have to build it yourself, you can also use the ones from boot.ipxe.org, but for any debugging to be done you will need to be able to modify and build new versions. Could you please dump the http headers you get from the server? |
Will do.
As it's https I can give you the headers from a curl call to the same URL. Is that what you want? |
You could start with what curl shows you, but really want what iPXE gets. |
Alright, here's the curl story:
This is a public URL, so if you like you can poke it directly for debugging. How do I get to see the headers that ipxe sees? |
I ran a quick test downloading that URL with both curl and iPXE just now: curl: 54.3s so I am unable to reproduce your problem. Since you have a packet capture: could you please provide the raw .pcapng file? Doesn't need to include the whole download: the first 10 seconds or so should be sufficient to observe the problem. |
Thanks a lot. I'll get a pcap file, could be a couple of days, though as travel is coming up. |
Alright, here's a pcap file (unfortunately it doesn't compress well due to the encryption). Something I noticed while going through it is a high number of duplicate ACKs. I'm not aware of an underling issue in our network here as I can use another host attached to the same network and switch and get the download within 10s which is close to 1GBit which is almost identical to the slowest link on the path. |
Ugh. Seems like some corruption is happening here as well. The archive itself is intact. I downloaded it using curl on the neighbouring machine. I'm double checking this on other hardware now to see whether this is specific to that one machine. Edit: actually, I'm going to try a manually compiled current version of ipxe (based on 226531e) first. |
After trying a couple of times I was able to boot one of the initrds we have available and on that machine, using the same link I got the initrd downloaded within 1 minute. So it's not an issue with the machine itself. |
Thanks. The capture file is taken from an interface with some kind of TCP offload enabled, so is not showing the actual packets that went over the wire. For example: packet 115 is shown as being 15994 bytes long, which is longer than an Ethernet jumbo frame. We therefore cannot trust what the capture shows about duplicate ACKs, etc, since we are seeing a resynthesis of a TCP conversation rather than the actual TCP conversation. Could you try disabling the assorted segmentation offload features on the capture interface via |
Ok, so, this is quite fiddly to setup and I only managed to get an excerpt from the middle of the conversation. It might be that this doesn't yet help, but I think I managed to get a better dump now. I used a router in the middle and set its offloading settings to Looking at the dump in wireshark now only shows packet sizes around 1514, so correct L2 overhead for 1500 link MTU. I still see messages about reassambled PDUs, though as well as bursts of retransmisions and duplicate acks ... Any ideas? Let me know if you do need the beginning of the conversation instead. |
Great, so we can rule out any problem relating to packet sizes.
I see normal length packets and ACK RTT times (at the point of the wireshark capture) of <1ms from iPXE. TCP SACK is in use and is working as expected. I think you're using undionly.kpxe, which means that we have no direct control over the NIC and no visibility into things like RX buffer exhaustion. Are you able to use |
The cards are I'm using undionly mostly due to (very longterm) historical reasons when I tried to get things working reliably around 10+ years ago ... so this choise is likely cargo cult for now. I can try using ipxe.pxe - I'm curious whether this might be a driver issue and would resolve itself by switching to the natvie driver ... |
Ok, so I chainloaded into ipxe.pxe and had the impression, that the kernel loaded faster, but the initrd is still as slow at 1% in 10seconds. I canceled the download and here's the data from the interfaces:
The relevant interface is net1 (or potentially net0 which is the same) ... and ... right in this moment I'm noticing that usually we did boot from net0 and not net1. There was a slight firewall misconfiguration that caused the tftp server not to respond from net0 but on net1. Interestingly ... I now chained this to
This shows much much lower error rates ... I'm 95% sure that this isn't a problem on the actual network it's connected to. I can double check that once I booted. Consider me puzzled. |
Ah, I chained this again into the
|
Interesting! In the absence of any information to the contrary, I'm going to assume that this is most likely a configuration issue on the network side. If you are able to test that it really does depend on whether the NIC is using port 0 or port 1 (e.g. by physically swapping cables and observing that the slow/fast behaviour can be reproduced the other way round), then we can investigate further. |
Yes. I'm a bit tight on on-hands resources at the moment, so the first thing I can check is whether this also happens in a regular Linux environment. I'm happy to experiment with swapping the cables in a few days. |
So, within Linux on the same machine downloading over the two interfaces shows no differences. I'll try with switches cables in a couple of days. |
Are the 2 interfaces connected to identically configured ports? Is there any LCAP or other group functions enabled on the ports? STP configuration? |
Both are connected to identical switches, no LACP or other functions enabled. The faster network has a bit less traffic on the router (both area 1 switch away from the same router) but either are 1g interfaces that aren't fully utilized either way. |
Hi,
I've been running with an older image for quite a while successfully (undionly.kpxe from around 2020 or 2021) and this only started showing up with newer machines. I've updated to a current version (not exactly sure how old, likely only a few days/weeks).
I've seen that slow downloads are a recurring theme and I've tried doing my homework ...
When downloading over HTTPs I'm getting less that 10mbit/s. Looking at this tcpdump I see that the window doesn't seem to increase and only wobbles around 1.5k and 3k bytes. The latency is around 10ms (a WAN link) so this makes initrd downloads extremely non-fun:
The interface stats also don't look too good, but the linked explainers don't help ...
The version that is shown when booting is a bit unspecific (1.0.0+) and I'm not 100% sure whether I might still be accidentally running an old image, but as far as I can tell my tftp server is deliverying the correct image file that I've taken from my distro as
nix/store/3fm734b6ci0klbsijc8mi04rryfhfh10-ipxe-unstable-2023-07-19
.Thanks for any help ..
The text was updated successfully, but these errors were encountered: