New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5.0.0] Rootless networking with custom network is broken #22146
Comments
Faking a
...if I enter the target namespace and capture traffic from there, I don't see that segment, though:
Is it from netfilter? It doesn't look like netavark is configuring anything that might lead to that:
|
@maxi0604 Is this ipv4 or ipv6 traffic that is not working? I only have access to ipv4 systems so I cannot test v6. For the cutsom rootless network case the setup is more complicated with involves both pasta and netavark so it is not easy to tell where things go wrong. You can enter our rootless netns with |
I've explicitly tested both and both show the same hang. If the network was not created with
Yes, that seems to work with v4 and v6.
142.250.74.206:80 is google.com. I think the IPv6 output is unrelated, the network was created without |
I don't think there's any packet getting lost, @maxi0604's output is consistent with mine, here is the RST segment:
I strace'd pasta, and it close()s the "host" socket as it gets this, as expected. |
[Distractedly thinking about this, sorry for the rain of comments] On a second thought, we can't exclude that the window update frame (18 in my first capture, #22146 (comment)) is seen as somewhat strange by the kernel and that warrants a reset. The acknowledgement sequence is increased by one compared to the SYN, ACK segment, but the ACK flag is not set (because we want to update the window) -- that should be legitimate but somewhat unusual. |
Tagging @dgibson in case that rings a bell. |
Confirmed, the kernel doesn't seem to like (anymore?) a segment that just updates the window, without any flag set, and with the acknowledgement sequence matching the previous one. If I force the ACK flag in pasta, here:
then we don't get a reset and wget completes. |
This smells like a kernel issue to me and we should look into that. Probably reasonable workaround meanwhile: if we just completed the three-way handshake, with a connection started from the tap side (container), reset our own value of the window we sent to the container, in order to force an ACK flag on the next segment (including a possible window update, as it happens here):
lightly tested, this seems to work as well. |
I don't think it is a kernel issue. @sbrivio-rh pointed out this kernel commit. It states that RFC 793 requires that packets without an ACK be dropped, and my reading of RFC 793 its successors concurs. See for example here. I think we should be setting ACK on all non-SYN, non-RST packets. What we do for RST packets is a bit more complicated. Currently trying to figure out how to correct this without excessive churn. I've also filed an upstream pasta bug to track it. |
While we fix this in pasta and make updated packages available, I tested this nftables-based workaround:
from the target network namespace (for pasta itself). For some reason The idea is to drop any TCP segment that has none of the SYN, RST, and ACK flags set, before some kernel component (we haven't figured that out yet) resets the connection. @dgibson also points out that RFC 9293 says those segments should be discarded, but not that they should cause a reset. This part looks like a kernel issue to me. |
I can confirm that on 5.0 it is broken with the default bridged network adapter when running on WSL. Unless a custom DNS server is added, e.g. Cloudflare's 1.1.1.1, DNS requests fail. |
This seems different. In my case, DNS and |
I'm sorry then. I must have misunderstood the reported issue. My apologies! |
@KirilMihaylov , which pasta version do you have installed? There was a DNS related issue fixed recently, which you might be seeing. |
I have something I hope is a fix, essentially a polished version of Stefano's suggestion. Unfortunately I haven't been able to test it against the specific problem here, because I wasn't able to reproduce. I don't know quite what's different about my setup, but the wget from an alpine container is working fine for me with podman 5.0.0 and existing pasta binaries. |
Ok, tree with the draft fix is here. I believe @sbrivio-rh will be able to make a release, and we can test from there. |
I'm able to reproduce the issue reliably, and your series fixes it for me. Testing and releasing now.
I think it's pretty much a combination of two factors, which might be unlikely or impossible to reproduce on some setups: first off we get a slightly different window value from the socket (65280 instead of 65536 bytes in my case) between three-way handshake and just after it, and we reflect it to the container, hence the problematic packet. Second, we write the HTTP request to the socket, but we don't see it being acknowledged right away (hence no increase of acknowledged sequence and no ACK flag in the problematic packet). |
This should now be fixed in the new version 2024_03_26.4988e2b. As the Arch Linux maintainer just happened to merge a change two hours ago, I guess you'll get an updated package for Arch rather soon. |
I've flagged the package in the Arch repository, thanks for the quick fix everyone |
The update has been released and works, closing this |
Currently we set ACK on flags packets only when the acknowledged byte pointer has advanced, or we hadn't previously set a window. This means in particular that we can send a window update with no ACK flag, which doesn't appear to be correct. RFC 9293 requires a receiver to ignore such a packet [0], and indeed it appears that every non-SYN, non-RST packet should have the ACK flag. The reason for the existing logic, rather than always forcing an ACK seems to be to avoid having the packet mistaken as a duplicate ACK which might trigger a fast retransmit. However, earlier tests in the function mean we won't reach here if we don't have either an advance in the ack pointer - which will already set the ACK flag, or a window update - which shouldn't trigger a fast retransmit. [0] https://www.ietf.org/rfc/rfc9293.html#section-3.10.7.4-2.5.2.1 Link: containers/podman#22146 Link: https://bugs.passt.top/show_bug.cgi?id=84 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Issue Description
When using rootless podman and a network created with
podman network create foo
, the container doesn't have internet access. The issue is not specific to IPv4-only networks and also occurs withpodman network create --ipv6 bar
.Steps to reproduce the issue
Steps to reproduce the issue
podman network create foo
podman run -it --rm --network=foo alpine wget google.com
Describe the results you received
The IP resolves, but the command hangs.
ping
(andping6
) work as expectedDescribe the results you expected
The command goes through.
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: