UDP from Docker/Podman machine on Mac/Windows times out after 90 seconds #654

scandey · 2023-05-07T07:10:55Z

This is primarily a docker/podman or user issue, but I figured I'd include my writeup here anyways. Something in the network stack from both Docker and Podman seems to timeout a UDP connection after 90 seconds. In our COSMOS use-case, this shows up when we send commands to our platform computer emulator and then take a break. After returning, the connection has to be reset (closing and reopening the interface) in order to send more commands.

Working backwards, I ruled out issues with the emulator hardware, the COSMOS gem, COSMOS' ecosystem and my laptop itself. At this point I'm pretty sure that the issue occurs at the interface between the linux machine that runs docker inside Docker Desktop (or Podman machine) and the host. The issue does not occur on a pure Linux box Docker installation.

I've identified this behavior on Mac 13.3.1 on a M2Max and Windows 11 on an Intel i7, primarily running Docker Desktop though I've also tried Podman and Orbstack on the Mac.

The following is from a conversation with @MTI-twalker

I’ve tested regular ncat out of the host machines without any timeout issues. I then proceeded to try nc out of the COSMOS operator container and got the timeout behavior. Next step is to try a generic docker container without COSMOS. Something as basic using as nc in a pure alpine container like docker run -p 1234:1234/udp -it alpine ash ends up timing out silently after 90 seconds. So I do think this is a Docker issue.

I did find a few items that sound very similar to what I’m seeing:
https://forums.docker.com/t/udp-stream-timeout/114185
https://stackoverflow.com/questions/58031315/udp-port-forwarding-not-working-with-docker-on-windows-10

Tragically yes, everything that isn’t Linux-based has to use some form of Docker Desktop (or alternative) to create a linux machine that then runs docker inside it. I’ve now learned far more than I wanted to know about the underlying bits of the Docker ecosystem. I did note that the newest Docker Desktop for Mac (as of a few days ago) uses Google’s gVisor instead of their original vpnKit system. I’m doubtful that the switch to gVisor is related (especially since the Windows version is still using vpnKit) but it is at least one lead. https://www.docker.com/blog/docker-desktop-4-19/

I’ve been searching high and low and I can find lots of people reporting somewhat similar timeout issues, though not the specific 90 seconds that I see.

I tried updating vpnKitMaxPortIdleTime to 0 in ~/Library/Group\ Containers/group.com.docker/settings.json per docker/for-mac#2197, but I’m not sure whether it makes any difference. The default is 300, but I’m definitely getting a 90 second timeout rather than a 300 second timeout.

https://github.com/docker/for-win/issues/2639 (Perhaps windows equivalent of Mac issue 2197?)
https://github.com/moby/vpnkit/issues/587 (Maybe related, though based on server load supposedly)
https://github.com/docker/for-win/issues/8861 (Seems like the opposite issue, too many requests causes problems not too few)
https://github.com/moby/moby/issues/8795 (Maybe related, requires flushing the “conntrak” table to fix)
https://forums.docker.com/t/udp-stream-timeout/114185 (Unsolved forum post that specifically mentions UDP)
https://stackoverflow.com/questions/68639603/inactive-tcp-sockets-disconnecting-in-docker-for-windows-wsl-2 (Goes back to the config file)

In terms of my own debugging, I’ve found that I can sometimes actually get a "Ncat: Connection refused” if I try to send messages back from Mac host running ncat listener to docker container running nc after 90 seconds of no messages. Reloading the listener does not make a difference, nc on the container has to be restarted.

In the settings.json, switching back to "networkType": “vpnkit”, does not seem to make a difference.

The text was updated successfully, but these errors were encountered:

scandey · 2023-05-09T06:09:51Z

As expected, using an Ubuntu machine with regular docker seems to work fine.

After trying a few other Docker Desktop replacement systems including OrbStack and Rancher, I ended up with Colima and the containerd backend as provided by nerdctl. Unfortunately that system is still not nearly mature enough (and has its own special UDP issues) so I'm giving up on the Mac development for now till I can work out what is going on.

jmthomas · 2023-05-09T15:10:19Z

Thank you for the incredibly detailed and researched information. I'm a little confused by the "UDP connection timeout" since UDP is connectionless. COSMOS determines that a UDP interface is "connected" simply by whether we allocated a port. Are we saying that the UDP socket stops listening to packets after 90s? So future packet transmissions are not received?

scandey · 2023-05-09T15:58:51Z

The language is kind of tricky, I'm using "timeout" for lack of a better word rather than some actual error related to a closed connection or anything (since as you point out UDP itself is connectionless). The issue is sort of the other way around, a UDP socket stops (apparently) sending packets out of Docker Desktop if it does not send any for 90 seconds. Now that you mention it though, I'm not positive I've tested the other way around (whether a UDP socket stops listening if it gets nothing for 90 seconds). I've been focused on the commanding side and not considered the telemetry side since it just worked (in our HERMES use-case, there are separate UDP ports for command and telemetry).

From the point of the COSMOS containers, everything is working as intended (it works fine on a real Linux machine running docker directly). I haven't yet raised this issue with Podman Machine or Docker Desktop (or OrbStack or....) since I wasn't totally clear about the language to describe the problem.

jmthomas · 2023-05-09T16:44:08Z

Our primary development platform is Mac OS with Docker Desktop so we'll see if we can reproduce at some point.

scandey · 2023-05-09T18:35:59Z

I put together two quick scripts for showing the issue, hopefully the UDP issue replicates on your systems too. All of the TCP messages work as expected, but the final UDP message just never comes through.

TCP Tester

#!/bin/sh

echo "TCP test: set up a TDC reciever in other terminal with ncat -lk 1234 (requires ncat from nmap-ncat)"
sleep 10
echo "starting sending TCP packets 10, 30, 60 and 90 seconds apart"
docker run --rm --name sender alpine ash -c "{ echo 'tcp 10 seconds'; date; sleep 10; date; } | timeout 11 nc -p 1234 host.docker.internal 1234"
sleep 1
docker run --rm --name sender alpine ash -c "{ echo 'tcp 30 seconds'; date; sleep 30; date; } | timeout 31 nc -p 1234 host.docker.internal 1234"
sleep 1
docker run --rm --name sender alpine ash -c "{ echo 'tcp 60 seconds'; date; sleep 60; date; } | timeout 61 nc -p 1234 host.docker.internal 1234"
sleep 1
docker run --rm --name sender alpine ash -c "{ echo 'tcp 90 seconds'; date; sleep 90; date; } | timeout 91 nc -p 1234 host.docker.internal 1234"
sleep 1

UDP Tester

#!/bin/sh

echo "UDP Test: Start UDP reciever in other terminal with ncat -lu 1234 (requires ncat from nmap-ncat)"
sleep 10
echo "starting sending UDP packets 10, 30, 60 and 90 seconds apart"
docker run --rm --name sender alpine ash -c "{ echo 'udp 10 seconds'; date; sleep 10; date; } | timeout 11 nc -u -p 1234 host.docker.internal 1234"
sleep 1
docker run --rm --name sender alpine ash -c "{ echo 'udp 30 seconds'; date; sleep 30; date; } | timeout 31 nc -u -p 1234 host.docker.internal 1234"
sleep 1
docker run --rm --name sender alpine ash -c "{ echo 'udp 60 seconds'; date; sleep 60; date; } | timeout 61 nc -u -p 1234 host.docker.internal 1234"
sleep 1
docker run --rm --name sender alpine ash -c "{ echo 'udp 90 seconds'; date; sleep 90; date; } | timeout 91 nc -u -p 1234 host.docker.internal 1234"
sleep 1

ryanmelt · 2023-05-10T19:18:12Z

The TCP Tests works fine for me. With the UDP one I actually get nothing which is surprising and I'll have to experiment some more.

kdrag0n · 2023-05-14T00:22:26Z

👋 OrbStack developer here. Came across this issue while searching GitHub and thought I'd chime in.

I can't speak for other Docker providers, but in OrbStack's case, there are two issues here:

A bug causing UDP packets with the same source port + destination IP & port to get dropped after the timeout. I've found the cause and fixed it for the next version — thanks for raising the issue!
The fact that there's an internal NAT between the host and Docker. This means that there has to be some sort of connection timeout to keep resource usage under control. As a result, the client's source port changes after the timeout and the server thinks there's a new "connection":

17:02:02.061255 IP 127.0.0.1.49478 > 127.0.0.1.1234: UDP, length 29
17:02:04.083464 IP 127.0.0.1.59394 > 127.0.0.1.1234: UDP, length 44
17:02:34.087749 IP 127.0.0.1.58358 > 127.0.0.1.1234: UDP, length 29
17:02:36.105397 IP 127.0.0.1.59200 > 127.0.0.1.1234: UDP, length 44

ncat only seems to accept up to one "connection", so it stops printing packets after the source port changes.

I think it's possible to fix the connection persistence issue, but it'll be challenging. Feel free to open an issue on the OrbStack repo as well.

kdrag0n · 2023-05-16T02:30:57Z

Thought about this some more and figured out a relatively simple solution. Both issues should be fixed in the newly-released OrbStack v0.10.2.

The reproducer above still won't work because the client's source port 1234 conflicts with the ncat server port on the macOS host side. It should work if you change it to -p 1235.

There's still another NAT layer that be might cause issues after 2–3 minutes of idle, but I think it's not much different from standard Docker-on-Linux setups. Let me know how it goes with the real COSMOS use case!

jmthomas · 2023-05-16T02:41:47Z

@kdrag0n Thanks so much for finding this issue and finding a fix. @scandey I'll leave this open until you get a chance to try out it.

scandey · 2023-05-19T16:56:01Z

I don't think I fully understand why the source port matching the ncat server port on localhost only works for me for very short sleep times, but that's not actually an issue in production.

I do see that the new OrbStack has fixed the UDP issue if I use an alternative source port! That's very exciting! It also works in COSMOS as expected. Thank you @kdrag0n!! I can develop on COSMOS much much faster now. I've not seen a multi-minute idle timeout from the Docker-on-Linux, so hopefully we've dodged that issue here as well (somehow).

It seems like I should probably go ahead and report some form of this issue to Docker Desktop as well as to Podman, but first I need to clean up my test scripts to use a different source port than the ncat server port.

kdrag0n · 2023-05-19T18:24:02Z

Great to hear, thanks for sharing your results! Also good to know that the other potential source of timeouts isn't actually a problem.

jmthomas added the triage More information is needed label May 9, 2023

jmthomas closed this as completed May 21, 2023

usedhondacivic mentioned this issue Oct 27, 2024

Docker Desktop stops forwarding packets on exposed ports after a short time little-red-rover/lrr-ros#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UDP from Docker/Podman machine on Mac/Windows times out after 90 seconds #654

UDP from Docker/Podman machine on Mac/Windows times out after 90 seconds #654

scandey commented May 7, 2023

scandey commented May 9, 2023

jmthomas commented May 9, 2023

scandey commented May 9, 2023

jmthomas commented May 9, 2023

scandey commented May 9, 2023 •

edited

Loading

ryanmelt commented May 10, 2023

kdrag0n commented May 14, 2023 •

edited

Loading

kdrag0n commented May 16, 2023 •

edited

Loading

jmthomas commented May 16, 2023

scandey commented May 19, 2023

kdrag0n commented May 19, 2023

UDP from Docker/Podman machine on Mac/Windows times out after 90 seconds #654

UDP from Docker/Podman machine on Mac/Windows times out after 90 seconds #654

Comments

scandey commented May 7, 2023

The following is from a conversation with @MTI-twalker

scandey commented May 9, 2023

jmthomas commented May 9, 2023

scandey commented May 9, 2023

jmthomas commented May 9, 2023

scandey commented May 9, 2023 • edited Loading

TCP Tester

UDP Tester

ryanmelt commented May 10, 2023

kdrag0n commented May 14, 2023 • edited Loading

kdrag0n commented May 16, 2023 • edited Loading

jmthomas commented May 16, 2023

scandey commented May 19, 2023

kdrag0n commented May 19, 2023

scandey commented May 9, 2023 •

edited

Loading

kdrag0n commented May 14, 2023 •

edited

Loading

kdrag0n commented May 16, 2023 •

edited

Loading