New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WITH DOCKER
is not correctly releasing network resources (and most likely, the network resource is not isolated to a container)
#3495
Comments
Here's a similar reproduction of this error that does not require VERSION 0.7
client:
FROM ubuntu:latest
RUN apt update && apt install -y iproute2 curl iputils-ping
test:
FROM earthly/dind:ubuntu-23.04-docker-24.0.5-1
ARG CACHE_BUSTER
RUN echo "running test $CACHE_BUSTER"
WITH DOCKER --load myclient:latest=+client
RUN \
docker network create --subnet 192.168.200.0/24 foo && \
docker run --network foo --rm --name host1 -d myclient:latest /bin/sh -c 'sleep 999' && \
docker run --network foo --rm --name host2 -d myclient:latest /bin/sh -c 'sleep 999' && \
docker exec host1 /bin/sh -c 'ip address && ip route' && \
docker exec host2 /bin/sh -c 'ip address && ip route' && \
docker exec host1 /bin/sh -c 'ping -W 1 -c 3 host2' && \
docker exec host2 /bin/sh -c 'ping -W 1 -c 3 host1'
END
A notable discovery, is this does not require high levels of parallelism, it can also be reproduced with a sequential loop:
and it seems to be consistently erroring on the second iteration (even after killing the |
work-around: clean up after yourself
|
WITH DOCKER
when used with multiple networks under docker-compose struggles with high parallelismWITH DOCKER
is not correctly releasing network resources (and most likely, the network resource is not isolated to a container)
I expanded my test to have a simple python-based udp server/client which would send messages in a loop, and was able to start two instances in parallel, both of which started and continued working. I then shut down the first instance, but left the second instance running, and it continued to work. At that point I started a third instance (leaving the second instance up and running), and it's only that third instance that failed -- the second instance continued to run. The code:
|
Here's another reproduction case:
and
The first time this works, and on the second time we get 100% packet loss. If however you run |
This issue does not occur when using the Here's a full example:
which can be run multiple times: |
Note that the
Note that this work-around requires changing the internal value. Be careful not to confuse this with the unrelated external setting. |
Amazing. Thanks for the find @alexcb. |
We have documented that It's not clear what's needed to support the |
Thanks @alexcb |
What went wrong?
Consider a
docker-compose.yml
:and
Earthfile
:When running many instances of
earthly +test --CACHE_BUSTER=$RANDOM
(e.g. 30 or more), some will fail with:It's interesting to note, when the
networks
section is removed from thedocker-compose.yml
file (i.e. all services are only on the same default network), this reproduction-case no longer occurs.Update: Parallelism isn't required here, simply running
earthly -P +test --CACHE_BUSTER=$RANDOM && earthly -P +test --CACHE_BUSTER=$RANDOM
will reproduce this problem -- the first run succeeds, and the second consistently fails.Potential work-around
This problem does not occur when creating an internal network via
docker network create --internal ...
or by setting theinternal: true
option in the docker-compose file. This work-around will not work for cases where you expect a network that's externally accessible.The text was updated successfully, but these errors were encountered: