-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swarm service, with overlay network, fails to remove all containers #26244
Comments
@urlund Yes, this is a bug caused by multiple go routines racing to delete the network and only one of them wins and everybody else fails to delete the network and because of that fails to remove the container. Will fix it. But other than some stale containers it will not cause any other issues in terms of functionality. |
Having a similar issue here: 1 swarm node, 1 service, 3 replicas, overlay network...and service rm always left one task exited that can be viewed with docker ps -a Is there anything we could do/test for you? |
@mrjana could you please point the point to reoslve the problem. |
@xiaods , |
I don't know it this can be somehow useful, but with just one swarm node, running just one service, the left containers seems to be always half of replicas. |
@jmzwcn cool. wait for your result. |
https://github.com/docker/swarmkit/blob/master/manager/orchestrator/replicated.go#L160 |
Are logs produced by lines like "log.G(ctx).WithError(err).Errorf("failed to list tasks")" visible running docker in verbose mode/docker logs/somewhere? I could dump some log traces if you need so. |
@mostolog docker writes logs to stderr. This means logs will be available depending on how/what the init system is/is setup. If you are using systemd, the logs are in journald. |
Supposed to be anonymized. I just run 4 commands:
log.txt attached |
Actually when we use "built-in network"(ingress[overlay] or null[bridge]), this issue is gone. I will investigate further more. |
@jmzwcn wait for your confirm |
@jmzwcn if you want to take care of fixing this issue it should be fixed here: https://github.com/docker/docker/blob/master/daemon/cluster/executor/container/adapter.go#L136. In addition to ignoring |
Great, I think it is exact. I will verify and submit a PR then. |
|
@jmzwcn i have follow @mrjana hints and reproduce the steps in above comments, the docker logs report :
i can confirmed the issue is cause by ErrNoSuchNetwork. give a PR asap ,thanks a lot. |
Yes, I have verified using a local binary with the fix(also include UnknownNetworkError) and find the issue has been gone as expected. dev binary. I will create a PR soon. |
Upload a latest binary(without UnknownNetworkError), could anybody help to verify too? here |
@jmzwcn .deb? 😁 |
Dev build takes too long time, not sure if deb could be completed before I leave office. 😁 |
1.12.2-rc1 is not also fixed it, please have a try on new testing binary. |
All debs have been uploaded to here, please try it based on your contrib. Thanks! |
Just run the previous 4 commands:
Tested on debian-jessie on a single-node swarm cluster. |
waiting the patch is merged, then close it. |
Output of
docker version
:Output of
docker info
:Steps to reproduce the issue:
docker network create -d overlay sleeper_nw
docker service create --name sleeper --replicas 20 --network sleeper_nw urlund/sleeper
docker service rm sleeper
Describe the results you received:
Output from
docker ps -a
:Node 1:
Node 2:
Node 3:
Describe the results you expected:
I would expect all containers to be removed.
Additional information you deem important (e.g. issue happens only occasionally):
You should not be able to reproduce this without creating/configuring an overlay network.
The text was updated successfully, but these errors were encountered: