New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker daemon crashes/logs full of network not found #20694
Comments
@vlerenc it seems like the local db ( |
Yes, it seems so. After removing the file, I could restart the daemon again. You don't know how much you saved my day. Thank you very much! What's the consequences of the removed file? Does it impact anything? The second question is why it happened out of the sudden? The VMs were regularly updated, i.e. the daemon was shutdown every couple of weeks. Usually, with that many containers, the daemon kills most of the containers,right? Most likely, the VM updater even killed the daemon itself. Docker was (for the past couple of months) on the same 1.9 release, so no change there. Besides the question on the impact of the now deleted network DB file, how can I make the corrupted file available to you? And thank you again for your prompt help! |
If you are using custom bridge networks (other than the default docker0 network), those needs to be recreated manually. But it seems in your case, that is not the case. So, you wont see any functionality issues. Other than that, this file holds the resources to be cleaned up if it was not cleaned up properly when the daemon is brought down ungracefully (which seems to be your case).
Its a file corruption due to some event (could be timing related). The best way to know that would be to look at the daemon logs (preferably with debugs enabled with
You don't know how much you are helping us by reporting the issue so that we can resolve it in next release. Thank you very much. OSS FTW ! |
Easiest would be to put it in dropbox/box or other file sharing site and share the link here (or) send the link to my email madhu@docker.com Also, can you please confirm if you have created a custom docker network using |
@thaJeztah this is already in the radar yes. but we don't have a solid use-case to reproduce it yet. We can close this but lets dupe to the other open issue appropriately. |
@mavenugo I sent you the link to your e-mail address. Thank you for looking into it! |
@vlerenc can you pls answer this question as well
|
@mavenugo No, we haven't. |
@vlerenc thanks for sharing the corrupted local-kv.db. I analyzed it and found that there are/were bunch of containers run in
and the answer is |
@vlerenc can you pls attach the traceback for
Easiest thing to do is to
|
@vlerenc never mind. found the issue and am fixing it now. For confirmation, this is not a file corruption issue. But, this is one such scenario of using a global kv-store to try out multi-host networking followed by an unclean daemon shutdown and restarting the daemon with the kv-store down/not configured. Thanks for reporting and sharing all the requested information. |
@mavenugo Glad you found it! As for your question, I am sorry. We are using https://github.com/cloudfoundry-community/cf-containers-broker which is using https://github.com/swipely/docker-api and when checking I didn't see any "special network" code in https://github.com/cloudfoundry-community/cf-containers-broker/blob/master/app/models/docker_manager.rb. So I am actually puzzled by your finding. Anyhow, we have hundreds of users and containers and after deletion of the DB and restart, nobody complained/the existing and now restarted containers run just fine. They all run on their own reached only through the exposed port by the applications that use them. |
BUG REPORT INFORMATION (may be related to #19988)
Output of
docker version
:Output of
docker info
:Provide additional environment details (AWS, VirtualBox, physical, etc.):
OpenStack VM, 4 cores, 8 GB mem, Ubuntu 14.04.4 LTS, Linux 2b320682-1875-4548-a1e2-29d15b11b5a8 3.19.0-49-generic #55~14.04.1hf1533043v20160201b1-Ubuntu SMP Mon Feb 1 20:41:00 UT x86_64 x86_64 x86_64 GNU/Linux.
List the steps to reproduce the issue:
Started by script since months without problems, but now failing. Also fails after Docker update and OS update. Here the launch command we use (230+ containers all started with restart policy = always):
Describe the results you received:
...
Describe the results you expected:
Daemon starts, all containers start.
Provide additional info you think is important:
We haven't restarted the Docker daemon for two weeks. Maybe there are too many containers deployed now? But we had already daemons with 300+ containers running.
Maybe something went wrong during shutdown/the VM was terminated? The VM gets automatically recreated/reprovisioned and the former volume gets reattached. Could there by some issues with a hard kill?
In any case, the crash when relaunching the daemon doesn't look too good/like a real issue.
Do you have short term remedy proposals? Is there a way to force a cleanup? Or some way to start the daemon, but not all the containers (all were started with restart policy = always)?
In short, we see tons of:
... and then it crashes in:
The text was updated successfully, but these errors were encountered: