Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker run occasionally fails with Error: No such container #1319

Closed
AtnNn opened this issue Jul 27, 2013 · 36 comments
Closed

docker run occasionally fails with Error: No such container #1319

AtnNn opened this issue Jul 27, 2013 · 36 comments

Comments

@AtnNn
Copy link

AtnNn commented Jul 27, 2013

$ for i in `seq 10`; do docker run -d base true & done; wait
0efb018be35b
2013/07/26 19:34:30 Error: No such container: a4cad32eddcb
2013/07/26 19:34:30 Error: No such container: 350f31c1121c
2013/07/26 19:34:30 Error: No such container: 0a4371dc7d51
8178fdf5ea16
63b7ae070c19
1b7fe339819c
58d6140aa618
9f894dd1fa58
d98da270e090

$ docker version
Client version: 0.5.0-dev
Server version: 0.5.0-dev
Git commit: c01d17d
Go version: go1.1
@vieux
Copy link
Contributor

vieux commented Jul 29, 2013

Can't reproduce, could you start your daemon in debug mode and paste us the docker info output please ?

Thanks.

@AtnNn
Copy link
Author

AtnNn commented Jul 29, 2013

I think it only happens when I call many docker run in parallel.

$ for i in `seq 2`; do docker -D run -d base true & done; wait
2013/07/29 11:12:16 Error: No such container: c2ada9df5af8
3081aa3d7171

Server output:

[debug] api.go:918 Calling POST /containers/create from 127.0.0.1:43979
2013/07/29 11:12:15 POST /v1.3/containers/create
[debug] api.go:918 Calling POST /containers/create from 127.0.0.1:43980
2013/07/29 11:12:15 POST /v1.3/containers/create
[debug] api.go:918 Calling POST /containers/{name:.*}/start from 127.0.0.1:43981
2013/07/29 11:12:16 POST /v1.3/containers/c2ada9df5af8/start
[debug] api.go:64 [error 404] No such container: c2ada9df5af8
[debug] api.go:918 Calling POST /containers/{name:.*}/start from 127.0.0.1:43982
2013/07/29 11:12:16 POST /v1.3/containers/3081aa3d7171/start
[debug] container.go:795 Waiting for process
[debug] container.go:808 Process finished
$ docker -D info
Containers: 5996
Images: 11
Debug mode (server): true
Debug mode (client): true
Fds: 28
Goroutines: 51
LXC Version: 0.7.5
EventsListeners: 264
Kernel Version: 3.8.0-26-generic
WARNING: No swap limit support

@AtnNn
Copy link
Author

AtnNn commented Jul 29, 2013

Note that the container was created:

$ docker ps -a | grep c2ada9df5af8
c2ada9df5af8        base:latest            true                   51 minutes ago      Exit 0

@apocas
Copy link
Contributor

apocas commented Aug 7, 2013

I'm also suffering from this, but with start.
https://gist.github.com/apocas/6170410

This is rare it happened ~20 times in ~37000 containers created.

@apocas
Copy link
Contributor

apocas commented Aug 11, 2013

I found something interesting. If I remove all containers except the problematic, the next start call will work.

EDIT: You don't need to remove all containers, just do something that manipulates the container pool (create, remove, ... another container) and the problematic container will be found in the next call.

@jpetazzo
Copy link
Contributor

Can you try with the latest release of Docker to see if the problem still happens?
Thank you!

@apocas
Copy link
Contributor

apocas commented Oct 28, 2013

Docker version 0.6.4, build2f74b1c

Still happening.

@jpetazzo
Copy link
Contributor

Thanks a lot. Can you confirm the exact command(s) that you used to reproduce?
I tried with Docker 0.6.4 on my Ubuntu 12.04 VM, and this works fine:

for i in `seq 10`; do docker run -d base true & done; wait

@apocas
Copy link
Contributor

apocas commented Nov 1, 2013

Statistically that doesn't have impact (it happens ~ 20/30000). And if you manipulate another container, the previous one that was stuck gets unstuck. (it's like a race condition of some sort)

I figured out the best way to reproduce this is to loop: create->start->wait->remove (one container a time)

I hacked something very quick to reproduce this consistently but till now wasn't able to reproduce it, but it happens on production which leads me thinking it may be related with the container image since I'm using base image on tests.

If you want I can give you ssh access when this happens, it's going to be a node from nodechecker.com so nothing critically you can mess with it :)

@jpetazzo
Copy link
Contributor

jpetazzo commented Nov 1, 2013

Hmmm, SSH access won't help a lot (mainly because I might not be the right person to inspect that), unfortunately! :-)

If you happen to find a way to reproduce it, that would be awesome. Also, if it becomes more annoying, don't hesitate to ping again; I'll see with the rest of the team how we could work on that!

@sylvinus
Copy link

Here is a script that can reproduce these errors with docker 0.6.7 on my vagrant box (normal mac os install path):

#!/usr/bin/env python
import multiprocessing
import os

SLEEP_TIME = 0
PARALLEL_CONTAINERS = 30
TOTAL_CONTAINERS = 1000


def run(i):
    os.system("docker run -t stackbrew/ubuntu:saucy sh -c 'sleep %s ; echo %s'" % (SLEEP_TIME, i))

if __name__ == "__main__":

  pool = multiprocessing.Pool(PARALLEL_CONTAINERS)

  pool.map(run, range(TOTAL_CONTAINERS))

If you play a bit with the parameters you can see different behaviours. I'm getting 3 kinds of errors for now:

No such container: 233603b5f5197a3003d80963f2b160d62279a9e3fdef68479d310e74b9746128
[error] commands.go:2453 Error resize: Error: bad file descriptor
2013/11/24 18:26:49 Error: create: Conflict, The name /red_sloth is already assigned to 7db618058dfb. You have to delete (or rename) that container to be able to assign /red_sloth to a container again.

Of these I think mostly the first one is important to fix asap imho: it makes docker quite unreliable when starting lots of containers (which most hosting providers will be doing I guess)

When removing the -t option, I stop getting "bad file descriptor" error but get this one instead, along with the "no such container" ones:

[error] commands.go:2415 Error receiveStdout: Unrecognized input header

Thanks!

@sylvinus
Copy link

Got this new error today with another image:

Error: start: Cannot start container c55314993fff607288e27bc876e687f0ae96966b447f1d0df8e52d42fe0d87f6: 
iptables failed: iptables -I FORWARD -i docker0 -o docker0 -p tcp -s 172.17.0.122 --sport 6379 -d 172.17.0.130 -j ACCEPT: iptables: Resource temporarily unavailable.

@gravis
Copy link

gravis commented Nov 27, 2013

FYI, this is still present in docker 0.7.0.

@codeaholics
Copy link
Contributor

See #2911 for a whole load more errors that can occur if you run in parallel.

@sylvinus
Copy link

There are less errors now but "No such container" is still happening in 0.7.1 ; Sample output of my script:

235
59
2013/12/10 11:10:30 Unrecognized input header
[error] commands.go:2399 Error receiveStdout: Unrecognized input header
136
2013/12/10 11:10:33 Error: start: No such container: df461ad15ec70994e0806dfbecd6395f41c0e4ef2da39d802f02effcc4ca3265
161
9
109
210
186
60
236

@vieux
Copy link
Contributor

vieux commented Dec 10, 2013

@sylvinus I'll work on this tomorrow

@sylvinus
Copy link

Excellent thanks :)

@codeaholics
Copy link
Contributor

@vieux, check out #2911 which documents a whole load of other error conditions which can occur.

@sylvinus
Copy link

@vieux any news on this? thanks!

@sylvinus
Copy link

sylvinus commented Jan 1, 2014

ping?

@shykes
Copy link
Contributor

shykes commented Jan 6, 2014

Tentatively scheduling for 0.8

@sylvinus
Copy link

sylvinus commented Jan 6, 2014

That's great, thanks!

@jpallen
Copy link
Contributor

jpallen commented Jan 15, 2014

I've hit this issue in production too, and after a bit of digging, I'm pretty sure the issue is a race condition in TruncIndex. (Disclaimer: this is my first foray into Go and I only have a loose idea of the threading model). When a new container is created, the id is inserted into an instance of TruncIndex, and the following three lines update the index (lines 450-452 of utils/utils.go):

idx.ids[id] = true
idx.bytes = append(idx.bytes, []byte(id+" ")...)
idx.index = suffixarray.New(idx.bytes)

idx.index is overwritten so if two containers are created at the same time in two separate threads then it's possible that one gets overridden by the other and one of the container ids does not appear in idx.index. The container id is then not found when trying to start the container.

Since only idx.index is updated destructively, this also explains why another update causes the container to reappear. The idx.bytes array still contains the container id, and another update will put it into idx.index correctly.

A simple solution to this would be to add a lock around the read/write methods of TruncIndex, and I'm looking into coding a test case before submitting a patch that does that. However, it might be more appropriate to refactor TruncIndex to use a data structure which doesn't have concurrency issues. In the second case I don't have enough Go knowledge to know what might be appropriate. Any thoughts?

@crosbymichael
Copy link
Contributor

@jpallen I think a RwMutex would be fine on the index.

@crosbymichael
Copy link
Contributor

I can no longer reproduce and the last PRs have been merged fixing the races in these areas.

@peikk0
Copy link

peikk0 commented Mar 20, 2014

Got this error again with docker 0.9.0:

% docker ps -a --no-trunc | head -n2
CONTAINER ID                                                       IMAGE                           COMMAND                                                                                                                                                  CREATED             STATUS              PORTS               NAMES
913d4146e09772c525ff3bbb3215100f24df4699ef7b4ab189400e57fc9d0cbc   precise-with-updates:20140320   /bin/bash                                                                                                                                                47 minutes ago      Exit 0                                  cocky_ritchie
% docker start 913d4146e09772c525ff3bbb3215100f24df4699ef7b4ab189400e57fc9d0cbc
Error: Cannot start container 913d4146e09772c525ff3bbb3215100f24df4699ef7b4ab189400e57fc9d0cbc: Container 913d4146e09772c525ff3bbb3215100f24df4699ef7b4ab189400e57fc9d0cbc not found. Impossible to mount its volumes
2014/03/20 16:08:29 Error: failed to start one or more containers

I was building an image using https://github.com/racker/docker-ubuntu-with-updates and got the error on the flatten task.

@jpetazzo jpetazzo reopened this Mar 20, 2014
@jpetazzo
Copy link
Contributor

Does it happen all the time, or randomly...?
Do you have the output of the docker daemon?
If you can reproduce with the docker daemon running in debug mode (-D), the output will probably be very helpful.
Thanks!

@peikk0
Copy link

peikk0 commented Mar 20, 2014

It happens all the time with this particular job (others containers may start just fine). Here is the debug output from the daemon, but it doesn't look verbose enough:

[debug] server.go:925 Calling POST /containers/{name:.*}/start
2014/03/20 19:08:37 POST /v1.10/containers/718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9/start
[/var/lib/docker|4d1c77fc] +job start(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9)
[/var/lib/docker|4d1c77fc] +job allocate_interface(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9)
[/var/lib/docker|4d1c77fc] -job allocate_interface(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9) = OK (0)
[/var/lib/docker|4d1c77fc] +job release_interface(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9)
[/var/lib/docker|4d1c77fc] -job release_interface(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9) = OK (0)
Cannot start container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9: Container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9 not found. Impossible to mount its volumes
[/var/lib/docker|4d1c77fc] -job start(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9) = ERR (1)
[error] server.go:951 Error: Cannot start container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9: Container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9 not found. Impossible to mount its volumes
[error] server.go:86 HTTP Error: statusCode=406 Cannot start container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9: Container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9 not found. Impossible to mount its volumes

The daemon is lxc-docker 0.9.0 from docker.io repositories, running on Ubuntu Server 12.04 with kernel 3.8.0-37-generic, and I got the error too with boot2docker. I reproduce it running make precise on an unaltered copy of https://github.com/racker/docker-ubuntu-with-updates or just re-running docker start <failing container> manually.

@jpetazzo jpetazzo removed this from the 0.8.0 milestone Mar 24, 2014
@jpetazzo
Copy link
Contributor

Thanks for the details, very useful.

I cleared the milestone tag since it's still happening on 0.9.

/cc @crosbymichael (not sure who would be the right person...)

@peikk0
Copy link

peikk0 commented Apr 9, 2014

The problem disappeared with docker 0.10. :)

@jpetazzo
Copy link
Contributor

jpetazzo commented Apr 9, 2014

Awesome, thanks Pierre! Closing this.

@jpetazzo jpetazzo closed this as completed Apr 9, 2014
@ntmggr
Copy link

ntmggr commented Jun 23, 2014

this issue looks similar to this bug...
deis/deis#1208

@maratbn
Copy link

maratbn commented May 9, 2016

Just for the record, I received this error message just earlier this evening on:

Docker version 1.11.1, build 5604cbe on CentOS Linux release 7.2.1511 (Core), kernel 3.10.0-229.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) running inside:

VirtualBox 5.0.10 r104061 running on:

Ubuntu 16.04 LTS kernel 4.4.0-22-generic (buildd@lcy01-32) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #39-Ubuntu SMP running as:

guest on libvirt 1.3.1-r1 running on:

Gentoo Linux with kernel 4.1.15-gentoo-r1

The problem did not repeat the second time the script setting up the Docker image was caused to run, so a race condition may still be in effect at least for Docker version 1.11.1

Perhaps it has a higher likelihood with nested VMs.

screenshot_2016-05-09_00-00-38--cropped

@sauravmndl-zz
Copy link

Getting same issue with docker 1.12.3

@pandaycp
Copy link

I have the following error similar to above but my sever has the overlay as Storage Driver already.
Please tell me how to fix it. This issue is occurring multiple times

Error response from daemon: Cannot kill container mastercdnEmbms_9184: No such container: mastercdnEmbms_9184

$docker info
Containers: 34
Running: 23
Paused: 0
Stopped: 11
Images: 18
Server Version: 1.11.2
Storage Driver: overlay
Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host bridge null
Kernel Version: 4.7.3-coreos-r2
Operating System: CoreOS 1185.3.0 (MoreOS)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 31.42 GiB
Name: dockerci-cp.novalocal
ID: PEJJ:YY47:R4CD:T7NT:MRBL:LUZO:DRRU:ULFQ:74OE:W2GQ:TKOW:KSGL
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Http Proxy: http://10.175.250.81:8080
Https Proxy: http://10.175.250.81:8080
No Proxy: cnshdocker.sh.cn.ao.ericsson.se
Registry: https://index.docker.io/v1/

$uname -a
Linux dockerci-cp.novalocal 4.7.3-coreos-r2 #1 SMP Tue Nov 1 01:38:43 UTC 2016 x86_64 Intel Xeon E312xx (Sandy Bridge) GenuineIntel GNU/Linux

@thaJeztah
Copy link
Member

@pandaycp docker 1.11.2 has reached EOL over a year ago, and there's not a lot of information in your comment to work with (also note that CoreOS is not a supported platform; issues with those packages should be reported with CoreOS, who maintains those).

You're commenting on an issue that's almost 5 years old, and was reported with a completely different code-base and runtime, so any issue you run into with a current version of Docker is likely not related.

Keep in mind that the GitHub issue tracker is not intended as a general support forum,
but for reporting bugs and feature requests. For other type of questions, consider using one of;

I'm locking the conversation on this issue because of the above, and to prevent it from collecting unrelated issues.

If you arrive on this issue because you're encountering a problem on a current version, and suspect there's a bug at hand, please open a new issue, providing the information that's requested in the issue template.

@moby moby locked and limited conversation to collaborators Oct 25, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests