`docker run` occasionally fails with `Error: No such container` #1319

AtnNn · 2013-07-27T02:38:42Z

$ for i in `seq 10`; do docker run -d base true & done; wait
0efb018be35b
2013/07/26 19:34:30 Error: No such container: a4cad32eddcb
2013/07/26 19:34:30 Error: No such container: 350f31c1121c
2013/07/26 19:34:30 Error: No such container: 0a4371dc7d51
8178fdf5ea16
63b7ae070c19
1b7fe339819c
58d6140aa618
9f894dd1fa58
d98da270e090

$ docker version
Client version: 0.5.0-dev
Server version: 0.5.0-dev
Git commit: c01d17d
Go version: go1.1

The text was updated successfully, but these errors were encountered:

vieux · 2013-07-29T10:34:15Z

Can't reproduce, could you start your daemon in debug mode and paste us the docker info output please ?

Thanks.

AtnNn · 2013-07-29T18:16:46Z

I think it only happens when I call many docker run in parallel.

$ for i in `seq 2`; do docker -D run -d base true & done; wait
2013/07/29 11:12:16 Error: No such container: c2ada9df5af8
3081aa3d7171

Server output:

[debug] api.go:918 Calling POST /containers/create from 127.0.0.1:43979
2013/07/29 11:12:15 POST /v1.3/containers/create
[debug] api.go:918 Calling POST /containers/create from 127.0.0.1:43980
2013/07/29 11:12:15 POST /v1.3/containers/create
[debug] api.go:918 Calling POST /containers/{name:.*}/start from 127.0.0.1:43981
2013/07/29 11:12:16 POST /v1.3/containers/c2ada9df5af8/start
[debug] api.go:64 [error 404] No such container: c2ada9df5af8
[debug] api.go:918 Calling POST /containers/{name:.*}/start from 127.0.0.1:43982
2013/07/29 11:12:16 POST /v1.3/containers/3081aa3d7171/start
[debug] container.go:795 Waiting for process
[debug] container.go:808 Process finished

$ docker -D info
Containers: 5996
Images: 11
Debug mode (server): true
Debug mode (client): true
Fds: 28
Goroutines: 51
LXC Version: 0.7.5
EventsListeners: 264
Kernel Version: 3.8.0-26-generic
WARNING: No swap limit support

AtnNn · 2013-07-29T19:04:56Z

Note that the container was created:

$ docker ps -a | grep c2ada9df5af8
c2ada9df5af8        base:latest            true                   51 minutes ago      Exit 0

apocas · 2013-08-07T01:36:11Z

I'm also suffering from this, but with start.
https://gist.github.com/apocas/6170410

This is rare it happened ~20 times in ~37000 containers created.

apocas · 2013-08-11T21:17:19Z

I found something interesting. If I remove all containers except the problematic, the next start call will work.

EDIT: You don't need to remove all containers, just do something that manipulates the container pool (create, remove, ... another container) and the problematic container will be found in the next call.

jpetazzo · 2013-10-28T22:54:04Z

Can you try with the latest release of Docker to see if the problem still happens?
Thank you!

apocas · 2013-10-28T23:11:25Z

Docker version 0.6.4, build2f74b1c

Still happening.

jpetazzo · 2013-10-29T06:17:11Z

Thanks a lot. Can you confirm the exact command(s) that you used to reproduce?
I tried with Docker 0.6.4 on my Ubuntu 12.04 VM, and this works fine:

for i in `seq 10`; do docker run -d base true & done; wait

apocas · 2013-11-01T10:55:01Z

Statistically that doesn't have impact (it happens ~ 20/30000). And if you manipulate another container, the previous one that was stuck gets unstuck. (it's like a race condition of some sort)

I figured out the best way to reproduce this is to loop: create->start->wait->remove (one container a time)

I hacked something very quick to reproduce this consistently but till now wasn't able to reproduce it, but it happens on production which leads me thinking it may be related with the container image since I'm using base image on tests.

If you want I can give you ssh access when this happens, it's going to be a node from nodechecker.com so nothing critically you can mess with it :)

jpetazzo · 2013-11-01T18:17:33Z

Hmmm, SSH access won't help a lot (mainly because I might not be the right person to inspect that), unfortunately! :-)

If you happen to find a way to reproduce it, that would be awesome. Also, if it becomes more annoying, don't hesitate to ping again; I'll see with the rest of the team how we could work on that!

sylvinus · 2013-11-24T18:36:09Z

Here is a script that can reproduce these errors with docker 0.6.7 on my vagrant box (normal mac os install path):

#!/usr/bin/env python
import multiprocessing
import os

SLEEP_TIME = 0
PARALLEL_CONTAINERS = 30
TOTAL_CONTAINERS = 1000


def run(i):
    os.system("docker run -t stackbrew/ubuntu:saucy sh -c 'sleep %s ; echo %s'" % (SLEEP_TIME, i))

if __name__ == "__main__":

  pool = multiprocessing.Pool(PARALLEL_CONTAINERS)

  pool.map(run, range(TOTAL_CONTAINERS))

If you play a bit with the parameters you can see different behaviours. I'm getting 3 kinds of errors for now:

No such container: 233603b5f5197a3003d80963f2b160d62279a9e3fdef68479d310e74b9746128
[error] commands.go:2453 Error resize: Error: bad file descriptor
2013/11/24 18:26:49 Error: create: Conflict, The name /red_sloth is already assigned to 7db618058dfb. You have to delete (or rename) that container to be able to assign /red_sloth to a container again.

Of these I think mostly the first one is important to fix asap imho: it makes docker quite unreliable when starting lots of containers (which most hosting providers will be doing I guess)

When removing the -t option, I stop getting "bad file descriptor" error but get this one instead, along with the "no such container" ones:

[error] commands.go:2415 Error receiveStdout: Unrecognized input header

Thanks!

sylvinus · 2013-11-25T18:33:35Z

Got this new error today with another image:

Error: start: Cannot start container c55314993fff607288e27bc876e687f0ae96966b447f1d0df8e52d42fe0d87f6: 
iptables failed: iptables -I FORWARD -i docker0 -o docker0 -p tcp -s 172.17.0.122 --sport 6379 -d 172.17.0.130 -j ACCEPT: iptables: Resource temporarily unavailable.

gravis · 2013-11-27T16:43:30Z

FYI, this is still present in docker 0.7.0.

codeaholics · 2013-12-03T09:01:38Z

See #2911 for a whole load more errors that can occur if you run in parallel.

sylvinus · 2013-12-10T11:16:41Z

There are less errors now but "No such container" is still happening in 0.7.1 ; Sample output of my script:

235
59
2013/12/10 11:10:30 Unrecognized input header
[error] commands.go:2399 Error receiveStdout: Unrecognized input header
136
2013/12/10 11:10:33 Error: start: No such container: df461ad15ec70994e0806dfbecd6395f41c0e4ef2da39d802f02effcc4ca3265
161
9
109
210
186
60
236

vieux · 2013-12-10T18:12:20Z

@sylvinus I'll work on this tomorrow

sylvinus · 2013-12-10T18:18:10Z

Excellent thanks :)

codeaholics · 2013-12-11T10:25:43Z

@vieux, check out #2911 which documents a whole load of other error conditions which can occur.

sylvinus · 2013-12-18T23:47:40Z

@vieux any news on this? thanks!

sylvinus · 2014-01-01T18:11:36Z

ping?

shykes · 2014-01-06T18:53:19Z

Tentatively scheduling for 0.8

sylvinus · 2014-01-06T19:01:22Z

That's great, thanks!

jpallen · 2014-01-15T15:52:41Z

I've hit this issue in production too, and after a bit of digging, I'm pretty sure the issue is a race condition in TruncIndex. (Disclaimer: this is my first foray into Go and I only have a loose idea of the threading model). When a new container is created, the id is inserted into an instance of TruncIndex, and the following three lines update the index (lines 450-452 of utils/utils.go):

idx.ids[id] = true
idx.bytes = append(idx.bytes, []byte(id+" ")...)
idx.index = suffixarray.New(idx.bytes)

idx.index is overwritten so if two containers are created at the same time in two separate threads then it's possible that one gets overridden by the other and one of the container ids does not appear in idx.index. The container id is then not found when trying to start the container.

Since only idx.index is updated destructively, this also explains why another update causes the container to reappear. The idx.bytes array still contains the container id, and another update will put it into idx.index correctly.

A simple solution to this would be to add a lock around the read/write methods of TruncIndex, and I'm looking into coding a test case before submitting a patch that does that. However, it might be more appropriate to refactor TruncIndex to use a data structure which doesn't have concurrency issues. In the second case I don't have enough Go knowledge to know what might be appropriate. Any thoughts?

crosbymichael · 2014-01-15T17:03:39Z

@jpallen I think a RwMutex would be fine on the index.

crosbymichael · 2014-02-03T18:15:47Z

I can no longer reproduce and the last PRs have been merged fixing the races in these areas.

peikk0 · 2014-03-20T15:12:04Z

Got this error again with docker 0.9.0:

% docker ps -a --no-trunc | head -n2
CONTAINER ID                                                       IMAGE                           COMMAND                                                                                                                                                  CREATED             STATUS              PORTS               NAMES
913d4146e09772c525ff3bbb3215100f24df4699ef7b4ab189400e57fc9d0cbc   precise-with-updates:20140320   /bin/bash                                                                                                                                                47 minutes ago      Exit 0                                  cocky_ritchie
% docker start 913d4146e09772c525ff3bbb3215100f24df4699ef7b4ab189400e57fc9d0cbc
Error: Cannot start container 913d4146e09772c525ff3bbb3215100f24df4699ef7b4ab189400e57fc9d0cbc: Container 913d4146e09772c525ff3bbb3215100f24df4699ef7b4ab189400e57fc9d0cbc not found. Impossible to mount its volumes
2014/03/20 16:08:29 Error: failed to start one or more containers

I was building an image using https://github.com/racker/docker-ubuntu-with-updates and got the error on the flatten task.

jpetazzo · 2014-03-20T16:04:43Z

Does it happen all the time, or randomly...?
Do you have the output of the docker daemon?
If you can reproduce with the docker daemon running in debug mode (-D), the output will probably be very helpful.
Thanks!

peikk0 · 2014-03-20T18:17:06Z

It happens all the time with this particular job (others containers may start just fine). Here is the debug output from the daemon, but it doesn't look verbose enough:

[debug] server.go:925 Calling POST /containers/{name:.*}/start
2014/03/20 19:08:37 POST /v1.10/containers/718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9/start
[/var/lib/docker|4d1c77fc] +job start(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9)
[/var/lib/docker|4d1c77fc] +job allocate_interface(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9)
[/var/lib/docker|4d1c77fc] -job allocate_interface(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9) = OK (0)
[/var/lib/docker|4d1c77fc] +job release_interface(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9)
[/var/lib/docker|4d1c77fc] -job release_interface(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9) = OK (0)
Cannot start container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9: Container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9 not found. Impossible to mount its volumes
[/var/lib/docker|4d1c77fc] -job start(718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9) = ERR (1)
[error] server.go:951 Error: Cannot start container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9: Container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9 not found. Impossible to mount its volumes
[error] server.go:86 HTTP Error: statusCode=406 Cannot start container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9: Container 718e930e9b3655c232a9d56f027565ec5b5ba8fcf0ac8abfac70302979283bb9 not found. Impossible to mount its volumes

The daemon is lxc-docker 0.9.0 from docker.io repositories, running on Ubuntu Server 12.04 with kernel 3.8.0-37-generic, and I got the error too with boot2docker. I reproduce it running make precise on an unaltered copy of https://github.com/racker/docker-ubuntu-with-updates or just re-running docker start <failing container> manually.

jpetazzo · 2014-03-24T23:44:29Z

Thanks for the details, very useful.

I cleared the milestone tag since it's still happening on 0.9.

/cc @crosbymichael (not sure who would be the right person...)

peikk0 · 2014-04-09T13:21:53Z

The problem disappeared with docker 0.10. :)

jpetazzo · 2014-04-09T19:03:11Z

Awesome, thanks Pierre! Closing this.

ntmggr · 2014-06-23T20:04:11Z

this issue looks similar to this bug...
deis/deis#1208

maratbn · 2016-05-09T07:12:14Z

Just for the record, I received this error message just earlier this evening on:

Docker version 1.11.1, build 5604cbe on CentOS Linux release 7.2.1511 (Core), kernel 3.10.0-229.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) running inside:

VirtualBox 5.0.10 r104061 running on:

Ubuntu 16.04 LTS kernel 4.4.0-22-generic (buildd@lcy01-32) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #39-Ubuntu SMP running as:

guest on libvirt 1.3.1-r1 running on:

Gentoo Linux with kernel 4.1.15-gentoo-r1

The problem did not repeat the second time the script setting up the Docker image was caused to run, so a race condition may still be in effect at least for Docker version 1.11.1

Perhaps it has a higher likelihood with nested VMs.

sauravmndl-zz · 2017-05-16T16:51:59Z

Getting same issue with docker 1.12.3

pandaycp · 2017-10-25T10:19:58Z

I have the following error similar to above but my sever has the overlay as Storage Driver already.
Please tell me how to fix it. This issue is occurring multiple times

Error response from daemon: Cannot kill container mastercdnEmbms_9184: No such container: mastercdnEmbms_9184

$docker info
Containers: 34
Running: 23
Paused: 0
Stopped: 11
Images: 18
Server Version: 1.11.2
Storage Driver: overlay
Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host bridge null
Kernel Version: 4.7.3-coreos-r2
Operating System: CoreOS 1185.3.0 (MoreOS)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 31.42 GiB
Name: dockerci-cp.novalocal
ID: PEJJ:YY47:R4CD:T7NT:MRBL:LUZO:DRRU:ULFQ:74OE:W2GQ:TKOW:KSGL
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Http Proxy: http://10.175.250.81:8080
Https Proxy: http://10.175.250.81:8080
No Proxy: cnshdocker.sh.cn.ao.ericsson.se
Registry: https://index.docker.io/v1/

$uname -a
Linux dockerci-cp.novalocal 4.7.3-coreos-r2 #1 SMP Tue Nov 1 01:38:43 UTC 2016 x86_64 Intel Xeon E312xx (Sandy Bridge) GenuineIntel GNU/Linux

thaJeztah · 2017-10-25T10:35:43Z

@pandaycp docker 1.11.2 has reached EOL over a year ago, and there's not a lot of information in your comment to work with (also note that CoreOS is not a supported platform; issues with those packages should be reported with CoreOS, who maintains those).

You're commenting on an issue that's almost 5 years old, and was reported with a completely different code-base and runtime, so any issue you run into with a current version of Docker is likely not related.

Keep in mind that the GitHub issue tracker is not intended as a general support forum,
but for reporting bugs and feature requests. For other type of questions, consider using one of;

the Docker Support Forums - https://forums.docker.com
the Docker community Slack channel (register here: http://dockr.ly/community)
StackOverflow

I'm locking the conversation on this issue because of the above, and to prevent it from collecting unrelated issues.

If you arrive on this issue because you're encountering a problem on a current version, and suspect there's a bug at hand, please open a new issue, providing the information that's requested in the issue template.

sylvinus mentioned this issue Nov 24, 2013

Running containers in parallel on docker 0.6.6 produces error: Cannot find child... #2586

Closed

gravis mentioned this issue Nov 29, 2013

Removing containers doesn't remove layers #2951

Closed

codeaholics mentioned this issue Dec 3, 2013

Pulling multiple images with shared layers simultaneously fails (v0.7.0) #2911

Closed

flavianmissi mentioned this issue Dec 20, 2013

provison/docker: "remote: No such container: 0cbe3eba580b" tsuru/tsuru#643

Closed

apocas mentioned this issue Dec 23, 2013

error when running 2 tests in a row mikehostetler/codeswarm-docker-worker#2

Closed

jpetazzo mentioned this issue Jan 14, 2014

retry after iptables failed #1573

Closed

jpallen mentioned this issue Jan 15, 2014

Add RWMutex lock into TruncIndex to fix race condition when inserting values #3606

Merged

crosbymichael closed this as completed Feb 3, 2014

jpetazzo reopened this Mar 20, 2014

jpetazzo removed this from the 0.8.0 milestone Mar 24, 2014

jpetazzo closed this as completed Apr 9, 2014

pmcq mentioned this issue Sep 9, 2016

Create container gives Conflict, while start container gives not found #26452

Closed

carbolymer mentioned this issue Sep 8, 2017

Parallel execution of docker-compose run fails to start some containers docker/compose#5179

Closed

moby locked and limited conversation to collaborators Oct 25, 2017

docker run occasionally fails with Error: No such container #1319

docker run occasionally fails with Error: No such container #1319

Comments

AtnNn commented Jul 27, 2013

vieux commented Jul 29, 2013

AtnNn commented Jul 29, 2013

AtnNn commented Jul 29, 2013

apocas commented Aug 7, 2013

apocas commented Aug 11, 2013

jpetazzo commented Oct 28, 2013

apocas commented Oct 28, 2013

jpetazzo commented Oct 29, 2013

apocas commented Nov 1, 2013

jpetazzo commented Nov 1, 2013

sylvinus commented Nov 24, 2013

sylvinus commented Nov 25, 2013

gravis commented Nov 27, 2013

codeaholics commented Dec 3, 2013

sylvinus commented Dec 10, 2013

vieux commented Dec 10, 2013

sylvinus commented Dec 10, 2013

codeaholics commented Dec 11, 2013

sylvinus commented Dec 18, 2013

sylvinus commented Jan 1, 2014

shykes commented Jan 6, 2014

sylvinus commented Jan 6, 2014

jpallen commented Jan 15, 2014

crosbymichael commented Jan 15, 2014

crosbymichael commented Feb 3, 2014

peikk0 commented Mar 20, 2014

jpetazzo commented Mar 20, 2014

peikk0 commented Mar 20, 2014

jpetazzo commented Mar 24, 2014

peikk0 commented Apr 9, 2014

jpetazzo commented Apr 9, 2014

ntmggr commented Jun 23, 2014

maratbn commented May 9, 2016 • edited Loading

sauravmndl-zz commented May 16, 2017

pandaycp commented Oct 25, 2017

thaJeztah commented Oct 25, 2017

`docker run` occasionally fails with `Error: No such container` #1319

`docker run` occasionally fails with `Error: No such container` #1319

maratbn commented May 9, 2016 •

edited

Loading