docker kill leaves directories behind. #197

Closed
simonjohansson opened this Issue Mar 26, 2013 · 21 comments

Projects

None yet
@simonjohansson

Doing a docker kill UUID have left some directories behind in /var/lib/docker/containers
Doing a ls reveals

$ls /var/lib/docker/containers/0a50ba2e6217fe8234fe6a29f84e97b541631697777515f92259f276d7f83d3e/
ls: cannot access /var/lib/docker/containers/0a50ba2e6217fe8234fe6a29f84e97b541631697777515f92259f276d7f83d3e/rootfs: Stale NFS file handle
rootfs

I am running docker inside a rather slow virtualbox-vm (Ubuntu 12.04, 3.5.0-23-generic). I have right now 7 of these directories, two of them comes from containers where I have made big changes(apt-get update), the other five have only been "echo hello world"-containers.

Relevant IRC-chat

23:11 < DinMamma> Ah, this is interesting, when looking into the cointaners in /var/lib/docker/containers I get "ls: cannot access 
                  rootfs: Stale NFS file handle"
23:11 < DinMamma> So I wonder if this is a issue with my system rather than docker.
23:11 <@shykes> DinMamma: no, this is a known issue with aufs, which we thought we had neutralized
23:12 <@shykes> basically aufs umount is asynchronous
23:12 <@shykes> it does background cleanup
23:12 <@shykes> if you remove the mountpoint too quickly before aufs is done with cleanup, it gets stuck
23:12 <@shykes> and you get that error message
23:13 < DinMamma> I should say that I am running my tests inside a rather slow virtualbox-vm.
23:13 <@shykes> I'm surprised that you hit this. We have a workaround which includes checking the stat() on the mountpoint in a loop, 
                until its inode changes
23:19 <@shykes> DinMamma: so am I :)
23:19 <@shykes> mmm that could be it
23:20 <@shykes> DinMamma: did one of these containers have a lot of filesystem changes on them?
23:20 <@shykes> like a big apt-get, or something like that?
23:20 < DinMamma> Yep
23:20 < DinMamma> Two of them.
23:20 <@shykes> maybe slow machine + lots of data on the aufs rw layer means -> our workaround timed out, and gave up waiting for aufs
@shykes
Contributor
shykes commented Mar 26, 2013

Just an extra comment: it is normal for 'docker kill' to leave the container directory. By default all containers are stored, so you can inspect their filesystem state, commit them into images, restart them etc.

But of course it is not normal to see "stale NFS handle" errors :)

@vieux
Member
vieux commented Apr 11, 2013

I can't reproduce.

My host is ubuntu12.10 and I used the base as guest.
Anybody can reproduce ?

@hansent
Contributor
hansent commented Apr 15, 2013

Is there a way to manually repair the directory so I can delete the directories without rebooting the host?

@shykes
Contributor
shykes commented Apr 15, 2013

Not that I know of. Note that there is no known side-effect outside the
scope of that container.

On Monday, April 15, 2013, Thomas Hansen wrote:

Is there a way to manually repair the directory so I can delete the
directories without rebooting the host?


Reply to this email directly or view it on GitHubhttps://github.com/dotcloud/docker/issues/197#issuecomment-16385541
.

@shykes
Contributor
shykes commented Apr 23, 2013

As discussed earlier, this is probably due to the asynchronous nature of aufs unmount.

I'm downgrading this to minor bug, since:

a) it occurs very rarely (1 known occurrence so far)
b) it has no impact on the behavior of docker or the system,
c) it's very hard to reproduce

@ricardoamaro

+1 on a fix for this since i just bumped into it:

~# docker rm 5cbb64c3279a
Error: Error destroying container 5cbb64c3279a: stat /var/lib/docker/containers/5cbb64c3279a76acaac4769e4a6c57c39a7fff6027b51d14ecff08040d252d13/rootfs: stale NFS file handle

@creack
Contributor
creack commented Jul 24, 2013

@simonjohansson Since #816, did you get the error?

@vieux
Member
vieux commented Jul 30, 2013
@simonjohansson

Hi guys, sorry I didn't see this until now. I have some holiday coming up in the next couple of days, Ill make sure to see if #816 fixed the issue!

@dscape
dscape commented Aug 1, 2013

Just encountered the same issue:

root@dscape:~# docker ps -a | grep 'Exit' |  awk '{print $1}' | xargs docker rm
Error: Error destroying container 38b561af34e1: stat /var/lib/docker/containers/38b561af34e1bb0b3e92d7b1fe734aeabf223d6a5c36757be8925514e28e8b45/rootfs: stale NFS file handle

Error: Error destroying container 112a0c0b9c95: stat /var/lib/docker/containers/112a0c0b9c9546697f20dd7ed21899b789f981eb5195d189b1503ab1893184e4/rootfs: stale NFS file handle

Error: Error destroying container ef13c73b64a9: stat /var/lib/docker/containers/ef13c73b64a991e2b937fbcb1fae412d7b6404dcb67ae105c06ebd5b62926f35/rootfs: stale NFS file handle

Error: Error destroying container e0178615f6d8: stat /var/lib/docker/containers/e0178615f6d8be7ca343c89c398536713542413fa7ac04d172bb268f626a252a/rootfs: stale NFS file handle

Error: Error destroying container 3c8659a041c9: stat /var/lib/docker/containers/3c8659a041c9217e35c056e96da0fe5dc9d5eae43f37874ff372190ed8867277/rootfs: stale NFS file handle

Error: Error destroying container 99dee8e5a486: stat /var/lib/docker/containers/99dee8e5a486b8eeff3855e6750e1dee90ec4c8af022ed9a43304edda411b507/rootfs: stale NFS file handle

Error: Error destroying container b7ac0d3f3f79: stat /var/lib/docker/containers/b7ac0d3f3f79ae35883d09e796332726322e56bdd715e5484210bf84099cc513/rootfs: stale NFS file handle

Error: Error destroying container 7329c9be9795: stat /var/lib/docker/containers/7329c9be97957b187cdb6cbb825ab506e3a8610c01b4055ad5cc64fc58a6e985/rootfs: stale NFS file handle
root@dscape:~# docker version
Client version: 0.4.8
Server version: 0.4.8
Git commit: ??
Go version: go1.1.1
@simonjohansson

I cannot reproduce anymore.

Client version: 0.5.0
Server version: 0.5.0
Git commit: 51f6c4a
Go version: go1.1.1

GG :)

@creack
Contributor
creack commented Aug 7, 2013

@dscape can you try again with docker 0.5.1?

@dtabuenc

I keep seeing this issue over and over using docker inside VirtualBox. I usually run docker rm $(docker ps -a |cut -d " " -f 1) to remove all containers but many of them fail with stale NFS file handle.

@paulosuzart

Just to add, I tried some brutal force removing the directories of such containers. After that, trying to remove them via docker rm still prints the same message.

Managed to remove after restarting docker host.

@ricardoamaro

This seems fixed to me.
Using:

# docker version
Client version: 0.5.3
Server version: 0.5.3
Git commit: 5d25f32
Go version: go1.1.1

Also make sure you have no bash running inside the container path.

@dsissitka
Contributor

Was the asynchronous unmount theory ever proven? I wonder if this is the "deleted a container's image while the container is running" bug:

# Pane 1
$ docker run -i -t foo /bin/bash
root@d6d23b36b613:/#

# Pane 2
$ docker rmi foo
Untagged: 1cfaa4fe8724
Deleted: 1cfaa4fe8724
$

# Pane 1
root@d6d23b36b613:/# exit
$ docker rm `docker ps -l -q`
Error: Error destroying container d6d23b36b613: stat /var/lib/docker/containers/d6d23b36b613337b8e8bbc2ee90af11da3c5fab78a07a01a43ba7262359292ca/rootfs: stale NFS file handle

$
@pungoyal

@dsissitka i think that is exactly what it is. happened with me.

 $ docker version
Go version (client): go1.1.1
Go version (server): go1.1.1
Last stable version: 0.6.3

how can the container be removed now?

@crosbymichael
Member

The original issue is resolved in 0.7 because kill does not do an umount anymore. Containers are unmounted when the daemon is stopped.

@eliasp
Contributor
eliasp commented Nov 30, 2013

In case anyone has a /var/lib/docker/volumes directory full of orphaned volumes, feel free to use the following Python script (make sure to understand what it does before executing it):

#!/usr/bin/python

import json
import os
import shutil
import subprocess
import re

dockerdir = '/var/lib/docker'
volumesdir = os.path.join(dockerdir, 'volumes')

containers = dict((line, 1) for line in subprocess.check_output('docker ps -a -q -notrunc', shell=True).splitlines())

volumes = os.walk(os.path.join(volumesdir, '.')).next()[1]
for volume in volumes:
    if not re.match('[0-9a-f]{64}', volume):
        print volume + ' is not a valid volume identifier, skipping...'
        continue
    volume_metadata = json.load(open(os.path.join(volumesdir, volume, 'json')))
    container_id = volume_metadata['container']
    if container_id in containers:
        print 'Container ' + container_id[:12] + ' does still exist, not clearing up volume ' + volume
        continue
    print 'Deleting volume ' + volume + ' (container: ' + container_id[:12] + ')'
    volumepath = os.path.join(volumesdir, volume)
    print 'Volumepath: ' + volumepath
    shutil.rmtree(volumepath)
@mindreframer

thanks for the script! I fixed the indentation and a small bug:

container_id = volume_metadata['id'] # (not container anymore)

https://gist.github.com/mindreframer/7787702

@eliasp
Contributor
eliasp commented Dec 4, 2013

Thanks! No idea why the indentation was messed up in my post, edited + fixed it.

I used volume_metadata['container'] because I was still on 0.6.6 when I wrote the script, but anyone using 0.7.0 (or later) should use your changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment