Device-mapper does not release free space from removed images #3182

Closed
AaronFriel opened this Issue Dec 12, 2013 · 198 comments

Projects

None yet
@AaronFriel

Docker claims, via docker info to have freed space after an image is deleted, but the data file retains its former size and the sparse file allocated for the device-mapper storage backend file will continue to grow without bound as more extents are allocated.

I am using lxc-docker on Ubuntu 13.10:

Linux ergodev-zed 3.11.0-14-generic #21-Ubuntu SMP Tue Nov 12 17:04:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

This sequence of commands reveals the problem:

Doing a docker pull stackbrew/ubuntu:13.10 increased space usage reported docker info, before:

Containers: 0
Images: 0
Driver: devicemapper
 Pool Name: docker-252:0-131308-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 291.5 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 0.7 Mb
 Metadata Space Total: 2048.0 Mb
WARNING: No swap limit support

And after docker pull stackbrew/ubuntu:13.10:

Containers: 0
Images: 3
Driver: devicemapper
 Pool Name: docker-252:0-131308-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 413.1 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 0.8 Mb
 Metadata Space Total: 2048.0 Mb
WARNING: No swap limit support

And after docker rmi 8f71d74c8cfc, it returns:

Containers: 0
Images: 0
Driver: devicemapper
 Pool Name: docker-252:0-131308-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 291.5 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 0.7 Mb
 Metadata Space Total: 2048.0 Mb
WARNING: No swap limit support

Only problem is, the data file has expanded to 414MiB (849016 512-byte sector blocks) per stat. Some of that space is properly reused after an image has been deleted, but the data file never shrinks. And under some mysterious condition (not yet able to reproduce) I have 291.5 MiB allocated that can't even be reused.

My dmsetup ls looks like this when there are 0 images installed:

# dmsetup ls
docker-252:0-131308-pool        (252:2)
ergodev--zed--vg-root   (252:0)
cryptswap       (252:1)

And a du of the data file shows this:

# du /var/lib/docker/devicemapper/devicemapper/data -h
656M    /var/lib/docker/devicemapper/devicemapper/data

How can I have docker reclaim space, and why doesn't docker automatically do this when images are removed?

@vreon
Contributor
vreon commented Dec 12, 2013

+1, I'm very interested in hearing some discussion on this subject. My strategy so far has been

  • be careful what you build/pull
  • be prepared to blow away your /var/lib/docker 😐

@AaronFriel, which version of Docker are you on? 0.7.1?

@kiorky
Contributor
kiorky commented Dec 12, 2013

/cc @regilero (link also in to #2276)

@alexlarsson
Contributor

Starting from a fresh /var/lib/docker:

# ls -lsh /var/lib/docker/devicemapper/devicemapper/*
292M -rw-------. 1 root root 100G Dec 12 17:29 /var/lib/docker/devicemapper/devicemapper/data
4.0K -rw-------. 1 root root   89 Dec 12 17:29 /var/lib/docker/devicemapper/devicemapper/json
732K -rw-------. 1 root root 2.0G Dec 12 17:31 /var/lib/docker/devicemapper/devicemapper/metadata
# docker info
Containers: 0
Images: 0
Driver: devicemapper
 Pool Name: docker-0:31-15888696-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 291.5 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 0.7 Mb
 Metadata Space Total: 2048.0 Mb

Then after docker pull busybox it grew a bit:

# ls -lsh /var/lib/docker/devicemapper/devicemapper/*
297M -rw-------. 1 root root 100G Dec 12 17:31 /var/lib/docker/devicemapper/devicemapper/data
4.0K -rw-------. 1 root root  181 Dec 12 17:31 /var/lib/docker/devicemapper/devicemapper/json
756K -rw-------. 1 root root 2.0G Dec 12 17:31 /var/lib/docker/devicemapper/devicemapper/metadata
# docker info
Containers: 0
Images: 1
Driver: devicemapper
 Pool Name: docker-0:31-15888696-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 296.6 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 0.7 Mb
 Metadata Space Total: 2048.0 Mb

docker rmi busybox does not make the file larger, but makes the space free in the devicemapper pool:

# ls -lsh /var/lib/docker/devicemapper/devicemapper/*
298M -rw-------. 1 root root 100G Dec 12 17:32 /var/lib/docker/devicemapper/devicemapper/data
4.0K -rw-------. 1 root root   89 Dec 12 17:32 /var/lib/docker/devicemapper/devicemapper/json
772K -rw-------. 1 root root 2.0G Dec 12 17:32 /var/lib/docker/devicemapper/devicemapper/metadata
# docker info
Containers: 0
Images: 0
Driver: devicemapper
 Pool Name: docker-0:31-15888696-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 291.5 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 0.7 Mb
 Metadata Space Total: 2048.0 Mb

The loopback file doesn't grow if we download the image again:

# ls -lsh /var/lib/docker/devicemapper/devicemapper/*
298M -rw-------. 1 root root 100G Dec 12 17:32 /var/lib/docker/devicemapper/devicemapper/data
4.0K -rw-------. 1 root root  181 Dec 12 17:32 /var/lib/docker/devicemapper/devicemapper/json
772K -rw-------. 1 root root 2.0G Dec 12 17:32 /var/lib/docker/devicemapper/devicemapper/metadata
# docker info
Containers: 0
Images: 1
Driver: devicemapper
 Pool Name: docker-0:31-15888696-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 296.6 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 0.7 Mb
 Metadata Space Total: 2048.0 Mb

So, it seems like we fail to re-sparsify the loopback file when the thinp device discards a block.

@alexlarsson
Contributor

However, if i create a file inside the container fs image it does reclaim the space in the loopback file.
I.e. I did this in the busybox image:

 cd lib
 cat libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 lib
c.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so
.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 libc.so.6 > a_file.bin
/lib # ls -l a_file.bin
-rw-r--r--    1 root     root      47090160 Dec 12 16:41 a_file.bin

This grew the data file from 299M to 344M. But when i removed a_file.bin (and waited a bit) it got back to 299M.

So, this seems to me like it is a devicemapper bug. I.e. it forwards discards from the thinp device to the underlying device, but it doesn't discard when removing thinp devices from the pool.

@alexlarsson
Contributor

This seem to be a kernel issue, I was looking at working around it bu using BLKDISCARD, but i failed. See this bugs for some details: https://bugzilla.redhat.com/show_bug.cgi?id=1043527

@alexlarsson
Contributor

I put my workaround in https://github.com/alexlarsson/docker/tree/blkdiscard, but we're still researching if we can do better than this.

@blalor
blalor commented Dec 29, 2013

Having this problem on CentOS (2.6.32-358.23.2.el6.x86_64) with Docker 0.7.0, as well. Old, but the problem's not isolated to Ubuntu.

@skakri
skakri commented Jan 3, 2014

Same issue on Arch GNU/Linux 3.12.6-1-ARCH, Docker version 0.7.2.

@blalor
blalor commented Jan 7, 2014

Still exists on 0.7.0 on CentOS.

@nmz787
nmz787 commented Jan 9, 2014

Still exists in 0.7.2 on ubuntu 12.04.3 LTS.

A lot of the space is in docker/devicemapper/devicemapper/data and metadata, but also in docker/devicemapper/mnt

It's neat that I learned you can see the container file systems in docker/devicemapper/mnt/SOME_KIND_OF_ID/rootfs

but it's not neat that my hard disk is almost completely eaten up and only fixable by rmdir -r docker.

@logicminds

I am having a similar issue while writing docker support for rspec-system. My test VM (docker host) has a 8GB drive and after repeatedly creating images without deleting them my drive fills up. But after removing all images and containers the drive is still 100% full. I figured it was an ID-10T error but just gave up and destroyed the VM all together.

@mengzechao

Still exist in 0.7.5 on ubuntu 13.04.

@unclejack
Contributor

This issue has been fixed by PR #3256 which was recently merged. This fix will be included in a future release.

I'm closing this issue now because the fix has been merged to master.

@unclejack unclejack closed this Jan 21, 2014
@alexlarsson
Contributor

Note: Its not fully fixed until you also run a kernel with http://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=6d03f6ac888f2cfc9c840db0b965436d32f1816d in it. Without that the docker fix is only partial.

@logicminds

What is the work around to remove space. I am using rhel 6.5 and it might be a while to get the new kernel.

Sent from my iPhone

On Jan 21, 2014, at 6:18 AM, Alexander Larsson notifications@github.com wrote:

Note: Its not fully fixed until you also run a kernel with http://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=6d03f6ac888f2cfc9c840db0b965436d32f1816d in it. Without that the docker fix is only partial.

β€”
Reply to this email directly or view it on GitHub.

@alexlarsson
Contributor

@logicminds There is no super easy way to recover the space atm. Technically it should be possible to manually re-sparsify the loopback files. But that would require all the non-used blocks to be zeroed or something for easily detection of the sparse areas, which is not done on thinp device removal.

@logicminds

@alexlarsson Does this also affect OEL 6.5? The OEL 6.5 actually uses the uek 3.8 linux kernel and since I have the option between switching from 2.6 to 3.8 kernel this might be a simple switch for me.

@alexlarsson
Contributor

@logicminds I don't even know if that commit is in the upstream kernel yet. That link is from the device-mapper tree. Its definitely not in 3.8.

@alexlarsson
Contributor

I'm looking at creating a tool like fstrim that can be used to get back the space.

@alexlarsson alexlarsson added a commit to alexlarsson/docker that referenced this issue Feb 20, 2014
@alexlarsson alexlarsson devicemapper: Add trim-pool driver command
This command suspends the pool, extracts all metadata from the metadata pool and
then manually discards all regions not in use on the data device. This will
re-sparsify the underlying loopback file and regain space on the host operating system.

This is required in some cases because the discards we do when deleting images and
containers isn't enought to fully free all space unless you have a very new kernel.
See: docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
1ec7116
@alexlarsson alexlarsson added a commit to alexlarsson/docker that referenced this issue Mar 17, 2014
@alexlarsson alexlarsson devicemapper: Add trim-pool driver command
This command suspends the pool, extracts all metadata from the metadata pool and
then manually discards all regions not in use on the data device. This will
re-sparsify the underlying loopback file and regain space on the host operating system.

This is required in some cases because the discards we do when deleting images and
containers isn't enought to fully free all space unless you have a very new kernel.
See: docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
a5d2fed
@alexlarsson alexlarsson added a commit to alexlarsson/docker that referenced this issue Mar 20, 2014
@alexlarsson alexlarsson devicemapper: Add trim-pool driver command
This command suspends the pool, extracts all metadata from the metadata pool and
then manually discards all regions not in use on the data device. This will
re-sparsify the underlying loopback file and regain space on the host operating system.

This is required in some cases because the discards we do when deleting images and
containers isn't enought to fully free all space unless you have a very new kernel.
See: docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
aab5e30
@alexlarsson alexlarsson added a commit to alexlarsson/docker that referenced this issue Mar 28, 2014
@alexlarsson alexlarsson devicemapper: Add trim-pool driver command
This command suspends the pool, extracts all metadata from the metadata pool and
then manually discards all regions not in use on the data device. This will
re-sparsify the underlying loopback file and regain space on the host operating system.

This is required in some cases because the discards we do when deleting images and
containers isn't enought to fully free all space unless you have a very new kernel.
See: docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
a8e1077
@alexlarsson alexlarsson added a commit to alexlarsson/docker that referenced this issue Apr 1, 2014
@alexlarsson alexlarsson devicemapper: Add trim-pool driver command
This command suspends the pool, extracts all metadata from the metadata pool and
then manually discards all regions not in use on the data device. This will
re-sparsify the underlying loopback file and regain space on the host operating system.

This is required in some cases because the discards we do when deleting images and
containers isn't enought to fully free all space unless you have a very new kernel.
See: docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
c75f425
@alexlarsson alexlarsson added a commit to alexlarsson/docker that referenced this issue Apr 4, 2014
@alexlarsson alexlarsson devicemapper: Add trim-pool driver command
This command suspends the pool, extracts all metadata from the metadata pool and
then manually discards all regions not in use on the data device. This will
re-sparsify the underlying loopback file and regain space on the host operating system.

This is required in some cases because the discards we do when deleting images and
containers isn't enought to fully free all space unless you have a very new kernel.
See: docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
f6ac81e
@kontrafiktion

@alexlarsson issue https://bugzilla.redhat.com/show_bug.cgi?id=1043527 has been closed, officially because of "unsufficient data". Does that mean the patch will not make it into the kernel? Is it still needed?

@alexlarsson
Contributor

@vrvolle The patch that makes the workaround that docker uses is already upstream. There doesn't seem to be any upstream work to make that workaround unnecessary though.

@alexlarsson alexlarsson added a commit to alexlarsson/docker that referenced this issue Apr 24, 2014
@alexlarsson alexlarsson devicemapper: Add trim-pool driver command
This command suspends the pool, extracts all metadata from the metadata pool and
then manually discards all regions not in use on the data device. This will
re-sparsify the underlying loopback file and regain space on the host operating system.

This is required in some cases because the discards we do when deleting images and
containers isn't enought to fully free all space unless you have a very new kernel.
See: docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
a9d6afb
@alexlarsson alexlarsson added a commit to alexlarsson/docker that referenced this issue Apr 24, 2014
@alexlarsson alexlarsson devicemapper: Add devicemapper:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
a39c728
@nicolas-van

I still have this problem with docker 0.9 on centos 6.5 with the default 2.6.32 kernel.

I'm not sure to understand what you said previously about this commit into device-mapper. Could you confirm that if I migrate my kernel to 3.8 this bug should be solved?

Thanks in advance.

@alexlarsson
Contributor

@nicolas-van No, you need this commit: torvalds/linux@19fa1a6

It is in 3.14, and in may be in various 3.x.y backports

@Jacq
Jacq commented May 5, 2014

I've installed some time ago docker to build an image and run it within a container. Then some time later I cleared all the images and containers, including the docker application and main folder.
Now I realize that of a total of 4GB/24GB free/used (df -h), the command du / -sh reports only 10GB so another 10Gb are not being accounted for. That is more less the size of the temp images generated with docker, could it be related to this bug. I've used centos 6.5 and docker 0.9.

@Jacq
Jacq commented May 24, 2014

I've removed docker with yum, and devs from /dev/mapper/docker* with dmsetup, and also rm /var/lib/docker -Rf, and still the disk reports with df 10gb used that I cannot find anywhere.

@alexlarsson
Contributor

@Jacq Its possible some file is still kept alive by a process that has a file descriptor open to it. Did you reboot? That would ensure that doesn't happen.

@alexlarsson alexlarsson added a commit to alexlarsson/docker that referenced this issue Jun 2, 2014
@alexlarsson alexlarsson devicemapper: Add devicemapper:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
017fc59
@Dieterbe
Dieterbe commented Jun 4, 2014

I run Linux 3.14.4-1-ARCH, and docker 0.11.1, removed all images and containers.
and the file /var/lib/docker/devicemapper/devicemapper just stuck around, consuming about 1.5GB

here's output from after i was fiddling with some mongodb stuff, i guess the file size must be sparsely allocated because my /var is not even that big.

~/w/e/video_history ❯❯❯ docker info
Containers: 0
Images: 0
Storage Driver: devicemapper
 Pool Name: docker-254:3-585-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 948.4 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 1.0 Mb
 Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.14.4-1-ARCH
WARNING: No swap limit support
~/w/e/video_history ❯❯❯ sudo ls -alh /var/lib/docker/devicemapper/devicemapper/data
-rw------- 1 root root 100G Jun  4 14:35 /var/lib/docker/devicemapper/devicemapper/data
~/w/e/video_history ❯❯❯ sudo  du -shc /var/lib/docker/devicemapper/
1.6G    /var/lib/docker/devicemapper/
1.6G    total
@alexlarsson alexlarsson added a commit to alexlarsson/docker that referenced this issue Jun 9, 2014
@alexlarsson alexlarsson devicemapper: Add dm:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

Conflicts:
	daemon/graphdriver/devmapper/driver.go
5138264
@bolasblack

Excuse me, is the bug fixed now? I did encounter it in gentoo.

@arosboro
arosboro commented Sep 8, 2014

@bolasblack I run gentoo and ran into this issue. Did you figure anything out?

I'm using the latest gentoo-sources which are 3.14.14 for x86_64. I looked at torvalds/linux@19fa1a6?diff and that patch is applied to my sources. I have docker Docker version 1.1.0, build 79812e3.

@alexlarsson Thanks for bringing your attention back to this closed issue after such a long period of time. It seems like it's still causing trouble. Any word on the status of #4202?

@rhatdan rhatdan added a commit to rhatdan/docker that referenced this issue Sep 10, 2014
@rhatdan rhatdan devicemapper: Add dm:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
c5a20e1
@rhatdan rhatdan added a commit to rhatdan/docker that referenced this issue Sep 11, 2014
@rhatdan rhatdan devicemapper: Add dm:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
d694692
@daniellockard

This is still an issue! I think I'm going to switch back to using AUFS for the time being.

@shabbychef

@daniellockard AUFS seems to be unofficially deprecated: #783 and #4704, so good luck with that.

@ccjon
ccjon commented Sep 22, 2014

Yikes... where did that 25GB go? Oh, into that one file.... I am running kernel 3.16.2 f20 64b

The link to the 'workaround' is broken... what is it? and does my kernel support it... if Torvalds committed in 3.14, I suspect fedora should see it in 3.16, no?

@rhatdan rhatdan added a commit to rhatdan/docker that referenced this issue Sep 23, 2014
@rhatdan rhatdan devicemapper: Add dm:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
61fa534
@rhatdan rhatdan added a commit to rhatdan/docker that referenced this issue Sep 23, 2014
@rhatdan rhatdan devicemapper: Add dm:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
2c8a91c
@rhatdan rhatdan added a commit to rhatdan/docker that referenced this issue Sep 24, 2014
@rhatdan rhatdan devicemapper: Add dm:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
a0b33c9
@analytically

+1

@rhatdan rhatdan added a commit to rhatdan/docker that referenced this issue Oct 17, 2014
@rhatdan rhatdan devicemapper: Add dm:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
8aa24db
@rhatdan rhatdan added a commit to rhatdan/docker that referenced this issue Oct 17, 2014
@rhatdan rhatdan devicemapper: Add dm:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
2bdd077
@rhatdan rhatdan added a commit to rhatdan/docker that referenced this issue Oct 17, 2014
@rhatdan rhatdan devicemapper: Add dm:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
a0dd1b4
@rhatdan rhatdan added a commit to rhatdan/docker that referenced this issue Oct 17, 2014
@rhatdan rhatdan devicemapper: Add dm:trim-pool command
This command suspends the pool, extracts all metadata from the
metadata pool and then manually discards all regions not in use on the
data device. This will re-sparsify the underlying loopback file and
regain space on the host operating system.

This is required in some cases because the discards we do when
deleting images and containers isn't enought to fully free all space
unless you have a very new kernel.  See:
docker#3182 (comment)

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
9e024b4
@Zolmeister

+1

@bleib1dj
bleib1dj commented Nov 4, 2014

+1

@Crashthatch

+1 Still seems to be happening on Ubuntu Trusty, 3.13.0-36-generic with Docker version 1.3.1, build 4e9bbfa

@4thAce
4thAce commented Nov 19, 2014

+1 also on Trusty. Is it worth checking on 14.10 which uses 3.16?

@AaronFriel

@SvenDowideit @shykes @vieux @alexlarsson @Zolmeister @creack @crosbymichael @dhrp @jamtur01 @tianon @erikh @LK4D4

Tagging the top committers because this is ridiculous. There needs to be a public conversation about this and an admission from Docker's team that this bug can lead to containers or systems that periodically break and need to be recreated. I already know several people who have had to implement insane devops solutions like periodically reimaging their Docker hosts every week or two because their build bots have so much churn. I introduced this issue close to a year ago and there's been, as near as I can tell, no definitive solution created, and older kernels that are ostensibly supported are not.

Docker team: please do the research to determine which kernel versions are effected, why, and what patch will fix the issue, and document it. Publish that information along with what kernel versions you support, because right now consumers of Docker are getting bit by this issue over and over and over, as evidenced by the fact that I still get emails on this issue every week or two. Seriously, this is a breaking issue and it's been a pain point since before 1.0.

As I see it, there are several possible options to fix this issue in a satisfactory way that would stop the emails I keep getting for +1s on this issue:

  1. Notify users when Device-Mapper is being used on an unsupported kernel and provide them with detailed instructions for how to reclaim space, and if possible automatically set up a process to do this in Docker. I would advise that this notice should also be emitted when using the docker CLI against a host that suffers from this problem, so that when remote managing hosts from the docker CLI, users are made aware that some hosts may not reclaim space correctly.

  2. Fix the problem (somehow). I don't know enough about kernel development to know what this would entail, but, based on my novice reading, I suggest this:

    a. As device mapper is a kernel module, bring a functional, working version of it into the Docker source tree as something like dm-docker

    b. Make sufficient changes to dm-docker that it can coexist with device mapper.

    c. On affected platforms, install the dm-docker kernel module on installation and default to using dm-docker.

  3. Amend your installation docs and the docker.com site to include a warning on affected kernel versions, and add a runtime check to the packages to verify correct device-mapper operation, and if not report it to the user.

This should be a blocking issue for the next stable release of Docker, because it's just plain unacceptable to keep punting on it and leaving users in the lurch.

@damovsky

Personally I see the CoreOS as the only one stable Linux distro for Docker (until this issue is resolved).

Docker Team: I know that this issue is not caused by your component, but please, help us to use your software also on the others Linux distro. It would be fine if you can also mention this issue as a well known limitation of Docker in documentation, so other people won't waste their time.

Thanks!
Martin

@adamdecaf

+1 something needs to be done.

It'd be nice if there was something more apparent for this issue than having to dig into the (closed) github issues list. It took a long time to actually discover that this was the underlying issue and it would have been nice to visibility into this issue.

@snitm
Contributor
snitm commented Nov 19, 2014

Us upstream Device mapper developers (myself and Joe Thornber) had absolutely zero awareness that this issue is still a problem for people. We fixed the issue immediately once we were made aware of it (by @alexlarsson back in Dec 2013) and tagged it for inclusion in all stable kernels at that time, see: http://git.kernel.org/linus/19fa1a6756ed9e9 ("dm thin: fix discard support to a previously shared block")

Joe Thornber was just made aware that @alexlarsson implemented trim-pool in docker go code. When I pointed it out to him he took on implementing a proper standalone 'thin_trim' tool that will get distributed as part of the 'device-mapper-persistent-data' package (at least on Fedora, CentOS, RHEL), see:
jthornber/thin-provisioning-tools@8e92158

SO... all being said, users who are running kernels that don't have upstream commit 19fa1a6756ed9e9 ("dm thin: fix discard support to a previously shared block") need to fix that by running kernels that are better supported. I can easily send a note to stable@vger.kernel.org to backfill the fix to any stable kernels that don't have it though. So please let me know which, if any, stable kernel(s) don't have this fix.

Moving forward we'll want docker to take on periodically running 'thin_trim' against the thin-pool device that docker is using. But we'll cross that bridge once 'thin_trim' is widely available in the distros.

@SvenDowideit
Collaborator

@shabbychef @daniellockard no, AUFs is not deprecated - first up, only one of those issues is closed, and reading on, I'm guessing #783 (comment) is worth reading:

Our initial plan was to phase out aufs completely because devmapper appeared
 to be the best option in 100% of cases. That turned out not to be true, there 
are tradeoffs depending on your situation and so we are continuing to maintain both.
@SvenDowideit
Collaborator

@snitm could you add something to hack/check_config.sh to tell users their kernel doesn't have this patch?

@jessfraz jessfraz reopened this Nov 19, 2014
@snitm
Contributor
snitm commented Nov 19, 2014

@SvenDowideit unfortunately the change in question isn't identified in an arbitrary kernel. For starters commit 19fa1a6756ed9e9 didn't bump the thin or thin-pool targets' version. But even if it did that version will vary across all the stable kernels (and so that is why version bumps within a 'stable' commit are bad.. as it is cause for hand-editing of all kernels the commit gets backported to).

BUT, users that have a thin and thin-pool target version >= 1.12.0 will all have the fix. So kernels >= 3.15. The docker that users are running would also need to include @alexlarsson's docker.git commit 0434a2c ("devmapper: Add blkdiscard option and disable it on raw devices")

@snitm
Contributor
snitm commented Nov 19, 2014

FYI, running 'dmsetup targets' will list the thin and thin-pool target versions (provided the dm-thin-pool kernel module is loaded).

@dcvii
dcvii commented Nov 19, 2014

Thanks for attention on this. We mentioned it to the booth dudes at Re:Invent last week.

@SvenDowideit
Collaborator

@snitm so dmsetup targets output should be added to the check-config output?

@AaronFriel

@snitm

Would it be possible to create an automated test that would create a thin-provisioned device-mapper device, perform some operations on it that would fail to reclaim free space on an unpatched kernel, and report a status code based on that?

@snitm
Contributor
snitm commented Nov 20, 2014

@SvenDowideit you're hoping to catch any new offending kernels before they start making use of DM thin provisioning ontop of loopback?

@AaronFriel seems extremely narrow in scope to be so hung up on this particular fix. There is more to enterprise-grade deployment of DM thin-provisioning than making sure this fix is in place (unfortunate reality). The DM thin provisioning target has seen numerous error handling, performance and feature improvements since commit 19fa1a675 went upstream. All of which are important when deploying DM thin provisioning in production.

SO taking a step back, I happen to be able to enjoy working for a company that understands layered products need to be developed in concert with all the underlying layers. And I have little interest in supporting DM thin provisioning for N arbitrary kernels being paired with docker.

And I happen to strongly believe that nobody should be using DM thin provisioning with loopback devices as the backing devices. That was a complete hack that got merged into docker without proper restraint or concern for "what does this look like in production?".

I've realized that a proper deployment of docker on DM thin provisioning requires the enterprise-oriented management features that lvm2-based thin-provisioning configuration provides. So with that in mind I've steadily worked on making a new hybrid management solution work, see this PR:
#9006

This work depends on:

  1. lvm2 >= 2.02.112
  2. DM thin-pool target version >= v1.14.0 (aka changes staged in linux-next for Linux 3.19 inclusion)

RHEL7.1 will have these changes. And RHELAH (Atomic Host) will too. Any other distros/products that want to properly deploy docker on DM thin-provisioning should too.

@SvenDowideit
Collaborator

@snitm yeah - I'm trying to reduce the number of users that are hit by a misterious breakage, which then takes even more time for someone to realise that it could be this obscure pain, then to try to ask the user to figure out that a lack of some mystery patch is the problem.

and so in context of your remaining info - I want to reduce the number of kernels that will cause this horrid surprise :)

@snitm
Contributor
snitm commented Nov 22, 2014

@SvenDowideit OK, I provided the info that docker (or its users) could use to check for DM thinp on loopback compatibility in this comment: #3182 (comment)

@dbabits
dbabits commented Dec 23, 2014

@snitm, @AaronFriel
I'm running into what seems like this issue on AWS beanstalk, Amazon AMI.
(Running out of space)
Linux ip-172-31-63-145 3.14.23-22.44.amzn1.x86_64 #1 SMP Tue Nov 11 23:07:48 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ sudo docker version
Client version: 1.3.2
Client API version: 1.15
Go version (client): go1.3.3
Git commit (client): c78088f/1.3.2
OS/Arch (client): linux/amd64
Server version: 1.3.2
Server API version: 1.15
Go version (server): go1.3.3
Git commit (server): c78088f/1.3.2

@reiz
reiz commented Jan 6, 2015

I'm still getting this error with Ubuntu 14.04 and Docker 1.4.1.

@sybrandy

Don't count on the thin-pool and thin versions being >= 1.12.0 to mean you're O.K. We're running CentOS 6.5 VMs with the 2.6.32-431.29.2.el6.x86_64 kernel and dmsetup targets reports that thin-pool and thin are both v1.12.0. However, I'm stuck with a 40GB data file that isn't freeing itself up.

@LinuXY
LinuXY commented Jan 17, 2015

What are the implications running this on CentOS / RHEL 6.5 with this bug? Are you OK with >=100G of free space. I assume this eventually fills any size disk? Or is it bounded at 100G?

@jperrin
Contributor
jperrin commented Jan 20, 2015

Keep in mind that 6.5 does not have the newest kernel available to centos. I would recommend a simple 'yum update' to 6.6 and a reboot to test with the 2.6.32-504.3.3 kernel.

@ripienaar

Confirm the very latest kernel and distro updates for CentOS 6 works fine and releases space.

@sybrandy

ripienaar: Can you explicitly state which CentOS 6 and kernel version you are using? I just want to make sure that when I pass the info along, I have all of the info I need.

thanks.

@jbiel
Contributor
jbiel commented Jan 20, 2015

As @reiz commented above, this is happening with latest Ubuntu 14.04 + latest docker. dmsetup targets shows thin/thin-pool of v1.9.0 on one of our instances (no active docker containers.) dmsetup targets doesn't show any thin entries on a similar instance with active docker containers. (I'm not really asking for help on this, just adding (hopefully useful) data.)

@ripienaar

Basic OS stuff:

# uname -a
Linux vagrant-centos65 2.6.32-504.3.3.el6.x86_64 #1 SMP Wed Dec 17 01:55:02 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
CentOS release 6.6 (Final)

# dmsetup targets
thin-pool        v1.14.0
thin             v1.14.0
mirror           v1.12.0
striped          v1.5.6
linear           v1.1.0
error            v1.2.0

# dmsetup status
docker-8:1-393542-pool: 0 209715200 thin-pool 92343 178/524288 4664/1638400 - rw discard_passdown queue_if_no_space

Usage:

# du -h /var/lib/docker/
2.7G    /var/lib/docker/devicemapper/devicemapper
# docker rmi b39b81afc8ca
# du -h /var/lib/docker/
2.5G    /var/lib/docker/devicemapper/devicemapper

Previous to getting onto that kernel it would not recover space.

@lexinator

@jperrin mentions this be resolved using centos kernel 2.6.32-504.3.3. is it known which kernel commit(set of commits?) that resolve this issue?

I am currently using one of Oracle Enterprise Linux 3.8 UEK.

@donrudo
donrudo commented Jan 27, 2015

Is there any workaround for this situation? at AWS is easy to just re-create the instance but at the local development environment is not good to notice the /var partition is owned in a 90% by device-mapper because of docker; soon won't have space even for logs or journals

@ripienaar

@donrudo as above - update to latest centos 6 - the 504.3.3 kernel.

@donrudo
donrudo commented Jan 27, 2015

That's OK for centos but some of our devs are using Fedora and others OpenSuse.

@snitm
Contributor
snitm commented Jan 27, 2015

All DM thin provisioning changes are sent upstream first. So if Centos 6 is fixed it is also fixed upstream. The various distros need to pull fixes back or rebase more aggressively.

@lexinator

@ripienaar I'll ask again, do you know which kernel commit(or set of commits) that are required for the fix?

@rhvgoyal
Contributor

Loop driver uses fallocate() to punch a hole in file when discard comes in. Linux man page of fallocate() verifies that it is not supported on ext3.

/*
Not all filesystems support FALLOC_FL_PUNCH_HOLE; if a filesystem doesn't support the operation, an error is returned. The operation is supported on at least the following filesystems:
* XFS (since Linux 2.6.38)
* ext4 (since Linux 3.0)
* Btrfs (since Linux 3.7)
* tmpfs (since Linux 3.5)
*/

@rhvgoyal
Contributor

@thaJeztah

If somebody wants to run docker on ext3, I guess we should allow that. Just that they will not get discard support hence loop file size will not shrink when image/containers are deleted.

@thaJeztah
Member

@rhvgoyal how about showing this in docker info? Similar to the Udev Sync Supported output.

@rhvgoyal
Contributor

@thaJeztah

We already have an entry in docker info "Backing filesystem". I am not sure why does it say extfs instead of being precise about ext2/ext3/ext4.

May be a warning at startup time in logs should do here. Something similar to warning about using loop devices for thin pool.

@rhvgoyal
Contributor

@thaJeztah

And I think we should do it only if lot of people are impacted by this. If not, then we are just creating more work and code and it might not be worth it.

@vbatts
Contributor
vbatts commented Jun 16, 2015

the syscall.Statfs_t struct, with the Type field on ext2, ext3, and ext4 all return 0xef53, and that is what we are using to detect filesystem magic.

@thaJeztah
Member

the syscall.Statfs_t struct, with the Type field on ext2, ext3, and ext4 all return 0xef53, and that is what we are using to detect filesystem magic.

Bummer. Would have been good to have that information to make it easier to identify reported issues.

I guess, we should just close this then?

@vbatts
Contributor
vbatts commented Jun 16, 2015

closing as this is an issue with ext3 being outdated. Please use ext4 or xfs

@vbatts vbatts closed this Jun 16, 2015
@dbabits
dbabits commented Jun 16, 2015

do you think this could be better documented in the main Docker documentation somehow?

Whether the issue is with the filesystem or not, look at how much confusion this has generated across two recent issues.

Thanks.

@dbabits
dbabits commented Jun 16, 2015

reterring to issue #9786

@thaJeztah
Member

@dbabits yes, I think mentioning that ext3 has some issues could be worth mentioning in the docs. No idea yet what an appropriate location would be.

@tomlux
tomlux commented Jul 17, 2015

Have same/similar problem with XFS.
Was on kernel "3.10.0-123.el7.x86_64", but updated now on "3.10.0-229.el7.x86_64".

Eveything is deleted (container, images) but data still contains 100GB.
Any ideas, help?

[root@docker0101 ~]# ls -alh /data/docker/devicemapper/devicemapper/
total 80G
drwx------ 2 root root 32 Jun 8 16:48 .
drwx------ 5 root root 50 Jun 9 07:16 ..
-rw------- 1 root root 100G Jul 16 21:33 data
-rw------- 1 root root 2.0G Jul 17 09:20 metadata

[root@docker0101 ~]# uname -a
Linux docker0101 3.10.0-229.7.2.el7.x86_64 #1 SMP Tue Jun 23 22:06:11 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

[root@docker0101 ~]# cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)

[root@docker0101 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

[root@docker0101 ~]# docker images -a
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE

[root@docker0101 ~]# docker info
Containers: 0
Images: 0
Storage Driver: devicemapper
Pool Name: docker-253:0-268599424-pool
Pool Blocksize: 65.54 kB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 85.61 GB
Data Space Total: 107.4 GB
Data Space Available: 40.91 MB
Metadata Space Used: 211.4 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 40.91 MB
Udev Sync Supported: true
Data loop file: /data/docker/devicemapper/devicemapper/data
Metadata loop file: /data/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Kernel Version: 3.10.0-229.7.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 4
Total Memory: 11.58 GiB
Name: docker0101

[root@docker0101 ~]# dmsetup table
vg1-lvol0: 0 167772160 linear 8:16 2048
docker-253:0-268599424-pool: 0 209715200 thin-pool 7:1 7:0 128 32768 1 skip_block_zeroing

[root@docker0101 ~]# dmsetup status
vg1-lvol0: 0 167772160 linear
docker-253:0-268599424-pool: 0 209715200 thin-pool 71359 51606/524288 1306264/1638400 - ro discard_passdown queue_if_no_space

[root@docker0101 ~]# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS=

[root@docker0101 ~]# rpm -qi docker
Name : docker
Version : 1.6.2
Release : 14.el7.centos
Architecture: x86_64
Install Date: Fri 17 Jul 2015 09:19:54 AM CEST
Group : Unspecified
Size : 33825026
License : ASL 2.0
Signature : RSA/SHA256, Wed 24 Jun 2015 05:43:12 AM CEST, Key ID 24c6a8a7f4a80eb5
Source RPM : docker-1.6.2-14.el7.centos.src.rpm
Build Date : Wed 24 Jun 2015 03:52:32 AM CEST
Build Host : worker1.bsys.centos.org

[root@docker0101 ~]# ls -al /var/lib/docker
lrwxrwxrwx 1 root root 12 Jun 9 08:21 /var/lib/docker -> /data/docker

[root@docker0101 ~]# mount
/dev/sda5 on /var type xfs (rw,relatime,attr2,inode64,noquota)
/dev/mapper/vg1-lvol0 on /data type xfs (rw,relatime,attr2,inode64,noquota)

@jperrin
Contributor
jperrin commented Jul 17, 2015

@tomlux The devicemapper loopback mode you're using is mostly meant as a way to easily toy around with docker. for serious work, the loopback will be slower, and will have some limitations. I'd very highly recommend having a read over http://www.projectatomic.io/docs/docker-storage-recommendation/

You'll get better performance, and won't hit things like this, assuming you've applied all the system updates.

@rhvgoyal
Contributor

@tomlux

can you add "-s" to ls. That will give you actual blocks allocated to data and metadata files. Right now it is showing apparent size of the files.

docker info output is intriguing though. It seems to show high usage.

@tomlux
tomlux commented Jul 17, 2015

ocr

@rhvgoyal
Contributor

@tomlux

So actual size of data and metadata files looks small. You can use "ls -alsh" so that sizes are more readable.

So data file size seems to be around 79MB and metadata file size is around 202KB.

I think somehow thin pool's stats about number of blocks used does not look right. Does a reboot of the machine fix the problem?

@tomlux
tomlux commented Jul 17, 2015

I did a reboot after the kernel update without success.

image

@rhvgoyal
Contributor

Ok, so loop files are big and pool thinks it has lots of used blocks. So something is using those blocks. Can you give me output of.

  • docker ps -a
  • docker images
  • ls /var/lib/dokcer/devicemapper/metadata/ | wc -l
  • shutdown docker and run following.
    thin_dump /var/lib/docker/devicemapper/devicemapper/metadata | grep "device dev_id" | wc -l

This will tell how many thin devices are there in the pool.

@tomlux
tomlux commented Jul 17, 2015

Hmm,
now we hae DEAD containers but no images.
Did already a reboot.

Seems to be something wrong with the devicemapper.
I don't want to spam this issue.
I can also "rm -f /var/lib/docker" and rebuild my containers. Everything is scripted.

image

@rhvgoyal
Contributor

do "dmsetup status"

Looks like pool is in bad shape and most likely it needs to be repaired.

@tomlux
tomlux commented Jul 17, 2015

image

@swachter

Hi,

I seemed to have the same problem under Ubuntu 14.04. However the cause where unwanted volumes (cf. blog http://blog.yohanliyanage.com/2015/05/docker-clean-up-after-yourself/). Running the command

docker run -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/docker:/var/lib/docker --rm martin/docker-cleanup-volumes

released a lot of disk space.

@bkeroackdsc

This isn't fixed in any meaningful way. On a fully up-to-date Ubuntu 14.04.x installation (the latest LTS release) and with the latest version of Docker (installed via $ wget -qO- https://get.docker.com/ | sh), Docker will continuously leak space with no easy way to reclaim. docker stop $(docker ps -q) && docker rm $(docker ps -q -a) && docker rmi $(docker images -q) only releases a small amount of space.

The only way to reclaim all space is with the following hack:

$ sudo service docker stop
$ sudo rm -rf /var/lib/docker
$ sudo service docker start

Which then requires re-pulling any images you might need.

@thaJeztah
Member

@bkeroackdsc could that space also be related to "orphaned" volumes?

@sagiegurari

I agree with @bkeroackdsc this is not solved.
I asked @rhvgoyal why he wanted to close this case so much.
at the end, I dropped from docker specifically because of this issue.
it is killing this product.

orphaned or not orphaned, there is no good, easy, way to cleanup space.
there needs to be a docker cli option for cleanup, a good status report cli option and also some sort of monitoring for the diskspace as this issue happens to too many people on many platforms.

@thaJeztah
Member

@bkeroackdsc @sagiegurari

Reason I'm asking is that orphaned volumes are not related to this issue,
not related to devicemapper.

Orphaned volumes are not a bug, but a misconception that volumes defined
for a container are automatically deleted when the container is deleted.
This is not the case (and by design), because volumes can contain data
that should persist after a container is deleted.

To delete a volume together with a container, use docker rm -v [mycontainer].
Volume management functions will be added (see #14242 and #8363),
and will allow you to manage "orphaned" volumes.

A growing size of /var/lib/docker does not have to be an indication
that devicemapper is leaking data, because a grow in size of that directory
can also be the result of (orphaned) volumes that have not been cleaned up
by the user (docker stores all its data in that path).

@sagiegurari

I really hope those 2 items will give the needed capabilities.
I did understand the second item, but the first (#14242), doesn't explain at the top what it is about apart of that its a volume api (meaning, not sure what capabilities it gives).

@thaJeztah
Member

@sagiegurari it's part of the requirements to implement image volume management (there are some other open PR's issues). End goals is to make volumes a first-class citizen in Docker, that can be created/deleted/management separate of the containers that make use of them.

@rxacevedo

@swachter Thanks for posting that workaround, I reclaimed 6GB with the aforementioned image.

@rxacevedo rxacevedo added a commit to seibelsbi/docker-pentaho-ee that referenced this issue Aug 18, 2015
@rxacevedo rxacevedo eagerly rm zips to save space (docker/docker#3182) cfa0a41
@nathwill
Contributor

we fixed the leaky volume issue with a dirty hack, since it was preventing the docker daemon from starting up before the service timed out on our high-churn docker hosts:

PRODUCTION [root@ws-docker01.prod ~]$ cat /etc/systemd/system/docker.service.d/docker-high-churn.conf 
[Service]
ExecStartPre=-/bin/rm -rf /var/lib/docker/containers
ExecStopPost=-/bin/rm -rf /var/lib/docker/volumes

which fixes the issue without flushing the pre-cached images.

@rhvgoyal
Contributor

Can we discuss the issue of leaky volumes in a separate issue. Discussing it here gives the impression that it is a device mapper issue while it is not.

@brian-dlee

@tomlux @rhvgoyal

Did you ever come to a conclusion on what occurring in the CentOS 7 box? My docker host is nearly identical, and I was experiencing the same issue. I followed through to the point in which @rhvgoyal asked to run the thin_dump command. After, I went to start the docker daemon up and it wouldn't start. I've just deleted /var/lib/docker and restarted since then, but I just wanted to know if a resolution was found as I (as well as others) may run into it again.

[root@Docker_Sandbox_00 devicemapper]# thin_dump /var/lib/docker/devicemapper/devicemapper/metadata | grep "device dev_id" | wc -l
102
[root@Docker_Sandbox_00 devicemapper]# systemctl start docker
Job for docker.service failed. See 'systemctl status docker.service' and 'journalctl -xn' for details.
 [root@Docker_Sandbox_00 devicemapper]# systemctl -l status docker.service
docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled)
   Active: failed (Result: exit-code) since Tue 2015-10-27 08:24:47 PDT; 37s ago
     Docs: https://docs.docker.com
  Process: 45244 ExecStart=/usr/bin/docker daemon -H fd:// (code=exited, status=1/FAILURE)
 Main PID: 45244 (code=exited, status=1/FAILURE)

Oct 27 08:24:45 Docker_Sandbox_00 systemd[1]: Starting Docker Application Container Engine...
Oct 27 08:24:46 Docker_Sandbox_00 docker[45244]: time="2015-10-27T08:24:46.512617474-07:00" level=info msg="[graphdriver] using prior storage driver \"devicemapper\""
Oct 27 08:24:46 Docker_Sandbox_00 docker[45244]: time="2015-10-27T08:24:46.526637164-07:00" level=info msg="Option DefaultDriver: bridge"
Oct 27 08:24:46 Docker_Sandbox_00 docker[45244]: time="2015-10-27T08:24:46.526719113-07:00" level=info msg="Option DefaultNetwork: bridge"
Oct 27 08:24:46 Docker_Sandbox_00 docker[45244]: time="2015-10-27T08:24:46.589016574-07:00" level=warning msg="Running modprobe bridge nf_nat br_netfilter failed with message: modprobe: WARNING: Module br_netfilter not found.\n, error: exit status 1"
Oct 27 08:24:46 Docker_Sandbox_00 docker[45244]: time="2015-10-27T08:24:46.625324632-07:00" level=info msg="Firewalld running: true"
Oct 27 08:24:47 Docker_Sandbox_00 docker[45244]: time="2015-10-27T08:24:47.142468904-07:00" level=fatal msg="Error starting daemon: Unable to open the database file: unable to open database file"
Oct 27 08:24:47 Docker_Sandbox_00 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Oct 27 08:24:47 Docker_Sandbox_00 systemd[1]: Failed to start Docker Application Container Engine.
Oct 27 08:24:47 Docker_Sandbox_00 systemd[1]: Unit docker.service entered failed state.
[root@Docker_Sandbox_00 devicemapper]# df -ah
Filesystem                                  Size  Used Avail Use% Mounted on
/dev/vdb1                                    20G   20G   24K 100% /var/lib/docker
[root@Docker_Sandbox_00 devicemapper]# du -sh /var/lib/docker/devicemapper/devicemapper/data
20G     /var/lib/docker/devicemapper/devicemapper/data
@brian-dlee

@tomlux @rhvgoyal

I found the source of my issue. Despite how it appeared, it was unrelated to the issues in the thread. It's just stemmed from a misunderstanding with how docker worked. All, the dead containers were still holding on to the disk space they had allocated during their execution. I just had to remove all the container carcasses to free up the disk space. I do remember this coming up in this thread, but I thought is only regarded mounted volumes, not the disk space allocated by the container.

# Beware this removes ALL containers
docker rm -f $(docker ps -aq) 

@tomlux This may have been your issue as well since your output of docker ps -a showed several Dead containers.

@fgimenez fgimenez added a commit to fgimenez/snappy-jenkins that referenced this issue Jan 19, 2016
@fgimenez fgimenez Using upstream docker over wily
Also symlinked /var/lib/docker to /mnt to prevent exhausting the base disk
because of docker/docker#3182
7cd5f18
@fgimenez fgimenez referenced this issue in ubuntu-core/snappy-jenkins Jan 19, 2016
Merged

Using upstream docker over wily #48

@fgimenez fgimenez added a commit to fgimenez/snappy-jenkins that referenced this issue Jan 19, 2016
@fgimenez fgimenez Using upstream docker over wily
Also symlinked /var/lib/docker to /mnt to prevent exhausting the base disk
because of docker/docker#3182
a924089
@fgimenez fgimenez added a commit to fgimenez/snappy-jenkins that referenced this issue Jan 19, 2016
@fgimenez fgimenez Using upstream docker over wily
Also symlinked /var/lib/docker to /mnt to prevent exhausting the base disk
because of docker/docker#3182
2ffbb5a
@fgimenez fgimenez added a commit to fgimenez/snappy-jenkins that referenced this issue Jan 20, 2016
@fgimenez fgimenez Using upstream docker over wily
Also symlinked /var/lib/docker to /mnt to prevent exhausting the base disk
because of docker/docker#3182
0519a94
@awolfe-silversky

docker rm is not freeing container's disk space

boot2docker recent in OS X VirtualBox. OS X fully patched.

I'm building a fat (47 GB) container and it has a problem indicating I should rebuild the container. So I stopped the container and did docker rm. Double-checking using docker ssh 'df -h', I find disk usage is still 47 GB. The container has 75 GB.

So I'm going to need to kill the docker VM again.

Can we get this done?

@thaJeztah
Member

@awolfe-silversky is disk-space inside the VM returned? If it's outside the VM, this may be unrelated.

@thaJeztah
Member

@awolfe-silversky also; did you remove the image as well? removing just the container may not help much if the image is still there

@SvenDowideit
Collaborator

@awolfe-silversky this issue is about devicemapper - and if you're using docker-machine/boot2docker, then you're much more likely to be running aufs. I also wonder if you've docker rmi'd your big image.

its worth running docker images, and docker info to see if things really are as dire as you make it sound :)

(yes, if you still have the vm, and the image is removed, then we should open a new issue and debug further, as you've found a weird corner case)

@awolfe-silversky

I did not remove the image. It is from an internal docker registry.
I used an awk script to size the images - total 6.9 GB.

docker images -a | awk '(FNR > 1) { imgSpace = imgSpace + $(NF - 1); }
END { print "Image space is " imgSpace; }'
Image space is 6909.01

It's dirty but I know that all the image sizes are in MB.

Here's how I was trying to diagnose usage:

 file:///Andrew-Wolfe-MacBook-Pro.local/Users/awolfe/DataStores
awolfe_10063: docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

 file:///Andrew-Wolfe-MacBook-Pro.local/Users/awolfe/DataStores
awolfe_10064: docker-machine ssh 'awolfe-dbhost' 'df -h'
Filesystem                Size      Used Available Use% Mounted on
tmpfs                     7.0G    123.8M      6.9G   2% /
tmpfs                     3.9G         0      3.9G   0% /dev/shm
/dev/sda1                71.0G     47.2G     20.2G  70% /mnt/sda1
cgroup                    3.9G         0      3.9G   0% /sys/fs/cgroup
none                    464.8G    379.9G     84.8G  82% /Users
/dev/sda1                71.0G     47.2G     20.2G  70% /mnt/sda1/var/lib/docker/aufs
@ror6ax
ror6ax commented Mar 14, 2016

Right now I have box with 0 images.
docker volume ls -qf dangling=true shows nothing.
docker volume ls shows a lot of volumes, which are, by definition, orphaned, since there's no-images to own them.
docker volume rm $(docker volume ls) shows lots of such messages:

Error response from daemon: get local: no such volume
Error response from daemon: Conflict: remove 6989acc79fd53d26e3d4668117a7cb6fbd2998e6214d5c4843ee9fceda66fb14: volume is in use - [77e0eddb05f2b53e22cca97aa8bdcd51620c94acd2020b04b779e485c7563c57]

Device mapper directory eats up 30 GiG.
Docker version 1.10.2, build c3959b1
CentOS 7, 3.10.0-327.10.1.el7.x86_64

@ror6ax
ror6ax commented Mar 14, 2016
Data Space Used: 33.33 GB
 Data Space Total: 107.4 GB
 Data Space Available: 915.5 MB
 Metadata Space Used: 247 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 915.5 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2015-12-01)

Also, why is default installation uses 'strongly discouraged' storage option?
Why wasn't I told so at installation?

@ir-fuel
ir-fuel commented Apr 23, 2016 edited

I have exactly the same problem here on an Amazon Linux EC2 instance.

Linux ip-172-31-25-154 4.4.5-15.26.amzn1.x86_64 #1 SMP Wed Mar 16 17:15:34 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

On instances where I install new docker images on a regular basis the only solution is to do the following:

service docker stop
yum remove docker -y
rm -rf /var/lib/docker
yum install docker -y
service docker start

I don't really think such a thing is acceptable in a production environment

some extra info:

df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       20G   20G     0 100% /
@mercuriete
mercuriete commented May 21, 2016 edited

As this bug is there for years and is seem that is not closed yet, could you put in the docker documentation about devidemapper how to destroy safety all docker information?
i mean, in this page: https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/
put something like "Cleaning device mapper" and how to do it.

I will try to do rm -rf /var/lib/docker but I'm not feel confortable doing that. Can somebody tell me if is safe?

I am using gentoo linux in my daily laptop and I tried docker for learning but is filling up my disk and reinstall the whole system is not an option because is not a VM and reinstall gentoo takes time.

Thank you for your work.

@ir-fuel
ir-fuel commented May 21, 2016

@mercuriete On your dev machine just uninstall docker, delete the directory and reinstall it. Works fine.

@aleks-f
aleks-f commented May 22, 2016

@ir-fuel: I just did that and now I have this:

$ sudo service docker-engine start
Redirecting to /bin/systemctl start  docker-engine.service
Failed to start docker-engine.service: Unit docker-engine.service failed to load: No such file or directory.
$ uname -a
Linux CentOS7 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
@ir-fuel
ir-fuel commented May 22, 2016

I'm using service docker start

@mercuriete

@ir-fuel thanks, it works fine. πŸ‘

@asarkar
asarkar commented Jun 22, 2016

Reinstalling docker to release disk space is the most ridiculous answer that I came across while looking for a solution for this issue. Not only that's a waste of time, it's not even allowed in most environments. It's a good way to get paid if you're an hourly worker.

@ir-fuel
ir-fuel commented Jun 23, 2016

I completely agree. It's amazing that a product like Docker just keeps on eating away disk space with nothing you can do about it except for uninstall/reinstall.

@ror6ax
ror6ax commented Jun 23, 2016

Checking in to this issue yet another time to see nothing has changed. +1

@awolfe-silversky

This issue is marked closed. We need a resolution. No workaround, no reconfiguration. What is the real status, and what are the configuration settings that are implicated? Dropping and recreating a production Docker node is not acceptable.

@kpande
kpande commented Jun 23, 2016

workaround is to avoid using the docker device-mapper driver, unfortunately.

@ir-fuel
ir-fuel commented Jun 23, 2016

And what is the alternative? How do we go about avoiding this?

@ror6ax
ror6ax commented Jun 23, 2016

If this is not recommended feature to use - why is it silent and default
one? If you do not care for people using devicemapper - I might be even ok
with this. But do inform the user about it! Do you realize the amount of
headache people have due to this amazing 'workaround' you settled on??
On 23 Jun 2016 4:32 p.m., "kpande" notifications@github.com wrote:

workaround is to avoid using the docker device-mapper driver,
unfortunately.

β€”
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#3182 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ACRlBbL5BDn3qMzUW_UiaALB32anUER4ks5qOpkJgaJpZM4BTlBd
.

@kpande
kpande commented Jun 23, 2016 edited

no one forced you to use Docker. there is a lot of learning that should be done before you configure your production server, it's a valid question, why did these issues make it through your test environment into prod?

@blalor
blalor commented Jun 23, 2016

There's always rkt…

@ripienaar

the general bitchy unhelpful snark is no doubt why no-one from upstream cares to give you proper answers.

@asarkar
asarkar commented Jun 24, 2016

no one forced you to use Docker.

That's like Oracle telling a Java developer to use PHP due to a JVM bug. That's also not consistent with the elevator pitch here

Three years ago, Docker made an esoteric Linux kernel technology called containerization simple and accessible to everyone.

I'm sure a lot of people are grateful that Docker took off like it did and that couldn't have happened without volunteering from the community. However, it shouldn't be this hard to admit that it has it's problems too without implicitly dropping the "I'm a upstream contributor so shut up and listen" line whenever someone brings up an unlikable point.

@ror6ax
ror6ax commented Jun 24, 2016

Wait. I did report an issue, provided the details of my machine and setup,
which I'm not obliged to. None of devteam responded to my and others bug
reports in half a year period. Now I stated this fact, you call my behavior
bitchy? Do you even open-source? I'm looking for Go project to work on, and
it will not be Docker, I give you that. Is this your goal?
On 23 Jun 2016 16:45, "gregory grey" ror6ax@gmail.com wrote:

If this is not recommended feature to use - why is it silent and default
one? If you do not care for people using devicemapper - I might be even ok
with this. But do inform the user about it! Do you realize the amount of
headache people have due to this amazing 'workaround' you settled on??
On 23 Jun 2016 4:32 p.m., "kpande" notifications@github.com wrote:

workaround is to avoid using the docker device-mapper driver,
unfortunately.

β€”
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#3182 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ACRlBbL5BDn3qMzUW_UiaALB32anUER4ks5qOpkJgaJpZM4BTlBd
.

@thaJeztah
Member
thaJeztah commented Jun 24, 2016 edited

First of all, if you still have this problem, please open a new issue;

Wait. I did report an issue

You replied on a 3 year old, closed issue; following the discussion above, the original issue was resolved. Your issue may be the same, but needs more research to be sure; the errors you're reporing indicate that it may actually be something else.

I really recommend to open a new issue, not commenting on a closed issue

provided the details of my machine and setup, which I'm not obliged to.

You're not obliged to, but without any information to go on, it's unlikely to be resolved. So, when reporting a bug, please include the information that's asked for in the template:
https://raw.githubusercontent.com/docker/docker/master/.github/ISSUE_TEMPLATE.md

None of devteam responded to my and others bug reports in half a year period.

If you mean "one of the maintainers", please keep in mind that there's almost 24000 issues and PRs, and less than 20 maintainers, many of whom doing that besides their daily job. Not every comment will be noticed especially if it's on a closed issue.

If this is not recommended feature to use - why is it silent and default one?

It's the default if aufs, btrfs, and zfs are not supported, you can find the priority that's used when selecting drivers; see daemon/graphdriver/driver_linux.go. It's still above overlay, because unfortunately there's some remaining issues with that driver that some people may be affected by.

Automatically selecting a graphdriver is just to "get things running"; the best driver for your situation depends on your use-case. Docker cannot make that decision automatically, so this is up to the user to configure.

If you do not care for people using devicemapper - I might be even ok with this.

Reading back the discussion above, I see that the upstream devicemapper maintainers have looked into this multple times, trying to assist users reporting these issues, and resolving the issue. The issue was resolved for those that reported it, or in some cases, depended on distros updating devicemapper versions. I don't think that can be considered "not caring".

Also, why is default installation uses 'strongly discouraged' storage option?

Running on loop devices is fine for getting docker running, and currently the only way to set up devicemapper automatically. For production, and to get a better performance overall, use direct-lvm, as explained in the devicemapper section in the storage driver user guide.

Why wasn't I told so at installation?

That's out of scope for the installation, really. If you're going to use some software in production, it should be reasonable to assume that you get yourself familiar with that software, and know what's needed to set it up for your use case. Some maintainers even argued if the warning should be output at all. Linux is not a "holding hands" OS (does your distro show a warning that data loss can occur if you're using RAID-0? If you have ports opened in your Firewall?)

@shadowmint

Deeply reluctant as I am, to once again resurrect this ancient thread, there is still no meaningful advice in it about how to work around this issue on an existing machine encountering this issue.

This is my best effort at a tldr; for the entire thread; I hope it helps others who find this thread.

Issue encountered

Your volume has a significant (and growing) amount of space which is in /var/lib/docker and you're using ext3.

Resolution

You're out of luck. Upgrade your file system or see blowing docker away at the bottom.

Issue encountered

Your volume has a significant (and growing) amount of space which is in /var/lib/docker and you're not using ext3 (eg. system currently using xfs or ext4)

Resolution

You may be able to reclaim space on your device using standard docker commands.

Read http://blog.yohanliyanage.com/2015/05/docker-clean-up-after-yourself/

Run these commands:

docker volume ls
docker ps
docker images

If you have nothing listed in any of these, see blowing docker away at the bottom.

If you see old stale images, unused containers, etc. you can perform manual cleanup with:

# Delete 'exited' containers
docker rm -v $(docker ps -a -q -f status=exited)

# Delete 'dangling' images
docker rmi $(docker images -f "dangling=true" -q)

# Delete 'dangling' volumes
docker volume rm $(docker volume ls -qf dangling=true)

This should reclaim much of the hidden container space in the devicemapper.

Blowing docker away

Didn't work? You're out of luck.

Your best bet at this point is:

service docker stop
rm -rf /var/lib/docker
service docker start

This will destroy all your docker images. Make sure to export ones you want to keep before doing this.

Ultimately, please read https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#configure-direct-lvm-mode-for-production; but I hope this will assist others who find this thread.

If you have problems with using advice above open a new ticket that specifically addresses the issue you encounter and link to this issue; do not post it here.

@gg7
gg7 commented Nov 4, 2016

rm -rf /var/lib/docker

You can also use nuke-graph-directory.sh.

@openstack-gerrit openstack-gerrit pushed a commit to openstack/fuel-qa that referenced this issue Nov 22, 2016
@avgoor avgoor [8.0] Reboot procedure fixed
Added the workaround to avoid the thin pool exhaustion probably caused
by the issue docker/docker#3182

Change-Id: I2edc74189872754383bf9a1f3215490d46e38b27
3282812
@machard machard referenced this issue in lambci/ecs Jan 17, 2017
Open

Related docker "Thin top" issue #10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment