Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker rm performance drop since 1.8.2 #16281

Closed
ceecko opened this issue Sep 14, 2015 · 9 comments
Closed

docker rm performance drop since 1.8.2 #16281

ceecko opened this issue Sep 14, 2015 · 9 comments

Comments

@ceecko
Copy link

ceecko commented Sep 14, 2015

Hi,

I noticed that after an upgrade from docker 1.7.1 to 1.8.2 the performance of docker rm command has dropped significantly. I managed to reproduce the behavior on two identical AWS machines (m4.large):

# cat /etc/centos-release
CentOS Linux release 7.1.1503 (Core)

# uname -a
Linux ip-172-31-20-201.eu-west-1.compute.internal 3.10.0-229.11.1.el7.x86_64 #1 SMP Thu Aug 6 01:06:18 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

# lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda                         202:0    0    8G  0 disk
└─xvda1                      202:1    0    8G  0 part /
loop0                          7:0    0  100G  0 loop
└─docker-202:1-25219536-pool 253:0    0  100G  0 dm
loop1                          7:1    0    2G  0 loop
└─docker-202:1-25219536-pool 253:0    0  100G  0 dm

On v1.7.1 it takes ~1.2s to remove a container.

# docker version
Client version: 1.7.1
Client API version: 1.19
Package Version (client): docker-1.7.1-108.el7.centos.x86_64
Go version (client): go1.4.2
Git commit (client): 3043001/1.7.1
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Package Version (server): docker-1.7.1-108.el7.centos.x86_64
Go version (server): go1.4.2
Git commit (server): 3043001/1.7.1
OS/Arch (server): linux/amd64

# docker -D info
Containers: 0
Images: 2
Storage Driver: devicemapper
 Pool Name: docker-202:1-25219536-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 459.8 MB
 Data Space Total: 107.4 GB
 Data Space Available: 6.934 GB
 Metadata Space Used: 860.2 kB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.147 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.10.0-229.11.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 2
Total Memory: 7.389 GiB
Name: ip-172-31-21-188.eu-west-1.compute.internal
ID: JSWW:5A52:NGZU:FM4K:E3F4:EGL3:77AR:DCZM:W6FS:ZBMS:SBA3:ZXR6

# DATE1=$(date +"%s%3N"); docker run --rm debian:jessie echo ""; echo "Took $(($(date +"%s%3N") - $DATE1))ms"
Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.

Took 1214ms

On v1.8.2 it takes ~6.6s to remove a container.

# docker version
Client:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:08:45 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:08:45 UTC 2015
 OS/Arch:      linux/amd64

# docker -D info
Containers: 0
Images: 2
Storage Driver: devicemapper
 Pool Name: docker-202:1-58183-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 1.973 GB
 Data Space Total: 107.4 GB
 Data Space Available: 5.361 GB
 Metadata Space Used: 1.61 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.146 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.10.0-229.11.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 2
Total Memory: 7.389 GiB
Name: ip-172-31-20-201.eu-west-1.compute.internal
ID: W3WU:5JBX:L6FH:NMNF:BGGY:VXOG:37FA:5N4Y:YAEZ:6LWB:6UGF:JU46

# DATE1=$(date +"%s%3N"); docker run --rm debian:jessie echo ""; echo "Took $(($(date +"%s%3N") - $DATE1))ms"

Took 6651ms

The logs from v1.8.2 don't show any errors:

Sep 14 17:21:49 ip-172-31-20-201 docker: time="2015-09-14T17:21:49.319018994Z" level=info msg="POST /v1.20/containers/create"
Sep 14 17:21:49 ip-172-31-20-201 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Sep 14 17:21:49 ip-172-31-20-201 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Sep 14 17:21:49 ip-172-31-20-201 docker: time="2015-09-14T17:21:49.474291469Z" level=info msg="POST /v1.20/containers/aa12a680eea65b6d5c90a9dd448735e569435210238966152f378da9e3933cc3/attach?stderr=1&stdout=1&stream=1"
Sep 14 17:21:49 ip-172-31-20-201 docker: time="2015-09-14T17:21:49.474892538Z" level=info msg="POST /v1.20/containers/aa12a680eea65b6d5c90a9dd448735e569435210238966152f378da9e3933cc3/start"
Sep 14 17:21:49 ip-172-31-20-201 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Sep 14 17:21:49 ip-172-31-20-201 kernel: device veth52edf8b entered promiscuous mode
Sep 14 17:21:49 ip-172-31-20-201 kernel: IPv6: ADDRCONF(NETDEV_UP): veth52edf8b: link is not ready
Sep 14 17:21:49 ip-172-31-20-201 kernel: docker0: port 1(veth52edf8b) entered forwarding state
Sep 14 17:21:49 ip-172-31-20-201 kernel: docker0: port 1(veth52edf8b) entered forwarding state
Sep 14 17:21:49 ip-172-31-20-201 kernel: docker0: port 1(veth52edf8b) entered disabled state
Sep 14 17:21:49 ip-172-31-20-201 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth52edf8b: link becomes ready
Sep 14 17:21:49 ip-172-31-20-201 kernel: docker0: port 1(veth52edf8b) entered forwarding state
Sep 14 17:21:49 ip-172-31-20-201 kernel: docker0: port 1(veth52edf8b) entered forwarding state
Sep 14 17:21:49 ip-172-31-20-201 systemd: Starting docker container aa12a680eea65b6d5c90a9dd448735e569435210238966152f378da9e3933cc3.
Sep 14 17:21:49 ip-172-31-20-201 systemd: Started docker container aa12a680eea65b6d5c90a9dd448735e569435210238966152f378da9e3933cc3.
Sep 14 17:21:49 ip-172-31-20-201 systemd: Stopping docker container aa12a680eea65b6d5c90a9dd448735e569435210238966152f378da9e3933cc3.
Sep 14 17:21:49 ip-172-31-20-201 systemd: Stopped docker container aa12a680eea65b6d5c90a9dd448735e569435210238966152f378da9e3933cc3.
Sep 14 17:21:49 ip-172-31-20-201 docker: time="2015-09-14T17:21:49.707192689Z" level=info msg="POST /v1.20/containers/aa12a680eea65b6d5c90a9dd448735e569435210238966152f378da9e3933cc3/wait"
Sep 14 17:21:49 ip-172-31-20-201 kernel: docker0: port 1(veth52edf8b) entered disabled state
Sep 14 17:21:49 ip-172-31-20-201 kernel: device veth52edf8b left promiscuous mode
Sep 14 17:21:49 ip-172-31-20-201 kernel: docker0: port 1(veth52edf8b) entered disabled state
Sep 14 17:21:49 ip-172-31-20-201 docker: time="2015-09-14T17:21:49.848508505Z" level=info msg="GET /v1.20/containers/aa12a680eea65b6d5c90a9dd448735e569435210238966152f378da9e3933cc3/json"
Sep 14 17:21:49 ip-172-31-20-201 docker: time="2015-09-14T17:21:49.849767196Z" level=info msg="DELETE /v1.20/containers/aa12a680eea65b6d5c90a9dd448735e569435210238966152f378da9e3933cc3?v=1"

Is this an expected behavior of v1.8.2 due to some changes?

@cpuguy83
Copy link
Member

Maybe this is due to bumping up the allocated space from 10GB to 100GB?
ping @rhvgoyal

@rhvgoyal
Copy link
Contributor

Bumping up allocated space to 100G is causing delays in mkfs.xfs on top of loop devices but I have not heard anything w.r.t device deletion.

@ceecko

Can you setup a fresh instance and pass option "--storage-opt dm.basesize=10G" to make sure 10G loop device is created and try again.

@ceecko
Copy link
Author

ceecko commented Sep 15, 2015

@rhvgoyal --storage-opt dm.basesize=10G resolves the performance issues. On v1.8.2 the time is now ~1.2s, the same as on v1.7.1

A side-effect of having to wait ~6.6s for the removal of a container is that it's not possible to start any other container until the removal is finished, since docker seems to be locked.

If you run these two commands in parallel (one does not remove a container), both take ~6.5s to finish:

# docker run --rm debian:jessie echo ""
# docker run debian:jessie echo ""

Running only the second command takes ~0.5s:

# DATE1=$(date +"%s%3N"); docker run debian:jessie echo ""; echo "Took $(($(date +"%s%3N") - $DATE1))ms"
Took 498ms

Is this a standard behavior or is there anything that can be done?

@rhvgoyal
Copy link
Contributor

Ok, I think you are taking more time to discard the blocks because device size is 100G. This discard happens only in case of loop devices. If you move away from loop devies, I think you will not see this performance penalty.

Anyway, 100G default base size is not working well. mkfs.xfs is taking more time on top of loop devices and discards are taking more time and container removal is slow.

I think we should revert the patch back and keep base size either 10G or 20G and then implement the functionality to be able to specify container size at run time so that we can have different sized containers to meet the needs of people for whom 10G is not sufficient.

@ceecko
Copy link
Author

ceecko commented Sep 17, 2015

After switching to thin-pool, the time with removal of a container is ~650ms.
I haven't noticed that loop devices are not suitable for production during deployment and configuration.

Just an idea - maybe it would to have a separate section in the docs which would list all things to keep in mind for production systems. Similar to what MongoDB does with Production notes https://docs.mongodb.org/manual/administration/production-notes

@ceecko ceecko closed this as completed Sep 17, 2015
@rhvgoyal
Copy link
Contributor

@ceecko

When you say "switching to thin-pool", I am assuming you mean using a thin-pool which is on real block devices and not on top of loop devices?

@ceecko
Copy link
Author

ceecko commented Sep 17, 2015

Yes, exactly

@zrml
Copy link

zrml commented Sep 17, 2015

same probem here on CentOS7.1; device-mapper default storage with an image size of about 2GB (Tutum CentOS + some tar files I copied...)
CPU is 100%
Last night I disconnected. Now trying something new and wanting to remove the container it's still taking ages.

the other open shell is frozen because the CPU is @100%. I see this from the portal.

@jeanpralo
Copy link

Hey there,

I am having the same problem as @ceecko, deleting is way slower than before and sometimes results in docker saying that it can't delete the container because the Device is busy:

Error response from daemon: Cannot destroy container fb975b899c03: Driver devicemapper failed    to remove root filesystem  fb975b899c032da73611ad3c17cf68d33ebe805e234c51797a4b3284870577ac: Device is Busy

Switching the dm.basesize to 10GB seems to be fixing the issue so far, maybe it would be worth reverting the default to 10GB instead of 100GB or even specify this option at the creation of the container as requested in this issue: #14678

Docker info:

docker info

Containers: 5
Images: 57
Storage Driver: devicemapper
Pool Name: docker-8:0-81993-pool
Pool Blocksize: 65.54 kB
Backing Filesystem: extfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 1.521 GB
Data Space Total: 107.4 GB
Data Space Available: 21.84 GB
Metadata Space Used: 3.039 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.144 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.77 (2012-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.5-x86_64-linode61
Operating System: Ubuntu 14.04.1 LTS
CPUs: 1
Total Memory: 991.2 MiB
Name: ch-d-dm10-2
ID: 6YWZ:ZNXS:IFUC:47ZT:KDBB:QZ4L:CRNQ:UXYM:2MFD:2RMV:N66X:INLV
WARNING: No swap limit support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants