Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockerd left open file descriptors to deleted /var/log/journal/*/*.journal files #27343

Closed
theundefined opened this issue Oct 13, 2016 · 19 comments · Fixed by #38859
Closed

dockerd left open file descriptors to deleted /var/log/journal/*/*.journal files #27343

theundefined opened this issue Oct 13, 2016 · 19 comments · Fixed by #38859
Labels
area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.

Comments

@theundefined
Copy link

theundefined commented Oct 13, 2016

dockerd left open file descriptors do deleted /var/log/journal//.journal files

I'm using dockerd with --log-driver=journald.
After few days it left many open fd to files in /var/log/journal:

ls -l /proc/$(cat /var/run/docker.pid)/fd |grep deleted |wc -l
6250

which store place on disk.

Steps to reproduce the issue:

  1. install docker, configure it to use journald

Describe the results you received:

ls -l /proc/$(cat /var/run/docker.pid)/fd |grep deleted |wc -l
6250

Describe the results you expected:
no open deleted files.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

docker -v
Docker version 1.12.1, build 23cf638

Output of docker info:

# docker info
Containers: 44
 Running: 41
 Paused: 0
 Stopped: 3
Images: 58
Server Version: 1.12.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: xfs
 Dirs: 193
 Dirperm1 Supported: true
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null bridge host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-36-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67 GiB
Name: docker-name
ID: XXX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):
kvm

@suside
Copy link

suside commented Oct 13, 2016

It's worth noting that solution proposed by @JohnMorales here #8693 can be used as a workaround: https://gist.github.com/JohnMorales/4a4277510190d187bac5

@theundefined theundefined changed the title dockerd left open file descriptors do deleted /var/log/journal/*/*.journal files dockerd left open file descriptors to deleted /var/log/journal/*/*.journal files Oct 13, 2016
@theundefined
Copy link
Author

yes, but - workaround are very bad. and using gdb on dockerd is a bit dangerous.

@cpuguy83 cpuguy83 added area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. labels Oct 13, 2016
@cpuguy83
Copy link
Member

ping @nalind

@JohnMorales
Copy link

@theundefined I agree it's not the safest, but it does work. I'm not trying to convince you to use it, but I have been using this workaround for ~a year w/ no repercussions.

@nalind
Copy link
Contributor

nalind commented Oct 14, 2016

When I try to reproduce the bug here, I'm seeing deleted files in the list of open descriptors between when the files are removed and when the daemon notices that they're gone, but they reliably get closed a fraction of a second later.

What's supposed to happen is that the daemon is woken up when it gets a message on an inotify socket that it's using to watch the directory which contains the journal files, and when it calls sd_journal_process, it processes that notification and closes files which the notification says have been deleted.

When you're seeing this happen on your systems, is the set of open deleted files staying the same, or are they being closed, with other open files possibly being deleted after that?

@suside
Copy link

suside commented Oct 17, 2016

During the weekend amount of deleted-not-closed files increased from 111 to 163 but I don't have exact list. I will report back tomorrow.

@suside
Copy link

suside commented Oct 27, 2016

Ok so here is ~one week diff https://gist.github.com/suside/10ecd6e61edfa9062272f94027b0ac3b
and it keeps growing until gdb close(fd) workaround.

Moreover when I CtrlC during gdb session docker crashed there:
docker[2853]: goroutine 31644189 [syscall, locked to thread]: docker[2853]: github.com/docker/docker/daemon/logger/journald._Cfunc_wait_for_data_or_close(0x7fc494031150, 0x7fc4000007e9, 0x0) docker[2853]: github.com/docker/docker/daemon/logger/journald/_obj/_cgo_gotypes.go:370 +0x41 docker[2853]: github.com/docker/docker/daemon/logger/journald.(*journald).followJournal.func1(0x7fc494031150, 0x7ea000007e9, 0xc821794ba0, 0xc824964b40, 0xc823ef2ba0, 0x0, 0x0, 0x0, 0x1, 0x100000000000001) docker[2853]: /usr/src/docker/.gopath/src/github.com/docker/docker/daemon/logger/journald/read.go:248 +0x1f1 docker[2853]: created by github.com/docker/docker/daemon/logger/journald.(*journald).followJournal docker[2853]: /usr/src/docker/.gopath/src/github.com/docker/docker/daemon/logger/journald/read.go:263 +0x179
Perhaps this may explain something?

@thaJeztah
Copy link
Member

ping @cpuguy83 @nalind ^^

@mattupstate
Copy link

I'm running a load balanced application on two servers and one of them has far more undeleted files than the other. The main difference between the two is that I've certainly run docker logs ... on one of the servers far more than the other. As such, this server has thousands of undeleted journal files compared to the other machine.

@theundefined
Copy link
Author

small and dirty workaround to release disk space without connecting to existing docker process and closing files:

root@docker1:~# cd /proc/`pidof dockerd`/fd
root@docker1:/proc/975/fd# ls -l |grep var.log.journal |grep deleted.$ |awk '{print $9}' |while read x; do :> $x; done;

@sebglon
Copy link

sebglon commented May 4, 2017

is it the same than #30063

@kshantaramanUFL
Copy link

Any resolution to this issue on ubuntu 14.04 running docker?

@rvadim
Copy link

rvadim commented Aug 3, 2017

Same problem with docker 1.11.2 and json log driver

docker info
Containers: 8
 Running: 6
 Paused: 0
 Stopped: 2
Images: 5
Server Version: 1.11.2
Storage Driver: devicemapper
 Pool Name: docker-253:1-257267-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 1.9 GB
 Data Space Total: 107.4 GB
 Data Space Available: 14.5 GB
 Metadata Space Used: 3.535 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.144 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.110 (2015-10-30)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge null
Kernel Version: 4.4.0-62-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 3.858 GiB
Name: router1
ID: MEGU:5IMZ:MTFG:CYXS:DS7F:PTEU:JT5L:LHUO:DSR4:XNQ2:DLTJ:N4GF
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
lsof -nP | grep json.log | grep deleted
docker     7980                   root   14r      REG              253,1 2048000610     512203 /var/lib/docker/containers/52f047735a8944150b2be76f4573466eb7ec9b2a573bf06abacce2285d063763/52f047735a8944150b2be76f4573466eb7ec9b2a573bf06abacce2285d063763-json.log.1 (deleted)
docker     7980  5649             root   14r      REG              253,1 2048000610     512203 /var/lib/docker/containers/52f047735a8944150b2be76f4573466eb7ec9b2a573bf06abacce2285d063763/52f047735a8944150b2be76f4573466eb7ec9b2a573bf06abacce2285d063763-json.log.1 (deleted)
docker     7980  5650             root   14r      REG              253,1 2048000610     512203 /var/lib/docker/containers/52f047735a8944150b2be76f4573466eb7ec9b2a573bf06abacce2285d063763/52f047735a8944150b2be76f4573466eb7ec9b2a573bf06abacce2285d063763-json.log.1 (deleted)
docker     7980  7983             root   14r      REG              253,1 2048000610     512203 /var/lib/docker/containers/52f047735a8944150b2be76f4573466eb7ec9b2a573bf06abacce2285d063763/52f047735a8944150b2be76f4573466eb7ec9b2a573bf06abacce2285d063763-json.log.1 (deleted)
lsof -nP | grep json.log | grep deleted | wc -l
34

@cpuguy83
Copy link
Member

cpuguy83 commented Aug 3, 2017

@rvadim This is a different issue and is fixed in a newer release.

@rvadim
Copy link

rvadim commented Aug 6, 2017

@cpuguy83 Can you specify which version, please? Or link to issue.

@cpuguy83
Copy link
Member

cpuguy83 commented Aug 8, 2017

@rvadim

54f11b8 and/or
0ead624

I'd recommend running at least 17.03.2.

@BBQigniter
Copy link

BBQigniter commented Aug 17, 2017

Not sure if this can be anyway related, as we do not use Moby - but we've seen similar problems on some docker nodes in swarm mode and log-driver journald.

Googling for this file descriptor problem gives not many hits. We switched to this log-driver because we use a journalbeat-container on each node, which then sends the logs to our ELK-stack. For that the journalbeat-containers are mounting /var/log/journal and somehow it then seemed that docker keeps his hands on all those deleted journald files.

What we then did was mounting the needed folders and partitions with consistency=cached (e.g. --mount type=bind,consistency=cached,source=/var/log/journal,destination=/var/log/journal -- see more at https://docs.docker.com/engine/reference/commandline/service_create/#add-bind-mounts-or-volumes) - it seems to prevent those filehandle problems.

Forgot to add: we are using docker version 17.06.0-ce

@thaJeztah
Copy link
Member

Pull request for this (but needs to be carried); #36254

@kolyshkin
Copy link
Contributor

Proposed fix: #38859

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.