New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang with df -h #1630

Closed
keimoon opened this Issue Oct 24, 2016 · 17 comments

Comments

Projects
None yet
10 participants
@keimoon

keimoon commented Oct 24, 2016

Issue Report

Bug

CoreOS Version

$ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=1122.3.0
VERSION_ID=1122.3.0
BUILD_ID=2016-10-20-2045
PRETTY_NAME="CoreOS 1122.3.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

AWS EC2, deploy using ``kube-aws`

Expected Behavior

df -h returns list of mounted storage devices

Actual Behavior

df -h hang infinitely

Reproduction Steps

Other Information

Kubernetes version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.3+coreos.0", GitCommit:"7819c84f25e8c661321ee80d6b9fa5f4ff06676f", GitTreeState:"clean", BuildDate:"2016-10-17T21:19:17Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes Controller Instance type: t2.small
Kubernetes Worker Instance type: m4.xlarge

We ran into an issue where commands like df -h hang on worker server. We ran with strace and it hangs at:

stat("/proc/sys/fs/binfmt_misc",

Rebooting worker instances will resolve the problem temporarily, but after about 1 day it will occur again.

@keimoon

This comment has been minimized.

Show comment
Hide comment
@keimoon

keimoon Oct 24, 2016

Some related error in journalctl:

Failed to open directory /etc/systemd/system/var-lib-rkt-pods-run-2da157fb\x2d892b\x2d4ee0\x2dba88\x2df6a5d586d624-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-rkt-pods-run-3d225a67\x2d27f0\x2d430a\x2d8613\x2ded4365a0dd69-stage1-rootfs-opt-stage2-flannel-rootfs-proc-sys-fs-binfmt_misc.mount.wants: File name too long

keimoon commented Oct 24, 2016

Some related error in journalctl:

Failed to open directory /etc/systemd/system/var-lib-rkt-pods-run-2da157fb\x2d892b\x2d4ee0\x2dba88\x2df6a5d586d624-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-rkt-pods-run-3d225a67\x2d27f0\x2d430a\x2d8613\x2ded4365a0dd69-stage1-rootfs-opt-stage2-flannel-rootfs-proc-sys-fs-binfmt_misc.mount.wants: File name too long

@lucab

This comment has been minimized.

Show comment
Hide comment
@lucab

lucab Oct 24, 2016

Member

The timing plus the journal entry are suspicious, and may hint at some rkt pod which are not being garbage-collected quickly enough.

@keimoon can you please manually run a rkt gc --grace-period=0 from time to time and see if it helps passing the 1 day threshold? Also, can you please paste the output of rkt list at the time when you see this kind of hanging?

Member

lucab commented Oct 24, 2016

The timing plus the journal entry are suspicious, and may hint at some rkt pod which are not being garbage-collected quickly enough.

@keimoon can you please manually run a rkt gc --grace-period=0 from time to time and see if it helps passing the 1 day threshold? Also, can you please paste the output of rkt list at the time when you see this kind of hanging?

@keimoon

This comment has been minimized.

Show comment
Hide comment
@keimoon

keimoon Oct 25, 2016

@lucab thanks, I will paste the output when it hang again

keimoon commented Oct 25, 2016

@lucab thanks, I will paste the output when it hang again

@pbx0 pbx0 referenced this issue Oct 25, 2016

Closed

Hang with df -h #739

@PAStheLoD

This comment has been minimized.

Show comment
Hide comment
@PAStheLoD

PAStheLoD Dec 15, 2016

Hello,

On 1248.1.0.

Drastic things can happen if someone tries a rkt-ified toolbox, such as un-GC-able pods.

/bin/rkt --insecure-options=image run --dns=host --hosts-entry=host --interactive --mount=volume=root,target=/media/root --mount=volume=run,target=/media/root/run --mount=volume=usr,target=/media/root/usr --net=host --stage1-name=coreos.com/rkt/stage1-fly:1.20.0 --uuid-file-save=/tmp/tmp.25NMNKACOZ --volume=root,kind=host,source=/,readOnly=false --volume=run,kind=host,source=/run,readOnly=false --volume=usr,kind=host,source=/usr,readOnly=true docker://fedora:latest --user=root --exec=/usr/bin/bash

Results in Dec 14 23:46:30 localhost systemd[1]: Failed to set up mount unit: Invalid argument Dec 14 23:46:30 localhost systemd[1335]: Failed to open directory /run/user/500/systemd/transient/var-lib-rkt-pods-run-5e7159a6\x2d9dcb\x2d4f84\x2daccb\x2da8f9853bacb7-stage1-rootfs-opt-stage2-fedora-rootfs-media-root-var-lib-rkt-pods-run-b1578be9\x2dbd57\x2d4d19\x2db8dc\x2d5c37853abc1e-stage1-rootfs-opt-stage2-fedora-rootfs-etc-resolv.conf.mount.wants: File name too long

And then it takes a few reboots to clean up the resulting mess if more and more containers' things get tangled up below that pod's mounts (so it seemed that more and more mounts got accessible from that directory).

PAStheLoD commented Dec 15, 2016

Hello,

On 1248.1.0.

Drastic things can happen if someone tries a rkt-ified toolbox, such as un-GC-able pods.

/bin/rkt --insecure-options=image run --dns=host --hosts-entry=host --interactive --mount=volume=root,target=/media/root --mount=volume=run,target=/media/root/run --mount=volume=usr,target=/media/root/usr --net=host --stage1-name=coreos.com/rkt/stage1-fly:1.20.0 --uuid-file-save=/tmp/tmp.25NMNKACOZ --volume=root,kind=host,source=/,readOnly=false --volume=run,kind=host,source=/run,readOnly=false --volume=usr,kind=host,source=/usr,readOnly=true docker://fedora:latest --user=root --exec=/usr/bin/bash

Results in Dec 14 23:46:30 localhost systemd[1]: Failed to set up mount unit: Invalid argument Dec 14 23:46:30 localhost systemd[1335]: Failed to open directory /run/user/500/systemd/transient/var-lib-rkt-pods-run-5e7159a6\x2d9dcb\x2d4f84\x2daccb\x2da8f9853bacb7-stage1-rootfs-opt-stage2-fedora-rootfs-media-root-var-lib-rkt-pods-run-b1578be9\x2dbd57\x2d4d19\x2db8dc\x2d5c37853abc1e-stage1-rootfs-opt-stage2-fedora-rootfs-etc-resolv.conf.mount.wants: File name too long

And then it takes a few reboots to clean up the resulting mess if more and more containers' things get tangled up below that pod's mounts (so it seemed that more and more mounts got accessible from that directory).

@lucab

This comment has been minimized.

Show comment
Hide comment
@lucab

lucab Dec 19, 2016

Member

@PAStheLoD be careful about this:

--mount=volume=root,target=/media/root
--mount=volume=run,target=/media/root/run
--mount=volume=usr,target=/media/root/usr

This is nesting run and usr under an existing root volume. Due to "shared" back propagation, they will re-appear back on the host layered on top your rootfs. That's an almost pathological case to clean. I strongly suggest to use separate non-nested targets for each volume and if possible not to share root at all (ie. only share the rootfs directories you need).

Member

lucab commented Dec 19, 2016

@PAStheLoD be careful about this:

--mount=volume=root,target=/media/root
--mount=volume=run,target=/media/root/run
--mount=volume=usr,target=/media/root/usr

This is nesting run and usr under an existing root volume. Due to "shared" back propagation, they will re-appear back on the host layered on top your rootfs. That's an almost pathological case to clean. I strongly suggest to use separate non-nested targets for each volume and if possible not to share root at all (ie. only share the rootfs directories you need).

@btalbot

This comment has been minimized.

Show comment
Hide comment
@btalbot

btalbot Jun 23, 2017

This seems to be happening again with Container Linux by CoreOS stable (1409.2.0) on EC2 with EBS root disk and HVM image.

Access to /proc/sys/fs/binfmt_misc seems to hang.

Rebooting does seem to clear the issue for a very short time maybe until something trys to access binfmt_misc filesystem.

core@ip-172-31-49-176 ~ $ uptime
 01:31:05 up 2 min,  2 users,  load average: 0.04, 0.06, 0.02
core@ip-172-31-49-176 ~ $ time ls -alF /proc/sys/fs/
^Cls: cannot access '/proc/sys/fs/binfmt_misc': Interrupted system call


real	0m3.994s
user	0m0.001s
sys	0m0.001s
core@ip-172-31-49-176 ~ $ stat /proc/sys/fs
  File: '/proc/sys/fs'
  Size: 0         	Blocks: 0          IO Block: 1024   directory
Device: 4h/4d	Inode: 3278        Links: 1
Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:sysctl_fs_t:s0
Access: 2017-06-23 01:28:13.628000000 +0000
Modify: 2017-06-23 01:28:13.628000000 +0000
Change: 2017-06-23 01:28:13.628000000 +0000
 Birth: -
core@ip-172-31-49-176 ~ $ stat /proc/sys/fs/binfmt_misc
^C

Logs about binfmt include

Jun 23 01:28:42 ip-172-31-49-176.ec2.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 1218 (python)
Jun 23 01:28:42 ip-172-31-49-176.ec2.internal systemd[1]: Mounting Arbitrary Executable File Formats File System...
Jun 23 01:28:42 ip-172-31-49-176.ec2.internal systemd[1]: Mounted Arbitrary Executable File Formats File System.
Jun 23 01:28:42 ip-172-31-49-176.ec2.internal bash[1031]: 2017-06-23 01:28:42,373 INFO exited: go-metro (exit status 0; expected)
Jun 23 01:28:42 ip-172-31-49-176.ec2.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 1218 (python)
Jun 23 01:28:42 ip-172-31-49-176.ec2.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Automount point already active?

and mount says

core@ip-172-31-49-176 ~ $ mount | grep binfmt
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=36,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11018)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

btalbot commented Jun 23, 2017

This seems to be happening again with Container Linux by CoreOS stable (1409.2.0) on EC2 with EBS root disk and HVM image.

Access to /proc/sys/fs/binfmt_misc seems to hang.

Rebooting does seem to clear the issue for a very short time maybe until something trys to access binfmt_misc filesystem.

core@ip-172-31-49-176 ~ $ uptime
 01:31:05 up 2 min,  2 users,  load average: 0.04, 0.06, 0.02
core@ip-172-31-49-176 ~ $ time ls -alF /proc/sys/fs/
^Cls: cannot access '/proc/sys/fs/binfmt_misc': Interrupted system call


real	0m3.994s
user	0m0.001s
sys	0m0.001s
core@ip-172-31-49-176 ~ $ stat /proc/sys/fs
  File: '/proc/sys/fs'
  Size: 0         	Blocks: 0          IO Block: 1024   directory
Device: 4h/4d	Inode: 3278        Links: 1
Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:sysctl_fs_t:s0
Access: 2017-06-23 01:28:13.628000000 +0000
Modify: 2017-06-23 01:28:13.628000000 +0000
Change: 2017-06-23 01:28:13.628000000 +0000
 Birth: -
core@ip-172-31-49-176 ~ $ stat /proc/sys/fs/binfmt_misc
^C

Logs about binfmt include

Jun 23 01:28:42 ip-172-31-49-176.ec2.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 1218 (python)
Jun 23 01:28:42 ip-172-31-49-176.ec2.internal systemd[1]: Mounting Arbitrary Executable File Formats File System...
Jun 23 01:28:42 ip-172-31-49-176.ec2.internal systemd[1]: Mounted Arbitrary Executable File Formats File System.
Jun 23 01:28:42 ip-172-31-49-176.ec2.internal bash[1031]: 2017-06-23 01:28:42,373 INFO exited: go-metro (exit status 0; expected)
Jun 23 01:28:42 ip-172-31-49-176.ec2.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 1218 (python)
Jun 23 01:28:42 ip-172-31-49-176.ec2.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Automount point already active?

and mount says

core@ip-172-31-49-176 ~ $ mount | grep binfmt
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=36,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11018)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
@denderello

This comment has been minimized.

Show comment
Hide comment
@denderello

denderello Jun 23, 2017

We are facing the same issue with CoreOS 1409.2.0 and were able to pin this down on /proc/sys/fs/binfmt_misc using strace:

sudo su
toolbox
dnf install strace
strace df
[...]
stat("/media/root/proc/sys/fs/binfmt_misc",
[ hangs from here on ]

We are running as similar setup of EC2 and EBS volumes as @btalbot described.

denderello commented Jun 23, 2017

We are facing the same issue with CoreOS 1409.2.0 and were able to pin this down on /proc/sys/fs/binfmt_misc using strace:

sudo su
toolbox
dnf install strace
strace df
[...]
stat("/media/root/proc/sys/fs/binfmt_misc",
[ hangs from here on ]

We are running as similar setup of EC2 and EBS volumes as @btalbot described.

@lucab

This comment has been minimized.

Show comment
Hide comment
@lucab

lucab Jun 23, 2017

Member

This is typically due to mounts piling up or similar strange scenarios as a result of some other container-mounting activity. This is just a symptom however, and the root causes may change.

I'd suggest to start from a fresh system, keep track of which containers/services are being started/stopped/gc-ed, and then look into /proc/1/mountinfo at the time of the hang.

Member

lucab commented Jun 23, 2017

This is typically due to mounts piling up or similar strange scenarios as a result of some other container-mounting activity. This is just a symptom however, and the root causes may change.

I'd suggest to start from a fresh system, keep track of which containers/services are being started/stopped/gc-ed, and then look into /proc/1/mountinfo at the time of the hang.

@keimoon

This comment has been minimized.

Show comment
Hide comment
@keimoon

keimoon Jun 23, 2017

I haven't seen this error in my system for months. But when this occurred I can just mount the binfmt_misc manually using mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc. This would prevent the hang to happen until the machine was restarted. I am using CoreOS-stable-1353.7.0-hvm (ami-ad593cbb)

keimoon commented Jun 23, 2017

I haven't seen this error in my system for months. But when this occurred I can just mount the binfmt_misc manually using mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc. This would prevent the hang to happen until the machine was restarted. I am using CoreOS-stable-1353.7.0-hvm (ami-ad593cbb)

@denderello

This comment has been minimized.

Show comment
Hide comment
@denderello

denderello Jun 23, 2017

@lucab We can try to gather this data next week, although it will take a bit of time.

One thing that I did not mention yet. This happens when we update our cluster machines from 1353.7.0 to 1409.2.0. As soon as we roll back to version 1353.7.0, everything is working fine again and we are not experiencing any errors with /proc/sys/fs/binfmt_misc. A quick check of the changes between the two versions did not reveal anything obvious to me. Maybe some change between these two versions that rings a bell for you?

denderello commented Jun 23, 2017

@lucab We can try to gather this data next week, although it will take a bit of time.

One thing that I did not mention yet. This happens when we update our cluster machines from 1353.7.0 to 1409.2.0. As soon as we roll back to version 1353.7.0, everything is working fine again and we are not experiencing any errors with /proc/sys/fs/binfmt_misc. A quick check of the changes between the two versions did not reveal anything obvious to me. Maybe some change between these two versions that rings a bell for you?

@idleyoungman

This comment has been minimized.

Show comment
Hide comment
@idleyoungman

idleyoungman Jun 23, 2017

We have a container w/ a read-only volume mount of the host's root vol (like -v /:/mnt/ROOT:ro) that reliably triggers this on 1409.2.0. Happy to provide any details that help.

Contents of /proc/1/mountinfo at time of hang are:
https://gist.github.com/idleyoungman/8f01adffa997023d00671e0587409cdd

idleyoungman commented Jun 23, 2017

We have a container w/ a read-only volume mount of the host's root vol (like -v /:/mnt/ROOT:ro) that reliably triggers this on 1409.2.0. Happy to provide any details that help.

Contents of /proc/1/mountinfo at time of hang are:
https://gist.github.com/idleyoungman/8f01adffa997023d00671e0587409cdd

@idleyoungman

This comment has been minimized.

Show comment
Hide comment
@idleyoungman

idleyoungman Jun 23, 2017

Just a note re: our current workaround: We are starting the docker container including the volume mount of the host's root vol via a systemd unit. Adding proc-sys-fs-binfmt_misc.mount to the Requires: appears to ensure the mount has completed and subsequent stats of paths in /proc/sys/fs succeed.

idleyoungman commented Jun 23, 2017

Just a note re: our current workaround: We are starting the docker container including the volume mount of the host's root vol via a systemd unit. Adding proc-sys-fs-binfmt_misc.mount to the Requires: appears to ensure the mount has completed and subsequent stats of paths in /proc/sys/fs succeed.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Jun 26, 2017

Contributor

I suspect this issue will be fixed by systemd/systemd#5916
Notably, the message Automount point already active? is out-of-order in one of these traces too.

That change should be in the next systemd release.

The easiest workaround in the meanwhile is probably to just mask that automount (systemctl mask --now proc-sys-fs-binfmt_misc.automount).
This assumes that you have no need of /proc/sys/fs/binfmt_misc.

If you do need binfmt_misc mounted, you could still mask the automount and depend on the regular .mount unit for whatever service needs it.

Contributor

euank commented Jun 26, 2017

I suspect this issue will be fixed by systemd/systemd#5916
Notably, the message Automount point already active? is out-of-order in one of these traces too.

That change should be in the next systemd release.

The easiest workaround in the meanwhile is probably to just mask that automount (systemctl mask --now proc-sys-fs-binfmt_misc.automount).
This assumes that you have no need of /proc/sys/fs/binfmt_misc.

If you do need binfmt_misc mounted, you could still mask the automount and depend on the regular .mount unit for whatever service needs it.

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Jul 6, 2017

Member

This should now be fixed in the beta channel, as well as in the next alpha and stable releases, due shortly.

Member

bgilbert commented Jul 6, 2017

This should now be fixed in the beta channel, as well as in the next alpha and stable releases, due shortly.

@bgilbert bgilbert closed this Jul 6, 2017

@felixbuenemann

This comment has been minimized.

Show comment
Hide comment
@felixbuenemann

felixbuenemann Jul 6, 2017

@bgilbert It would be great if you could add a link to the PR or commit that introduced the fix.

felixbuenemann commented Jul 6, 2017

@bgilbert It would be great if you could add a link to the PR or commit that introduced the fix.

@lucab

This comment has been minimized.

Show comment
Hide comment
@lucab

lucab Jul 6, 2017

Member

@felixbuenemann the upstream fix is systemd/systemd#5916 and will land in systemd-234, while the backport for ContainerLinux systemd-233 is coreos/systemd#82.

Member

lucab commented Jul 6, 2017

@felixbuenemann the upstream fix is systemd/systemd#5916 and will land in systemd-234, while the backport for ContainerLinux systemd-233 is coreos/systemd#82.

@lucab

This comment has been minimized.

Show comment
Hide comment
@lucab

lucab Jan 22, 2018

Member

For reference, this has been assigned CVE-2018-1049: https://bugzilla.redhat.com/show_bug.cgi?id=1534701

Member

lucab commented Jan 22, 2018

For reference, this has been assigned CVE-2018-1049: https://bugzilla.redhat.com/show_bug.cgi?id=1534701

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment