New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance reboots after 15-90 minutes after dind launch #1153

Closed
d-kononov opened this Issue Mar 4, 2016 · 9 comments

Comments

Projects
None yet
4 participants
@d-kononov

d-kononov commented Mar 4, 2016

Hi

I try to use DinD in coreos.

I turned off update-engine.service, locksmithd.service, set reboot-strategy: off. But coreos reboots after 15-90 minutes after dind launch.

I tried 835.13.0 (ami-7f3a0b15) and 835.11.0 (logs below and configs from this version - ami-23260749) coreos.

Cloud-init:

#cloud-config

write_files:
  - path: /etc/modules-load.d/nf.conf
    content: |
      nf_conntrack
      nf_conntrack_ipv4
  - path: /etc/sysctl.d/nf.conf
    permissions: 0644
    owner: root
    content: |
      net.netfilter.nf_conntrack_tcp_timeout_established = 600
      net.netfilter.nf_conntrack_generic_timeout = 60
      net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 30
      net.netfilter.nf_conntrack_max=524288
  - path: /etc/systemd/system/docker.service.d/increase-ulimit.conf
    owner: core:core
    permissions: 0644
    content: |
      [Service]
      LimitMEMLOCK=infinity
      LimitNOFILE=262144

coreos:
  update:
    reboot-strategy: off
  units:
    - name: systemd-modules-load.service
      command: restart
    - name: systemd-sysctl.service
      command: restart
    - name: update-engine.service
      command: stop
    - name: locksmithd.service
      command: stop
    - name: systemd-journald.service
      drop-ins:
        - name: 10-debug.conf
          content: |
            [Service]
            Environment=SYSTEMD_LOG_LEVEL=debug
      command: restart
....

running services:

find /run/systemd/system -ls
  8416    0 drwxr-xr-x   6 root     root          200 Mar  4 09:33 /run/systemd/system
 20249    0 drwxr-xr-x   2 root     root          160 Mar  4 09:33 /run/systemd/system/docker-f69dec53970730ba21b1bc638de47f7791a7dc3fe7e3cda13ac5352db8201122.scope.d
 20255    4 -rw-r--r--   1 root     root           30 Mar  4 09:33 /run/systemd/system/docker-f69dec53970730ba21b1bc638de47f7791a7dc3fe7e3cda13ac5352db8201122.scope.d/50-DefaultDependencies.conf
 20254    4 -rw-r--r--   1 root     root           30 Mar  4 09:33 /run/systemd/system/docker-f69dec53970730ba21b1bc638de47f7791a7dc3fe7e3cda13ac5352db8201122.scope.d/50-BlockIOAccounting.conf
 20253    4 -rw-r--r--   1 root     root           26 Mar  4 09:33 /run/systemd/system/docker-f69dec53970730ba21b1bc638de47f7791a7dc3fe7e3cda13ac5352db8201122.scope.d/50-CPUAccounting.conf
 20252    4 -rw-r--r--   1 root     root           29 Mar  4 09:33 /run/systemd/system/docker-f69dec53970730ba21b1bc638de47f7791a7dc3fe7e3cda13ac5352db8201122.scope.d/50-MemoryAccounting.conf
 20251    4 -rw-r--r--   1 root     root          101 Mar  4 09:33 /run/systemd/system/docker-f69dec53970730ba21b1bc638de47f7791a7dc3fe7e3cda13ac5352db8201122.scope.d/50-Description.conf
 20250    4 -rw-r--r--   1 root     root           27 Mar  4 09:33 /run/systemd/system/docker-f69dec53970730ba21b1bc638de47f7791a7dc3fe7e3cda13ac5352db8201122.scope.d/50-Slice.conf
 20248    4 -rw-r--r--   1 root     root           17 Mar  4 09:33 /run/systemd/system/docker-f69dec53970730ba21b1bc638de47f7791a7dc3fe7e3cda13ac5352db8201122.scope
 18775    0 drwxr-xr-x   2 root     root          160 Mar  4 09:32 /run/systemd/system/docker-00cebd1048d790cd79be0af0e1f84d68818cf08e822bbc334f2f9f08e37e9647.scope.d
 18781    4 -rw-r--r--   1 root     root           30 Mar  4 09:32 /run/systemd/system/docker-00cebd1048d790cd79be0af0e1f84d68818cf08e822bbc334f2f9f08e37e9647.scope.d/50-DefaultDependencies.conf
 18780    4 -rw-r--r--   1 root     root           30 Mar  4 09:32 /run/systemd/system/docker-00cebd1048d790cd79be0af0e1f84d68818cf08e822bbc334f2f9f08e37e9647.scope.d/50-BlockIOAccounting.conf
 18779    4 -rw-r--r--   1 root     root           26 Mar  4 09:32 /run/systemd/system/docker-00cebd1048d790cd79be0af0e1f84d68818cf08e822bbc334f2f9f08e37e9647.scope.d/50-CPUAccounting.conf
 18778    4 -rw-r--r--   1 root     root           29 Mar  4 09:32 /run/systemd/system/docker-00cebd1048d790cd79be0af0e1f84d68818cf08e822bbc334f2f9f08e37e9647.scope.d/50-MemoryAccounting.conf
 18777    4 -rw-r--r--   1 root     root          101 Mar  4 09:32 /run/systemd/system/docker-00cebd1048d790cd79be0af0e1f84d68818cf08e822bbc334f2f9f08e37e9647.scope.d/50-Description.conf
 18776    4 -rw-r--r--   1 root     root           27 Mar  4 09:32 /run/systemd/system/docker-00cebd1048d790cd79be0af0e1f84d68818cf08e822bbc334f2f9f08e37e9647.scope.d/50-Slice.conf
 18774    4 -rw-r--r--   1 root     root           17 Mar  4 09:32 /run/systemd/system/docker-00cebd1048d790cd79be0af0e1f84d68818cf08e822bbc334f2f9f08e37e9647.scope
 16488    0 lrwxrwxrwx   1 root     root            9 Mar  4 09:29 /run/systemd/system/locksmithd.service -> /dev/null
 15744    4 -rw-r--r--   1 root     root          130 Mar  4 09:28 /run/systemd/system/oem-cloudinit.service
 15742    0 drwxr-xr-x   2 root     root           60 Mar  4 09:28 /run/systemd/system/etcd2.service.d
 15743    4 -rw-r--r--   1 root     root           49 Mar  4 09:28 /run/systemd/system/etcd2.service.d/10-oem.conf
 15740    0 drwxr-xr-x   2 root     root           60 Mar  4 09:28 /run/systemd/system/etcd.service.d
 15741    4 -rw-r--r--   1 root     root           54 Mar  4 09:28 /run/systemd/system/etcd.service.d/10-oem.conf

os release:

core@ip-10-95-171-141 ~ $ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=835.11.0
VERSION_ID=835.11.0
BUILD_ID=
PRETTY_NAME="CoreOS 835.11.0"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

update.conf:

core@ip-10-95-171-141 ~ $ cat /etc/coreos/update.conf 
GROUP=stable
REBOOT_STRATEGY=off

journalctl:

....
Mar 04 09:09:41 ip-10-95-171-141 systemd-networkd[715]: vethd106542: Removing non-existent address: fe80::42:acff:fe11:4/64 (valid for ever)
Mar 04 09:09:42 ip-10-95-171-141 kernel: eth0: renamed from vethcd93e3f
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 5(veth3eb204e) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 5(veth3eb204e) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 3(vethd88cdd8) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 3(vethd88cdd8) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 4(veth637db6d) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 4(veth637db6d) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: eth0: renamed from vethdc95df1
Mar 04 09:09:41 ip-10-95-171-141 systemd-timesyncd[685]: Synchronized to time server 208.75.89.4:123 (0.coreos.pool.ntp.org).
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Network configuration changed, trying to establish connection.
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth6ccef94: Gained carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth33d3582: Gained carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth637db6d: Lost carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth3eb204e: Gained carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: vethd88cdd8: Gained carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth637db6d: Gained carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Synchronized to time server 208.75.89.4:123 (0.coreos.pool.ntp.org).
Mar 04 09:09:42 ip-10-95-171-141 sshd[14363]: Failed password for root from 183.3.202.104 port 55195 ssh2
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Network configuration changed, trying to establish connection.
Mar 04 09:09:42 ip-10-95-171-141 kernel: IPv6: ADDRCONF(NETDEV_UP): vethe2f83f8: link is not ready
Mar 04 09:09:42 ip-10-95-171-141 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethe2f83f8: link becomes ready
Mar 04 09:09:42 ip-10-95-171-141 systemd-udevd[14924]: Could not generate persistent MAC address for vethe2f83f8: No such file or directory
Mar 04 09:09:42 ip-10-95-171-141 kernel: eth0: renamed from veth017b732
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 7(vethb91be3d) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: vethe2f83f8: Configured
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 7(vethb91be3d) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 6(veth3ca5931) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 6(veth3ca5931) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: device veth92b4c50 entered promiscuous mode
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 9(veth92b4c50) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 9(veth92b4c50) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 systemd-udevd[14925]: Could not generate persistent MAC address for veth92b4c50: No such file or directory
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth92b4c50: Configured
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth92b4c50: Gained carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: vethe2f83f8: Gained carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: vethb91be3d: Gained carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth3ca5931: Gained carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Synchronized to time server 208.75.89.4:123 (0.coreos.pool.ntp.org).
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: vethe2f83f8: Lost carrier
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 9(veth92b4c50) entered disabled state
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Network configuration changed, trying to establish connection.
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: vethe2f83f8: Removing non-existent address: fe80::dc15:22ff:fe2a:66c4/64 (valid for ever)
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Synchronized to time server 208.75.89.4:123 (0.coreos.pool.ntp.org).
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth92b4c50: Lost carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Network configuration changed, trying to establish connection.
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth465586c: Lost carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth465586c: Removing non-existent address: fe80::42:acff:fe11:8/64 (valid for ever)
Mar 04 09:09:42 ip-10-95-171-141 sshd[14363]: Failed password for root from 183.3.202.104 port 55195 ssh2
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Synchronized to time server 208.75.89.4:123 (0.coreos.pool.ntp.org).
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 8(veth6ccef94) entered disabled state
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth6ccef94: Lost carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Network configuration changed, trying to establish connection.
Mar 04 09:09:42 ip-10-95-171-141 kernel: eth0: renamed from veth465586c
Mar 04 09:09:42 ip-10-95-171-141 docker[1384]: time="2016-03-04T09:09:42.614315549Z" level=error msg="Handler for DELETE /v1.22/containers/vm76_56d93d23010e9c000b00310e returned error: No such container: vm76_56
Mar 04 09:09:42 ip-10-95-171-141 docker[1384]: time="2016-03-04T09:09:42.622402960Z" level=error msg="Handler for DELETE /v1.22/containers/vm73_56d93d23010e9c000b00310b returned error: No such container: vm73_56
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Synchronized to time server 208.75.89.4:123 (0.coreos.pool.ntp.org).
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Network configuration changed, trying to establish connection.
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 8(veth6ccef94) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 8(veth6ccef94) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth6ccef94: Gained carrier
Mar 04 09:09:42 ip-10-95-171-141 docker[1384]: time="2016-03-04T09:09:42.745931390Z" level=error msg="Handler for DELETE /v1.22/containers/vm50_56d93d23010e9c000b003110 returned error: No such container: vm50_56
Mar 04 09:09:42 ip-10-95-171-141 sshd[14363]: Received disconnect from 183.3.202.104: 11:  [preauth]
Mar 04 09:09:42 ip-10-95-171-141 sshd[14363]: Disconnected from 183.3.202.104 [preauth]
Mar 04 09:09:42 ip-10-95-171-141 kernel: eth0: renamed from vethe2f83f8
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Synchronized to time server 208.75.89.4:123 (0.coreos.pool.ntp.org).
Mar 04 09:09:42 ip-10-95-171-141 sshd[14115]: Failed password for root from 183.3.202.104 port 31295 ssh2
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 9(veth92b4c50) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 kernel: docker0: port 9(veth92b4c50) entered forwarding state
Mar 04 09:09:42 ip-10-95-171-141 systemd-networkd[715]: veth92b4c50: Gained carrier
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Network configuration changed, trying to establish connection.
Mar 04 09:09:42 ip-10-95-171-141 systemd-timesyncd[685]: Synchronized to time server 208.75.89.4:123 (0.coreos.pool.ntp.org).
Mar 04 09:09:42 ip-10-95-171-141 docker[1384]: time="2016-03-04T09:09:42.958364339Z" level=error msg="Handler for DELETE /v1.22/containers/vm109_56d93d23010e9c000b003111 returned error: No such container: vm109_
Mar 04 09:09:43 ip-10-95-171-141 systemd[1]: Started OpenSSH per-connection server daemon (183.3.202.104:24687).
Mar 04 09:09:43 ip-10-95-171-141 systemd-timesyncd[685]: Network configuration changed, trying to establish connection.
Mar 04 09:09:43 ip-10-95-171-141 sshd[14115]: Received disconnect from 183.3.202.104: 11:  [preauth]
Mar 04 09:09:43 ip-10-95-171-141 sshd[14115]: Disconnected from 183.3.202.104 [preauth]
Mar 04 09:09:43 ip-10-95-171-141 systemd-timesyncd[685]: Synchronized to time server 208.75.89.4:123 (0.coreos.pool.ntp.org).
-- Reboot --
Mar 04 09:11:34 localhost systemd-journal[98]: Runtime journal (/run/log/journal/) is currently using 8.0M.
                                               Maximum allowed usage is set to 765.1M.
                                               Leaving at least 1.1G free (of currently available 7.4G of space).
                                               Enforced usage limit is thus 765.1M.
Mar 04 09:11:34 localhost systemd-journal[98]: Runtime journal (/run/log/journal/) is currently using 8.0M.
                                               Maximum allowed usage is set to 765.1M.
                                               Leaving at least 1.1G free (of currently available 7.4G of space).
                                               Enforced usage limit is thus 765.1M.
Mar 04 09:11:34 localhost kernel: Initializing cgroup subsys cpuset
Mar 04 09:11:34 localhost kernel: Initializing cgroup subsys cpu
Mar 04 09:11:34 localhost kernel: Initializing cgroup subsys cpuacct
....

mount:

mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=7820152k,nr_inodes=1955038,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime,seclabel)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
/dev/xvda9 on / type ext4 (rw,relatime,seclabel,data=ordered)
/dev/xvda3 on /usr type ext4 (ro,relatime,seclabel)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
systemd-1 on /boot type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
tmpfs on /media type tmpfs (rw,nosuid,nodev,noexec,relatime,seclabel)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
tmpfs on /tmp type tmpfs (rw,seclabel)
debugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel)
xenfs on /proc/xen type xenfs (rw,relatime)
/dev/xvda6 on /usr/share/oem type ext4 (rw,nodev,relatime,seclabel,commit=600,data=ordered)
/dev/xvda1 on /boot type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro)
/dev/xvdh on /mnt type ext4 (rw,relatime,seclabel,data=ordered)
/dev/xvdh on /var/lib/docker type ext4 (rw,relatime,seclabel,data=ordered)

df -h:

df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        7.5G     0  7.5G   0% /dev
tmpfs           7.5G     0  7.5G   0% /dev/shm
tmpfs           7.5G  652K  7.5G   1% /run
tmpfs           7.5G     0  7.5G   0% /sys/fs/cgroup
/dev/xvda9      5.5G   21M  5.2G   1% /
/dev/xvda3      985M  492M  442M  53% /usr
/dev/xvda1      128M   35M   94M  27% /boot
tmpfs           7.5G     0  7.5G   0% /media
tmpfs           7.5G     0  7.5G   0% /tmp
/dev/xvda6      108M   60K   99M   1% /usr/share/oem
/dev/xvdh        99G   32G   63G  34% /mnt

Any thoughts?

@d-kononov d-kononov changed the title from Instance reboots after 15-90 minutes after launch dind launch to Instance reboots after 15-90 minutes after dind launch Mar 4, 2016

@crawford

This comment has been minimized.

Member

crawford commented Mar 4, 2016

Do you have access to the console logs? The journal hints at a fairly abrupt restart, so the first thing that comes to mind is a kernel panic.

@d-kononov

This comment has been minimized.

d-kononov commented Mar 4, 2016

Here is something interesting from system log:

[ 1436.032361] eth0: renamed from vethf4ef06d
[ 1449.759063] docker0: port 1(veth430921e) entered forwarding state
[ 1450.335068] docker0: port 4(veth1862d41) entered forwarding state
[ 1451.039068] docker0: port 2(veth52284f6) entered forwarding state
[ 1476.170033] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [exe:22368]
[ 1476.170033] Modules linked in: veth xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter br_netfilter nf_nat bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack nls_ascii nls_cp437 vfat fat xenfs xen_privcmd ext4 crc16 mbcache jbd2 mousedev crc32c_intel jitterentropy_rng hmac drbg aesni_intel xen_netfront xen_blkfront aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ata_piix microcode firmware_class libata i2c_piix4 psmouse scsi_mod i2c_core evdev acpi_cpufreq button sch_fq_codel ip_tables autofs4
[ 1476.170033] CPU: 0 PID: 22368 Comm: exe Not tainted 4.2.2-coreos-r2 #2
[ 1476.170033] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[ 1476.170033] task: ffff8803cbd51cc0 ti: ffff880315014000 task.ti: ffff880315014000
[ 1476.170033] RIP: 0010:[<ffffffff811e1293>]  [<ffffffff811e1293>] mangle_path+0x93/0xc0
[ 1476.170033] RSP: 0018:ffff880315017d08  EFLAGS: 00000246
[ 1476.170033] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88030734ff8c
[ 1476.170033] RDX: 0000000000000066 RSI: 0000000000000066 RDI: ffffffff817c4602
[ 1476.170033] RBP: ffff880315017d38 R08: ffff8803a44ad7b8 R09: 000000000000fff9
[ 1476.170033] R10: 0000000000000000 R11: ffff880338c96c00 R12: ffffffff812c749d
[ 1476.170033] R13: ffff880315017ce8 R14: ffff880307350000 R15: ffff88030734f00d
[ 1476.170033] FS:  00000000026e8880(0063) GS:ffff8803e0400000(0000) knlGS:0000000000000000
[ 1476.170033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1476.170033] CR2: 000000c820515000 CR3: 00000000ac455000 CR4: 00000000001406f0
[ 1476.170033] Stack:
[ 1476.170033]  ffff880315972000 ffff8800acb85c00 ffff88030734f00e ffffffff817c45fe
[ 1476.170033]  ffff880315972000 ffff880316657a40 ffff880315017d68 ffffffff811e17e3
[ 1476.170033]  0000000000000179 ffff8800ac73f0a0 ffff8800acb85c00 ffff880316488f00
[ 1476.170033] Call Trace:
[ 1476.170033]  [<ffffffff811e17e3>] seq_dentry+0x63/0xb0
[ 1476.170033]  [<ffffffff811fc7b5>] show_mountinfo+0xa5/0x280
[ 1476.170033]  [<ffffffff811dbcb7>] m_show+0x17/0x20
[ 1476.170033]  [<ffffffff811e0d8d>] seq_read+0xcd/0x370
[ 1476.170033]  [<ffffffff811bd918>] __vfs_read+0x28/0xd0
[ 1476.170033]  [<ffffffff81258e33>] ? security_file_permission+0xa3/0xc0
[ 1476.170033]  [<ffffffff811bde43>] ? rw_verify_area+0x53/0xf0
[ 1476.170033]  [<ffffffff811bdf6a>] vfs_read+0x8a/0x130
[ 1476.170033]  [<ffffffff811bed66>] SyS_read+0x46/0xa0
[ 1476.170033]  [<ffffffff8152bbae>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 1476.170033] Code: 89 d6 4d 39 fe 77 2d 49 83 c7 01 41 0f b6 5f ff 84 db 74 31 44 0f be e3 4c 89 ef 44 89 e6 e8 55 28 0e 00 48 85 c0 75 9e 41 88 1e <49> 83 c6 01 4d 39 fe 76 d3 48 83 c4 08 31 c0 5b 41 5c 41 5d 41 
[ 1476.170033] Kernel panic - not syncing: softlockup: hung tasks
[ 1476.170033] CPU: 0 PID: 22368 Comm: exe Tainted: G             L  4.2.2-coreos-r2 #2
[ 1476.170033] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[ 1476.170033]  ffff8803cbd51cc0 ffff8803e0403c58 ffffffff81525d98 0000000000000007
[ 1476.170033]  ffffffff817c4cec ffff8803e0403cd8 ffffffff81524db0 0000000000000000
[ 1476.170033]  0000000000000008 ffff8803e0403ce8 ffff8803e0403c88 0000000000000000
[ 1476.170033] Call Trace:
[ 1476.170033]  <IRQ>  [<ffffffff81525d98>] dump_stack+0x45/0x57
[ 1476.170033]  [<ffffffff81524db0>] panic+0xc1/0x1fe
[ 1476.170033]  [<ffffffff81109652>] watchdog_timer_fn+0x1d2/0x1e0
[ 1476.170033]  [<ffffffff81109480>] ? watchdog+0x50/0x50
[ 1476.170033]  [<ffffffff810cfbb4>] __hrtimer_run_queues+0xe4/0x200
[ 1476.170033]  [<ffffffff810cffe8>] hrtimer_interrupt+0xa8/0x1a0
[ 1476.170033]  [<ffffffff8100a9ee>] xen_timer_interrupt+0x2e/0x140
[ 1476.170033]  [<ffffffff813b274a>] ? add_interrupt_randomness+0x3a/0x1d0
[ 1476.170033]  [<ffffffff810bed94>] handle_irq_event_percpu+0x74/0x180
[ 1476.170033]  [<ffffffff810c240e>] handle_percpu_irq+0x3e/0x60
[ 1476.170033]  [<ffffffff810be49c>] generic_handle_irq+0x2c/0x40
[ 1476.170033]  [<ffffffff8137781d>] evtchn_2l_handle_events+0x24d/0x260
[ 1476.170033]  [<ffffffff8101c579>] ? sched_clock+0x9/0x10
[ 1476.170033]  [<ffffffff81095eeb>] ? sched_clock_local+0x1b/0x90
[ 1476.170033]  [<ffffffff81374fd7>] __xen_evtchn_do_upcall+0x47/0x80
[ 1476.170033]  [<ffffffff81376df4>] xen_evtchn_do_upcall+0x34/0x50
[ 1476.170033]  [<ffffffff8152d7cb>] xen_hvm_callback_vector+0x6b/0x70
[ 1476.170033]  <EOI>  [<ffffffff811e1293>] ? mangle_path+0x93/0xc0
[ 1476.170033]  [<ffffffff811e128b>] ? mangle_path+0x8b/0xc0
[ 1476.170033]  [<ffffffff811e17e3>] seq_dentry+0x63/0xb0
[ 1476.170033]  [<ffffffff811fc7b5>] show_mountinfo+0xa5/0x280
[ 1476.170033]  [<ffffffff811dbcb7>] m_show+0x17/0x20
[ 1476.170033]  [<ffffffff811e0d8d>] seq_read+0xcd/0x370
[ 1476.170033]  [<ffffffff811bd918>] __vfs_read+0x28/0xd0
[ 1476.170033]  [<ffffffff81258e33>] ? security_file_permission+0xa3/0xc0
[ 1476.170033]  [<ffffffff811bde43>] ? rw_verify_area+0x53/0xf0
[ 1476.170033]  [<ffffffff811bdf6a>] vfs_read+0x8a/0x130
[ 1476.170033]  [<ffffffff811bed66>] SyS_read+0x46/0xa0
[ 1476.170033]  [<ffffffff8152bbae>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 1476.170033] Kernel Offset: disabled
�[H�[J�[1;1Herror: syntax error.

error: Incorrect command.

error: syntax error.

error: syntax error.

error: Incorrect command.

error: syntax error.

error: syntax error.

error: Incorrect command.

error: syntax error.

�[?25l�[m�[H�[J�[1;1H�[2;27HGNU GRUB  version 2.02~beta2


�[m�[4;2H+----------------------------------------------------------------------------+�[5;2H|�[5;79H|�[6;2H|�[6;79H|�[7;2H|�[7;79H|�[8;2H|�[8;79H|�[9;2H|�[9;79H|�[10;2H|�[10;79H|�[11;2H|�[11;79H|�[12;2H|�[12;79H|�[13;2H|�[13;79H|�[14;2H|�[14;79H|�[15;2H|�[15;79H|�[16;2H|�[16;79H|�[17;2H+----------------------------------------------------------------------------+�[m�[18;2H�[19;2H�[m     Use the ^ and v keys to select which entry is highlighted.          

      Press enter to boot the selected OS, `e' to edit the commands       

      before booting or `c' for a command-line.                           �[5;80H �[7m�[5;3H*CoreOS default                                                             �[m�[5;78H�[m�[m�[6;3H CoreOS USR-A                                                               �[m�[6;78H�[m�[m�[7;3H CoreOS USR-B                                                               �[m�[7;78H�[m�[m�[8;3H                                                                            �[m�[8;78H�[m�[m�[9;3H                                                                            �[m�[9;78H�[m�[m�[10;3H                                                                            �[m�[10;78H�[m�[m�[11;3H                                                                            �[m�[11;78H�[m�[m�[12;3H                                                                            �[m�[12;78H�[m�[m�[13;3H                                                                            �[m�[13;78H�[m�[m�[14;3H                                                                            �[m�[14;78H�[m�[m�[15;3H                                                                            �[m�[15;78H�[m�[m�[16;3H                                                                            �[m�[16;78H�[m�[16;80H �[5;78H�[22;1H   The highlighted entry will be executed automatically in 1s.                 �[5;78H�[22;1H   The highlighted entry will be executed automatically in 0s.                 �[5;78H�[?25h�[H�[J�[1;1H�[H�[J�[1;1H  Booting `CoreOS default'


[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.2.2-coreos-r2 (buildbot@ip-10-204-3-57) (gcc version 4.9.3 (Gentoo Hardened 4.9.3 p1.2, pie-0.6.3) ) #2 SMP Fri Jan 22 06:06:09 UTC 2016
[    0.000000] Command line: BOOT_IMAGE=/coreos/vmlinuz-a mount.usr=PARTUUID=7130c94a-213a-4e5a-8e26-6cce9662f132 rootflags=rw mount.usrflags=ro root=LABEL=ROOT console=ttyS0,115200n8 console=tty0 coreos.oem.id=ec2 modprobe.blacklist=xen_fbfront net.ifnames=0 systemd.journald.forward_to_console=yes verity.usrhash=a88ba142e407455a32b127e19df211526137e40fcf94d73c8809ea7ccb77723b
[    0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100

@aulanov

This comment has been minimized.

aulanov commented Apr 14, 2016

The code below can be used to repro this bug. As far as I can tell this is present in all 4.* kernels.

#include <fcntl.h>
#include <linux/sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

const char* source = "/tmp/a";
const char* target = "/tmp/x";

#define CHECK(cmd) { if (cmd) { perror(#cmd); exit(1); } }

void init() {
  mkdir(source, 0777);

  CHECK(unshare(CLONE_NEWNS));
  CHECK(mount("none", "/", NULL, MS_REC|MS_PRIVATE, NULL));
}
void do_rm(int in, int out) {
  char buf[4096];

  while(1) {
    read(in, buf, 1);
    CHECK(rmdir(target));
    write(out, "a", 1);
  }
}

void mountread(int out, int in, int size) {
  mkdir(target, 0777);
  CHECK(mount("/", target, NULL, MS_BIND, NULL));

  int f = open("/proc/self/mountinfo", O_RDONLY);
  char buf[4096];

  read(f, buf, size);
  write(out, "a", 1);
  read(in, buf, 1);
  read(f, buf, 4096);
  close(f);
}

void test() {
  int p1[2];
  int p2[2];
  pipe(p1);
  pipe(p2);

  if (fork()) {
    close(p1[1]);
    close(p2[0]);
    do_rm(p1[0], p2[1]);
  } else {
    close(p1[0]);
    close(p2[1]);
    CHECK(unshare(CLONE_NEWNS));

    int i;
    for (i = 1; i < 1024*8; i+=10) {
      mountread(p1[1], p2[0], i);
    }
  }
}

int main(int argc, char* argv[]) {
  init();
  test();
  return 0;
}

0day-ci pushed a commit to 0day-ci/linux that referenced this issue Apr 14, 2016

namespace: update event counter when umounting a deleted dentry
This change fixes a problem originally reported at:
coreos/bugs#1153
You can find code that reproduces it at:
coreos/bugs#1153 (comment)

- m_start() in fs/namespace.c expects that ns->event is incremented each
  time a mount added or removed from ns->list.
- umount_tree() removes items from the list but does not increment event
  counter, expecting that it's done before the function is called.
- There are some codepaths that call umount_tree() without updating
  "event" counter. e.g. from __detach_mounts().
- When this happens m_start may reuse a cached mount structure that no
  longer belongs to ns->list (i.e. use after free which usually leads
  to infinite loop).

This change fixes the above problem by incrementing global event counter
before invoking umount_tree().

Signed-off-by: Andrey Ulanov <andreyu@google.com>
@aulanov

This comment has been minimized.

@wcwxyz

This comment has been minimized.

wcwxyz commented May 12, 2016

@aulanov I believe I'm facing the same kernel issue. I didn't find on linux or linux-next. Any update where this patch is going into?

@aulanov

This comment has been minimized.

aulanov commented May 14, 2016

No idea.

@wcwxyz

This comment has been minimized.

wcwxyz commented May 16, 2016

@aulanov Thanks for the fix anyway. Replacing default kernel in coreos seems to be a lot of work comparing to other rpm/deb based distro. I need to rebuild everything and put all of them into an image. That's just not usual as I did. But maybe I'm wrong. Hopely this can make its way to mainline and eventually get into coreos stable. Thanks again.

@aulanov

This comment has been minimized.

aulanov commented Jul 8, 2016

@crawford

This comment has been minimized.

Member

crawford commented Dec 19, 2016

This should be fixed in Stable now.

@crawford crawford closed this Dec 19, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment