Kernel <3.8.0 panics on lxc-start with 1core/low memory VM #407

creack · 2013-04-14T22:49:37Z

The issue does not occur on a 8 core machine.

The lxc-start process causes a kernel panic while waiting for the child process to return.

It has something to do with the unmount/lock/namespaces. The output of the panic is difficult to catch.

On the last version, this happen after 5 hello world...
On older version, it takes more time.

I'll push a script that reproduce the issue.

creack · 2013-04-14T22:51:21Z

I managed to reproduce the issue on 2ee3db6, I need to update the test script to handle the old docker/dockerd scheme in order to go further

shykes · 2013-04-14T23:00:18Z

Attaching screenshots from virtualbox console output. Unfortunately the output is incomplete.

Steps to reproduce:

Run docker in daemon mode

for i in $(seq 100); do docker run base echo hello world; done

The command causing the crash will print the intended output ("hello world"), then crash before returning.

Screenshot 1: visible immediately.

Screenshot 2: appears 2 - 5 seconds after screenshot 1. Then every 3-5 seconds, it is re-printed.

shykes · 2013-04-15T16:24:54Z

This is a blocker for 0.2.

My best guess is some sort of interaction between aufs and lxc-start - maybe we unmount the rootfs too early for example?

shykes · 2013-04-15T16:26:41Z

@creack can you share the exact steps to reproduce with maximum certainty? That way we can all help with debugging, by each trying different revisions.

creack · 2013-04-15T16:29:41Z

I pushed my script in contrib/crashTest.go

You need to update the docket path and just 'go run crashTest.go'

On Monday, April 15, 2013, Solomon Hykes wrote:

@creack https://github.com/creack can you share the exact steps to
reproduce with maximum certainty? That way we can all help with debugging,
by each trying different revisions.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/407#issuecomment-16395640
.

Guillaume J. Charmes

shykes · 2013-04-15T16:30:58Z

Thanks. What is the current range of good / bad revisions that you
identified?

On Mon, Apr 15, 2013 at 9:29 AM, Guillaume J. Charmes <
notifications@github.com> wrote:

I pushed my script in contrib/crashTest.go

You need to update the docket path and just 'go run crashTest.go'

On Monday, April 15, 2013, Solomon Hykes wrote:

@creack https://github.com/creack can you share the exact steps to
reproduce with maximum certainty? That way we can all help with
debugging,
by each trying different revisions.

—
Reply to this email directly or view it on GitHub<
https://github.com/dotcloud/docker/issues/407#issuecomment-16395640>
.

Guillaume J. Charmes

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/407#issuecomment-16395821
.

jpetazzo · 2013-04-15T16:48:13Z

Things to try:

reproduce with hardened kernel (s3://
get.docker.io/kernels/linux-headers-3.2.40-grsec-dotcloud_42~tethys_amd64.deb
)
reproduce in such a way that we actually get the full backtrace (e.g. in
a Xen VM on our test machines at the office :-))
if the problem can be triggered in a Xen VM, extract the backtrace of the
kernel (starting point: xenctx)

You mentioned that the problem happened on UP machines but not SMP. If
that's indeed the case, try with 1 core but with SMP code anyway (IIRC,
kernel option noreplace-smp).

unclejack · 2013-04-15T21:04:36Z

Memory use increases after every docker run. It looks like aufs has some kind of problem or there's some other problem within the kernel.

I've just tried the script posted above with 10000 runs and I was able to get 3.8.7 with aufs3 to start swapping with 3GB of RAM. Memory never got released after running this script, it just kept growing and growing forever.

creack · 2013-04-16T00:26:38Z

I installed a fresh ubuntu 13 with a kernel 3.8.0 and I wasn't able to reproduce (I let the script run for ~1h).
However, as @unclejack said, it leaks.

creack · 2013-04-16T23:11:50Z

After a lot of tests, I am pretty sure the leaks are due to #197

unclejack · 2013-04-18T00:32:30Z

I've performed a few tests to try to reproduce this on 12.04 with stock kernels.
It didn't crash, nor leak.

docker was downloaded from docker.io to keep things simple

docker version
Version:0.1.4
Git Commit:

uname -rv
3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC 2013

cat /proc/cpuinfo | grep processor
processor       : 0

cat /proc/meminfo | grep Total
MemTotal:         496260 kB
SwapTotal:             0 kB

memory after first test w/ 100 runs & before second test

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 333512  29196  54776    0    0   214    38   41  213  3  2 94  1

memory after the second test w/ 100 runs

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0 324360  30968  56548    0    0   182    46   47  283  4  2 93  1

creack · 2013-04-18T18:35:15Z

do you perform your test with @shykes command or from my script (in /contrib/crashTest.go) ?
What version of the kernel and lxc are you using?

unclejack · 2013-04-18T19:01:59Z

@creack I was trying the command @shykes has posted earlier.

I'll try your crashTest script as well.

lxc is the standard one from Ubuntu 12.04.

robknight · 2013-04-21T17:11:37Z

Getting the same issue, captured the output from VirtualBox here: https://gist.github.com/robknight/5430280 - it's pretty much the same thing reported by @shykes earlier. Running Docker inside the standard Vagrant box, OSX 10.8 host.

For me this doesn't seem to have much to do with the length of time that the container is running for. I'm trying to build an image using docker-build, and my build succeeds maybe 25% of the time while the other 75% results in the above crash, after which the Vagrant box becomes unresponsive and has to be restarted.

My docker-build changefile only has two lines:
from base:latest
copy dist/dbx.tar /tmp/dbx.tar

The file referenced here definitely exists, and the build does succeed sometimes.

Identical behaviour occurs when using a different base image, e.g. centos.

barryaustin · 2013-04-21T17:53:40Z

Also getting kernel panics running docker 0.1.5, 0.1.6, 0.1.7 on Ubuntu 12.10, Linux 3.5.0-27, bare metal Dell Latitude D830 w/ Intel Core 2 Duo and 4GB RAM.

Reproduced by running the example multiple (<20) times:

docker run base echo hello world

Screen photos (docker 0.1.7):
https://f.cloud.github.com/assets/361379/406625/f4a5a682-aaa8-11e2-8add-2c965f5758b9.jpg
https://f.cloud.github.com/assets/361379/406627/0581c620-aaa9-11e2-9f3d-18f0ec82aae6.jpg

shykes · 2013-04-23T16:13:56Z

It seems that for the time being Docker requires Linux >3.8. This is unfortunate, but it seems earlier versions just can't handle spawning too many short-lived namespaced processes. And we couldn't pinpoint the exact change which caused the bug to strike more frequently...

Docker now issues a warning on Linux kernels <3.8.

jpetazzo · 2013-04-23T17:03:07Z

The screenshot posted by @barryaustin shows that it's exactly the same problem with bare metal. That's very useful, because it rules out many potential side effects caused by virtualization.

Are we sure we want to close this issue? People running Ubuntu in production will very probably run 12.04 LTS rather than 12.10 or 13.04, and 12.04 LTS might not be supporting 3.8 ever.

shykes · 2013-04-23T17:08:04Z

I don't mind keeping it open, but that would imply that there's something
we can do other than upgrading the kernel. Do you have any suggestions?

On Tue, Apr 23, 2013 at 10:03 AM, Jérôme Petazzoni <notifications@github.com

wrote:

The screenshot posted by @barryaustin https://github.com/barryaustinshows that it's exactly the same problem with bare metal. That's very
useful, because it rules out many potential side effects caused by
virtualization.

Are we sure we want to close this issue? People running Ubuntu in
production will very probably run 12.04 LTS rather than 12.10 or 13.04, and
12.04 LTS might not be supporting 3.8 ever.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/407#issuecomment-16871722
.

jpetazzo · 2013-04-23T17:15:54Z

My plan would look like this:

reproduce the issue using only lxc-start commands
escalate to lxc mailing list
reproduce the issue using only basic namespace code (unshare or just
clone syscalls)
escalate to kernel mailing list

On Tue, Apr 23, 2013 at 10:08 AM, Solomon Hykes notifications@github.comwrote:

I don't mind keeping it open, but that would imply that there's something
we can do other than upgrading the kernel. Do you have any suggestions?

On Tue, Apr 23, 2013 at 10:03 AM, Jérôme Petazzoni <
notifications@github.com

wrote:

The screenshot posted by @barryaustin https://github.com/barryaustinshows
that it's exactly the same problem with bare metal. That's very
useful, because it rules out many potential side effects caused by
virtualization.

Are we sure we want to close this issue? People running Ubuntu in
production will very probably run 12.04 LTS rather than 12.10 or 13.04,
and
12.04 LTS might not be supporting 3.8 ever.

—
Reply to this email directly or view it on GitHub<
https://github.com/dotcloud/docker/issues/407#issuecomment-16871722>
.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/407#issuecomment-16872040
.

shykes · 2013-04-23T17:20:43Z

I agree this would be great. I re-opened the issue and removed it from 0.2.

Want to lead the charge? Let me know and I'll assign to you.

lopter · 2013-04-24T01:52:24Z

Per @creack requests here what happens for me on apt-get update && apt-get install:

https://gist.github.com/lopter/5449001#file-dmesg-log

I'm running Docker in daemon mode on Ubuntu 12.04 in Virtualbox:

louis@dotcloud-docker:~$ uname -a
Linux dotcloud-docker 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
louis@dotcloud-docker:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 12.04.2 LTS
Release:        12.04
Codename:       precise
louis@dotcloud-docker:~$

unclejack · 2013-04-25T09:16:33Z

I've just tried the exact same setup as @lopter. It didn't crash at all, not even with CPU limit set to 40%.

However, I was able to make the system leak memory when the memory cgroup didn't get mounted. That seems to happen once every other boot on ubuntu in virtualbox. This didn't seem to break the system even after running the script @shykes posted above with 10000 runs.

I've used the precise64 box. I've updated the system to use the latest kernel (3.2.0.40).

unclejack · 2013-04-25T17:51:07Z

It locked up by the time it reached the 1355th run with the script posted by @shykes.

[ 2692.120088] BUG: soft lockup - CPU#0 stuck for 23s! [lxc-start:27038]
[ 2692.122073] Modules linked in: veth aufs xt_addrtype vboxvideo(O) drm vboxsf(O) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables bridge stp vesafb ppdev i2c_piix4 psmouse serio_raw vboxguest(O) nfsd parport_pc nfs lockd fscache auth_rpcgss nfs_acl mac_hid sunrpc lp parport ext2
[ 2692.123992] CPU 0
[ 2692.124019] Modules linked in: veth aufs xt_addrtype vboxvideo(O) drm vboxsf(O) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables bridge stp vesafb ppdev i2c_piix4 psmouse serio_raw vboxguest(O) nfsd parport_pc nfs lockd fscache auth_rpcgss nfs_acl mac_hid sunrpc lp parport ext2
[ 2692.124053]
[ 2692.124053] Pid: 27038, comm: lxc-start Tainted: G      D    O 3.2.0-40-generic #64-Ubuntu innotek GmbH VirtualBox/VirtualBox
[ 2692.124053] RIP: 0010:[<ffffffff8103ebd5>]  [<ffffffff8103ebd5>] __ticket_spin_lock+0x25/0x30
[ 2692.124053] RSP: 0018:ffff8800157937b8  EFLAGS: 00000297
[ 2692.124053] RAX: 000000000000ca9e RBX: ffffffff8112525c RCX: 0000000100045ada
[ 2692.124053] RDX: 000000000000ca9f RSI: ffffffff8117a8e0 RDI: ffff880017c10950
[ 2692.124053] RBP: ffff8800157937b8 R08: 0000000000000001 R09: 0000000000000000
[ 2692.124053] R10: ffff880014e69410 R11: 0000000000000001 R12: 000000018200017f
[ 2692.124053] R13: ffff880012c02940 R14: 000000000000000c R15: 000000000000000c
[ 2692.124053] FS:  0000000000000000(0000) GS:ffff880017c00000(0000) knlGS:0000000000000000
[ 2692.124053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2692.124053] CR2: ffff880117c00001 CR3: 0000000001c05000 CR4: 00000000000006f0
[ 2692.124053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2692.124053] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2692.124053] Process lxc-start (pid: 27038, threadinfo ffff880015792000, task ffff880016ff1700)
[ 2692.124053] Stack:
[ 2692.124053]  ffff8800157937c8 ffffffff8119712e ffff8800157937e8 ffffffff81198f5d
[ 2692.124053]  ffff880014e69400 0000000000000010 ffff8800157937f8 ffffffff8119904f
[ 2692.124053]  ffff880015793848 ffffffff8117ad83 ffff880014e69410 ffff880016efbf00
[ 2692.124053] Call Trace:
[ 2692.124053]  [<ffffffff8119712e>] vfsmount_lock_local_lock+0x1e/0x30
[ 2692.124053]  [<ffffffff81198f5d>] mntput_no_expire+0x1d/0xf0
[ 2692.124053]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2692.124053]  [<ffffffff8117ad83>] __fput+0x153/0x210
[ 2692.124053]  [<ffffffff8117ae65>] fput+0x25/0x30
[ 2692.124053]  [<ffffffff81065a89>] removed_exe_file_vma+0x39/0x50
[ 2692.124053]  [<ffffffff81143039>] remove_vma+0x89/0x90
[ 2692.124053]  [<ffffffff81145b38>] exit_mmap+0xe8/0x140
[ 2692.124053]  [<ffffffff81065b42>] mmput.part.16+0x42/0x130
[ 2692.124053]  [<ffffffff81065c59>] mmput+0x29/0x30
[ 2692.124053]  [<ffffffff8106c5f3>] exit_mm+0x113/0x130
[ 2692.124053]  [<ffffffff810e5555>] ? taskstats_exit+0x45/0x240
[ 2692.124053]  [<ffffffff8165e785>] ? _raw_spin_lock_irq+0x15/0x20
[ 2692.124053]  [<ffffffff8106c77e>] do_exit+0x16e/0x450
[ 2692.124053]  [<ffffffff8165f620>] oops_end+0xb0/0xf0
[ 2692.124053]  [<ffffffff81644907>] no_context+0x150/0x15d
[ 2692.124053]  [<ffffffff81644adf>] __bad_area_nosemaphore+0x1cb/0x1ea
[ 2692.124053]  [<ffffffff816441e4>] ? pud_offset+0x1a/0x20
[ 2692.124053]  [<ffffffff81644b11>] bad_area_nosemaphore+0x13/0x15
[ 2692.124053]  [<ffffffff81662266>] do_page_fault+0x426/0x520
[ 2692.124053]  [<ffffffff81323730>] ? zlib_inflate+0x1320/0x16d0
[ 2692.124053]  [<ffffffff81318c41>] ? vsnprintf+0x461/0x600
[ 2692.124053]  [<ffffffff8165ebf5>] page_fault+0x25/0x30
[ 2692.124053]  [<ffffffff81198f68>] ? mntput_no_expire+0x28/0xf0
[ 2692.124053]  [<ffffffff81198f5d>] ? mntput_no_expire+0x1d/0xf0
[ 2692.124053]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2692.124053]  [<ffffffff8119addc>] kern_unmount+0x2c/0x40
[ 2692.124053]  [<ffffffff811d9ca5>] pid_ns_release_proc+0x15/0x20
[ 2692.124053]  [<ffffffff811de8f9>] proc_flush_task+0x89/0xa0
[ 2692.124053]  [<ffffffff8106b1e3>] release_task+0x33/0x130
[ 2692.124053]  [<ffffffff8131b1cd>] ? __put_user_4+0x1d/0x30
[ 2692.124053]  [<ffffffff8106b77e>] wait_task_zombie+0x49e/0x5f0
[ 2692.124053]  [<ffffffff8106b9d3>] wait_consider_task.part.9+0x103/0x170
[ 2692.124053]  [<ffffffff8106baa5>] wait_consider_task+0x65/0x70
[ 2692.124053]  [<ffffffff8106bbb1>] do_wait+0x101/0x260
[ 2692.124053]  [<ffffffff8106cf00>] sys_wait4+0xa0/0xf0
[ 2692.124053]  [<ffffffff8106a700>] ? wait_task_continued+0x170/0x170
[ 2692.124053]  [<ffffffff81666a82>] system_call_fastpath+0x16/0x1b
[ 2692.124053] Code: 90 90 90 90 90 90 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 74 13 66 0f 1f 84 00 00 00 00 00 f3 90 0f b7 07 <66> 39 d0 75 f6 5d c3 0f 1f 40 00 8b 17 55 31 c0 48 89 e5 89 d1
[ 2692.124053] Call Trace:
[ 2692.124053]  [<ffffffff8119712e>] vfsmount_lock_local_lock+0x1e/0x30
[ 2692.124053]  [<ffffffff81198f5d>] mntput_no_expire+0x1d/0xf0
[ 2692.124053]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2692.124053]  [<ffffffff8117ad83>] __fput+0x153/0x210
[ 2692.124053]  [<ffffffff8117ae65>] fput+0x25/0x30
[ 2692.124053]  [<ffffffff81065a89>] removed_exe_file_vma+0x39/0x50
[ 2692.124053]  [<ffffffff81143039>] remove_vma+0x89/0x90
[ 2692.124053]  [<ffffffff81145b38>] exit_mmap+0xe8/0x140
[ 2692.124053]  [<ffffffff81065b42>] mmput.part.16+0x42/0x130
[ 2692.124053]  [<ffffffff81065c59>] mmput+0x29/0x30
[ 2692.124053]  [<ffffffff8106c5f3>] exit_mm+0x113/0x130
[ 2692.124053]  [<ffffffff810e5555>] ? taskstats_exit+0x45/0x240
[ 2692.124053]  [<ffffffff8165e785>] ? _raw_spin_lock_irq+0x15/0x20
[ 2692.124053]  [<ffffffff8106c77e>] do_exit+0x16e/0x450
[ 2692.124053]  [<ffffffff8165f620>] oops_end+0xb0/0xf0
[ 2692.124053]  [<ffffffff81644907>] no_context+0x150/0x15d
[ 2692.124053]  [<ffffffff81644adf>] __bad_area_nosemaphore+0x1cb/0x1ea
[ 2692.124053]  [<ffffffff816441e4>] ? pud_offset+0x1a/0x20
[ 2692.124053]  [<ffffffff81644b11>] bad_area_nosemaphore+0x13/0x15
[ 2692.124053]  [<ffffffff81662266>] do_page_fault+0x426/0x520
[ 2692.124053]  [<ffffffff81323730>] ? zlib_inflate+0x1320/0x16d0
[ 2692.124053]  [<ffffffff81318c41>] ? vsnprintf+0x461/0x600
[ 2692.124053]  [<ffffffff8165ebf5>] page_fault+0x25/0x30
[ 2692.124053]  [<ffffffff81198f68>] ? mntput_no_expire+0x28/0xf0
[ 2692.124053]  [<ffffffff81198f5d>] ? mntput_no_expire+0x1d/0xf0
[ 2692.124053]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2692.124053]  [<ffffffff8119addc>] kern_unmount+0x2c/0x40
[ 2692.124053]  [<ffffffff811d9ca5>] pid_ns_release_proc+0x15/0x20
[ 2692.124053]  [<ffffffff811de8f9>] proc_flush_task+0x89/0xa0
[ 2692.124053]  [<ffffffff8106b1e3>] release_task+0x33/0x130
[ 2692.124053]  [<ffffffff8131b1cd>] ? __put_user_4+0x1d/0x30
[ 2692.124053]  [<ffffffff8106b77e>] wait_task_zombie+0x49e/0x5f0
[ 2692.124053]  [<ffffffff8106b9d3>] wait_consider_task.part.9+0x103/0x170
[ 2692.124053]  [<ffffffff8106baa5>] wait_consider_task+0x65/0x70
[ 2692.124053]  [<ffffffff8106bbb1>] do_wait+0x101/0x260
[ 2692.124053]  [<ffffffff8106cf00>] sys_wait4+0xa0/0xf0
[ 2692.124053]  [<ffffffff8106a700>] ? wait_task_continued+0x170/0x170
[ 2692.124053]  [<ffffffff81666a82>] system_call_fastpath+0x16/0x1b
vagrant@precise64:~$ dmesg | less
[ 2720.112029]  [<ffffffff81644b11>] bad_area_nosemaphore+0x13/0x15
[ 2720.112029]  [<ffffffff81662266>] do_page_fault+0x426/0x520
[ 2720.112029]  [<ffffffff81323730>] ? zlib_inflate+0x1320/0x16d0
[ 2720.112029]  [<ffffffff81318c41>] ? vsnprintf+0x461/0x600
[ 2720.112029]  [<ffffffff8165ebf5>] page_fault+0x25/0x30
[ 2720.112029]  [<ffffffff81198f68>] ? mntput_no_expire+0x28/0xf0
[ 2720.112029]  [<ffffffff81198f5d>] ? mntput_no_expire+0x1d/0xf0
[ 2720.112029]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2720.112029]  [<ffffffff8119addc>] kern_unmount+0x2c/0x40
[ 2720.112029]  [<ffffffff811d9ca5>] pid_ns_release_proc+0x15/0x20
[ 2720.112029]  [<ffffffff811de8f9>] proc_flush_task+0x89/0xa0
[ 2720.112029]  [<ffffffff8106b1e3>] release_task+0x33/0x130
[ 2720.112029]  [<ffffffff8131b1cd>] ? __put_user_4+0x1d/0x30
[ 2720.112029]  [<ffffffff8106b77e>] wait_task_zombie+0x49e/0x5f0
[ 2720.112029]  [<ffffffff8106b9d3>] wait_consider_task.part.9+0x103/0x170
[ 2720.112029]  [<ffffffff8106baa5>] wait_consider_task+0x65/0x70
[ 2720.112029]  [<ffffffff8106bbb1>] do_wait+0x101/0x260
[ 2720.112029]  [<ffffffff8106cf00>] sys_wait4+0xa0/0xf0
[ 2720.112029]  [<ffffffff8106a700>] ? wait_task_continued+0x170/0x170
[ 2720.112029]  [<ffffffff81666a82>] system_call_fastpath+0x16/0x1b
[ 2720.112029] Code: 90 90 90 90 90 90 90 90 90 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 74 13 66 0f 1f 84 00 00 00 00 00 f3 90 <0f> b7 07 66 39 d0 75 f6 5d c3 0f 1f 40 00 8b 17 55 31 c0 48 89
[ 2720.112029] Call Trace:
[ 2720.112029]  [<ffffffff8119712e>] vfsmount_lock_local_lock+0x1e/0x30
[ 2720.112029]  [<ffffffff81198f5d>] mntput_no_expire+0x1d/0xf0
[ 2720.112029]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2720.112029]  [<ffffffff8117ad83>] __fput+0x153/0x210
[ 2720.112029]  [<ffffffff8117ae65>] fput+0x25/0x30
[ 2720.112029]  [<ffffffff81065a89>] removed_exe_file_vma+0x39/0x50
[ 2720.112029]  [<ffffffff81143039>] remove_vma+0x89/0x90
[ 2720.112029]  [<ffffffff81145b38>] exit_mmap+0xe8/0x140
[ 2720.112029]  [<ffffffff81065b42>] mmput.part.16+0x42/0x130
[ 2720.112029]  [<ffffffff81065c59>] mmput+0x29/0x30
[ 2720.112029]  [<ffffffff8106c5f3>] exit_mm+0x113/0x130
[ 2720.112029]  [<ffffffff810e5555>] ? taskstats_exit+0x45/0x240
[ 2720.112029]  [<ffffffff8165e785>] ? _raw_spin_lock_irq+0x15/0x20
[ 2720.112029]  [<ffffffff8106c77e>] do_exit+0x16e/0x450
[ 2720.112029]  [<ffffffff8165f620>] oops_end+0xb0/0xf0
[ 2720.112029]  [<ffffffff81644907>] no_context+0x150/0x15d
[ 2720.112029]  [<ffffffff81644adf>] __bad_area_nosemaphore+0x1cb/0x1ea
[ 2720.112029]  [<ffffffff816441e4>] ? pud_offset+0x1a/0x20
[ 2720.112029]  [<ffffffff81644b11>] bad_area_nosemaphore+0x13/0x15
[ 2720.112029]  [<ffffffff81662266>] do_page_fault+0x426/0x520
[ 2720.112029]  [<ffffffff81323730>] ? zlib_inflate+0x1320/0x16d0
[ 2720.112029]  [<ffffffff81318c41>] ? vsnprintf+0x461/0x600
[ 2720.112029]  [<ffffffff8165ebf5>] page_fault+0x25/0x30
[ 2720.112029]  [<ffffffff81198f68>] ? mntput_no_expire+0x28/0xf0
[ 2720.112029]  [<ffffffff81198f5d>] ? mntput_no_expire+0x1d/0xf0
[ 2720.112029]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2720.112029]  [<ffffffff8119addc>] kern_unmount+0x2c/0x40
[ 2720.112029]  [<ffffffff811d9ca5>] pid_ns_release_proc+0x15/0x20
[ 2720.112029]  [<ffffffff811de8f9>] proc_flush_task+0x89/0xa0
[ 2720.112029]  [<ffffffff8106b1e3>] release_task+0x33/0x130
[ 2720.112029]  [<ffffffff8131b1cd>] ? __put_user_4+0x1d/0x30
[ 2720.112029]  [<ffffffff8106b77e>] wait_task_zombie+0x49e/0x5f0
[ 2720.112029]  [<ffffffff8106b9d3>] wait_consider_task.part.9+0x103/0x170
[ 2720.112029]  [<ffffffff8106baa5>] wait_consider_task+0x65/0x70
[ 2720.112029]  [<ffffffff8106bbb1>] do_wait+0x101/0x260
[ 2720.112029]  [<ffffffff8106cf00>] sys_wait4+0xa0/0xf0
[ 2720.112029]  [<ffffffff8106a700>] ? wait_task_continued+0x170/0x170
[ 2720.112029]  [<ffffffff81666a82>] system_call_fastpath+0x16/0x1b
(END)

paulhammond · 2013-05-23T01:09:38Z

As another data point, I just ran the crashTest.go script on a Debian Wheezy VM running under Virtualbox on a dual-core i7 Macbook Air:

$ docker version
Version: 0.3.2
Git Commit: e289308
Kernel: 3.2.0-4-amd64
WARNING: No memory limit support
WARNING: No swap limit support

$ uname -rv
3.2.0-4-amd64 #1 SMP Debian 3.2.41-2

$ cat /proc/cpuinfo | grep processor
processor   : 0

The script has been running for an hour without crashing, and has done just over 10,000 runs. Memory usage remained constant throughout (with between 4 and 6MB free out of 250MB the whole time).

$ sudo /usr/local/go/bin/go run crashTest.go 
2013/05/23 00:05:54 WARNING: You are running linux kernel version 3.2.0-4-amd64, which might be unstable running docker. Please upgrade your kernel to 3.8.0.
2013/05/23 00:05:54 WARNING: cgroup mountpoint not found for memory
2013/05/23 00:05:54 Listening for RCLI/tcp on 127.0.0.1:4242
2013/05/23 00:05:54 docker run base echo 3
2013/05/23 00:05:54 docker run base echo 4
...
2013/05/23 01:05:59 docker run base echo 10153
2013/05/23 01:05:59 docker run base echo 10154

I think this means either:

I've made a mistake somewhere
The bug is a regression introduced after 3.2.0
The bug only affects the Ubuntu kernel tree

I hope some progress can be made on this issue. One of the things that I like about Docker is how easy it is to get started, requiring a 3.8 kernel makes that much harder in many environments.

shykes · 2013-05-23T02:00:20Z

Broadening kernel support is going to be a major priority for us in June. I'm also frustrated by the current kernel situation!
—
@solomonstre
@getdocker

On Wed, May 22, 2013 at 7:09 PM, Paul Hammond notifications@github.com
wrote:

As another data point, I just ran the crashTest.go script on a Debian Wheezy VM running under Virtualbox on a dual-core i7 Macbook Air:
$ docker version
Version: 0.3.2
Git Commit: e289308
Kernel: 3.2.0-4-amd64
WARNING: No memory limit support
WARNING: No swap limit support
$ uname -rv
3.2.0-4-amd64 #1 SMP Debian 3.2.41-2
$ cat /proc/cpuinfo | grep processor
processor : 0
The script has been running for an hour without crashing, and has done just over 10,000 runs. Memory usage remained constant throughout (with between 4 and 6MB free out of 250MB the whole time).
$ sudo /usr/local/go/bin/go run crashTest.go 
2013/05/23 00:05:54 WARNING: You are running linux kernel version 3.2.0-4-amd64, which might be unstable running docker. Please upgrade your kernel to 3.8.0.
2013/05/23 00:05:54 WARNING: cgroup mountpoint not found for memory
2013/05/23 00:05:54 Listening for RCLI/tcp on 127.0.0.1:4242
2013/05/23 00:05:54 docker run base echo 3
2013/05/23 00:05:54 docker run base echo 4
...
2013/05/23 01:05:59 docker run base echo 10153
2013/05/23 01:05:59 docker run base echo 10154
I think this means either:

I've made a mistake somewhere

The bug is a regression introduced after 3.2.0

The bug only affects the Ubuntu kernel tree
I hope some progress can be made on this issue. One of the things that I like about Docker is how easy it is to get started, requiring a 3.8 kernel makes that much harder in many environments.
Reply to this email directly or view it on GitHub:
Kernel <3.8.0 panics on lxc-start with 1core/low memory VM #407 (comment)

ghost · 2013-06-07T02:16:29Z

Hi guys, sorry I wasn't aware of this issue when I did my write up for OStatic. I'll update the post with a link back here. I experienced what looks to be this bug on Ubuntu 12.04 LTS, 64bit.

mastrolinux · 2013-07-02T16:36:07Z

I had something very similar in the past and was related to the hardware/kernel pair. Please test it on different CPU families if you can to see correlations.

shykes · 2013-07-18T02:58:48Z

I'm closing this since it's not immediately actionable.

jpetazzo · 2013-10-07T20:19:57Z

For the record, Josh Poimboeuf found something which might be related to this:

I did some digging. These panics seem to be caused by some race
conditions related to removing a container's mounts. I was easily able
to recreate with:

for i in seq 1 100; do docker run -i -t -d ubuntu bash; done |xargs docker kill

The fixes needed for RHEL 6.4 (based on 2.6.32) are in the following two
upstream kernel commits:

"45a68628d37222e655219febce9e91b6484789b2" (fixed in 2.6.39)

"17cf22c33e1f1b5e435469c84e43872579497653" (fixed in 3.8)

Allow the user to set DOCKER_NOWARN_KERNEL_VERSION=1 to disable the warning for RHEL 6.5 and other distributions that don't exhibit the panics described in moby#407.

gdm85 · 2014-05-15T12:59:42Z

can we keep this open? still verified with 0.11 and kernel 3.2.0 (2 processors)

https://gist.github.com/gdm85/9328ae13653e5683adda

unclejack · 2014-05-15T13:08:49Z

@gdm85 No, this will stay closed. Kernels older than 3.8 aren't supported. That means technical support isn't provided and you might run into unexpected behavior, even if it seems like it's working.

The only exception is the kernel provided by RHEL6 (2.6.32xxxxxx) which was patched and improved to work properly with Docker.

Docker on kernels older than 3.8 won't happen. Please upgrade your kernel.

Allow the user to set DOCKER_NOWARN_KERNEL_VERSION=1 to disable the warning for RHEL 6.5 and other distributions that don't exhibit the panics described in moby/moby#407.

The link to issue 407 was broken. The old link was: moby#407 The link must be: moby#407

The link to issue 407 was broken. The old link was: moby#407 The link must be: moby#407 Signed-off-by: Sabin Basyal <sabin.basyal@gmail.com>

Jenkins 2.32.1

fiveways · 2019-01-04T07:23:29Z

I know this is an old bug, has any one tried to reproduce this using the VFS driver instead of aufs ?

…ainer_error [19.03 backport] Propagate GetContainer error from event processor

…9.03-6eeb9ec3d6517883995b327b2e92b12a3cfc3074 [19.03] sync to upstream 19.03 4bed012

creack added a commit that referenced this issue Apr 14, 2013

Add a script to help reproduce #407

1ec6c22

ghost assigned creack Apr 15, 2013

shykes closed this as completed Apr 23, 2013

shykes reopened this Apr 23, 2013

mborromeo mentioned this issue Apr 23, 2013

Run docker on Linode #469

Closed

This was referenced May 3, 2013

Ubuntu 12.04 host consistently freezes the second time docker is invoked #513

Closed

Containers cannot resolve DNS if docker host uses 127.0.0.1 as resolver #541

Closed

jpetazzo mentioned this issue May 16, 2013

Document our kernel requirements and other kernel-related topic #626

Closed

creack mentioned this issue Jun 29, 2013

Linux kernel versions should never be tested. #1062

Closed

keeb-zz closed this as completed Jul 21, 2013

creack mentioned this issue Jul 26, 2013

docker build command crashes host #1309

Closed

bringhurst mentioned this issue Dec 2, 2013

remove kernel version warning on rhel 6.5 #2993

Merged

relistan mentioned this issue Jan 24, 2014

CPU soft lockup on docker daemon restart. #3744

Closed

creack removed their assignment Jul 24, 2014

fhoxh mentioned this issue Dec 19, 2014

Can't get a shell into running container under Docker 1.4.0 #9680

Closed

ghost mentioned this issue Apr 10, 2015

Improve support for Debian 7 garethr/garethr-docker#214

Closed

sbasyal added a commit to sbasyal/docker that referenced this issue Apr 15, 2015

The link to issue 407 was broken

db23693

The link to issue 407 was broken. The old link was: moby#407 The link must be: moby#407

sbasyal mentioned this issue Apr 15, 2015

The link to issue 407 was broken #12411

Merged

sbasyal added a commit to sbasyal/docker that referenced this issue Apr 15, 2015

The link to issue 407 was broken

6860c75

The link to issue 407 was broken. The old link was: moby#407 The link must be: moby#407 Signed-off-by: Sabin Basyal <sabin.basyal@gmail.com>

Ya3Sh mentioned this issue May 4, 2015

Linux: No "/etc/hosts" file in docker container #12887

Closed

johnw188 mentioned this issue Jun 24, 2015

Container losing connection to internet #14073

Closed

sysmat mentioned this issue Oct 28, 2015

config file not been used #17433

Closed

piascikj mentioned this issue Jan 9, 2018

it would be cleaner to not test for specific versions, but rather imdone/moby#77

Open

rtyler pushed a commit to rtyler/docker that referenced this issue Feb 23, 2018

Merge pull request moby#407 from winggundamth/master-2.32.1

3cf0212

Jenkins 2.32.1

kolyshkin pushed a commit to kolyshkin/moby that referenced this issue Oct 28, 2019

Merge pull request moby#407 from thaJeztah/19.03_backport_better_cont…

0e8949a

…ainer_error [19.03 backport] Propagate GetContainer error from event processor

dperny pushed a commit to dperny/docker that referenced this issue Oct 18, 2021

Merge pull request moby#407 from GordonTheTurtle/sync-with-upstream-1…

a7c4449

…9.03-6eeb9ec3d6517883995b327b2e92b12a3cfc3074 [19.03] sync to upstream 19.03 4bed012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel <3.8.0 panics on lxc-start with 1core/low memory VM #407

Kernel <3.8.0 panics on lxc-start with 1core/low memory VM #407

creack commented Apr 14, 2013

creack commented Apr 14, 2013

shykes commented Apr 14, 2013

shykes commented Apr 15, 2013

shykes commented Apr 15, 2013

creack commented Apr 15, 2013

shykes commented Apr 15, 2013

jpetazzo commented Apr 15, 2013

unclejack commented Apr 15, 2013

creack commented Apr 16, 2013

creack commented Apr 16, 2013

unclejack commented Apr 18, 2013

creack commented Apr 18, 2013

unclejack commented Apr 18, 2013

robknight commented Apr 21, 2013

barryaustin commented Apr 21, 2013

shykes commented Apr 23, 2013

jpetazzo commented Apr 23, 2013

shykes commented Apr 23, 2013

jpetazzo commented Apr 23, 2013

shykes commented Apr 23, 2013

lopter commented Apr 24, 2013

unclejack commented Apr 25, 2013

unclejack commented Apr 25, 2013

paulhammond commented May 23, 2013

shykes commented May 23, 2013

The bug only affects the Ubuntu kernel tree
I hope some progress can be made on this issue. One of the things that I like about Docker is how easy it is to get started, requiring a 3.8 kernel makes that much harder in many environments.

ghost commented Jun 7, 2013

mastrolinux commented Jul 2, 2013

shykes commented Jul 18, 2013

jpetazzo commented Oct 7, 2013

gdm85 commented May 15, 2014

unclejack commented May 15, 2014

fiveways commented Jan 4, 2019

Kernel <3.8.0 panics on lxc-start with 1core/low memory VM #407

Kernel <3.8.0 panics on lxc-start with 1core/low memory VM #407

Comments

creack commented Apr 14, 2013

creack commented Apr 14, 2013

shykes commented Apr 14, 2013

shykes commented Apr 15, 2013

shykes commented Apr 15, 2013

creack commented Apr 15, 2013

shykes commented Apr 15, 2013

jpetazzo commented Apr 15, 2013

unclejack commented Apr 15, 2013

creack commented Apr 16, 2013

creack commented Apr 16, 2013

unclejack commented Apr 18, 2013

creack commented Apr 18, 2013

unclejack commented Apr 18, 2013

robknight commented Apr 21, 2013

barryaustin commented Apr 21, 2013

shykes commented Apr 23, 2013

jpetazzo commented Apr 23, 2013

shykes commented Apr 23, 2013

jpetazzo commented Apr 23, 2013

shykes commented Apr 23, 2013

lopter commented Apr 24, 2013

unclejack commented Apr 25, 2013

unclejack commented Apr 25, 2013

paulhammond commented May 23, 2013

shykes commented May 23, 2013

The bug only affects the Ubuntu kernel tree I hope some progress can be made on this issue. One of the things that I like about Docker is how easy it is to get started, requiring a 3.8 kernel makes that much harder in many environments.

ghost commented Jun 7, 2013

mastrolinux commented Jul 2, 2013

shykes commented Jul 18, 2013

jpetazzo commented Oct 7, 2013

gdm85 commented May 15, 2014

unclejack commented May 15, 2014

fiveways commented Jan 4, 2019

The bug only affects the Ubuntu kernel tree
I hope some progress can be made on this issue. One of the things that I like about Docker is how easy it is to get started, requiring a 3.8 kernel makes that much harder in many environments.