Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel <3.8.0 panics on lxc-start with 1core/low memory VM #407

Closed
creack opened this issue Apr 14, 2013 · 32 comments
Closed

Kernel <3.8.0 panics on lxc-start with 1core/low memory VM #407

creack opened this issue Apr 14, 2013 · 32 comments
Milestone

Comments

@creack
Copy link
Contributor

creack commented Apr 14, 2013

The issue does not occur on a 8 core machine.

The lxc-start process causes a kernel panic while waiting for the child process to return.

It has something to do with the unmount/lock/namespaces. The output of the panic is difficult to catch.

On the last version, this happen after 5 hello world...
On older version, it takes more time.

I'll push a script that reproduce the issue.

creack added a commit that referenced this issue Apr 14, 2013
@creack
Copy link
Contributor Author

creack commented Apr 14, 2013

I managed to reproduce the issue on 2ee3db6, I need to update the test script to handle the old docker/dockerd scheme in order to go further

@shykes
Copy link
Contributor

shykes commented Apr 14, 2013

Attaching screenshots from virtualbox console output. Unfortunately the output is incomplete.

Steps to reproduce:

  1. Run docker in daemon mode

for i in $(seq 100); do docker run base echo hello world; done

The command causing the crash will print the intended output ("hello world"), then crash before returning.

Screenshot 1: visible immediately. docker 407 1

Screenshot 2: appears 2 - 5 seconds after screenshot 1. Then every 3-5 seconds, it is re-printed.
docker 407 2

@ghost ghost assigned creack Apr 15, 2013
@shykes
Copy link
Contributor

shykes commented Apr 15, 2013

This is a blocker for 0.2.

My best guess is some sort of interaction between aufs and lxc-start - maybe we unmount the rootfs too early for example?

@shykes
Copy link
Contributor

shykes commented Apr 15, 2013

@creack can you share the exact steps to reproduce with maximum certainty? That way we can all help with debugging, by each trying different revisions.

@creack
Copy link
Contributor Author

creack commented Apr 15, 2013

I pushed my script in contrib/crashTest.go

You need to update the docket path and just 'go run crashTest.go'

On Monday, April 15, 2013, Solomon Hykes wrote:

@creack https://github.com/creack can you share the exact steps to
reproduce with maximum certainty? That way we can all help with debugging,
by each trying different revisions.


Reply to this email directly or view it on GitHubhttps://github.com//issues/407#issuecomment-16395640
.

Guillaume J. Charmes

@shykes
Copy link
Contributor

shykes commented Apr 15, 2013

Thanks. What is the current range of good / bad revisions that you
identified?

On Mon, Apr 15, 2013 at 9:29 AM, Guillaume J. Charmes <
notifications@github.com> wrote:

I pushed my script in contrib/crashTest.go

You need to update the docket path and just 'go run crashTest.go'

On Monday, April 15, 2013, Solomon Hykes wrote:

@creack https://github.com/creack can you share the exact steps to
reproduce with maximum certainty? That way we can all help with
debugging,
by each trying different revisions.


Reply to this email directly or view it on GitHub<
https://github.com/dotcloud/docker/issues/407#issuecomment-16395640>
.

Guillaume J. Charmes


Reply to this email directly or view it on GitHubhttps://github.com//issues/407#issuecomment-16395821
.

@jpetazzo
Copy link
Contributor

Things to try:

  • reproduce with hardened kernel (s3://
    get.docker.io/kernels/linux-headers-3.2.40-grsec-dotcloud_42~tethys_amd64.deb
    )
  • reproduce in such a way that we actually get the full backtrace (e.g. in
    a Xen VM on our test machines at the office :-))
  • if the problem can be triggered in a Xen VM, extract the backtrace of the
    kernel (starting point: xenctx)

You mentioned that the problem happened on UP machines but not SMP. If
that's indeed the case, try with 1 core but with SMP code anyway (IIRC,
kernel option noreplace-smp).

@unclejack
Copy link
Contributor

Memory use increases after every docker run. It looks like aufs has some kind of problem or there's some other problem within the kernel.

I've just tried the script posted above with 10000 runs and I was able to get 3.8.7 with aufs3 to start swapping with 3GB of RAM. Memory never got released after running this script, it just kept growing and growing forever.

@creack
Copy link
Contributor Author

creack commented Apr 16, 2013

I installed a fresh ubuntu 13 with a kernel 3.8.0 and I wasn't able to reproduce (I let the script run for ~1h).
However, as @unclejack said, it leaks.

@creack
Copy link
Contributor Author

creack commented Apr 16, 2013

After a lot of tests, I am pretty sure the leaks are due to #197

@unclejack
Copy link
Contributor

I've performed a few tests to try to reproduce this on 12.04 with stock kernels.
It didn't crash, nor leak.

docker was downloaded from docker.io to keep things simple

docker version
Version:0.1.4
Git Commit:
uname -rv
3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC 2013
cat /proc/cpuinfo | grep processor
processor       : 0
cat /proc/meminfo | grep Total
MemTotal:         496260 kB
SwapTotal:             0 kB

memory after first test w/ 100 runs & before second test

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 333512  29196  54776    0    0   214    38   41  213  3  2 94  1

memory after the second test w/ 100 runs

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0 324360  30968  56548    0    0   182    46   47  283  4  2 93  1

@creack
Copy link
Contributor Author

creack commented Apr 18, 2013

do you perform your test with @shykes command or from my script (in /contrib/crashTest.go) ?
What version of the kernel and lxc are you using?

@unclejack
Copy link
Contributor

@creack I was trying the command @shykes has posted earlier.

I'll try your crashTest script as well.

lxc is the standard one from Ubuntu 12.04.

@robknight
Copy link

Getting the same issue, captured the output from VirtualBox here: https://gist.github.com/robknight/5430280 - it's pretty much the same thing reported by @shykes earlier. Running Docker inside the standard Vagrant box, OSX 10.8 host.

For me this doesn't seem to have much to do with the length of time that the container is running for. I'm trying to build an image using docker-build, and my build succeeds maybe 25% of the time while the other 75% results in the above crash, after which the Vagrant box becomes unresponsive and has to be restarted.

My docker-build changefile only has two lines:
from base:latest
copy dist/dbx.tar /tmp/dbx.tar

The file referenced here definitely exists, and the build does succeed sometimes.

Identical behaviour occurs when using a different base image, e.g. centos.

@barryaustin
Copy link

Also getting kernel panics running docker 0.1.5, 0.1.6, 0.1.7 on Ubuntu 12.10, Linux 3.5.0-27, bare metal Dell Latitude D830 w/ Intel Core 2 Duo and 4GB RAM.

Reproduced by running the example multiple (<20) times:

docker run base echo hello world

Screen photos (docker 0.1.7):
https://f.cloud.github.com/assets/361379/406625/f4a5a682-aaa8-11e2-8add-2c965f5758b9.jpg
https://f.cloud.github.com/assets/361379/406627/0581c620-aaa9-11e2-9f3d-18f0ec82aae6.jpg

@shykes
Copy link
Contributor

shykes commented Apr 23, 2013

It seems that for the time being Docker requires Linux >3.8. This is unfortunate, but it seems earlier versions just can't handle spawning too many short-lived namespaced processes. And we couldn't pinpoint the exact change which caused the bug to strike more frequently...

Docker now issues a warning on Linux kernels <3.8.

@shykes shykes closed this as completed Apr 23, 2013
@jpetazzo
Copy link
Contributor

The screenshot posted by @barryaustin shows that it's exactly the same problem with bare metal. That's very useful, because it rules out many potential side effects caused by virtualization.

Are we sure we want to close this issue? People running Ubuntu in production will very probably run 12.04 LTS rather than 12.10 or 13.04, and 12.04 LTS might not be supporting 3.8 ever.

@shykes
Copy link
Contributor

shykes commented Apr 23, 2013

I don't mind keeping it open, but that would imply that there's something
we can do other than upgrading the kernel. Do you have any suggestions?

On Tue, Apr 23, 2013 at 10:03 AM, Jérôme Petazzoni <notifications@github.com

wrote:

The screenshot posted by @barryaustin https://github.com/barryaustinshows that it's exactly the same problem with bare metal. That's very
useful, because it rules out many potential side effects caused by
virtualization.

Are we sure we want to close this issue? People running Ubuntu in
production will very probably run 12.04 LTS rather than 12.10 or 13.04, and
12.04 LTS might not be supporting 3.8 ever.


Reply to this email directly or view it on GitHubhttps://github.com//issues/407#issuecomment-16871722
.

@jpetazzo
Copy link
Contributor

My plan would look like this:

  • reproduce the issue using only lxc-start commands
  • escalate to lxc mailing list
  • reproduce the issue using only basic namespace code (unshare or just
    clone syscalls)
  • escalate to kernel mailing list

On Tue, Apr 23, 2013 at 10:08 AM, Solomon Hykes notifications@github.comwrote:

I don't mind keeping it open, but that would imply that there's something
we can do other than upgrading the kernel. Do you have any suggestions?

On Tue, Apr 23, 2013 at 10:03 AM, Jérôme Petazzoni <
notifications@github.com

wrote:

The screenshot posted by @barryaustin https://github.com/barryaustinshows
that it's exactly the same problem with bare metal. That's very
useful, because it rules out many potential side effects caused by
virtualization.

Are we sure we want to close this issue? People running Ubuntu in
production will very probably run 12.04 LTS rather than 12.10 or 13.04,
and
12.04 LTS might not be supporting 3.8 ever.


Reply to this email directly or view it on GitHub<
https://github.com/dotcloud/docker/issues/407#issuecomment-16871722>
.


Reply to this email directly or view it on GitHubhttps://github.com//issues/407#issuecomment-16872040
.

@shykes
Copy link
Contributor

shykes commented Apr 23, 2013

I agree this would be great. I re-opened the issue and removed it from 0.2.

Want to lead the charge? Let me know and I'll assign to you.

@lopter
Copy link
Contributor

lopter commented Apr 24, 2013

Per @creack requests here what happens for me on apt-get update && apt-get install:

https://gist.github.com/lopter/5449001#file-dmesg-log

I'm running Docker in daemon mode on Ubuntu 12.04 in Virtualbox:

louis@dotcloud-docker:~$ uname -a
Linux dotcloud-docker 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
louis@dotcloud-docker:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 12.04.2 LTS
Release:        12.04
Codename:       precise
louis@dotcloud-docker:~$ 

@unclejack
Copy link
Contributor

I've just tried the exact same setup as @lopter. It didn't crash at all, not even with CPU limit set to 40%.

However, I was able to make the system leak memory when the memory cgroup didn't get mounted. That seems to happen once every other boot on ubuntu in virtualbox. This didn't seem to break the system even after running the script @shykes posted above with 10000 runs.

I've used the precise64 box. I've updated the system to use the latest kernel (3.2.0.40).

@unclejack
Copy link
Contributor

It locked up by the time it reached the 1355th run with the script posted by @shykes.

[ 2692.120088] BUG: soft lockup - CPU#0 stuck for 23s! [lxc-start:27038]
[ 2692.122073] Modules linked in: veth aufs xt_addrtype vboxvideo(O) drm vboxsf(O) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables bridge stp vesafb ppdev i2c_piix4 psmouse serio_raw vboxguest(O) nfsd parport_pc nfs lockd fscache auth_rpcgss nfs_acl mac_hid sunrpc lp parport ext2
[ 2692.123992] CPU 0
[ 2692.124019] Modules linked in: veth aufs xt_addrtype vboxvideo(O) drm vboxsf(O) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables bridge stp vesafb ppdev i2c_piix4 psmouse serio_raw vboxguest(O) nfsd parport_pc nfs lockd fscache auth_rpcgss nfs_acl mac_hid sunrpc lp parport ext2
[ 2692.124053]
[ 2692.124053] Pid: 27038, comm: lxc-start Tainted: G      D    O 3.2.0-40-generic #64-Ubuntu innotek GmbH VirtualBox/VirtualBox
[ 2692.124053] RIP: 0010:[<ffffffff8103ebd5>]  [<ffffffff8103ebd5>] __ticket_spin_lock+0x25/0x30
[ 2692.124053] RSP: 0018:ffff8800157937b8  EFLAGS: 00000297
[ 2692.124053] RAX: 000000000000ca9e RBX: ffffffff8112525c RCX: 0000000100045ada
[ 2692.124053] RDX: 000000000000ca9f RSI: ffffffff8117a8e0 RDI: ffff880017c10950
[ 2692.124053] RBP: ffff8800157937b8 R08: 0000000000000001 R09: 0000000000000000
[ 2692.124053] R10: ffff880014e69410 R11: 0000000000000001 R12: 000000018200017f
[ 2692.124053] R13: ffff880012c02940 R14: 000000000000000c R15: 000000000000000c
[ 2692.124053] FS:  0000000000000000(0000) GS:ffff880017c00000(0000) knlGS:0000000000000000
[ 2692.124053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2692.124053] CR2: ffff880117c00001 CR3: 0000000001c05000 CR4: 00000000000006f0
[ 2692.124053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2692.124053] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2692.124053] Process lxc-start (pid: 27038, threadinfo ffff880015792000, task ffff880016ff1700)
[ 2692.124053] Stack:
[ 2692.124053]  ffff8800157937c8 ffffffff8119712e ffff8800157937e8 ffffffff81198f5d
[ 2692.124053]  ffff880014e69400 0000000000000010 ffff8800157937f8 ffffffff8119904f
[ 2692.124053]  ffff880015793848 ffffffff8117ad83 ffff880014e69410 ffff880016efbf00
[ 2692.124053] Call Trace:
[ 2692.124053]  [<ffffffff8119712e>] vfsmount_lock_local_lock+0x1e/0x30
[ 2692.124053]  [<ffffffff81198f5d>] mntput_no_expire+0x1d/0xf0
[ 2692.124053]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2692.124053]  [<ffffffff8117ad83>] __fput+0x153/0x210
[ 2692.124053]  [<ffffffff8117ae65>] fput+0x25/0x30
[ 2692.124053]  [<ffffffff81065a89>] removed_exe_file_vma+0x39/0x50
[ 2692.124053]  [<ffffffff81143039>] remove_vma+0x89/0x90
[ 2692.124053]  [<ffffffff81145b38>] exit_mmap+0xe8/0x140
[ 2692.124053]  [<ffffffff81065b42>] mmput.part.16+0x42/0x130
[ 2692.124053]  [<ffffffff81065c59>] mmput+0x29/0x30
[ 2692.124053]  [<ffffffff8106c5f3>] exit_mm+0x113/0x130
[ 2692.124053]  [<ffffffff810e5555>] ? taskstats_exit+0x45/0x240
[ 2692.124053]  [<ffffffff8165e785>] ? _raw_spin_lock_irq+0x15/0x20
[ 2692.124053]  [<ffffffff8106c77e>] do_exit+0x16e/0x450
[ 2692.124053]  [<ffffffff8165f620>] oops_end+0xb0/0xf0
[ 2692.124053]  [<ffffffff81644907>] no_context+0x150/0x15d
[ 2692.124053]  [<ffffffff81644adf>] __bad_area_nosemaphore+0x1cb/0x1ea
[ 2692.124053]  [<ffffffff816441e4>] ? pud_offset+0x1a/0x20
[ 2692.124053]  [<ffffffff81644b11>] bad_area_nosemaphore+0x13/0x15
[ 2692.124053]  [<ffffffff81662266>] do_page_fault+0x426/0x520
[ 2692.124053]  [<ffffffff81323730>] ? zlib_inflate+0x1320/0x16d0
[ 2692.124053]  [<ffffffff81318c41>] ? vsnprintf+0x461/0x600
[ 2692.124053]  [<ffffffff8165ebf5>] page_fault+0x25/0x30
[ 2692.124053]  [<ffffffff81198f68>] ? mntput_no_expire+0x28/0xf0
[ 2692.124053]  [<ffffffff81198f5d>] ? mntput_no_expire+0x1d/0xf0
[ 2692.124053]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2692.124053]  [<ffffffff8119addc>] kern_unmount+0x2c/0x40
[ 2692.124053]  [<ffffffff811d9ca5>] pid_ns_release_proc+0x15/0x20
[ 2692.124053]  [<ffffffff811de8f9>] proc_flush_task+0x89/0xa0
[ 2692.124053]  [<ffffffff8106b1e3>] release_task+0x33/0x130
[ 2692.124053]  [<ffffffff8131b1cd>] ? __put_user_4+0x1d/0x30
[ 2692.124053]  [<ffffffff8106b77e>] wait_task_zombie+0x49e/0x5f0
[ 2692.124053]  [<ffffffff8106b9d3>] wait_consider_task.part.9+0x103/0x170
[ 2692.124053]  [<ffffffff8106baa5>] wait_consider_task+0x65/0x70
[ 2692.124053]  [<ffffffff8106bbb1>] do_wait+0x101/0x260
[ 2692.124053]  [<ffffffff8106cf00>] sys_wait4+0xa0/0xf0
[ 2692.124053]  [<ffffffff8106a700>] ? wait_task_continued+0x170/0x170
[ 2692.124053]  [<ffffffff81666a82>] system_call_fastpath+0x16/0x1b
[ 2692.124053] Code: 90 90 90 90 90 90 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 74 13 66 0f 1f 84 00 00 00 00 00 f3 90 0f b7 07 <66> 39 d0 75 f6 5d c3 0f 1f 40 00 8b 17 55 31 c0 48 89 e5 89 d1
[ 2692.124053] Call Trace:
[ 2692.124053]  [<ffffffff8119712e>] vfsmount_lock_local_lock+0x1e/0x30
[ 2692.124053]  [<ffffffff81198f5d>] mntput_no_expire+0x1d/0xf0
[ 2692.124053]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2692.124053]  [<ffffffff8117ad83>] __fput+0x153/0x210
[ 2692.124053]  [<ffffffff8117ae65>] fput+0x25/0x30
[ 2692.124053]  [<ffffffff81065a89>] removed_exe_file_vma+0x39/0x50
[ 2692.124053]  [<ffffffff81143039>] remove_vma+0x89/0x90
[ 2692.124053]  [<ffffffff81145b38>] exit_mmap+0xe8/0x140
[ 2692.124053]  [<ffffffff81065b42>] mmput.part.16+0x42/0x130
[ 2692.124053]  [<ffffffff81065c59>] mmput+0x29/0x30
[ 2692.124053]  [<ffffffff8106c5f3>] exit_mm+0x113/0x130
[ 2692.124053]  [<ffffffff810e5555>] ? taskstats_exit+0x45/0x240
[ 2692.124053]  [<ffffffff8165e785>] ? _raw_spin_lock_irq+0x15/0x20
[ 2692.124053]  [<ffffffff8106c77e>] do_exit+0x16e/0x450
[ 2692.124053]  [<ffffffff8165f620>] oops_end+0xb0/0xf0
[ 2692.124053]  [<ffffffff81644907>] no_context+0x150/0x15d
[ 2692.124053]  [<ffffffff81644adf>] __bad_area_nosemaphore+0x1cb/0x1ea
[ 2692.124053]  [<ffffffff816441e4>] ? pud_offset+0x1a/0x20
[ 2692.124053]  [<ffffffff81644b11>] bad_area_nosemaphore+0x13/0x15
[ 2692.124053]  [<ffffffff81662266>] do_page_fault+0x426/0x520
[ 2692.124053]  [<ffffffff81323730>] ? zlib_inflate+0x1320/0x16d0
[ 2692.124053]  [<ffffffff81318c41>] ? vsnprintf+0x461/0x600
[ 2692.124053]  [<ffffffff8165ebf5>] page_fault+0x25/0x30
[ 2692.124053]  [<ffffffff81198f68>] ? mntput_no_expire+0x28/0xf0
[ 2692.124053]  [<ffffffff81198f5d>] ? mntput_no_expire+0x1d/0xf0
[ 2692.124053]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2692.124053]  [<ffffffff8119addc>] kern_unmount+0x2c/0x40
[ 2692.124053]  [<ffffffff811d9ca5>] pid_ns_release_proc+0x15/0x20
[ 2692.124053]  [<ffffffff811de8f9>] proc_flush_task+0x89/0xa0
[ 2692.124053]  [<ffffffff8106b1e3>] release_task+0x33/0x130
[ 2692.124053]  [<ffffffff8131b1cd>] ? __put_user_4+0x1d/0x30
[ 2692.124053]  [<ffffffff8106b77e>] wait_task_zombie+0x49e/0x5f0
[ 2692.124053]  [<ffffffff8106b9d3>] wait_consider_task.part.9+0x103/0x170
[ 2692.124053]  [<ffffffff8106baa5>] wait_consider_task+0x65/0x70
[ 2692.124053]  [<ffffffff8106bbb1>] do_wait+0x101/0x260
[ 2692.124053]  [<ffffffff8106cf00>] sys_wait4+0xa0/0xf0
[ 2692.124053]  [<ffffffff8106a700>] ? wait_task_continued+0x170/0x170
[ 2692.124053]  [<ffffffff81666a82>] system_call_fastpath+0x16/0x1b
vagrant@precise64:~$ dmesg | less
[ 2720.112029]  [<ffffffff81644b11>] bad_area_nosemaphore+0x13/0x15
[ 2720.112029]  [<ffffffff81662266>] do_page_fault+0x426/0x520
[ 2720.112029]  [<ffffffff81323730>] ? zlib_inflate+0x1320/0x16d0
[ 2720.112029]  [<ffffffff81318c41>] ? vsnprintf+0x461/0x600
[ 2720.112029]  [<ffffffff8165ebf5>] page_fault+0x25/0x30
[ 2720.112029]  [<ffffffff81198f68>] ? mntput_no_expire+0x28/0xf0
[ 2720.112029]  [<ffffffff81198f5d>] ? mntput_no_expire+0x1d/0xf0
[ 2720.112029]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2720.112029]  [<ffffffff8119addc>] kern_unmount+0x2c/0x40
[ 2720.112029]  [<ffffffff811d9ca5>] pid_ns_release_proc+0x15/0x20
[ 2720.112029]  [<ffffffff811de8f9>] proc_flush_task+0x89/0xa0
[ 2720.112029]  [<ffffffff8106b1e3>] release_task+0x33/0x130
[ 2720.112029]  [<ffffffff8131b1cd>] ? __put_user_4+0x1d/0x30
[ 2720.112029]  [<ffffffff8106b77e>] wait_task_zombie+0x49e/0x5f0
[ 2720.112029]  [<ffffffff8106b9d3>] wait_consider_task.part.9+0x103/0x170
[ 2720.112029]  [<ffffffff8106baa5>] wait_consider_task+0x65/0x70
[ 2720.112029]  [<ffffffff8106bbb1>] do_wait+0x101/0x260
[ 2720.112029]  [<ffffffff8106cf00>] sys_wait4+0xa0/0xf0
[ 2720.112029]  [<ffffffff8106a700>] ? wait_task_continued+0x170/0x170
[ 2720.112029]  [<ffffffff81666a82>] system_call_fastpath+0x16/0x1b
[ 2720.112029] Code: 90 90 90 90 90 90 90 90 90 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 74 13 66 0f 1f 84 00 00 00 00 00 f3 90 <0f> b7 07 66 39 d0 75 f6 5d c3 0f 1f 40 00 8b 17 55 31 c0 48 89
[ 2720.112029] Call Trace:
[ 2720.112029]  [<ffffffff8119712e>] vfsmount_lock_local_lock+0x1e/0x30
[ 2720.112029]  [<ffffffff81198f5d>] mntput_no_expire+0x1d/0xf0
[ 2720.112029]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2720.112029]  [<ffffffff8117ad83>] __fput+0x153/0x210
[ 2720.112029]  [<ffffffff8117ae65>] fput+0x25/0x30
[ 2720.112029]  [<ffffffff81065a89>] removed_exe_file_vma+0x39/0x50
[ 2720.112029]  [<ffffffff81143039>] remove_vma+0x89/0x90
[ 2720.112029]  [<ffffffff81145b38>] exit_mmap+0xe8/0x140
[ 2720.112029]  [<ffffffff81065b42>] mmput.part.16+0x42/0x130
[ 2720.112029]  [<ffffffff81065c59>] mmput+0x29/0x30
[ 2720.112029]  [<ffffffff8106c5f3>] exit_mm+0x113/0x130
[ 2720.112029]  [<ffffffff810e5555>] ? taskstats_exit+0x45/0x240
[ 2720.112029]  [<ffffffff8165e785>] ? _raw_spin_lock_irq+0x15/0x20
[ 2720.112029]  [<ffffffff8106c77e>] do_exit+0x16e/0x450
[ 2720.112029]  [<ffffffff8165f620>] oops_end+0xb0/0xf0
[ 2720.112029]  [<ffffffff81644907>] no_context+0x150/0x15d
[ 2720.112029]  [<ffffffff81644adf>] __bad_area_nosemaphore+0x1cb/0x1ea
[ 2720.112029]  [<ffffffff816441e4>] ? pud_offset+0x1a/0x20
[ 2720.112029]  [<ffffffff81644b11>] bad_area_nosemaphore+0x13/0x15
[ 2720.112029]  [<ffffffff81662266>] do_page_fault+0x426/0x520
[ 2720.112029]  [<ffffffff81323730>] ? zlib_inflate+0x1320/0x16d0
[ 2720.112029]  [<ffffffff81318c41>] ? vsnprintf+0x461/0x600
[ 2720.112029]  [<ffffffff8165ebf5>] page_fault+0x25/0x30
[ 2720.112029]  [<ffffffff81198f68>] ? mntput_no_expire+0x28/0xf0
[ 2720.112029]  [<ffffffff81198f5d>] ? mntput_no_expire+0x1d/0xf0
[ 2720.112029]  [<ffffffff8119904f>] mntput+0x1f/0x30
[ 2720.112029]  [<ffffffff8119addc>] kern_unmount+0x2c/0x40
[ 2720.112029]  [<ffffffff811d9ca5>] pid_ns_release_proc+0x15/0x20
[ 2720.112029]  [<ffffffff811de8f9>] proc_flush_task+0x89/0xa0
[ 2720.112029]  [<ffffffff8106b1e3>] release_task+0x33/0x130
[ 2720.112029]  [<ffffffff8131b1cd>] ? __put_user_4+0x1d/0x30
[ 2720.112029]  [<ffffffff8106b77e>] wait_task_zombie+0x49e/0x5f0
[ 2720.112029]  [<ffffffff8106b9d3>] wait_consider_task.part.9+0x103/0x170
[ 2720.112029]  [<ffffffff8106baa5>] wait_consider_task+0x65/0x70
[ 2720.112029]  [<ffffffff8106bbb1>] do_wait+0x101/0x260
[ 2720.112029]  [<ffffffff8106cf00>] sys_wait4+0xa0/0xf0
[ 2720.112029]  [<ffffffff8106a700>] ? wait_task_continued+0x170/0x170
[ 2720.112029]  [<ffffffff81666a82>] system_call_fastpath+0x16/0x1b
(END)

@paulhammond
Copy link
Contributor

As another data point, I just ran the crashTest.go script on a Debian Wheezy VM running under Virtualbox on a dual-core i7 Macbook Air:

$ docker version
Version: 0.3.2
Git Commit: e289308
Kernel: 3.2.0-4-amd64
WARNING: No memory limit support
WARNING: No swap limit support

$ uname -rv
3.2.0-4-amd64 #1 SMP Debian 3.2.41-2

$ cat /proc/cpuinfo | grep processor
processor   : 0

The script has been running for an hour without crashing, and has done just over 10,000 runs. Memory usage remained constant throughout (with between 4 and 6MB free out of 250MB the whole time).

$ sudo /usr/local/go/bin/go run crashTest.go 
2013/05/23 00:05:54 WARNING: You are running linux kernel version 3.2.0-4-amd64, which might be unstable running docker. Please upgrade your kernel to 3.8.0.
2013/05/23 00:05:54 WARNING: cgroup mountpoint not found for memory
2013/05/23 00:05:54 Listening for RCLI/tcp on 127.0.0.1:4242
2013/05/23 00:05:54 docker run base echo 3
2013/05/23 00:05:54 docker run base echo 4
...
2013/05/23 01:05:59 docker run base echo 10153
2013/05/23 01:05:59 docker run base echo 10154

I think this means either:

  • I've made a mistake somewhere
  • The bug is a regression introduced after 3.2.0
  • The bug only affects the Ubuntu kernel tree

I hope some progress can be made on this issue. One of the things that I like about Docker is how easy it is to get started, requiring a 3.8 kernel makes that much harder in many environments.

@shykes
Copy link
Contributor

shykes commented May 23, 2013

Broadening kernel support is going to be a major priority for us in June. I'm also frustrated by the current kernel situation!

@solomonstre
@getdocker

On Wed, May 22, 2013 at 7:09 PM, Paul Hammond notifications@github.com
wrote:

As another data point, I just ran the crashTest.go script on a Debian Wheezy VM running under Virtualbox on a dual-core i7 Macbook Air:

$ docker version
Version: 0.3.2
Git Commit: e289308
Kernel: 3.2.0-4-amd64
WARNING: No memory limit support
WARNING: No swap limit support
$ uname -rv
3.2.0-4-amd64 #1 SMP Debian 3.2.41-2
$ cat /proc/cpuinfo | grep processor
processor : 0

The script has been running for an hour without crashing, and has done just over 10,000 runs. Memory usage remained constant throughout (with between 4 and 6MB free out of 250MB the whole time).

$ sudo /usr/local/go/bin/go run crashTest.go 
2013/05/23 00:05:54 WARNING: You are running linux kernel version 3.2.0-4-amd64, which might be unstable running docker. Please upgrade your kernel to 3.8.0.
2013/05/23 00:05:54 WARNING: cgroup mountpoint not found for memory
2013/05/23 00:05:54 Listening for RCLI/tcp on 127.0.0.1:4242
2013/05/23 00:05:54 docker run base echo 3
2013/05/23 00:05:54 docker run base echo 4
...
2013/05/23 01:05:59 docker run base echo 10153
2013/05/23 01:05:59 docker run base echo 10154

I think this means either:

  • I've made a mistake somewhere
  • The bug is a regression introduced after 3.2.0
  • The bug only affects the Ubuntu kernel tree
    I hope some progress can be made on this issue. One of the things that I like about Docker is how easy it is to get started, requiring a 3.8 kernel makes that much harder in many environments.

    Reply to this email directly or view it on GitHub:
    Kernel <3.8.0 panics on lxc-start with 1core/low memory VM #407 (comment)

@ghost
Copy link

ghost commented Jun 7, 2013

Hi guys, sorry I wasn't aware of this issue when I did my write up for OStatic. I'll update the post with a link back here. I experienced what looks to be this bug on Ubuntu 12.04 LTS, 64bit.

@mastrolinux
Copy link

I had something very similar in the past and was related to the hardware/kernel pair. Please test it on different CPU families if you can to see correlations.

@shykes
Copy link
Contributor

shykes commented Jul 18, 2013

I'm closing this since it's not immediately actionable.

@jpetazzo
Copy link
Contributor

jpetazzo commented Oct 7, 2013

For the record, Josh Poimboeuf found something which might be related to this:

I did some digging. These panics seem to be caused by some race
conditions related to removing a container's mounts. I was easily able
to recreate with:

for i in seq 1 100; do docker run -i -t -d ubuntu bash; done |xargs docker kill

The fixes needed for RHEL 6.4 (based on 2.6.32) are in the following two
upstream kernel commits:

  • "45a68628d37222e655219febce9e91b6484789b2" (fixed in 2.6.39)
  • "17cf22c33e1f1b5e435469c84e43872579497653" (fixed in 3.8)

jpoimboe added a commit to jpoimboe/docker that referenced this issue Dec 2, 2013
Allow the user to set DOCKER_NOWARN_KERNEL_VERSION=1 to disable the
warning for RHEL 6.5 and other distributions that don't exhibit the
panics described in moby#407.
@gdm85
Copy link
Contributor

gdm85 commented May 15, 2014

can we keep this open? still verified with 0.11 and kernel 3.2.0 (2 processors)

https://gist.github.com/gdm85/9328ae13653e5683adda

@unclejack
Copy link
Contributor

@gdm85 No, this will stay closed. Kernels older than 3.8 aren't supported. That means technical support isn't provided and you might run into unexpected behavior, even if it seems like it's working.

The only exception is the kernel provided by RHEL6 (2.6.32xxxxxx) which was patched and improved to work properly with Docker.

Docker on kernels older than 3.8 won't happen. Please upgrade your kernel.

@creack creack removed their assignment Jul 24, 2014
shykes pushed a commit to shykes/docker-dev that referenced this issue Oct 2, 2014
Allow the user to set DOCKER_NOWARN_KERNEL_VERSION=1 to disable the
warning for RHEL 6.5 and other distributions that don't exhibit the
panics described in moby/moby#407.
sbasyal added a commit to sbasyal/docker that referenced this issue Apr 15, 2015
The link to issue 407 was broken. The old link was: moby#407 
The link must be: moby#407
sbasyal added a commit to sbasyal/docker that referenced this issue Apr 15, 2015
The link to issue 407 was broken. The old link was: moby#407
The link must be: moby#407

Signed-off-by: Sabin Basyal <sabin.basyal@gmail.com>
rtyler pushed a commit to rtyler/docker that referenced this issue Feb 23, 2018
@fiveways
Copy link

fiveways commented Jan 4, 2019

I know this is an old bug, has any one tried to reproduce this using the VFS driver instead of aufs ?

kolyshkin pushed a commit to kolyshkin/moby that referenced this issue Oct 28, 2019
…ainer_error

[19.03 backport] Propagate GetContainer error from event processor
dperny pushed a commit to dperny/docker that referenced this issue Oct 18, 2021
…9.03-6eeb9ec3d6517883995b327b2e92b12a3cfc3074

[19.03] sync to upstream 19.03 4bed012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests