New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure Xen free_memory is always guaranteed to be above a fixed threshold #563

Closed
marmarek opened this Issue Mar 8, 2015 · 12 comments

Comments

Projects
None yet
1 participant
@marmarek
Member

marmarek commented Mar 8, 2015

Reported by joanna on 11 May 2012 15:49 UTC
Previously I had no problems starting this VM, but suddenly, today, I crashes:

[    0.012366] installing Xen timer for CPU 1
[    0.012394] ------------[ cut here ]------------
[    0.012400] kernel BUG at /home/user/qubes-src/kernel/kernel-3.2.7/linux-3.2.7/arch/x86/xen/smp.c:322!
[    0.012409] invalid opcode: 0000 [SMP 
[    0.012416](#1]) CPU 0 
[    0.012418] Modules linked in:
[    0.012424] 
[    0.012429] Pid: 1, comm: swapper/0 Not tainted 3.2.7-3.pvops.qubes.x86_64 #1  
[    0.012438] RIP: e030:[ [<ffffffff8143a229>](<ffffffff8143a229>]) cpu_initialize_context+0x263/0x280
[    0.012452] RSP: e02b:ffff880031863e10  EFLAGS: 00010282
[    0.012457] RAX: fffffffffffffff4 RBX: ffff8800318c0000 RCX: 0000000000000000
[    0.012463] RDX: ffff8800318c0000 RSI: 0000000000000001 RDI: 0000000000000000
[    0.012470] RBP: ffff880031863e50 R08: 00003ffffffff000 R09: ffff880000000000
[    0.012476] R10: ffff8800318c0000 R11: 0000000000002000 R12: 0000000000000001
[    0.012482] R13: ffff880031f82d30 R14: ffff88003186e0c0 R15: 0000000000039130
[    0.012491] FS:  0000000000000000(0000) GS:ffff880031f5c000(0000) knlGS:0000000000000000
[    0.012498] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.012504] CR2: 0000000000000000 CR3: 0000000001805000 CR4: 0000000000002660
[    0.012510] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.012516] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.012523] Process swapper/0 (pid: 1, threadinfo ffff880031862000, task ffff880031860040)
[    0.012529] Stack:
[    0.012532]  ffff88003186e0c0 0000000000031f7b ffffffff81866c80 0000000000000001
[    0.012544]  ffff88003186e0c0 0000000000000001 ffffffff81866c80 0000000000000001
[    0.012554]  ffff880031863e80 ffffffff8143a2e1 ffff880031863e70 0000000000000000
[    0.012564] Call Trace:
[    0.012572]  [xen_cpu_up+0x9b/0x115
[    0.012579](<ffffffff8143a2e1>])  [_cpu_up+0x9c/0x10e
[    0.012585](<ffffffff81440ad8>])  [cpu_up+0x75/0x85
[    0.012593](<ffffffff81440bbf>])  [smp_init+0x46/0x9e
[    0.012600](<ffffffff818998f1>])  [kernel_init+0x89/0x142
[    0.012607](<ffffffff8188263c>])  [kernel_thread_helper+0x4/0x10
[    0.012615](<ffffffff814518b4>])  [? int_ret_from_sys_call+0x7/0x1b
[    0.012624](<ffffffff8144f973>])  [? retint_restore_args+0x5/0x6
[    0.012632](<ffffffff81447d7c>])  [? gs_change+0x13/0x13
[    0.012637](<ffffffff814518b0>]) Code: 74 0d 48 ba ff ff ff ff ff ff ff 3f 48 21 d0 48 c1 e0 0c 31 ff 49 63 f4 48 89 83 90 13 00 00 48 89 da e8 db 70 bc ff 85 c0 74 04 <0f> 0b eb fe 48 89 df e8 db f6 ce ff 31 c0 48 83 c4 18 5b 41 5c 
[    0.012718] RIP  [cpu_initialize_context+0x263/0x280
[    0.012727](<ffffffff8143a229>])  RSP <ffff880031863e10>
[    0.012738] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.012753] Kernel panic - not syncing: Attempted to kill init!
[    0.012759] Pid: 1, comm: swapper/0 Tainted: G      D      3.2.7-3.pvops.qubes.x86_64 #1
[    0.012765] Call Trace:
[    0.012770]  [panic+0x8c/0x1a2
[    0.012778](<ffffffff81444c4a>])  [? enqueue_entity+0x74/0x2f0
[    0.012785](<ffffffff81059814>])  [forget_original_parent+0x34d/0x360
[    0.012793](<ffffffff8106113d>])  [? xen_restore_fl_direct_reloc+0x4/0x4
[    0.012801](<ffffffff8100a05f>])  [? _raw_spin_unlock_irqrestore+0x11/0x20
[    0.012809](<ffffffff814478b1>])  [? sched_move_task+0x93/0x150
[    0.012816](<ffffffff8104acb3>])  [exit_notify+0x12/0x190
[    0.012822](<ffffffff81061162>])  [do_exit+0x1ed/0x3e0
[    0.012828](<ffffffff81062a3d>])  [oops_end+0xa6/0xf0
[    0.012833](<ffffffff814489e6>])  [die+0x56/0x90
[    0.012837](<ffffffff81016476>])  [do_trap+0xc4/0x170
[    0.012841](<ffffffff81448584>])  [do_invalid_op+0x90/0xb0
[    0.012846](<ffffffff81014440>])  [? cpu_initialize_context+0x263/0x280
[    0.012853](<ffffffff8143a229>])  [? cache_grow.clone.0+0x2b4/0x3b0
[    0.012857](<ffffffff81128ce4>])  [? xen_restore_fl_direct_reloc+0x4/0x4
[    0.012862](<ffffffff8100a05f>])  [? pte_mfn_to_pfn+0x71/0xf0
[    0.012867](<ffffffff810052f1>])  [invalid_op+0x1b/0x20
[    0.012873](<ffffffff8145172b>])  [? cpu_initialize_context+0x263/0x280
[    0.012880](<ffffffff8143a229>])  [xen_cpu_up+0x9b/0x115
[    0.012886](<ffffffff8143a2e1>])  [_cpu_up+0x9c/0x10e
[    0.012893](<ffffffff81440ad8>])  [cpu_up+0x75/0x85
[    0.012899](<ffffffff81440bbf>])  [smp_init+0x46/0x9e
[    0.012904](<ffffffff818998f1>])  [kernel_init+0x89/0x142
[    0.012911](<ffffffff8188263c>])  [kernel_thread_helper+0x4/0x10
[    0.012918](<ffffffff814518b4>])  [? int_ret_from_sys_call+0x7/0x1b
[    0.012925](<ffffffff8144f973>])  [? retint_restore_args+0x5/0x6
[    0.012932](<ffffffff81447d7c>])  [<ffffffff814518b0>] ? gs_change+0x13/0x13

Migrated-From: https://wiki.qubes-os.org/ticket/563

@marmarek marmarek added this to the Release 1 milestone Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 11 May 2012 15:53 UTC
Ok, so this problem seemed to be caused by memmax set to 800 -- seems like one of the recent cores modified maxmem automatically for VMs that had mem set to some fixed value (which was the case for this VM, which I wanted to be excluded from dynamic mem balancing -- I assigned it 800MB). After I changed to the follwing:
mem = 800MB
maxmem = 4037MB

it starts fine again.

Member

marmarek commented Mar 8, 2015

Comment by joanna on 11 May 2012 15:53 UTC
Ok, so this problem seemed to be caused by memmax set to 800 -- seems like one of the recent cores modified maxmem automatically for VMs that had mem set to some fixed value (which was the case for this VM, which I wanted to be excluded from dynamic mem balancing -- I assigned it 800MB). After I changed to the follwing:
mem = 800MB
maxmem = 4037MB

it starts fine again.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 12 May 2012 12:35 UTC
The maxmem = mem setting, on the other hand, seems necessary for passthrough PCI devices to work fine in this VM -- see #525...

Member

marmarek commented Mar 8, 2015

Comment by joanna on 12 May 2012 12:35 UTC
The maxmem = mem setting, on the other hand, seems necessary for passthrough PCI devices to work fine in this VM -- see #525...

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by marmarek on 12 May 2012 22:44 UTC
This BUG is reaction to hipercall fail. What do you have in xl dmesg (or hypervisor.log)?

Relevant lines from kernel source:

321     if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
322         BUG();
323 
Member

marmarek commented Mar 8, 2015

Comment by marmarek on 12 May 2012 22:44 UTC
This BUG is reaction to hipercall fail. What do you have in xl dmesg (or hypervisor.log)?

Relevant lines from kernel source:

321     if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
322         BUG();
323 
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by marmarek on 18 May 2012 20:40 UTC
Perhaps this is connected with problem that VM sees different amount of memory, than dom0 sets...
I've found recently some patches in Konrad's git tree, that can fix it (http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=2e2fb75475c2fc74c98100f1468c8195fee49f3b - perhaps with all selfbaloon branch). Maybe this is solution? Will try to apply this branch to our kernel.

Member

marmarek commented Mar 8, 2015

Comment by marmarek on 18 May 2012 20:40 UTC
Perhaps this is connected with problem that VM sees different amount of memory, than dom0 sets...
I've found recently some patches in Konrad's git tree, that can fix it (http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=2e2fb75475c2fc74c98100f1468c8195fee49f3b - perhaps with all selfbaloon branch). Maybe this is solution? Will try to apply this branch to our kernel.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 21 May 2012 08:13 UTC
But what is singular about this is that is happened suddenly one day, while before it worked fine with mem=maxmem, and today it also works fine with mem=maxmem. Kind of a one-time accident, that I'm unable to reproduce... Very strange.

Member

marmarek commented Mar 8, 2015

Comment by joanna on 21 May 2012 08:13 UTC
But what is singular about this is that is happened suddenly one day, while before it worked fine with mem=maxmem, and today it also works fine with mem=maxmem. Kind of a one-time accident, that I'm unable to reproduce... Very strange.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 26 May 2012 11:45 UTC
I cannot reproduce it. I will close it for now.

Member

marmarek commented Mar 8, 2015

Comment by joanna on 26 May 2012 11:45 UTC
I cannot reproduce it. I will close it for now.

@marmarek marmarek added the worksforme label Mar 8, 2015

@marmarek marmarek closed this Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 22 Jun 2012 12:03 UTC
Ok, just got this again:

[    0.004000] CPU: Physical Processor ID: 0
[    0.004000] CPU: Processor Core ID: 0
[    0.004000] SMP alternatives: switching to UP code
[    0.008081] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.
[    0.008335] installing Xen timer for CPU 1
[    0.008362] ------------[ cut here ]------------
[    0.008368] kernel BUG at /home/user/qubes-src/kernel/kernel-3.2.7/linux-3.2.7/arch/x86/xen/smp.c:322!
[    0.008376] invalid opcode: 0000 [SMP 
[    0.008400](#1]) CPU 0 
[    0.008402] Modules linked in:
[    0.008408] 
[    0.008412] Pid: 1, comm: swapper/0 Not tainted 3.2.7-5.pvops.qubes.x86_64 #1  
[    0.008420] RIP: e030:[ [<ffffffff8143a229>](<ffffffff8143a229>]) cpu_initialize_context+0x263/0x280
[    0.008433] RSP: e02b:ffff880018063e10  EFLAGS: 00010282
[    0.008437] RAX: fffffffffffffff4 RBX: ffff8800180c0000 RCX: 0000000000000000
[    0.008442] RDX: ffff8800180c0000 RSI: 0000000000000001 RDI: 0000000000000000
[    0.008447] RBP: ffff880018063e50 R08: 00003ffffffff000 R09: ffff880000000000
[    0.008452] R10: ffff8800180c0000 R11: 0000000000002000 R12: 0000000000000001
[    0.008457] R13: ffff880018f82d30 R14: ffff88001806e0c0 R15: 000000000004d0d3
[    0.008467] FS:  0000000000000000(0000) GS:ffff880018f5c000(0000) knlGS:0000000000000000
[    0.008474] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.008479] CR2: 0000000000000000 CR3: 0000000001805000 CR4: 0000000000002660
[    0.008485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.008490] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.008496] Process swapper/0 (pid: 1, threadinfo ffff880018062000, task ffff880018060040)
[    0.008502] Stack:
[    0.008505]  ffff88001806e0c0 0000000000018f7b ffffffff81866c80 0000000000000001
[    0.008515]  ffff88001806e0c0 0000000000000001 ffffffff81866c80 0000000000000001
[    0.008525]  ffff880018063e80 ffffffff8143a2e1 ffff880018063e70 0000000000000000
[    0.008535] Call Trace:
[    0.008542]  [xen_cpu_up+0x9b/0x115
[    0.008548](<ffffffff8143a2e1>])  [_cpu_up+0x9c/0x10e
[    0.008555](<ffffffff81440ad8>])  [cpu_up+0x75/0x85
[    0.008562](<ffffffff81440bbf>])  [smp_init+0x46/0x9e
[    0.008569](<ffffffff818998f1>])  [kernel_init+0x89/0x142
[    0.008577](<ffffffff8188263c>])  [kernel_thread_helper+0x4/0x10
[    0.008585](<ffffffff814518b4>])  [? int_ret_from_sys_call+0x7/0x1b
[    0.008593](<ffffffff8144f973>])  [? retint_restore_args+0x5/0x6
[    0.008600](<ffffffff81447d7c>])  [? gs_change+0x13/0x13
[    0.008604](<ffffffff814518b0>]) Code: 74 0d 48 ba ff ff ff ff ff ff ff 3f 48 21 d0 48 c1 e0 0c 31 ff 49 63 f4 48 89 83 90 13 00 00 48 89 da e8 db 70 bc ff 85 c0 74 04 <0f> 0b eb fe 48 89 df e8 db f6 ce ff 31 c0 48 83 c4 18 5b 41 5c 
[    0.008682] RIP  [cpu_initialize_context+0x263/0x280
[    0.008691](<ffffffff8143a229>])  RSP <ffff880018063e10>
[    0.008701] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.008715] Kernel panic - not syncing: Attempted to kill init!
[    0.008722] Pid: 1, comm: swapper/0 Tainted: G      D      3.2.7-5.pvops.qubes.x86_64 #1
[    0.008728] Call Trace:
[    0.008734]  [panic+0x8c/0x1a2
[    0.008742](<ffffffff81444c4a>])  [? enqueue_entity+0x74/0x2f0
[    0.008750](<ffffffff81059814>])  [forget_original_parent+0x34d/0x360
[    0.008758](<ffffffff8106113d>])  [? xen_restore_fl_direct_reloc+0x4/0x4
[    0.008765](<ffffffff8100a05f>])  [? _raw_spin_unlock_irqrestore+0x11/0x20
[    0.008774](<ffffffff814478b1>])  [? sched_move_task+0x93/0x150
[    0.008781](<ffffffff8104acb3>])  [exit_notify+0x12/0x190
[    0.008787](<ffffffff81061162>])  [do_exit+0x1ed/0x3e0
[    0.008794](<ffffffff81062a3d>])  [oops_end+0xa6/0xf0
[    0.008801](<ffffffff814489e6>])  [die+0x56/0x90
[    0.008807](<ffffffff81016476>])  [do_trap+0xc4/0x170
[    0.008813](<ffffffff81448584>])  [do_invalid_op+0x90/0xb0
[    0.008820](<ffffffff81014440>])  [? cpu_initialize_context+0x263/0x280
[    0.008829](<ffffffff8143a229>])  [? cache_grow.clone.0+0x2b4/0x3b0
[    0.008836](<ffffffff81128ce4>])  [? xen_restore_fl_direct_reloc+0x4/0x4
[    0.008843](<ffffffff8100a05f>])  [? pte_mfn_to_pfn+0x71/0xf0
[    0.008851](<ffffffff810052f1>])  [invalid_op+0x1b/0x20
[    0.008857](<ffffffff8145172b>])  [? cpu_initialize_context+0x263/0x280
[    0.008864](<ffffffff8143a229>])  [xen_cpu_up+0x9b/0x115
[    0.008870](<ffffffff8143a2e1>])  [_cpu_up+0x9c/0x10e
[    0.008876](<ffffffff81440ad8>])  [cpu_up+0x75/0x85
[    0.008882](<ffffffff81440bbf>])  [smp_init+0x46/0x9e
[    0.008888](<ffffffff818998f1>])  [kernel_init+0x89/0x142
[    0.008895](<ffffffff8188263c>])  [kernel_thread_helper+0x4/0x10
[    0.008901](<ffffffff814518b4>])  [? int_ret_from_sys_call+0x7/0x1b
[    0.008909](<ffffffff8144f973>])  [? retint_restore_args+0x5/0x6
[    0.008916](<ffffffff81447d7c>])  [<ffffffff814518b0>] ? gs_change+0x13/0x13

Nothing in xl dmesg or in Dom0's dmesg. Again, I really changed NOTHING -- it started crashing my VMs suddenly, and keeps occurring no matter what VM I'm starting...

xen-4.1.2-13
dom0 kernel: 3.2.7-6

Member

marmarek commented Mar 8, 2015

Comment by joanna on 22 Jun 2012 12:03 UTC
Ok, just got this again:

[    0.004000] CPU: Physical Processor ID: 0
[    0.004000] CPU: Processor Core ID: 0
[    0.004000] SMP alternatives: switching to UP code
[    0.008081] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.
[    0.008335] installing Xen timer for CPU 1
[    0.008362] ------------[ cut here ]------------
[    0.008368] kernel BUG at /home/user/qubes-src/kernel/kernel-3.2.7/linux-3.2.7/arch/x86/xen/smp.c:322!
[    0.008376] invalid opcode: 0000 [SMP 
[    0.008400](#1]) CPU 0 
[    0.008402] Modules linked in:
[    0.008408] 
[    0.008412] Pid: 1, comm: swapper/0 Not tainted 3.2.7-5.pvops.qubes.x86_64 #1  
[    0.008420] RIP: e030:[ [<ffffffff8143a229>](<ffffffff8143a229>]) cpu_initialize_context+0x263/0x280
[    0.008433] RSP: e02b:ffff880018063e10  EFLAGS: 00010282
[    0.008437] RAX: fffffffffffffff4 RBX: ffff8800180c0000 RCX: 0000000000000000
[    0.008442] RDX: ffff8800180c0000 RSI: 0000000000000001 RDI: 0000000000000000
[    0.008447] RBP: ffff880018063e50 R08: 00003ffffffff000 R09: ffff880000000000
[    0.008452] R10: ffff8800180c0000 R11: 0000000000002000 R12: 0000000000000001
[    0.008457] R13: ffff880018f82d30 R14: ffff88001806e0c0 R15: 000000000004d0d3
[    0.008467] FS:  0000000000000000(0000) GS:ffff880018f5c000(0000) knlGS:0000000000000000
[    0.008474] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.008479] CR2: 0000000000000000 CR3: 0000000001805000 CR4: 0000000000002660
[    0.008485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.008490] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.008496] Process swapper/0 (pid: 1, threadinfo ffff880018062000, task ffff880018060040)
[    0.008502] Stack:
[    0.008505]  ffff88001806e0c0 0000000000018f7b ffffffff81866c80 0000000000000001
[    0.008515]  ffff88001806e0c0 0000000000000001 ffffffff81866c80 0000000000000001
[    0.008525]  ffff880018063e80 ffffffff8143a2e1 ffff880018063e70 0000000000000000
[    0.008535] Call Trace:
[    0.008542]  [xen_cpu_up+0x9b/0x115
[    0.008548](<ffffffff8143a2e1>])  [_cpu_up+0x9c/0x10e
[    0.008555](<ffffffff81440ad8>])  [cpu_up+0x75/0x85
[    0.008562](<ffffffff81440bbf>])  [smp_init+0x46/0x9e
[    0.008569](<ffffffff818998f1>])  [kernel_init+0x89/0x142
[    0.008577](<ffffffff8188263c>])  [kernel_thread_helper+0x4/0x10
[    0.008585](<ffffffff814518b4>])  [? int_ret_from_sys_call+0x7/0x1b
[    0.008593](<ffffffff8144f973>])  [? retint_restore_args+0x5/0x6
[    0.008600](<ffffffff81447d7c>])  [? gs_change+0x13/0x13
[    0.008604](<ffffffff814518b0>]) Code: 74 0d 48 ba ff ff ff ff ff ff ff 3f 48 21 d0 48 c1 e0 0c 31 ff 49 63 f4 48 89 83 90 13 00 00 48 89 da e8 db 70 bc ff 85 c0 74 04 <0f> 0b eb fe 48 89 df e8 db f6 ce ff 31 c0 48 83 c4 18 5b 41 5c 
[    0.008682] RIP  [cpu_initialize_context+0x263/0x280
[    0.008691](<ffffffff8143a229>])  RSP <ffff880018063e10>
[    0.008701] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.008715] Kernel panic - not syncing: Attempted to kill init!
[    0.008722] Pid: 1, comm: swapper/0 Tainted: G      D      3.2.7-5.pvops.qubes.x86_64 #1
[    0.008728] Call Trace:
[    0.008734]  [panic+0x8c/0x1a2
[    0.008742](<ffffffff81444c4a>])  [? enqueue_entity+0x74/0x2f0
[    0.008750](<ffffffff81059814>])  [forget_original_parent+0x34d/0x360
[    0.008758](<ffffffff8106113d>])  [? xen_restore_fl_direct_reloc+0x4/0x4
[    0.008765](<ffffffff8100a05f>])  [? _raw_spin_unlock_irqrestore+0x11/0x20
[    0.008774](<ffffffff814478b1>])  [? sched_move_task+0x93/0x150
[    0.008781](<ffffffff8104acb3>])  [exit_notify+0x12/0x190
[    0.008787](<ffffffff81061162>])  [do_exit+0x1ed/0x3e0
[    0.008794](<ffffffff81062a3d>])  [oops_end+0xa6/0xf0
[    0.008801](<ffffffff814489e6>])  [die+0x56/0x90
[    0.008807](<ffffffff81016476>])  [do_trap+0xc4/0x170
[    0.008813](<ffffffff81448584>])  [do_invalid_op+0x90/0xb0
[    0.008820](<ffffffff81014440>])  [? cpu_initialize_context+0x263/0x280
[    0.008829](<ffffffff8143a229>])  [? cache_grow.clone.0+0x2b4/0x3b0
[    0.008836](<ffffffff81128ce4>])  [? xen_restore_fl_direct_reloc+0x4/0x4
[    0.008843](<ffffffff8100a05f>])  [? pte_mfn_to_pfn+0x71/0xf0
[    0.008851](<ffffffff810052f1>])  [invalid_op+0x1b/0x20
[    0.008857](<ffffffff8145172b>])  [? cpu_initialize_context+0x263/0x280
[    0.008864](<ffffffff8143a229>])  [xen_cpu_up+0x9b/0x115
[    0.008870](<ffffffff8143a2e1>])  [_cpu_up+0x9c/0x10e
[    0.008876](<ffffffff81440ad8>])  [cpu_up+0x75/0x85
[    0.008882](<ffffffff81440bbf>])  [smp_init+0x46/0x9e
[    0.008888](<ffffffff818998f1>])  [kernel_init+0x89/0x142
[    0.008895](<ffffffff8188263c>])  [kernel_thread_helper+0x4/0x10
[    0.008901](<ffffffff814518b4>])  [? int_ret_from_sys_call+0x7/0x1b
[    0.008909](<ffffffff8144f973>])  [? retint_restore_args+0x5/0x6
[    0.008916](<ffffffff81447d7c>])  [<ffffffff814518b0>] ? gs_change+0x13/0x13

Nothing in xl dmesg or in Dom0's dmesg. Again, I really changed NOTHING -- it started crashing my VMs suddenly, and keeps occurring no matter what VM I'm starting...

xen-4.1.2-13
dom0 kernel: 3.2.7-6

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 22 Jun 2012 18:27 UTC
As discussed here:

http://lists.xen.org/archives/html/xen-devel/2012-06/msg01314.html

... this was really an out of memory condition. As further discussed in the thread, it would be nice to have a generic way to handle such out of memory conditions in Qubes/Xen. Renaming the ticket accordingly...

Member

marmarek commented Mar 8, 2015

Comment by joanna on 22 Jun 2012 18:27 UTC
As discussed here:

http://lists.xen.org/archives/html/xen-devel/2012-06/msg01314.html

... this was really an out of memory condition. As further discussed in the thread, it would be nice to have a generic way to handle such out of memory conditions in Qubes/Xen. Renaming the ticket accordingly...

@marmarek marmarek added P: major and removed P: critical labels Mar 8, 2015

@marmarek marmarek changed the title from Strange kernel bug upon VM start to Handle runtime Xen out of memory conditions in a user friendly way Mar 8, 2015

@marmarek marmarek added enhancement and removed bug labels Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by marmarek on 22 Jun 2012 22:37 UTC
So what do you propose? I don't see any solution for handling such errors in above thread. Only some hints how to try to mitigate it, but all looks like much more work than it worth (like porting part of XenServer toolstack or digging through full xen-unstable commit history).

Perhaps we should just increase xen free memory threshold in qmemman (currently 50MB) and/or investigate why xen_free_mem=0 happend.

Member

marmarek commented Mar 8, 2015

Comment by marmarek on 22 Jun 2012 22:37 UTC
So what do you propose? I don't see any solution for handling such errors in above thread. Only some hints how to try to mitigate it, but all looks like much more work than it worth (like porting part of XenServer toolstack or digging through full xen-unstable commit history).

Perhaps we should just increase xen free memory threshold in qmemman (currently 50MB) and/or investigate why xen_free_mem=0 happend.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 28 Jun 2012 09:32 UTC
After some discussion we concluded that the best we can do for now (i.e. before switching to Xen 4.2, which is far on the horizon), is to ensure that Xen free_memory is always guaranteed to be above some threshold. Specifically we should change the qmemman logic so that it doesn't give any memory to VM's if this breaks the Xen free_memory condition even for a moment (in other words we should not be counting that memory could be recovered from a VM).

Member

marmarek commented Mar 8, 2015

Comment by joanna on 28 Jun 2012 09:32 UTC
After some discussion we concluded that the best we can do for now (i.e. before switching to Xen 4.2, which is far on the horizon), is to ensure that Xen free_memory is always guaranteed to be above some threshold. Specifically we should change the qmemman logic so that it doesn't give any memory to VM's if this breaks the Xen free_memory condition even for a moment (in other words we should not be counting that memory could be recovered from a VM).

@marmarek marmarek changed the title from Handle runtime Xen out of memory conditions in a user friendly way to Ensure Xen free_memory is always guaranteed to be above a fixed threshold Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Modified by joanna on 28 Jun 2012 09:32 UTC

Member

marmarek commented Mar 8, 2015

Modified by joanna on 28 Jun 2012 09:32 UTC

@marmarek marmarek added the C: core label Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment