Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upEnsure Xen free_memory is always guaranteed to be above a fixed threshold #563
Comments
marmarek
added this to the Release 1 milestone
Mar 8, 2015
marmarek
added
bug
C: other
P: major
labels
Mar 8, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by joanna on 11 May 2012 15:53 UTC
Ok, so this problem seemed to be caused by memmax set to 800 -- seems like one of the recent cores modified maxmem automatically for VMs that had mem set to some fixed value (which was the case for this VM, which I wanted to be excluded from dynamic mem balancing -- I assigned it 800MB). After I changed to the follwing:
mem = 800MB
maxmem = 4037MB
it starts fine again.
|
Comment by joanna on 11 May 2012 15:53 UTC it starts fine again. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by joanna on 12 May 2012 12:35 UTC
The maxmem = mem setting, on the other hand, seems necessary for passthrough PCI devices to work fine in this VM -- see #525...
|
Comment by joanna on 12 May 2012 12:35 UTC |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by marmarek on 12 May 2012 22:44 UTC
This BUG is reaction to hipercall fail. What do you have in xl dmesg (or hypervisor.log)?
Relevant lines from kernel source:
321 if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
322 BUG();
323
|
Comment by marmarek on 12 May 2012 22:44 UTC Relevant lines from kernel source:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by marmarek on 18 May 2012 20:40 UTC
Perhaps this is connected with problem that VM sees different amount of memory, than dom0 sets...
I've found recently some patches in Konrad's git tree, that can fix it (http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=2e2fb75475c2fc74c98100f1468c8195fee49f3b - perhaps with all selfbaloon branch). Maybe this is solution? Will try to apply this branch to our kernel.
|
Comment by marmarek on 18 May 2012 20:40 UTC |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by joanna on 21 May 2012 08:13 UTC
But what is singular about this is that is happened suddenly one day, while before it worked fine with mem=maxmem, and today it also works fine with mem=maxmem. Kind of a one-time accident, that I'm unable to reproduce... Very strange.
|
Comment by joanna on 21 May 2012 08:13 UTC |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by joanna on 26 May 2012 11:45 UTC
I cannot reproduce it. I will close it for now.
|
Comment by joanna on 26 May 2012 11:45 UTC |
marmarek
added
the
worksforme
label
Mar 8, 2015
marmarek
closed this
Mar 8, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by joanna on 22 Jun 2012 12:03 UTC
Ok, just got this again:
[ 0.004000] CPU: Physical Processor ID: 0
[ 0.004000] CPU: Processor Core ID: 0
[ 0.004000] SMP alternatives: switching to UP code
[ 0.008081] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.
[ 0.008335] installing Xen timer for CPU 1
[ 0.008362] ------------[ cut here ]------------
[ 0.008368] kernel BUG at /home/user/qubes-src/kernel/kernel-3.2.7/linux-3.2.7/arch/x86/xen/smp.c:322!
[ 0.008376] invalid opcode: 0000 [SMP
[ 0.008400](#1]) CPU 0
[ 0.008402] Modules linked in:
[ 0.008408]
[ 0.008412] Pid: 1, comm: swapper/0 Not tainted 3.2.7-5.pvops.qubes.x86_64 #1
[ 0.008420] RIP: e030:[ [<ffffffff8143a229>](<ffffffff8143a229>]) cpu_initialize_context+0x263/0x280
[ 0.008433] RSP: e02b:ffff880018063e10 EFLAGS: 00010282
[ 0.008437] RAX: fffffffffffffff4 RBX: ffff8800180c0000 RCX: 0000000000000000
[ 0.008442] RDX: ffff8800180c0000 RSI: 0000000000000001 RDI: 0000000000000000
[ 0.008447] RBP: ffff880018063e50 R08: 00003ffffffff000 R09: ffff880000000000
[ 0.008452] R10: ffff8800180c0000 R11: 0000000000002000 R12: 0000000000000001
[ 0.008457] R13: ffff880018f82d30 R14: ffff88001806e0c0 R15: 000000000004d0d3
[ 0.008467] FS: 0000000000000000(0000) GS:ffff880018f5c000(0000) knlGS:0000000000000000
[ 0.008474] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 0.008479] CR2: 0000000000000000 CR3: 0000000001805000 CR4: 0000000000002660
[ 0.008485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.008490] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 0.008496] Process swapper/0 (pid: 1, threadinfo ffff880018062000, task ffff880018060040)
[ 0.008502] Stack:
[ 0.008505] ffff88001806e0c0 0000000000018f7b ffffffff81866c80 0000000000000001
[ 0.008515] ffff88001806e0c0 0000000000000001 ffffffff81866c80 0000000000000001
[ 0.008525] ffff880018063e80 ffffffff8143a2e1 ffff880018063e70 0000000000000000
[ 0.008535] Call Trace:
[ 0.008542] [xen_cpu_up+0x9b/0x115
[ 0.008548](<ffffffff8143a2e1>]) [_cpu_up+0x9c/0x10e
[ 0.008555](<ffffffff81440ad8>]) [cpu_up+0x75/0x85
[ 0.008562](<ffffffff81440bbf>]) [smp_init+0x46/0x9e
[ 0.008569](<ffffffff818998f1>]) [kernel_init+0x89/0x142
[ 0.008577](<ffffffff8188263c>]) [kernel_thread_helper+0x4/0x10
[ 0.008585](<ffffffff814518b4>]) [? int_ret_from_sys_call+0x7/0x1b
[ 0.008593](<ffffffff8144f973>]) [? retint_restore_args+0x5/0x6
[ 0.008600](<ffffffff81447d7c>]) [? gs_change+0x13/0x13
[ 0.008604](<ffffffff814518b0>]) Code: 74 0d 48 ba ff ff ff ff ff ff ff 3f 48 21 d0 48 c1 e0 0c 31 ff 49 63 f4 48 89 83 90 13 00 00 48 89 da e8 db 70 bc ff 85 c0 74 04 <0f> 0b eb fe 48 89 df e8 db f6 ce ff 31 c0 48 83 c4 18 5b 41 5c
[ 0.008682] RIP [cpu_initialize_context+0x263/0x280
[ 0.008691](<ffffffff8143a229>]) RSP <ffff880018063e10>
[ 0.008701] ---[ end trace 4eaa2a86a8e2da22 ]---
[ 0.008715] Kernel panic - not syncing: Attempted to kill init!
[ 0.008722] Pid: 1, comm: swapper/0 Tainted: G D 3.2.7-5.pvops.qubes.x86_64 #1
[ 0.008728] Call Trace:
[ 0.008734] [panic+0x8c/0x1a2
[ 0.008742](<ffffffff81444c4a>]) [? enqueue_entity+0x74/0x2f0
[ 0.008750](<ffffffff81059814>]) [forget_original_parent+0x34d/0x360
[ 0.008758](<ffffffff8106113d>]) [? xen_restore_fl_direct_reloc+0x4/0x4
[ 0.008765](<ffffffff8100a05f>]) [? _raw_spin_unlock_irqrestore+0x11/0x20
[ 0.008774](<ffffffff814478b1>]) [? sched_move_task+0x93/0x150
[ 0.008781](<ffffffff8104acb3>]) [exit_notify+0x12/0x190
[ 0.008787](<ffffffff81061162>]) [do_exit+0x1ed/0x3e0
[ 0.008794](<ffffffff81062a3d>]) [oops_end+0xa6/0xf0
[ 0.008801](<ffffffff814489e6>]) [die+0x56/0x90
[ 0.008807](<ffffffff81016476>]) [do_trap+0xc4/0x170
[ 0.008813](<ffffffff81448584>]) [do_invalid_op+0x90/0xb0
[ 0.008820](<ffffffff81014440>]) [? cpu_initialize_context+0x263/0x280
[ 0.008829](<ffffffff8143a229>]) [? cache_grow.clone.0+0x2b4/0x3b0
[ 0.008836](<ffffffff81128ce4>]) [? xen_restore_fl_direct_reloc+0x4/0x4
[ 0.008843](<ffffffff8100a05f>]) [? pte_mfn_to_pfn+0x71/0xf0
[ 0.008851](<ffffffff810052f1>]) [invalid_op+0x1b/0x20
[ 0.008857](<ffffffff8145172b>]) [? cpu_initialize_context+0x263/0x280
[ 0.008864](<ffffffff8143a229>]) [xen_cpu_up+0x9b/0x115
[ 0.008870](<ffffffff8143a2e1>]) [_cpu_up+0x9c/0x10e
[ 0.008876](<ffffffff81440ad8>]) [cpu_up+0x75/0x85
[ 0.008882](<ffffffff81440bbf>]) [smp_init+0x46/0x9e
[ 0.008888](<ffffffff818998f1>]) [kernel_init+0x89/0x142
[ 0.008895](<ffffffff8188263c>]) [kernel_thread_helper+0x4/0x10
[ 0.008901](<ffffffff814518b4>]) [? int_ret_from_sys_call+0x7/0x1b
[ 0.008909](<ffffffff8144f973>]) [? retint_restore_args+0x5/0x6
[ 0.008916](<ffffffff81447d7c>]) [<ffffffff814518b0>] ? gs_change+0x13/0x13
Nothing in xl dmesg or in Dom0's dmesg. Again, I really changed NOTHING -- it started crashing my VMs suddenly, and keeps occurring no matter what VM I'm starting...
xen-4.1.2-13
dom0 kernel: 3.2.7-6
|
Comment by joanna on 22 Jun 2012 12:03 UTC
Nothing in xl dmesg or in Dom0's dmesg. Again, I really changed NOTHING -- it started crashing my VMs suddenly, and keeps occurring no matter what VM I'm starting... xen-4.1.2-13 |
marmarek
added
P: critical
and removed
C: other
P: major
worksforme
labels
Mar 8, 2015
marmarek
reopened this
Mar 8, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by joanna on 22 Jun 2012 18:27 UTC
As discussed here:
http://lists.xen.org/archives/html/xen-devel/2012-06/msg01314.html
... this was really an out of memory condition. As further discussed in the thread, it would be nice to have a generic way to handle such out of memory conditions in Qubes/Xen. Renaming the ticket accordingly...
|
Comment by joanna on 22 Jun 2012 18:27 UTC http://lists.xen.org/archives/html/xen-devel/2012-06/msg01314.html ... this was really an out of memory condition. As further discussed in the thread, it would be nice to have a generic way to handle such out of memory conditions in Qubes/Xen. Renaming the ticket accordingly... |
marmarek
added
P: major
and removed
P: critical
labels
Mar 8, 2015
marmarek
changed the title from
Strange kernel bug upon VM start
to
Handle runtime Xen out of memory conditions in a user friendly way
Mar 8, 2015
marmarek
added
enhancement
and removed
bug
labels
Mar 8, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by marmarek on 22 Jun 2012 22:37 UTC
So what do you propose? I don't see any solution for handling such errors in above thread. Only some hints how to try to mitigate it, but all looks like much more work than it worth (like porting part of XenServer toolstack or digging through full xen-unstable commit history).
Perhaps we should just increase xen free memory threshold in qmemman (currently 50MB) and/or investigate why xen_free_mem=0 happend.
|
Comment by marmarek on 22 Jun 2012 22:37 UTC Perhaps we should just increase xen free memory threshold in qmemman (currently 50MB) and/or investigate why xen_free_mem=0 happend. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by joanna on 28 Jun 2012 09:32 UTC
After some discussion we concluded that the best we can do for now (i.e. before switching to Xen 4.2, which is far on the horizon), is to ensure that Xen free_memory is always guaranteed to be above some threshold. Specifically we should change the qmemman logic so that it doesn't give any memory to VM's if this breaks the Xen free_memory condition even for a moment (in other words we should not be counting that memory could be recovered from a VM).
|
Comment by joanna on 28 Jun 2012 09:32 UTC |
marmarek
changed the title from
Handle runtime Xen out of memory conditions in a user friendly way
to
Ensure Xen free_memory is always guaranteed to be above a fixed threshold
Mar 8, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Modified by joanna on 28 Jun 2012 09:32 UTC |
marmarek
added
the
C: core
label
Mar 8, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by marmarek on 4 Jul 2012 23:50 UTC
http://git.qubes-os.org/gitweb/?p=marmarek/core.git;a=commit;h=b4070a99a3b792c0251865dd824c566e32b14623
|
Comment by marmarek on 4 Jul 2012 23:50 UTC |
marmarek commentedMar 8, 2015
Reported by joanna on 11 May 2012 15:49 UTC
Previously I had no problems starting this VM, but suddenly, today, I crashes:
Migrated-From: https://wiki.qubes-os.org/ticket/563