New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM kernel bug on heavy adjustment #661

Closed
marmarek opened this Issue Mar 8, 2015 · 1 comment

Comments

Projects
None yet
1 participant
@marmarek
Member

marmarek commented Mar 8, 2015

Reported by marmarek on 7 Oct 2012 23:48 UTC
http://groups.google.com/group/qubes-devel/browse_thread/thread/399f43286b90e43a

I've seen this behavior a number of times before in both regular AppVMs and
DispVMs, but I just grabbed the log now.  The bug typically, though not
exclusively, manifests while browsing the web with Chrome.  It seems more
'activity' makes the presentation more likely. 

kernel message:

[ 5753.922726] init_memory_mapping: 0000000188000000-0000000198000000
[ 5753.935198] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 1061826
[ 5753.935518] ------------[ cut here ]------------
[ 5753.935524] kernel BUG at /home/user/qubes-src/kernel/kernel-3.4.12/linux-3.4.12/arch/x86/xen/p2m.c:460!
[ 5753.935529] invalid opcode: 0000 [SMP 
[ 5753.935532](#1]) CPU 0 
[ 5753.935542] Modules linked in: bnep bluetooth rfkill lockd sunrpc ipt_REJECT xt_state xt_tcpudp iptable_filter ipt_MASQUERADE ip6table_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip6_tables ip_tables x_tables coretemp hwmon crc32c_intel ghash_clmulni_intel xen_netfront microcode pcspkr u2mfn(O) xen_blkback xen_evtchn uinput autofs4 ext4 jbd2 crc16 scsi_mod aesni_intel cryptd aes_x86_64 aes_generic dm_snapshot xen_blkfront [unloaded: scsi_wait_scan](last)
[ 5753.935579] 
[ 5753.935582] Pid: 1928, comm: kworker/0:2 Tainted: G           O 3.4.12-1.pvops.qubes.x86_64 #1  
[ 5753.935587] RIP: e030:[ [<ffffffff8100ae7f>](<ffffffff8100ae7f>]) alloc_p2m+0x25f/0x270
[ 5753.935596] RSP: e02b:ffff88007858dce0  EFLAGS: 00010202
[ 5753.935599] RAX: 000000000004c611 RBX: 0000000000197fff RCX: 0000000000000000
[ 5753.935603] RDX: 3fffffffffffffff RSI: 0000000000000001 RDI: 0000000000000002
[ 5753.935606] RBP: ffff88007858dd20 R08: 0000000000000000 R09: 00000000000b5d4d
[ 5753.935609] R10: 0000000000000001 R11: dead000000200200 R12: 0000000000000006
[ 5753.935613] R13: ffffffff81a02000 R14: ffff880003a41000 R15: ffffffff81a04030
[ 5753.935619] FS:  00007fb901137840(0000) GS:ffff880018c00000(0000) knlGS:0000000000000000
[ 5753.935623] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 5753.935626] CR2: 00007fb90038fe60 CR3: 0000000013462000 CR4: 0000000000002660
[ 5753.935630] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5753.935633] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 5753.935637] Process kworker/0:2 (pid: 1928, threadinfo ffff88007858c000, task ffff88000360c780)
[ 5753.935641] Stack:
[ 5753.935643]  ffffffff8100118a 000000000000000c 0000000000000246 0000000000197fff
[ 5753.935648]  000000000041df7d ffffea000593ffe8 0000160000000000 0000000000000200
[ 5753.935653]  ffff88007858dd40 ffffffff8100aec3 ffffea000593ffc8 0000000000000000
[ 5753.935658] Call Trace:
[ 5753.935662]  [? hypercall_page+0x18a/0x1000
[ 5753.935668](<ffffffff8100118a>])  [set_phys_to_machine+0x33/0x50
[ 5753.935673](<ffffffff8100aec3>])  [increase_reservation+0x19f/0x2b0
[ 5753.935678](<ffffffff812b6dff>])  [? decrease_reservation+0x350/0x350
[ 5753.935682](<ffffffff812b7260>])  [balloon_process+0x103/0x220
[ 5753.935687](<ffffffff812b7363>])  [? blocking_notifier_call_chain+0x11/0x20
[ 5753.935692](<ffffffff810719c1>])  [process_one_work+0x125/0x470
[ 5753.935696](<ffffffff81064775>])  [worker_thread+0x177/0x420
[ 5753.935699](<ffffffff81066d17>])  [? manage_workers+0x120/0x120
[ 5753.935703](<ffffffff81066ba0>])  [kthread+0x96/0xa0
[ 5753.935708](<ffffffff8106ba96>])  [kernel_thread_helper+0x4/0x10
[ 5753.935712](<ffffffff814703a4>])  [? retint_restore_args+0x5/0x6
[ 5753.935717](<ffffffff814675b8>])  [? gs_change+0x13/0x13
[ 5753.935719](<ffffffff814703a0>]) Code: 48 39 c2 74 16 31 f6 4c 89 ef e8 2d e8 0e 00 e9 45 fe ff ff 31 c0 e9 76 fe ff ff 48 8b 05 42 6f 90 00 4e 89 2c e0 e9 2e fe ff ff <0f> 0b eb fe 0f 0b eb fe 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 
[ 5753.935748] RIP  [alloc_p2m+0x25f/0x270
[ 5753.935753](<ffffffff8100ae7f>])  RSP <ffff88007858dce0>
[ 5753.935760] ---[ end trace c38db828949e8622 ]---
[ 5753.935795] BUG: unable to handle kernel paging request at fffffffffffffff8
[ 5753.935800] IP: [kthread_data+0xb/0x20
[ 5753.935805](<ffffffff8106b53b>]) PGD 180d067 PUD 180e067 PMD 0 
[ 5753.935809] Oops: 0000 [SMP 
[ 5753.935812](#2]) CPU 0 
[ 5753.935813] Modules linked in: bnep bluetooth rfkill lockd sunrpc ipt_REJECT xt_state xt_tcpudp iptable_filter ipt_MASQUERADE ip6table_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip6_tables ip_tables x_tables coretemp hwmon crc32c_intel ghash_clmulni_intel xen_netfront microcode pcspkr u2mfn(O) xen_blkback xen_evtchn uinput autofs4 ext4 jbd2 crc16 scsi_mod aesni_intel cryptd aes_x86_64 aes_generic dm_snapshot xen_blkfront [unloaded: scsi_wait_scan](last)
[ 5753.935846] 
[ 5753.935848] Pid: 1928, comm: kworker/0:2 Tainted: G      D    O 3.4.12-1.pvops.qubes.x86_64 #1  
[ 5753.935853] RIP: e030:[ [<ffffffff8106b53b>](<ffffffff8106b53b>]) kthread_data+0xb/0x20
[ 5753.935858] RSP: e02b:ffff88007858d9c8  EFLAGS: 00010096
[ 5753.935861] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 5753.935864] RDX: ffffffff81982980 RSI: 0000000000000000 RDI: ffff88000360c780
[ 5753.935868] RBP: ffff88007858d9c8 R08: 0000000000989680 R09: 0000000000000000
[ 5753.935871] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[ 5753.935874] R13: ffff88000360cb60 R14: 0000000000000001 R15: 0000000000000006
[ 5753.935879] FS:  00007fb901137840(0000) GS:ffff880018c00000(0000) knlGS:0000000000000000
[ 5753.935883] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 5753.935886] CR2: fffffffffffffff8 CR3: 0000000013462000 CR4: 0000000000002660
[ 5753.935889] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5753.935893] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 5753.935896] Process kworker/0:2 (pid: 1928, threadinfo ffff88007858c000, task ffff88000360c780)
[ 5753.935900] Stack:
[ 5753.935901]  ffff88007858d9e8 ffffffff810643a0 ffff88007858d9e8 ffff880018c13440
[ 5753.935906]  ffff88007858da78 ffffffff81465f2c ffff88007858dfd8 0000000000013440
[ 5753.935911]  ffff88007858c010 0000000000013440 0000000000013440 0000000000013440
[ 5753.935915] Call Trace:
[ 5753.935919]  [wq_worker_sleeping+0x10/0xa0
[ 5753.935923](<ffffffff810643a0>])  [__schedule+0x55c/0x6e0
[ 5753.935926](<ffffffff81465f2c>])  [schedule+0x24/0x70
[ 5753.935930](<ffffffff81466394>])  [do_exit+0x27d/0x420
[ 5753.935934](<ffffffff8104f17d>])  [oops_end+0xa7/0xf0
[ 5753.935938](<ffffffff81468367>])  [die+0x56/0x90
[ 5753.935941](<ffffffff81016566>])  [do_trap+0xc4/0x170
[ 5753.935945](<ffffffff81467ed4>])  [do_invalid_op+0x90/0xb0
[ 5753.935948](<ffffffff81014480>])  [? alloc_p2m+0x25f/0x270
[ 5753.935952](<ffffffff8100ae7f>])  [invalid_op+0x1b/0x20
[ 5753.935956](<ffffffff8147021b>])  [? alloc_p2m+0x25f/0x270
[ 5753.935960](<ffffffff8100ae7f>])  [? hypercall_page+0x18a/0x1000
[ 5753.935964](<ffffffff8100118a>])  [set_phys_to_machine+0x33/0x50
[ 5753.935969](<ffffffff8100aec3>])  [increase_reservation+0x19f/0x2b0
[ 5753.935973](<ffffffff812b6dff>])  [? decrease_reservation+0x350/0x350
[ 5753.935976](<ffffffff812b7260>])  [balloon_process+0x103/0x220
[ 5753.935980](<ffffffff812b7363>])  [? blocking_notifier_call_chain+0x11/0x20
[ 5753.935984](<ffffffff810719c1>])  [process_one_work+0x125/0x470
[ 5753.935988](<ffffffff81064775>])  [worker_thread+0x177/0x420
[ 5753.935991](<ffffffff81066d17>])  [? manage_workers+0x120/0x120
[ 5753.935995](<ffffffff81066ba0>])  [kthread+0x96/0xa0
[ 5753.935998](<ffffffff8106ba96>])  [kernel_thread_helper+0x4/0x10
[ 5753.936002](<ffffffff814703a4>])  [? retint_restore_args+0x5/0x6
[ 5753.936006](<ffffffff814675b8>])  [? gs_change+0x13/0x13
[ 5753.936008](<ffffffff814703a0>]) Code: 55 65 48 8b 04 25 80 c6 00 00 48 8b 80 88 03 00 00 48 89 e5 8b 40 f0 c9 c3 0f 1f 80 00 00 00 00 48 8b 87 88 03 00 00 55 48 89 e5 <48> 8b 40 f8 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 
[ 5753.936037] RIP  [kthread_data+0xb/0x20
[ 5753.936041](<ffffffff8106b53b>])  RSP <ffff88007858d9c8>
[ 5753.936043] CR2: fffffffffffffff8
[ 5753.936045] ---[ end trace c38db828949e8623 ]---
[ 5753.936047] Fixing recursive fault but reboot is needed!

Looks like xen-balloon or p2m bug. If already reported (and eventually fixed) to xen-devel, we need to include the patch in our kernel. If not - need to be reported there.

The VM was assigned around 4090-4098MB of RAM, so it can be connected with bug described in InstallationGuide#KnownIssues

PS Assigning to component "kernel-dom0" because now we use the same kernel in dom0 and VM.

Migrated-From: https://wiki.qubes-os.org/ticket/661

@marmarek marmarek self-assigned this Mar 8, 2015

@marmarek marmarek added this to the Release 1 (fixes) milestone Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Modified by joanna on 8 Feb 2013 13:04 UTC

Member

marmarek commented Mar 8, 2015

Modified by joanna on 8 Feb 2013 13:04 UTC

@marmarek marmarek added the worksforme label Mar 8, 2015

@marmarek marmarek closed this Mar 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment