New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DispVM crashes when reaching about 4GB RAM #733

Closed
marmarek opened this Issue Mar 8, 2015 · 6 comments

Comments

Projects
None yet
1 participant
@marmarek
Member

marmarek commented Mar 8, 2015

Reported by marmarek on 10 Jun 2013 02:09 UTC
Kernel message (almost the same seen also on 3.2.x, 3.4.18, 3.7.6):

[   37.311963] init_memory_mapping: [0x100000000-0x10fffffff](mem)
[   37.347864] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 1039682
[   37.348235] ------------[ cut here ]------------
[   37.348243] kernel BUG at /home/user/qubes-src/kernel/kernel-3.7.6/linux-3.7.6/arch/x86/xen/p2m.c:545!
[   37.348251] invalid opcode: 0000 [SMP 
[   37.348258](#1]) Modules linked in: lockd sunrpc ipt_MASQUERADE ip6table_filter ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables xen_netfront coretemp hwmon microcode pcspkr u2mfn(O) dummy_hcd udc_core xen_usbfront usbcore usb_common xen_blkback uinput binfmt_misc autofs4 crc32c_intel ghash_clmulni_intel ext4 crc16 jbd2 aesni_intel aes_x86_64 ablk_helper cryptd xts lrw gf128mul dm_snapshot xen_blkfront
[   37.348323] CPU 0 
[   37.348329] Pid: 4, comm: kworker/0:0 Tainted: G           O 3.7.6-2.pvops.qubes.x86_64 #1  
[   37.348337] RIP: e030:[ [<ffffffff8100b065>](<ffffffff8100b065>]) set_phys_to_machine+0x255/0x270
[   37.348352] RSP: e02b:ffff880018067cf8  EFLAGS: 00010287
[   37.348358] RAX: 000000000016ef5b RBX: 000000000010ffff RCX: 0000000000000000
[   37.348364] RDX: ffffffff81c0f020 RSI: 000000000000000d RDI: 0000000000001c0d
[   37.348370] RBP: ffff880018067d48 R08: ffffffff81c06020 R09: ffffffff81a670b8
[   37.348376] R10: 000000000000003c R11: 0000000000000000 R12: 0000000000212e05
[   37.348381] R13: ffffffff81c0d000 R14: 000000000000007f R15: 0000000000000004
[   37.348392] FS:  00007f4f642a0840(0000) GS:ffff880018c00000(0000) knlGS:0000000000000000
[   37.348400] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[   37.348405] CR2: 00007f4f634fde80 CR3: 0000000013abe000 CR4: 0000000000002660
[   37.348412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   37.348418] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   37.348425] Process kworker/0:0 (pid: 4, threadinfo ffff880018066000, task ffff880018064100)
[   37.348431] Stack:
[   37.348434]  ffff880018067d08 ffffffff81c06020 ffff88000cc25000 ffffffff81c0f020
[   37.348443]  0000000000000000 0000000000000000 0000160000000000 ffffea0003b7ffc8
[   37.348452]  000000000010ffff 6db6db6db6db6db7 ffff880018067dc8 ffffffff812c0e44
[   37.348461] Call Trace:
[   37.348469]  [balloon_process+0x204/0x440
[   37.348477](<ffffffff812c0e44>])  [process_one_work+0x139/0x4c0
[   37.348483](<ffffffff81068789>])  [? decrease_reservation+0x330/0x330
[   37.348490](<ffffffff812c0c40>])  [worker_thread+0x15d/0x470
[   37.348498](<ffffffff8106b0bd>])  [? schedule_delayed_work+0x20/0x20
[   37.348505](<ffffffff8106af60>])  [kthread+0xbb/0xc0
[   37.348511](<ffffffff8106fa0b>])  [? kthread_create_on_node+0x120/0x120
[   37.348519](<ffffffff8106f950>])  [ret_from_fork+0x7c/0xb0
[   37.348526](<ffffffff81493ffc>])  [? kthread_create_on_node+0x120/0x120
[   37.348531](<ffffffff8106f950>]) Code: d2 fb ff ff 48 8b 4d b8 48 89 c6 48 8b 55 c8 48 89 c8 3e 48 0f b1 32 48 39 c1 74 11 31 f6 4c 89 ef e8 00 7f 0f 00 e9 70 fe ff ff <0f> 0b 48 8b 05 2a 7d b1 00 4e 89 2c f8 e9 5e fe ff ff 66 0f 1f 
[   37.348584] RIP  [set_phys_to_machine+0x255/0x270
[   37.348593](<ffffffff8100b065>])  RSP <ffff880018067cf8>
[   37.348601] ---[ end trace 931fee6e9eb22ad9 ]---
[   37.348642] BUG: unable to handle kernel paging request at ffffffffffffffd8
[   37.348650] IP: [kthread_data+0xb/0x20
[   37.348658](<ffffffff8106fdeb>]) PGD 1a0d067 PUD 1a0e067 PMD 0 
[   37.348665] Oops: 0000 [SMP 
[   37.348670](#2]) Modules linked in: lockd sunrpc ipt_MASQUERADE ip6table_filter ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrac
k xt_tcpudp iptable_filter ip_tables x_tables xen_netfront coretemp hwmon microcode pcspkr u2mfn(O) dummy_hcd udc_core xen_usbfront usbcore usb_common xen_blkback uinput binfmt_misc autofs4 c
rc32c_intel ghash_clmulni_intel ext4 crc16 jbd2 aesni_intel aes_x86_64 ablk_helper cryptd xts lrw gf128mul dm_snapshot xen_blkfront
[   37.348729] CPU 0 
[   37.348733] Pid: 4, comm: kworker/0:0 Tainted: G      D    O 3.7.6-2.pvops.qubes.x86_64 #1  
[   37.348741] RIP: e030:[ [<ffffffff8106fdeb>](<ffffffff8106fdeb>]) kthread_data+0xb/0x20
[   37.348750] RSP: e02b:ffff8800180679a8  EFLAGS: 00010096
[   37.348755] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   37.348760] RDX: ffffffff81b937e0 RSI: 0000000000000000 RDI: ffff880018064100
[   37.348766] RBP: ffff8800180679a8 R08: 0000000000989680 R09: ffffffff81b937e0
[   37.348772] R10: 0000000000000000 R11: dead000000100100 R12: ffff8800180644e0
[   37.348778] R13: 0000000000000000 R14: ffff8800180640f0 R15: ffff880018064100
[   37.348786] FS:  00007f4f642a0840(0000) GS:ffff880018c00000(0000) knlGS:0000000000000000
[   37.348794] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[   37.348799] CR2: ffffffffffffffd8 CR3: 0000000013abe000 CR4: 0000000000002660
[   37.348805] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   37.348811] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   37.348817] Process kworker/0:0 (pid: 4, threadinfo ffff880018066000, task ffff880018064100)
[   37.348824] Stack:
[   37.348827]  ffff8800180679c8 ffffffff8106b550 ffff8800180679c8 ffff880018c139c0
[   37.348835]  ffff880018067a38 ffffffff8148aed8 ffff880018064100 ffff880018067fd8
[   37.348844]  ffff880018067fd8 ffff880018067fd8 0000000000000008 ffff880018064100
[   37.348853] Call Trace:
[   37.348858]  [wq_worker_sleeping+0x10/0xc0
[   37.348865](<ffffffff8106b550>])  [__schedule+0x558/0x790
[   37.348871](<ffffffff8148aed8>])  [schedule+0x24/0x70
[   37.348878](<ffffffff8148b3f4>])  [do_exit+0x5af/0x8d0
[   37.348885](<ffffffff81052dff>])  [? printk+0x48/0x4a
[   37.348891](<ffffffff814834bb>])  [? check_events+0x12/0x20
[   37.348898](<ffffffff8100a432>])  [oops_end+0x99/0xe0
[   37.348904](<ffffffff8148d639>])  [die+0x53/0x80
[   37.348910](<ffffffff81016693>])  [do_trap+0x66/0x160
[   37.348918](<ffffffff8148cf36>])  [? __atomic_notifier_call_chain+0xd/0x10
[   37.348925](<ffffffff8149005d>])  [do_invalid_op+0x97/0xa0
[   37.348932](<ffffffff81013f17>])  [? set_phys_to_machine+0x255/0x270
[   37.348939](<ffffffff8100b065>])  [? sysfs_add_file+0xd/0x10
[   37.348945](<ffffffff811b006d>])  [? sysfs_create_file+0x21/0x30
[   37.348954](<ffffffff811b0131>])  [? kobject_add_internal+0x15c/0x270
[   37.348960](<ffffffff81234bcc>])  [invalid_op+0x1e/0x30
[   37.348964](<ffffffff8149509e>])  [? set_phys_to_machine+0x255/0x270
[   37.348969](<ffffffff8100b065>])  [? set_phys_to_machine+0xab/0x270
[   37.348973](<ffffffff8100aebb>])  [balloon_process+0x204/0x440
[   37.348978](<ffffffff812c0e44>])  [process_one_work+0x139/0x4c0
[   37.348982](<ffffffff81068789>])  [? decrease_reservation+0x330/0x330
[   37.348986](<ffffffff812c0c40>])  [worker_thread+0x15d/0x470
[   37.348990](<ffffffff8106b0bd>])  [? schedule_delayed_work+0x20/0x20
[   37.348994](<ffffffff8106af60>])  [kthread+0xbb/0xc0
[   37.348998](<ffffffff8106fa0b>])  [? kthread_create_on_node+0x120/0x120
[   37.349002](<ffffffff8106f950>])  [ret_from_fork+0x7c/0xb0
[   37.349006](<ffffffff81493ffc>])  [? kthread_create_on_node+0x120/0x120
[   37.349009](<ffffffff8106f950>]) Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 87 88 03 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 
[   37.349041] RIP  [kthread_data+0xb/0x20
[   37.349045](<ffffffff8106fdeb>])  RSP <ffff8800180679a8>
[   37.349047] CR2: ffffffffffffffd8
[   37.349050] ---[ end trace 931fee6e9eb22ada ]---
[   37.349053] Fixing recursive fault but reboot is needed!

Migrated-From: https://wiki.qubes-os.org/ticket/733

@marmarek marmarek added this to the Release 2 Beta 3 milestone Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 1 Aug 2013 12:29 UTC
Using 3.9.2 in the VMs for some time, and I've never seen this. In any case, this is really NOTOURBUG. Closing for now.

Member

marmarek commented Mar 8, 2015

Comment by joanna on 1 Aug 2013 12:29 UTC
Using 3.9.2 in the VMs for some time, and I've never seen this. In any case, this is really NOTOURBUG. Closing for now.

@marmarek marmarek added the worksforme label Mar 8, 2015

@marmarek marmarek closed this Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by marmarek on 2 Aug 2013 02:21 UTC
This bug is related to DispVM only and still happens on 3.9.2.

Even if we don't want invest our time to investigate this linux kernel problem (if it really is), IMHO still worth track this issue with appropriate ticket state. Perhaps someone want to debug it and send patches (or better: send them upstream), or work on this issue with xen developers.

Member

marmarek commented Mar 8, 2015

Comment by marmarek on 2 Aug 2013 02:21 UTC
This bug is related to DispVM only and still happens on 3.9.2.

Even if we don't want invest our time to investigate this linux kernel problem (if it really is), IMHO still worth track this issue with appropriate ticket state. Perhaps someone want to debug it and send patches (or better: send them upstream), or work on this issue with xen developers.

@marmarek marmarek modified the milestones: Release 2, Release 2 Beta 3 Mar 8, 2015

@marmarek marmarek removed the worksforme label Mar 8, 2015

@marmarek marmarek reopened this Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 9 Aug 2013 12:06 UTC
So, is it a dup of #732?

Member

marmarek commented Mar 8, 2015

Comment by joanna on 9 Aug 2013 12:06 UTC
So, is it a dup of #732?

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Modified by joanna on 9 Aug 2013 12:19 UTC

Member

marmarek commented Mar 8, 2015

Modified by joanna on 9 Aug 2013 12:19 UTC

@marmarek marmarek changed the title from VM crashes when reaching about 4GB RAM to DispVM crashes when reaching about 4GB RAM Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by marmarek on 9 Aug 2013 21:40 UTC
There seems to be two separate issues involved in DispVM problem:
#732 is the xen hypervisor/toolstack one, #733 (this) is the linux kernel one.

Member

marmarek commented Mar 8, 2015

Comment by marmarek on 9 Aug 2013 21:40 UTC
There seems to be two separate issues involved in DispVM problem:
#732 is the xen hypervisor/toolstack one, #733 (this) is the linux kernel one.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by marmarek on 14 Mar 2014 14:47 UTC
As maxmem for DispVM is finally working this bug have workaround implemented as part of #732. Perhaps it is even caused by the same Xen bug described in #732.
In any case it isn't our bug and we have working workaround.

Member

marmarek commented Mar 8, 2015

Comment by marmarek on 14 Mar 2014 14:47 UTC
As maxmem for DispVM is finally working this bug have workaround implemented as part of #732. Perhaps it is even caused by the same Xen bug described in #732.
In any case it isn't our bug and we have working workaround.

@marmarek marmarek added the wontfix label Mar 8, 2015

@marmarek marmarek closed this Mar 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment