Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel NULL pointer dereference in xenbus_thread, Linux 6.1.57 on openQA #8638

Open
marmarek opened this issue Oct 22, 2023 · 9 comments
Open
Labels
affects-4.1 This issue affects Qubes OS 4.1. affects-4.2 This issue affects Qubes OS 4.2. C: kernel needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. waiting for upstream This issue is waiting for something from an upstream project to arrive in Qubes. Remove when closed.

Comments

@marmarek
Copy link
Member

Observation

openQA test in scenario qubesos-4.1-release-upgrade-x86_64-install_default@64bit fails in
release_upgrade

kernel NULL pointer dereference, stack trace
[  876.712812] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  876.715099] #PF: supervisor read access in kernel mode
[  876.717222] #PF: error_code(0x0000) - not-present page
[  876.718919] PGD 101f9f067 P4D 101f9f067 PUD 103eae067 PMD 0 
[  876.721633] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  876.723184] CPU: 1 PID: 28 Comm: xenbus Not tainted 6.1.57-1.qubes.fc37.x86_64 #1
[  876.725629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[  876.729399] RIP: e030:__wake_up_common+0x4c/0x180
[  876.731221] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
[  876.737539] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
[  876.740443] RAX: 0000000000000000 RBX: ffff888066582f98 RCX: 0000000000000000
[  876.742913] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff888066582f90
[  876.745239] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
[  876.748484] R10: 0000000000007ff0 R11: ffff888100ad9000 R12: ffffc900400f7e68
[  876.750837] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  876.753734] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
[  876.756569] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  876.758422] CR2: 0000000000000000 CR3: 0000000104ac4000 CR4: 0000000000040660
[  876.760599] Call Trace:
[  876.761359]  <TASK>
[  876.762025]  ? show_trace_log_lvl+0x1d3/0x2ef
[  876.763390]  ? show_trace_log_lvl+0x1d3/0x2ef
[  876.764731]  ? show_trace_log_lvl+0x1d3/0x2ef
[  876.766061]  ? __wake_up_common_lock+0x82/0xd0
[  876.767465]  ? __die_body.cold+0x8/0xd
[  876.769374]  ? page_fault_oops+0x163/0x1a0
[  876.770706]  ? exc_page_fault+0x70/0x170
[  876.771922]  ? asm_exc_page_fault+0x22/0x30
[  876.773235]  ? __wake_up_common+0x4c/0x180
[  876.774502]  __wake_up_common_lock+0x82/0xd0
[  876.775835]  ? process_writes+0x240/0x240
[  876.777251]  process_msg+0x18e/0x2f0
[  876.778364]  xenbus_thread+0x165/0x1c0
[  876.779520]  ? cpuusage_read+0x10/0x10
[  876.780694]  kthread+0xe9/0x110
[  876.781680]  ? kthread_complete_and_exit+0x20/0x20
[  876.783168]  ret_from_fork+0x22/0x30
[  876.784287]  </TASK>
[  876.784974] Modules linked in: joydev snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm ppdev intel_rapl_msr intel_rapl_common snd_timer e1000e snd pcspkr parport_pc soundcore parport i2c_piix4 fuse loop xenfs dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt crct10dif_pclmul xhci_pci crc32_pclmul crc32c_intel xhci_pci_renesas polyval_clmulni polyval_generic xhci_hcd ghash_clmulni_intel sha512_ssse3 virtio_console virtio_scsi serio_raw bochs drm_vram_helper drm_ttm_helper ttm ata_generic pata_acpi floppy qemu_fw_cfg xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput dm_multipath
[  876.806036] CR2: 0000000000000000
[  876.807126] ---[ end trace 0000000000000000 ]---
[  876.808589] RIP: e030:__wake_up_common+0x4c/0x180
[  876.810069] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
[  876.815813] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
[  876.817375] RAX: 0000000000000000 RBX: ffff888066582f98 RCX: 0000000000000000
[  876.819549] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff888066582f90
[  876.821725] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
[  876.823885] R10: 0000000000007ff0 R11: ffff888100ad9000 R12: ffffc900400f7e68
[  876.826063] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  876.828252] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
[  876.830647] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  876.832502] CR2: 0000000000000000 CR3: 0000000104ac4000 CR4: 0000000000040660
[  876.834667] Kernel panic - not syncing: Fatal exception
[  876.836257] Kernel Offset: disabled
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.

Test suite description

The test is about release upgrade R4.1->R4.2, but the crash looks unrelated to the specific workload.

Reproducible

Fails since (at least) Build 2023102123-4.1 (current job)

Similar crash was observed also on a real hardware in a domU on an older kernel: 6.1.43.

Expected result

Last good: 2023101101-4.1 (or more recent)

Further details

Always latest result in this scenario: latest

@marmarek marmarek added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: kernel needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. affects-4.1 This issue affects Qubes OS 4.1. affects-4.2 This issue affects Qubes OS 4.2. labels Oct 22, 2023
@marmarek
Copy link
Member Author

Reported upstream at https://lore.kernel.org/xen-devel/ZO0WrR5J0xuwDIxW@mail-itl/

@andrewdavidwong andrewdavidwong added waiting for upstream This issue is waiting for something from an upstream project to arrive in Qubes. Remove when closed. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Oct 22, 2023
@marmarek
Copy link
Member Author

marmarek commented Nov 20, 2023

Similar crash was observed also on a real hardware in a domU on an older kernel: 6.1.43.

Specific crash message:

[173643.279852] BUG: kernel NULL pointer dereference, address: 0000000000000000
[173643.279867] #PF: supervisor read access in kernel mode
[173643.279874] #PF: error_code(0x0000) - not-present page
[173643.279881] PGD 0 P4D 0
[173643.279886] Oops: 0000 [#1] PREEMPT SMP NOPTI
[173643.279893] CPU: 1 PID: 144 Comm: xenbus Tainted: G        W          6.1.43-1.qubes.12.fc37.x86_64 #1
[173643.279905] RIP: 0010:__wake_up_common+0x5b/0x1b0
[173643.279915] Code: 85 0a 01 00 00 4d 85 e4 74 0b 41 f6 04 24 04 0f 85 a3 00 00 00 48 8b 43 40 4c 8d 40 e8 48 83 c3 40 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 4>
[173643.279934] RSP: 0018:ffffc90000dc3e10 EFLAGS: 00010082
[173643.279941] RAX: 0000000000000000 RBX: ffff8883562fd6d0 RCX: 0000000000000000
[173643.279951] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8883562fd690
[173643.279961] RBP: 0000000000000246 R08: ffffffffffffffe8 R09: ffffc90000dc3e68
[173643.279969] R10: ffffffff81175127 R11: ffffc9000003d000 R12: ffffc90000dc3e68
[173643.279979] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[173643.279990] FS:  0000000000000000(0000) GS:ffff8883dbe40000(0000) knlGS:0000000000000000
[173643.280000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[173643.280007] CR2: 0000000000000000 CR3: 000000002fed6006 CR4: 0000000000770ee0
[173643.280018] PKRU: 55555554
[173643.280022] Call Trace:
[173643.280027]  <TASK>
[173643.280032]  ? show_trace_log_lvl+0x1d3/0x2ef
[173643.280041]  ? show_trace_log_lvl+0x1d3/0x2ef
[173643.280049]  ? show_trace_log_lvl+0x1d3/0x2ef
[173643.280057]  ? __wake_up_common_lock+0x82/0xd0
[173643.280064]  ? __die_body.cold+0x8/0xd
[173643.280070]  ? page_fault_oops+0x163/0x1a0
[173643.280078]  ? exc_page_fault+0x7e/0x200
[173643.280085]  ? asm_exc_page_fault+0x22/0x30
[173643.280094]  ? __wake_up_common_lock+0x67/0xd0
[173643.280101]  ? __wake_up_common+0x5b/0x1b0
[173643.280107]  __wake_up_common_lock+0x82/0xd0
[173643.280114]  ? process_writes+0x260/0x260
[173643.280121]  process_msg+0x199/0x300
[173643.280153]  xenbus_thread+0x165/0x1c0
[173643.280162]  ? cpuusage_read+0x10/0x10
[173643.280170]  kthread+0xe9/0x110
[173643.280177]  ? kthread_complete_and_exit+0x20/0x20
[173643.280185]  ret_from_fork+0x22/0x30
[173643.280194]  </TASK>
[173643.280198] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nvme_fabrics nvme_core nvme_common nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_ct nft_masq >
[173643.280284] CR2: 0000000000000000
[173643.280290] ---[ end trace 0000000000000000 ]--- 
[173643.280297] RIP: 0010:__wake_up_common+0x5b/0x1b0
[173643.280304] Code: 85 0a 01 00 00 4d 85 e4 74 0b 41 f6 04 24 04 0f 85 a3 00 00 00 48 8b 43 40 4c 8d 40 e8 48 83 c3 40 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 4>
[173643.280324] RSP: 0018:ffffc90000dc3e10 EFLAGS: 00010082
[173643.280331] RAX: 0000000000000000 RBX: ffff8883562fd6d0 RCX: 0000000000000000 
[173643.280340] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8883562fd690 
[173643.280349] RBP: 0000000000000246 R08: ffffffffffffffe8 R09: ffffc90000dc3e68 
[173643.280359] R10: ffffffff81175127 R11: ffffc9000003d000 R12: ffffc90000dc3e68 
[173643.280368] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 
[173643.280378] FS:  0000000000000000(0000) GS:ffff8883dbe40000(0000) knlGS:0000000000000000
[173643.280386] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[173643.280394] CR2: 0000000000000000 CR3: 000000002fed6006 CR4: 0000000000770ee0 
[173643.280403] PKRU: 55555554
[173643.280407] Kernel panic - not syncing: Fatal exception
[173643.280475] Kernel Offset: disabled

@marmarek
Copy link
Member Author

marmarek commented Mar 25, 2024

happens on 6.1.75 too

crash message

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 131 Comm: xenbus Not tainted 6.1.75-1.qubes.fc37.x86_64 #1
RIP: 0010:__wake_up_common+0x4c/0x180
Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 

RSP: 0018:ffffc90000d4fe10 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff88811b77a018 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88811b77a010
RBP: 0000000000000246 R08: ffffffffffffffe8 R09: ffffc90000d4fe68
R10: 0000000000000003 R11: ffffc9000003d000 R12: ffffc90000d4fe68
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880f5ac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000002c10006 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
 <TASK>
 ? show_trace_log_lvl+0x1d3/0x2ef
 ? show_trace_log_lvl+0x1d3/0x2ef
 ? show_trace_log_lvl+0x1d3/0x2ef
 ? __wake_up_common_lock+0x82/0xd0
 ? __die_body.cold+0x8/0xd
 ? page_fault_oops+0x163/0x1a0
 ? exc_page_fault+0x70/0x170
 ? asm_exc_page_fault+0x22/0x30
 ? __wake_up_common+0x4c/0x180
 __wake_up_common_lock+0x82/0xd0
 ? process_writes+0x240/0x240
 process_msg+0x18e/0x2f0
 xenbus_thread+0x165/0x1c0
 ? cpuusage_read+0x10/0x10
 kthread+0xe9/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x22/0x30
 </TASK>
Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nvme_fabrics nvme_core nvme_common nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_ct nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xenfs nfnetlink ipmi_devintf ipmi_msghandler binfmt_misc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic xen_netfront snd_pcm snd_timer ghash_clmulni_intel sha512_ssse3 snd soundcore sha256_ssse3 sha1_ssse3 pcspkr xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn parport_pc ppdev lp parport loop fuse ip_tables overlay xen_blkfront
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:__wake_up_common+0x4c/0x180
Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
RSP: 0018:ffffc90000d4fe10 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff88811b77a018 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88811b77a010
RBP: 0000000000000246 R08: ffffffffffffffe8 R09: ffffc90000d4fe68
R10: 0000000000000003 R11: ffffc9000003d000 R12: ffffc90000d4fe68
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880f5ac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000002c10006 CR4: 0000000000770ee0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled

@scallyob
Copy link

I've begun to see instability in my qubes with random crashing. Probably started after I implemented this fix to my Xen configuration. Only affecting Whonix based qubes that I've noticed.

Dom0 kernel is 6.6.29-1

Here is the log:

[2024-05-27 13:41:13] [20937.502042] #PF: supervisor read access in kernel mode^M
[2024-05-27 13:41:13] [20937.503416] #PF: error_code(0x0000) - not-present page^M
[2024-05-27 13:41:13] [20937.504752] PGD 0 P4D 0 ^M
[2024-05-27 13:41:13] [20937.505434] Oops: 0000 [#1] PREEMPT SMP NOPTI^M
[2024-05-27 13:41:13] [20937.506609] CPU: 0 PID: 56 Comm: xenbus Not tainted 6.6.29-1.qubes.fc37.x86_64 #1^M
[2024-05-27 13:41:13] [20937.508455] RIP: 0010:__wake_up_common+0x4c/0x180^M
[2024-05-27 13:41:13] [20937.509960] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40^M
[2024-05-27 13:41:13] [20937.514811] RSP: 0018:ffffc90000dabdf0 EFLAGS: 00010082^M
[2024-05-27 13:41:13] [20937.515678] RAX: 0000000000000000 RBX: ffff88802e9f9b98 RCX: 0000000000000000^M
[2024-05-27 13:41:13] [20937.517510] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88802e9f9b90^M
[2024-05-27 13:41:13] [20937.519184] RBP: 0000000000000246 R08: ffffffffffffffe8 R09: ffffc90000dabe48^M
[2024-05-27 13:41:13] [20937.520675] R10: ffff88800d3d6ea8 R11: ffffc9000002d000 R12: ffffc90000dabe48^M
[2024-05-27 13:41:13] [20937.521637] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000^M
[2024-05-27 13:41:13] [20937.522753] FS:  0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000^M
[2024-05-27 13:41:13] [20937.523543] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[2024-05-27 13:41:13] [20937.524155] CR2: 0000000000000000 CR3: 0000000006b7a000 CR4: 00000000000406f0^M
[2024-05-27 13:41:13] [20937.524767] Call Trace:^M
[2024-05-27 13:41:13] [20937.524946]  <TASK>^M
[2024-05-27 13:41:13] [20937.525127]  ? __die+0x23/0x70^M
[2024-05-27 13:41:13] [20937.525397]  ? page_fault_oops+0x98/0x190^M
[2024-05-27 13:41:13] [20937.525669]  ? exc_page_fault+0x77/0x170^M
[2024-05-27 13:41:13] [20937.525940]  ? asm_exc_page_fault+0x26/0x30^M
[2024-05-27 13:41:13] [20937.526215]  ? __wake_up_common+0x4c/0x180^M
[2024-05-27 13:41:13] [20937.526484]  __wake_up_common_lock+0x82/0xd0^M
[2024-05-27 13:41:13] [20937.526839]  ? __pfx_xenbus_thread+0x10/0x10^M
[2024-05-27 13:41:13] [20937.527196]  process_msg+0x18e/0x2f0^M
[2024-05-27 13:41:13] [20937.527464]  xenbus_thread+0x4a/0x1e0^M
[2024-05-27 13:41:13] [20937.527732]  ? __pfx_autoremove_wake_function+0x10/0x10^M
[2024-05-27 13:41:13] [20937.528090]  kthread+0xe8/0x120^M
[2024-05-27 13:41:13] [20937.528360]  ? __pfx_kthread+0x10/0x10^M
[2024-05-27 13:41:13] [20937.528629]  ret_from_fork+0x34/0x50^M
[2024-05-27 13:41:13] [20937.528904]  ? __pfx_kthread+0x10/0x10^M
[2024-05-27 13:41:13] [20937.529173]  ret_from_fork_asm+0x1b/0x30^M
[2024-05-27 13:41:13] [20937.529443]  </TASK>^M
[2024-05-27 13:41:13] [20937.529622] Modules linked in: nf_conntrack_netlink nft_flow_offload nf_flow_table_inet nf_flow_table xen_netback dummy ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xenfs xt_multiport xt_nat xt_owner xt_REDIRECT nft_chain_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic binfmt_misc ghash_clmulni_intel nf_tables sha512_ssse3 sha256_ssse3 nfnetlink xen_netfront sha1_ssse3 xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn fuse loop ip_tables overlay xen_blkfront^M
[2024-05-27 13:41:13] [20937.533019] CR2: 0000000000000000^M
[2024-05-27 13:41:13] [20937.533290] ---[ end trace 0000000000000000 ]---^M
[2024-05-27 13:41:13] [20937.533642] RIP: 0010:__wake_up_common+0x4c/0x180^M
[2024-05-27 13:41:13] [20937.533998] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40^M
[2024-05-27 13:41:13] [20937.704565] RSP: 0018:ffffc90000dabdf0 EFLAGS: 00010082^M
[2024-05-27 13:41:13] [20937.705264] RAX: 0000000000000000 RBX: ffff88802e9f9b98 RCX: 0000000000000000^M
[2024-05-27 13:41:13] [20937.706294] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88802e9f9b90^M
[2024-05-27 13:41:13] [20937.707311] RBP: 0000000000000246 R08: ffffffffffffffe8 R09: ffffc90000dabe48^M
[2024-05-27 13:41:13] [20937.708988] R10: ffff88800d3d6ea8 R11: ffffc9000002d000 R12: ffffc90000dabe48^M
[2024-05-27 13:41:13] [20937.709805] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000^M
[2024-05-27 13:41:13] [20937.710685] FS:  0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000^M
[2024-05-27 13:41:13] [20937.712434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[2024-05-27 13:41:13] [20937.713674] CR2: 0000000000000000 CR3: 0000000006b7a000 CR4: 00000000000406f0^M
[2024-05-27 13:41:13] [20937.714301] Kernel panic - not syncing: Fatal exception^M
[2024-05-27 13:41:13] [20937.718046] Kernel Offset: disabled^M

@marmarek
Copy link
Member Author

marmarek commented Jun 4, 2024

Another backtrace, this time from 6.6.25 (internal ref: a3):

Details

[2024-06-04 18:49:51] [3694722.123261] BUG: kernel NULL pointer dereference, address: 0000000000000000
[2024-06-04 18:49:51] [3694722.123278] #PF: supervisor read access in kernel mode
[2024-06-04 18:49:51] [3694722.123286] #PF: error_code(0x0000) - not-present page
[2024-06-04 18:49:51] [3694722.123293] PGD 0 P4D 0 
[2024-06-04 18:49:51] [3694722.123299] Oops: 0000 [#1] PREEMPT SMP NOPTI
[2024-06-04 18:49:51] [3694722.123308] CPU: 1 PID: 151 Comm: xenbus Not tainted 6.6.25-1.qubes.fc37.x86_64 #1
[2024-06-04 18:49:51] [3694722.123319] RIP: 0010:__wake_up_common+0x4c/0x180
[2024-06-04 18:49:51] [3694722.123331] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
[2024-06-04 18:49:51] [3694722.123353] RSP: 0018:ffffc90000df7df0 EFLAGS: 00010086
[2024-06-04 18:49:51] [3694722.123361] RAX: 0000000000000000 RBX: ffff88828ca09918 RCX: 0000000000000000
[2024-06-04 18:49:51] [3694722.123370] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88828ca09910
[2024-06-04 18:49:51] [3694722.123380] RBP: 0000000000000246 R08: ffffffffffffffe8 R09: ffffc90000df7e48
[2024-06-04 18:49:51] [3694722.123389] R10: ffff8880053c9cd0 R11: ffffc9000003d000 R12: ffffc90000df7e48
[2024-06-04 18:49:51] [3694722.123398] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[2024-06-04 18:49:51] [3694722.123409] FS:  0000000000000000(0000) GS:ffff8880f5a40000(0000) knlGS:0000000000000000
[2024-06-04 18:49:51] [3694722.123419] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2024-06-04 18:49:51] [3694722.123428] CR2: 0000000000000000 CR3: 000000010fc48002 CR4: 0000000000770ee0
[2024-06-04 18:49:51] [3694722.123439] PKRU: 55555554
[2024-06-04 18:49:51] [3694722.123445] Call Trace:
[2024-06-04 18:49:51] [3694722.123451]  <TASK>
[2024-06-04 18:49:51] [3694722.123457]  ? __die+0x23/0x70
[2024-06-04 18:49:51] [3694722.123466]  ? page_fault_oops+0x98/0x190
[2024-06-04 18:49:51] [3694722.123475]  ? exc_page_fault+0x77/0x170
[2024-06-04 18:49:51] [3694722.123484]  ? asm_exc_page_fault+0x26/0x30
[2024-06-04 18:49:51] [3694722.123495]  ? __wake_up_common+0x4c/0x180
[2024-06-04 18:49:51] [3694722.123505]  __wake_up_common_lock+0x82/0xd0
[2024-06-04 18:49:51] [3694722.123515]  ? __pfx_xenbus_thread+0x10/0x10
[2024-06-04 18:49:51] [3694722.123524]  process_msg+0x18e/0x2f0
[2024-06-04 18:49:51] [3694722.123531]  xenbus_thread+0x181/0x1e0
[2024-06-04 18:49:51] [3694722.123537]  ? __pfx_autoremove_wake_function+0x10/0x10
[2024-06-04 18:49:51] [3694722.123546]  kthread+0xe8/0x120
[2024-06-04 18:49:51] [3694722.123554]  ? __pfx_kthread+0x10/0x10
[2024-06-04 18:49:51] [3694722.123562]  ret_from_fork+0x34/0x50
[2024-06-04 18:49:51] [3694722.123570]  ? __pfx_kthread+0x10/0x10
[2024-06-04 18:49:51] [3694722.123577]  ret_from_fork_asm+0x1b/0x30
[2024-06-04 18:49:51] [3694722.123586]  </TASK>
[2024-06-04 18:49:51] [3694722.123590] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nvme_fabrics nvme_core nvme_common nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_ct nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink xenfs ipmi_devintf ipmi_msghandler binfmt_misc intel_rapl_msr intel_rapl_common snd_pcm crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic snd_timer ghash_clmulni_intel snd sha512_ssse3 sha256_ssse3 soundcore sha1_ssse3 xen_netfront pcspkr xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn parport_pc ppdev lp parport fuse loop ip_tables overlay xen_blkfront
[2024-06-04 18:49:51] [3694722.123701] CR2: 0000000000000000
[2024-06-04 18:49:51] [3694722.123707] ---[ end trace 0000000000000000 ]---

@marmarek
Copy link
Member Author

marmarek commented Jun 4, 2024

@scallyob how often do you get it? can you narrow down when (what operations, applications etc) it is most likely to happen? I get it about once a month, and it's too infrequent for any kind of debugging...

@scallyob
Copy link

scallyob commented Jun 5, 2024

@marmarek Last two were 4 days apart. Different qubes. Those were the only 2 I captured logs for. Both were whonix gateways. Others have all been Whonix gateways and 1 time it was a Whonix workstation. So the gateways are just running in the background, I'm not actively doing anything with them when they crash, but they usually have services actively running through them.

@scallyob
Copy link

scallyob commented Jun 5, 2024

Just got another one. Was main sys-whonix. Was using Tor Browser in anon-whonix and changing settings with qvm-prefs on another whonix workstation when I saw it go down. Same errors in the logs. Seems like every 4 days now, which is a significant problem for me.

One thing I'm trying is I've had "Include in memory balancing checked" on qubes in the past. I turned it off for others that had crashed, but was still checked for sys-whonix. Will report back on whether there's any patterns there.

@scallyob
Copy link

scallyob commented Jun 18, 2024

First crash in 2 weeks

  • whonix workstation
  • was not included in memory balancing
  • memory only set to 400MB
  • vcpus only set to 1
  • I was not actively using my computer but it was running a server when it crashed
  • same error message except kernel 6.6.31

I increased to 2GB RAM and 2 VCPUS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. affects-4.2 This issue affects Qubes OS 4.2. C: kernel needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. waiting for upstream This issue is waiting for something from an upstream project to arrive in Qubes. Remove when closed.
Projects
None yet
Development

No branches or pull requests

3 participants