Skip to content

gpu related crashes with kernel >= 6.9.7 #309

@oliverbestmann

Description

@oliverbestmann

Since updating from 6.9.5 to to 6.9.6 (and 6.9.9) i get random gpu/drm related crashes after a few minutes of usage.

Jul 15 10:20:18 m1pro kernel: ------------[ cut here ]------------
Jul 15 10:20:18 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 15 10:20:18 m1pro kernel: WARNING: CPU: 0 PID: 15794 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: Modules linked in: uinput xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq usbhid cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_usb_audio snd_h>
Jul 15 10:20:18 m1pro kernel:  nvmem_spmi_mfd rtc_macsmc gpio_macsmc spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart macsmc_rtkit nvmem_appl>
Jul 15 10:20:18 m1pro kernel: CPU: 0 PID: 15794 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 10:20:18 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 10:20:18 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 10:20:18 m1pro kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: sp : ffff800090397440
Jul 15 10:20:18 m1pro kernel: x29: ffff800090397440 x28: 0000000000000030 x27: ffff000014ad5000
Jul 15 10:20:18 m1pro kernel: x26: ffff80007a55d948 x25: 0000000000000000 x24: ffff000139b5dc00
Jul 15 10:20:18 m1pro kernel: x23: ffff800090397888 x22: ffff000139b5cb38 x21: ffff0005be57f5d8
Jul 15 10:20:18 m1pro kernel: x20: ffff00013bfb1c08 x19: ffff00013bfb1c08 x18: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
Jul 15 10:20:18 m1pro kernel: x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: Call trace:
Jul 15 10:20:18 m1pro kernel:  drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel:  drm_sched_wakeup+0x18/0x7c
Jul 15 10:20:18 m1pro kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 15 10:20:18 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 10:20:18 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 10:20:18 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 10:20:18 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 10:20:18 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 10:20:18 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 10:20:18 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 10:20:18 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 10:20:18 m1pro kernel: Unable to handle kernel paging request at virtual address 006120492079636d
Jul 15 10:20:18 m1pro kernel: Mem abort info:
Jul 15 10:20:18 m1pro kernel:   ESR = 0x0000000096000004
Jul 15 10:20:18 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 10:20:18 m1pro kernel:   SET = 0, FnV = 0
Jul 15 10:20:18 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 10:20:18 m1pro kernel:   FSC = 0x04: level 0 translation fault
Jul 15 10:20:18 m1pro kernel: Data abort info:
Jul 15 10:20:18 m1pro kernel:   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
Jul 15 10:20:18 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 10:20:18 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 10:20:18 m1pro kernel: [006120492079636d] address between user and kernel address ranges
Jul 15 10:20:18 m1pro kernel: Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Jul 15 10:20:18 m1pro kernel: Modules linked in: uinput xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq usbhid cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_usb_audio snd_h>
Jul 15 10:20:18 m1pro kernel:  nvmem_spmi_mfd rtc_macsmc gpio_macsmc spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart macsmc_rtkit nvmem_appl>
Jul 15 10:20:18 m1pro kernel: CPU: 0 PID: 15794 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 10:20:18 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 10:20:18 m1pro kernel: pstate: 21401009 (nzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 10:20:18 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 10:20:18 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 15 10:20:18 m1pro kernel: sp : ffff800090395d40
Jul 15 10:20:18 m1pro kernel: x29: ffff800090395d50 x28: 00000000ffffffa0 x27: ffff000639ee3280
Jul 15 10:20:18 m1pro kernel: x26: ffffffa00000c984 x25: 0000000000212a9c x24: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x23: 736120492079616d x22: 00000000ffffffff x21: 0000000000000cc0
Jul 15 10:20:18 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000318 x18: 00000000000000ff
Jul 15 10:20:18 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x11: 00000000ffffffa0 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 15 10:20:18 m1pro kernel: x8 : c98580007a45d9c4 x7 : 0000000000000cc0 x6 : 0000000000000318
Jul 15 10:20:18 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000064ce340
Jul 15 10:20:18 m1pro kernel: x2 : 0000000000000200 x1 : 736120492079616d x0 : ffff000001f2cb00
Jul 15 10:20:18 m1pro kernel: Call trace:
Jul 15 10:20:18 m1pro kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 10:20:18 m1pro kernel:  krealloc+0x9c/0x144
Jul 15 10:20:18 m1pro kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 15 10:20:18 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw6vertex17RunVertexG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1u_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs1_NtN>
Jul 15 10:20:18 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_18QueueInnerG13V13_513submit_render+0x1ba8/0x1dd0 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 10:20:18 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 10:20:18 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 10:20:18 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 10:20:18 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 10:20:18 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 10:20:18 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 10:20:18 m1pro kernel: Code: 54000c20 b9402a82 aa1703e1 aa1403e0 (f8626af9) 
Jul 15 10:20:18 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 10:20:18 m1pro kernel: Unable to handle kernel paging request at virtual address 006120492079636d
Jul 15 10:20:18 m1pro kernel: Mem abort info:
Jul 15 10:20:18 m1pro kernel:   ESR = 0x0000000096000004
Jul 15 10:20:18 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 10:20:18 m1pro kernel:   SET = 0, FnV = 0
Jul 15 10:20:18 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 10:20:18 m1pro kernel:   FSC = 0x04: level 0 translation fault
Jul 15 10:20:18 m1pro kernel: Data abort info:
Jul 15 10:20:18 m1pro kernel:   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
Jul 15 10:20:18 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 10:20:18 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 10:20:18 m1pro kernel: [006120492079636d] address between user and kernel address ranges
Jul 15 10:20:18 m1pro kernel: Internal error: Oops: 0000000096000004 [#2] PREEMPT SMP

Going back to 6.9.5 brings back a stable system.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions