Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to resume from suspend with Radeon RX 560 card #5459

Closed
talex5 opened this issue Nov 12, 2019 · 7 comments
Closed

Fails to resume from suspend with Radeon RX 560 card #5459

talex5 opened this issue Nov 12, 2019 · 7 comments
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: power management hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@talex5
Copy link

talex5 commented Nov 12, 2019

Qubes OS version

Qubes release 4.0 (R4.0) (a recent install of 4.0.2-rc2, with latest dom0 updates)

Affected component(s) or functionality

Resume from suspend.

Brief summary

Since installing a Radeon RX 560 graphics card, the machine does not resume from suspend. Before that, it was working fine. Logs show a warning in switch_mm_irqs_off.

To Reproduce

Steps to reproduce the behavior:
_1. Choose Suspend from the menu.
_2. Try to resume, using either the power button or the keyboard.

Expected behavior

Lock screen appears.

Actual behavior

Screens light up as if waking from sleep, but no picture appears. Pressing CapsLock does not toggle the keyboard LED.

Additional context

The logs (for kernel inux version 4.19.81-1.pvops.qubes.x86_64 (user@build-fedora4)) show:

Nov 12 12:35:32 dom0 52qubes-pause-vms[7253]: 0
Nov 12 12:35:32 dom0 systemd[1]: Started Qubes suspend hooks.
Nov 12 12:35:32 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 12 12:35:32 dom0 systemd[1]: Reached target Sleep.
Nov 12 12:35:32 dom0 kernel: audit: type=1130 audit(1573562132.231:167): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 12 12:35:32 dom0 systemd[1]: Starting Suspend...
Nov 12 12:35:32 dom0 systemd-sleep[7265]: Suspending system...
Nov 12 12:35:32 dom0 kernel: PM: suspend entry (deep)
Nov 12 12:35:32 dom0 kernel: PM: Syncing filesystems ... done.
Nov 12 14:06:55 dom0 kernel: Freezing user space processes ... (elapsed 0.000 seconds) done.
Nov 12 14:06:55 dom0 kernel: OOM killer disabled.
Nov 12 14:06:55 dom0 kernel: Freezing remaining freezable tasks ... (elapsed 0.097 seconds) done.
Nov 12 14:06:55 dom0 kernel: Suspending console(s) (use no_console_suspend to debug)
Nov 12 14:06:55 dom0 kernel: PM: suspend devices took 0.701 seconds
Nov 12 14:06:55 dom0 kernel: ACPI: Preparing to enter system sleep state S3
Nov 12 14:06:55 dom0 kernel: PM: Saving platform NVS memory
Nov 12 14:06:55 dom0 kernel: Disabling non-boot CPUs ...
Nov 12 14:06:55 dom0 kernel: IRQ 195: no longer affine to CPU1
Nov 12 14:06:55 dom0 kernel: IRQ 198: no longer affine to CPU1
Nov 12 14:06:55 dom0 kernel: IRQ 199: no longer affine to CPU1
Nov 12 14:06:55 dom0 kernel: IRQ 200: no longer affine to CPU1
Nov 12 14:06:55 dom0 kernel: IRQ 201: no longer affine to CPU1
Nov 12 14:06:55 dom0 kernel: IRQ 222: no longer affine to CPU1
Nov 12 14:06:55 dom0 kernel: WARNING: CPU: 1 PID: 0 at /home/user/rpmbuild/BUILD/kernel-4.19.81/linux-4.19.81/arch/x86/mm/tlb.c:303 switch_mm_irqs_off+0x1f9/0x630
Nov 12 14:06:55 dom0 kernel: Modules linked in: loop ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp iTCO_wdt iTCO_vendor_support coretemp wmi_bmof intel_wmi_thunderbolt mxm_wmi intel_rapl_perf btusb pcspkr iwlwifi btrtl btbcm snd_hda_codec_realtek btintel e1000e i2c_i801 snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel bluetooth snd_hda_codec cfg80211 snd_hda_core mei_me ecdh_generic mei rfkill snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore intel_pch_thermal wmi pinctrl_cannonlake video pinctrl_intel xen_acpi_processor xenfs dm_thin_pool dm_persistent_data libcrc32c dm_bio_prison dm_crypt amdkfd amd_iommu_v2 crct10dif_pclmul crc32_pclmul crc32c_intel amdgpu chash i2c_algo_bit gpu_sched ghash_clmulni_intel drm_kms_helper nvme ttm nvme_core
Nov 12 14:06:55 dom0 kernel:  xhci_pci drm xhci_hcd xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
Nov 12 14:06:55 dom0 kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.81-1.pvops.qubes.x86_64 #1
Nov 12 14:06:55 dom0 kernel: Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS PRO WIFI/Z390 AORUS PRO WIFI-CF, BIOS F10 06/05/2019
Nov 12 14:06:55 dom0 kernel: RIP: e030:switch_mm_irqs_off+0x1f9/0x630
Nov 12 14:06:55 dom0 kernel: Code: 48 01 ca 0f 82 b4 03 00 00 48 c7 c1 00 00 00 80 48 2b 0d 6a ac 0f 01 48 01 ca 48 0b 15 b0 2b 1c 01 48 39 d0 0f 84 9b fe ff ff <0f> 0b 48 8b 05 12 d9 2e 01 f6 c4 20 0f 85 94 03 00 00 e8 40 40 fa
Nov 12 14:06:55 dom0 kernel: RSP: e02b:ffffc9000170fe78 EFLAGS: 00010087
Nov 12 14:06:55 dom0 kernel: RAX: 000000000220a000 RBX: ffffffff82279bc0 RCX: 0000777f80000000
Nov 12 14:06:55 dom0 kernel: RDX: 000000017da44000 RSI: ffffffff82279bc0 RDI: ffff8881b65ae600
Nov 12 14:06:55 dom0 kernel: RBP: ffff8881b65ae600 R08: 0000000000000000 R09: ffff8881bd800490
Nov 12 14:06:55 dom0 kernel: R10: ffffc9000170fee8 R11: 0000000000000040 R12: ffff8881bd7f0000
Nov 12 14:06:55 dom0 kernel: R13: ffff8881bd7f8000 R14: 0000000000000001 R15: 0000000000000000
Nov 12 14:06:55 dom0 kernel: FS:  0000000000000000(0000) GS:ffff8881c2040000(0000) knlGS:0000000000000000
Nov 12 14:06:55 dom0 kernel: CS:  e033 DS: 002b ES: 002b CR0: 0000000080050033
Nov 12 14:06:55 dom0 kernel: CR2: ffff8000002000f8 CR3: 000000000220a000 CR4: 0000000000042660
Nov 12 14:06:55 dom0 kernel: Call Trace:
Nov 12 14:06:55 dom0 kernel:  switch_mm+0x1c/0x30
Nov 12 14:06:55 dom0 kernel:  idle_task_exit+0x45/0x70
Nov 12 14:06:55 dom0 kernel:  play_dead_common+0xa/0x20
Nov 12 14:06:55 dom0 kernel:  xen_pv_play_dead+0xa/0x60
Nov 12 14:06:55 dom0 kernel:  do_idle+0x198/0x260
Nov 12 14:06:55 dom0 kernel:  cpu_startup_entry+0x6f/0x80
Nov 12 14:06:55 dom0 kernel: ---[ end trace bafd9d9c0a94a80f ]---
Nov 12 14:06:55 dom0 kernel: smpboot: CPU 1 is now offline
Nov 12 14:06:55 dom0 kernel: IRQ 216: no longer affine to CPU2
Nov 12 14:06:55 dom0 kernel: IRQ 218: no longer affine to CPU2
Nov 12 14:06:55 dom0 kernel: IRQ 220: no longer affine to CPU2
Nov 12 14:06:55 dom0 kernel: IRQ 223: no longer affine to CPU2
Nov 12 14:06:55 dom0 kernel: smpboot: CPU 2 is now offline
Nov 12 14:06:55 dom0 kernel: smpboot: CPU 3 is now offline
Nov 12 14:06:55 dom0 kernel: smpboot: CPU 4 is now offline
Nov 12 14:06:55 dom0 kernel: smpboot: CPU 5 is now offline
Nov 12 14:06:55 dom0 kernel: smpboot: CPU 6 is now offline
Nov 12 14:06:55 dom0 kernel: smpboot: CPU 7 is now offline
Nov 12 14:06:55 dom0 kernel: ACPI: Low-level resume complete
Nov 12 14:06:55 dom0 kernel: PM: Restoring platform NVS memory
Nov 12 14:06:55 dom0 kernel: xen_acpi_processor: Uploading Xen processor PM info
Nov 12 14:06:55 dom0 kernel: Enabling non-boot CPUs ...
Nov 12 14:06:55 dom0 kernel: installing Xen timer for CPU 1
Nov 12 14:06:55 dom0 kernel:  cache: parent cpu1 should not be sleeping
Nov 12 14:06:55 dom0 kernel: cpu 1 spinlock event irq 133
Nov 12 14:06:55 dom0 kernel: CPU1 is up
Nov 12 14:06:55 dom0 kernel: installing Xen timer for CPU 2
Nov 12 14:06:55 dom0 kernel:  cache: parent cpu2 should not be sleeping
Nov 12 14:06:55 dom0 kernel: cpu 2 spinlock event irq 140
Nov 12 14:06:55 dom0 kernel: CPU2 is up
Nov 12 14:06:55 dom0 kernel: installing Xen timer for CPU 3
Nov 12 14:06:55 dom0 kernel:  cache: parent cpu3 should not be sleeping
Nov 12 14:06:55 dom0 kernel: cpu 3 spinlock event irq 147
Nov 12 14:06:55 dom0 kernel: CPU3 is up
Nov 12 14:06:55 dom0 kernel: installing Xen timer for CPU 4
Nov 12 14:06:55 dom0 kernel:  cache: parent cpu4 should not be sleeping
Nov 12 14:06:55 dom0 kernel: cpu 4 spinlock event irq 154
Nov 12 14:06:55 dom0 kernel: MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
Nov 12 14:06:55 dom0 kernel: CPU4 is up
Nov 12 14:06:55 dom0 kernel: installing Xen timer for CPU 5
Nov 12 14:06:55 dom0 kernel:  cache: parent cpu5 should not be sleeping
Nov 12 14:06:55 dom0 kernel: cpu 5 spinlock event irq 161
Nov 12 14:06:55 dom0 kernel: CPU5 is up
Nov 12 14:06:55 dom0 kernel: installing Xen timer for CPU 6
Nov 12 14:06:55 dom0 kernel:  cache: parent cpu6 should not be sleeping
Nov 12 14:06:55 dom0 kernel: cpu 6 spinlock event irq 168
Nov 12 14:06:55 dom0 kernel: CPU6 is up
Nov 12 14:06:55 dom0 kernel: installing Xen timer for CPU 7
Nov 12 14:06:55 dom0 kernel:  cache: parent cpu7 should not be sleeping
Nov 12 14:06:55 dom0 kernel: cpu 7 spinlock event irq 175
Nov 12 14:06:55 dom0 kernel: CPU7 is up
Nov 12 14:06:55 dom0 kernel: ACPI: Waking up from system sleep state S3
Nov 12 14:06:55 dom0 kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Nov 12 14:06:55 dom0 kernel: nvme nvme0: Shutdown timeout set to 8 seconds
Nov 12 14:06:55 dom0 kernel: ata2: SATA link down (SStatus 4 SControl 300)
Nov 12 14:06:55 dom0 kernel: ata4: SATA link down (SStatus 4 SControl 300)
Nov 12 14:06:55 dom0 kernel: ata5: SATA link down (SStatus 4 SControl 300)
Nov 12 14:06:55 dom0 kernel: ata6: SATA link down (SStatus 4 SControl 300)
Nov 12 14:06:55 dom0 kernel: ata1: SATA link down (SStatus 4 SControl 300)
Nov 12 14:06:55 dom0 kernel: ata3: SATA link down (SStatus 4 SControl 300)
Nov 12 14:06:55 dom0 kernel: [drm] UVD and UVD ENC initialized successfully.
Nov 12 14:06:55 dom0 kernel: [drm] VCE initialized successfully.
Nov 12 14:06:55 dom0 kernel: PM: resume devices took 9.701 seconds
Nov 12 14:06:55 dom0 kernel: acpi LNXPOWER:04: Turning OFF
Nov 12 14:06:55 dom0 kernel: OOM killer enabled.
Nov 12 14:06:55 dom0 kernel: Restarting tasks ... done.
Nov 12 14:06:55 dom0 systemd[1]: Time has been changed
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: The canary thread is apparently starving. Taking action.
Nov 12 14:06:55 dom0 systemd[4580]: Time has been changed
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Demoting known real-time threads.
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Successfully demoted thread 5464 of process 4761 (/usr/bin/pulseaudio).
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Successfully demoted thread 4767 of process 4761 (/usr/bin/pulseaudio).
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Successfully demoted thread 4766 of process 4761 (/usr/bin/pulseaudio).
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Successfully demoted thread 4761 of process 4761 (/usr/bin/pulseaudio).
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Demoted 4 threads.
Nov 12 14:06:55 dom0 qmemman.daemon.algo[3265]: balance_when_enough_memory(xen_free_memory=28643408115, total_mem_pref=2085557862.4, total_available_memory=30915732108.6)
Nov 12 14:06:55 dom0 qmemman.systemstate[3265]: stat: dom '0' act=4294967296 pref=1886328422.4 last_target=4294967296
Nov 12 14:06:55 dom0 qmemman.systemstate[3265]: stat: dom '3' act=62914560 pref=199229440 last_target=62914560
Nov 12 14:06:55 dom0 qmemman.systemstate[3265]: stat: xenfree=28695836915 memset_reqs=[('0', 4294967296), ('3', 62914560)]
Nov 12 14:06:55 dom0 qmemman.systemstate[3265]: mem-set domain 0 to 4294967296
Nov 12 14:06:55 dom0 qmemman.systemstate[3265]: mem-set domain 3 to 62914560
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Supervising 0 threads of 0 processes of 0 users.
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Recovering from system lockup, not allowing further RT threads.
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Supervising 0 threads of 0 processes of 0 users.
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Recovering from system lockup, not allowing further RT threads.
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Supervising 0 threads of 0 processes of 0 users.
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Recovering from system lockup, not allowing further RT threads.
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Supervising 0 threads of 0 processes of 0 users.
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Recovering from system lockup, not allowing further RT threads.
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Supervising 0 threads of 0 processes of 0 users.
Nov 12 14:06:55 dom0 rtkit-daemon[4762]: Recovering from system lockup, not allowing further RT threads.
Nov 12 14:06:55 dom0 systemd-sleep[7265]: System resumed.
Nov 12 14:06:55 dom0 kernel: PM: suspend exit
Nov 12 14:06:55 dom0 systemd[1]: Started Suspend.
Nov 12 14:06:55 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 12 14:06:55 dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 12 14:06:55 dom0 systemd[1]: sleep.target: Unit not needed anymore. Stopping.
Nov 12 14:06:55 dom0 kernel: audit: type=1130 audit(1573567615.805:168): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 12 14:06:55 dom0 kernel: audit: type=1131 audit(1573567615.805:169): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 12 14:06:55 dom0 systemd[1]: Stopped target Sleep.
Nov 12 14:06:55 dom0 systemd[1]: qubes-suspend.service: Unit not needed anymore. Stopping.
Nov 12 14:06:55 dom0 systemd[1]: Stopping Qubes suspend hooks...
Nov 12 14:06:55 dom0 systemd[1]: Reached target Suspend.
Nov 12 14:06:55 dom0 systemd-logind[3292]: Operation 'sleep' finished.
Nov 12 14:06:55 dom0 systemd[1]: suspend.target: Unit is bound to inactive unit systemd-suspend.service. Stopping, too.
Nov 12 14:06:55 dom0 systemd[1]: Stopped target Suspend.
Nov 12 14:06:56 dom0 52qubes-pause-vms[7329]: 0
Nov 12 14:06:56 dom0 systemd[1]: Stopped Qubes suspend hooks.
Nov 12 14:06:56 dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 12 14:06:56 dom0 kernel: audit: type=1131 audit(1573567616.536:170): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 12 14:07:08 dom0 systemd-logind[3292]: Power key pressed.
-- Reboot --

A comment before the warning at https://elixir.bootlin.com/linux/v4.19.81/source/arch/x86/mm/tlb.c#L303 says:

 	 * Verify that CR3 is what we think it is.  This will catch
	 * hypothetical buggy code that directly switches to swapper_pg_dir
	 * without going through leave_mm() / switch_mm_irqs_off() or that
	 * does something like write_cr3(read_cr3_pa()).

A slightly earlier kernel (Linux version 4.19.80-1.pvops.qubes.x86_64 (user@build-fedora4)) failed with:

Nov 11 13:24:47 dom0 kernel: Call Trace:
Nov 11 13:24:47 dom0 kernel:  amdgpu_dm_atomic_commit_tail+0x81d/0xee0 [amdgpu]
Nov 11 13:24:47 dom0 kernel:  commit_tail+0x3d/0x70 [drm_kms_helper]
Nov 11 13:24:47 dom0 kernel:  drm_atomic_helper_commit+0xb4/0x120 [drm_kms_helper]
Nov 11 13:24:47 dom0 kernel:  restore_fbdev_mode_atomic+0x173/0x1d0 [drm_kms_helper]
Nov 11 13:24:47 dom0 kernel:  drm_fb_helper_restore_fbdev_mode_unlocked+0x45/0x90 [drm_kms_helper]
Nov 11 13:24:47 dom0 kernel:  drm_fb_helper_set_par+0x29/0x50 [drm_kms_helper]
Nov 11 13:24:47 dom0 kernel:  drm_fb_helper_hotplug_event.part.33+0x90/0xb0 [drm_kms_helper]
Nov 11 13:24:47 dom0 kernel:  drm_fb_helper_restore_fbdev_mode_unlocked+0x70/0x90 [drm_kms_helper]
Nov 11 13:24:47 dom0 kernel:  drm_fb_helper_set_par+0x29/0x50 [drm_kms_helper]
Nov 11 13:24:47 dom0 kernel:  fb_set_var+0x216/0x410
Nov 11 13:24:47 dom0 kernel:  ? __update_load_avg_cfs_rq+0x127/0x250
Nov 11 13:24:47 dom0 kernel:  ? __update_load_avg_cfs_rq+0x127/0x250
Nov 11 13:24:47 dom0 kernel:  fbcon_blank+0x2f1/0x330
Nov 11 13:24:47 dom0 kernel:  do_unblank_screen+0xd2/0x1c0
Nov 11 13:24:47 dom0 kernel:  complete_change_console+0x54/0xd0
Nov 11 13:24:47 dom0 kernel:  vt_ioctl+0x68d/0x11b0
Nov 11 13:24:47 dom0 kernel:  tty_ioctl+0xec/0x8c0
Nov 11 13:24:47 dom0 kernel:  do_vfs_ioctl+0xa2/0x640
Nov 11 13:24:47 dom0 kernel:  ? syscall_trace_enter+0x1ae/0x2c0
Nov 11 13:24:47 dom0 kernel:  ksys_ioctl+0x70/0x80
Nov 11 13:24:47 dom0 kernel:  __x64_sys_ioctl+0x16/0x20
Nov 11 13:24:47 dom0 kernel:  do_syscall_64+0x5b/0x190
Nov 11 13:24:47 dom0 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Solutions you've tried

None. I should probably try a normal Linux distro on this machine and see if that's affected too, but filing this now in case someone else has the same problem.

Relevant documentation you've consulted

https://groups.google.com/d/msg/qubes-users/mdC3B-y0Fzk/Ssuy3F1eAgAJ says that

Most, if not all, Radeon GPUs based on Polaris 11 (Radeon RX 460/560 / Pro 450/455/460/560) attempting to install Qubes 4.0.2-rc1 ends up with the error "X startup failed, aborting installation".

However, I didn't have the card when I installed, so maybe that's why this didn't affect me. The integrated graphics worked fine, but unfortunately this motherboard only has one display port (and only at 30 Hz) so it's not very usable without a graphics card.

Related, non-duplicate issues

none

HCL report

Qubes release 4.0 (R4.0)

Brand:		Gigabyte Technology Co., Ltd.
Model:		Z390 AORUS PRO WIFI
BIOS:		F10

Xen:		4.8.5-11.fc25
Kernel:		4.19.81-1

RAM:		32701 Mb

CPU:
  Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
Chipset:
  Intel Corporation Device [8086:3e30] (rev 0d)
VGA:
  Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 560] [1002:67ff] (rev ff) (prog-if 00 [VGA controller])

Net:
  Intel Corporation Device a370 (rev 10)
  Intel Corporation Ethernet Connection (7) I219-V (rev 10)

SCSI:


HVM:		Active
I/O MMU:	Active
HAP/SLAT:	Yes
TPM:		Device not found
Remapping:	yes
@talex5 talex5 added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Nov 12, 2019
@andrewdavidwong andrewdavidwong added this to the Release 4.0 updates milestone Nov 13, 2019
@zby
Copy link

zby commented Dec 11, 2019

I have a similar problem - but with different hardware:


Qubes release 4.0 (R4.0)

Brand:		HP
Model:		HP Z4 G4 Workstation
BIOS:		P61 v02.40

Xen:		4.8.5-10.fc25
Kernel:		4.19.82-1

RAM:		65263 Mb

CPU:
  Intel(R) Xeon(R) W-2125 CPU @ 4.00GHz
Chipset:
  Intel Corporation Sky Lake-E DMI3 Registers [8086:2020] (rev 04)
VGA:
  Intel Corporation Device [8086:a2a4]
  NVIDIA Corporation GP104GL [Quadro P4000] [10de:1bb1] (rev a1) (prog-if 00 [VGA controller])

Net:
  Intel Corporation Ethernet Connection (2) I219-LM
  Intel Corporation I210 Gigabit Network Connection (rev 03)
  Intel Corporation Wireless 8260 (rev 3a)

SCSI:
  WDC WD10EZEX-60W Rev: 1A01
  DVDRW  GUD1N     Rev: LD04

HVM:		Active
I/O MMU:	Active
HAP/SLAT:	Unknown ("xl dmesg" incomplete)
TPM:		Device not found
Remapping:	yes


@talex5
Copy link
Author

talex5 commented Jul 30, 2020

I tried this with a Radeon PRO WX 2100, with the same results. This bug is still present with kernel-latest-5.6.16-1 and kernel-latest-5.7.10-1 (from qubes-dom0-current-testing.

With 5.6.16-1:

  • Passing kernel boot option rd.qubes.hide_pci=01:00.0,01:00.1 prevents the problem (but the card can't be used).
  • Having the PCI devices assigned to a VM prevents the problem, even if the amdgpu module is loaded in dom0 (the device is assigned to pciback rather than amdgpu in this case). The card still can't be used from dom0 in this case, of course. Attempting to use it from the AppVM crashed the host with an earlier kernel; I haven't tried it since.
  • If the card is managed by the amdgpu module in dom0 (so that /dev/dri/card1 appears) but not used by xorg, then the machine resumes from suspend, but it takes several seconds longer to resume and it writes a kernel stack-trace to the journal.
  • Setting amdgpu.dc=0 doesn't seem to make any difference.

After booting Qubes with Linux 5.7.10-1 (Linux version 5.7.10-1.qubes.x86_64 (user@build-fedora4) (gcc version 6.4.1 20170727 (Red Hat 6.4.1-1) (GCC), GNU ld version 2.26.1-1.fc25) #1 SMP Sun Jul 26 01:06:51 UTC 2020), I immediately suspended and then resumed the machine. Some highlights from the dom0 journal:

Jul 30 09:51:32 dom0 systemd[1]: Starting Suspend...
Jul 30 09:51:50 dom0 kernel: WARNING: CPU: 1 PID: 0 at /home/user/rpmbuild/BUILD/kernel-latest-5.7.10/linux-5.7.10/arch/x86/mm/tlb.c:309 switch_mm_irqs_off+0x21b/0x6b0
Jul 30 09:51:50 dom0 kernel: Call Trace:
Jul 30 09:51:50 dom0 kernel:  switch_mm+0x1c/0x30
Jul 30 09:51:50 dom0 kernel:  play_dead_common+0xa/0x20
Jul 30 09:51:50 dom0 kernel:  xen_pv_play_dead+0xa/0x60
Jul 30 09:51:50 dom0 kernel:  do_idle+0x1b6/0x290
Jul 30 09:51:50 dom0 kernel:  cpu_startup_entry+0x19/0x20
Jul 30 09:51:50 dom0 kernel:  cpu_bringup_and_idle+0x7a/0xa0
Jul 30 09:51:50 dom0 kernel:  asm_cpu_bringup_and_idle+0x5/0x1000
Jul 30 09:51:50 dom0 kernel: ACPI: Waking up from system sleep state S3
Jul 30 09:51:50 dom0 kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Jul 30 09:51:50 dom0 kernel: [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
Jul 30 09:51:50 dom0 kernel: [drm:retrieve_link_cap [amdgpu]] *ERROR* retrieve_link_cap: Read dpcd data failed.
Jul 30 09:51:50 dom0 kernel: [drm] VCE initialized successfully.
Jul 30 09:51:50 dom0 kernel: [drm] Fence fallback timer expired on ring gfx
Jul 30 09:51:50 dom0 kernel: PM: resume devices took 8.148 seconds
Jul 30 09:51:50 dom0 kernel: PM: suspend exit
Jul 30 09:51:50 dom0 rtkit-daemon[5550]: The canary thread is apparently starving. Taking action.
Jul 30 09:51:50 dom0 systemd-coredump[6413]: Process 6352 (systemd-sleep) of user 0 dumped core.
                                             Stack trace of thread 6352:
                                             #0  0x000078543d5713f9 _IO_ferror (libc.so.6)
                                             #1  0x000078543de1af3c fflush_and_check (libsystemd-shared-231.so)
                                             #2  0x000061151fff33c8 main (systemd-sleep)
                                             #3  0x000078543d51a431 __libc_start_main (libc.so.6)
                                             #4  0x000061151fff366a _start (systemd-sleep)

Suggestions on how to debug this welcome...

@andrewdavidwong andrewdavidwong added hardware support needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Jul 30, 2020
@pabpas
Copy link

pabpas commented Oct 11, 2020

I solved a similar issue with a Radeon RX 480 by using less aggressive settings on BIOS for ASPM (PCIe power saving). In my case setting everything to L1 is enough to resume from supend without problem, but maybe you can try to disable it altogether.

@talex5
Copy link
Author

talex5 commented Oct 12, 2020

Interesting! I checked my settings and found ASPM was already disabled. I tried enabling it (and the PEG ASPM setting that doing that revealed), but it didn't help. The computer still crashes when resuming, unless I pass rd.qubes.hide_pci=01:00.0,01:00.1 to disable the graphics card completely.

@andrewdavidwong
Copy link
Member

Is this still a problem in 4.1?

@talex5
Copy link
Author

talex5 commented Apr 8, 2023

In the end, I switched from xen to kvm (https://roscidus.com/blog/blog/2021/03/07/qubes-lite-with-kvm-and-wayland/), which fixed the problem for me.

@andrewdavidwong
Copy link
Member

Closing. If anyone else is still affected by this issue, please leave a comment, and we'll be happy to reopen this. Thank you.

@andrewdavidwong andrewdavidwong closed this as not planned Won't fix, can't repro, duplicate, stale Apr 9, 2023
@andrewdavidwong andrewdavidwong removed the needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. label Apr 9, 2023
@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: power management hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

4 participants