New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computer doesn't recover from suspend state #3705

Open
lead4good opened this Issue Mar 16, 2018 · 19 comments

Comments

Projects
None yet
7 participants
@lead4good

lead4good commented Mar 16, 2018

Qubes OS version:

R4.0 rc5

Affected component(s):

Resume after suspend


Steps to reproduce the behavior:

About one of five times I put my laptop in suspend state, it fails to recover with screen staying black and fans spinning up to max.

Expected behavior:

Resume after suspend should work without sys-usb started.

Actual behavior:

Computer fails to recover with screen staying black and fans spinning up to max.

General notes:

  • xiaomi notebook pro 15
  • Kabylake R u8550u
  • usb 2.0, 3.0 and 3.1

A few times I was able to get the computer to recover by eg. plugging in the power cord or holding down the power button for a few seconds. Below you can see the sys log of the last time it happend. Be aware that I had tlp activated and a script that modifies intel p states and turbo mode depending on whether the charger was plugged in or not. The problem persists without, however I was not able to recover yet without tpl and the script enabled. I don't think these things are related however since the cpu soft lockup happens much earlier than the scripts in the wakeup process.
If the computer had successfully recovered such a lockup I wasn't able to reproduce the problem until reboot.

Syslog after recovery


Related issues:

#3689

@toserk

This comment has been minimized.

Show comment
Hide comment
@toserk

toserk Mar 19, 2018

I have same issue on HP Probook 450 G5 (i7-8550u). I noticed that I get a bunch of ACPI (Method not supported…) errors on qubes boot. So, I tried to install windows and official HP drivers to check if this is a hardware problem. And I got ACPI.sys problem, described in this post
https://h30434.www3.hp.com/t5/Business-Notebooks/ProBook-450-with-high-CPU-usage/td-p/6520063
This error has very similar behavior, except for the system freeze. It occurs with some probability after suspend/resume, and fan works at max speed when it occurs.
Maybe this information will help determine what the problem is.

toserk commented Mar 19, 2018

I have same issue on HP Probook 450 G5 (i7-8550u). I noticed that I get a bunch of ACPI (Method not supported…) errors on qubes boot. So, I tried to install windows and official HP drivers to check if this is a hardware problem. And I got ACPI.sys problem, described in this post
https://h30434.www3.hp.com/t5/Business-Notebooks/ProBook-450-with-high-CPU-usage/td-p/6520063
This error has very similar behavior, except for the system freeze. It occurs with some probability after suspend/resume, and fan works at max speed when it occurs.
Maybe this information will help determine what the problem is.

@lead4good

This comment has been minimized.

Show comment
Hide comment
@lead4good

lead4good Mar 21, 2018

@toserk can you dump your dmesg here so i can compare with mine? Do you experience similiar problems as in #3689?

@toserk can you dump your dmesg here so i can compare with mine? Do you experience similiar problems as in #3689?

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 21, 2018

Member

This might be BIOS bug, are there any BIOS updates available for this machine?

Member

marmarek commented Mar 21, 2018

This might be BIOS bug, are there any BIOS updates available for this machine?

@toserk

This comment has been minimized.

Show comment
Hide comment
@toserk

toserk Mar 21, 2018

I have latest available versions of BIOS, USB 3.1 Controller firmware, and Intel Management Engine firmware.
This is the dmesg dump with suspend/resume cycles until system freeze
dmesg1.txt

I'm not sure that this is not a coincidence, but every time I tried to check #3689 I got freeze on first suspend/resume.

toserk commented Mar 21, 2018

I have latest available versions of BIOS, USB 3.1 Controller firmware, and Intel Management Engine firmware.
This is the dmesg dump with suspend/resume cycles until system freeze
dmesg1.txt

I'm not sure that this is not a coincidence, but every time I tried to check #3689 I got freeze on first suspend/resume.

@lead4good

This comment has been minimized.

Show comment
Hide comment
@lead4good

lead4good Mar 26, 2018

Comparing my dmesg I cannot find similiar ACPI errors in my log:
dmesg.txt

However there is one line which is similar:

ACPI BIOS Warning (bug): Incorrect checksum in table [FACP] - 0x21, should be 0x57 (20170728/tbprint-211)

Could a wrong value in the Fixed ACPI Description Table be responsible for this behavior? I've extracted the facp table and disassembled it.
facp.dsl.txt
Inside there are actually values for the "sleep status register" and "sleep control register". I might be able to fix the FACP bug, but I might also try some different bioses of this laptop and see, whether the behavior stays the same.

Comparing my dmesg I cannot find similiar ACPI errors in my log:
dmesg.txt

However there is one line which is similar:

ACPI BIOS Warning (bug): Incorrect checksum in table [FACP] - 0x21, should be 0x57 (20170728/tbprint-211)

Could a wrong value in the Fixed ACPI Description Table be responsible for this behavior? I've extracted the facp table and disassembled it.
facp.dsl.txt
Inside there are actually values for the "sleep status register" and "sleep control register". I might be able to fix the FACP bug, but I might also try some different bioses of this laptop and see, whether the behavior stays the same.

@mirrorway

This comment has been minimized.

Show comment
Hide comment
@mirrorway

mirrorway Mar 27, 2018

My corebooted Thinkpad recently stopped resuming from suspend. Instead, it would restart.

I was able to workaround this by reverting the recent microcode update: dnf remove microcode_ctl and then removing ucode=scan from the Xen command line. I confirmed the microcode was reverted by running cat /proc/cpuinfo | grep microcode before and after.

More people might have this issue the next time they restart their laptops, when they load the new microcode...

mirrorway commented Mar 27, 2018

My corebooted Thinkpad recently stopped resuming from suspend. Instead, it would restart.

I was able to workaround this by reverting the recent microcode update: dnf remove microcode_ctl and then removing ucode=scan from the Xen command line. I confirmed the microcode was reverted by running cat /proc/cpuinfo | grep microcode before and after.

More people might have this issue the next time they restart their laptops, when they load the new microcode...

@lead4good

This comment has been minimized.

Show comment
Hide comment
@lead4good

lead4good Mar 29, 2018

sorry, cant reproduce that. I've run it with stock Kabylake R microcode 0x70 and updated microcode 0x80. Problem persists.

sorry, cant reproduce that. I've run it with stock Kabylake R microcode 0x70 and updated microcode 0x80. Problem persists.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 29, 2018

Member

@lead4good what xen-hypervisor package do you have? 4.8.3-4 have problems with suspend, try downgrading to 4.8.3-3

Member

marmarek commented Mar 29, 2018

@lead4good what xen-hypervisor package do you have? 4.8.3-4 have problems with suspend, try downgrading to 4.8.3-3

@lead4good

This comment has been minimized.

Show comment
Hide comment
@lead4good

lead4good Mar 29, 2018

@marmarek
I've got 4.8.3-3 installed (4.0 r5 without any testing repo updates)

@marmarek
I've got 4.8.3-3 installed (4.0 r5 without any testing repo updates)

@evilaliv3

This comment has been minimized.

Show comment
Hide comment
@evilaliv3

evilaliv3 May 17, 2018

I report this same issue on a Thinkpad T480 and a fresh installed Qubes 4

I managed to get the suspend to work by removing the USB3 controller from sys-usb;
Obviously removing the USB3 controller i'm loosing possibility to attach USB devices so that this represents just a short term fix to enable suspend/resume to work correctly but proper fix should still be identified.

I report this same issue on a Thinkpad T480 and a fresh installed Qubes 4

I managed to get the suspend to work by removing the USB3 controller from sys-usb;
Obviously removing the USB3 controller i'm loosing possibility to attach USB devices so that this represents just a short term fix to enable suspend/resume to work correctly but proper fix should still be identified.

@lead4good

This comment has been minimized.

Show comment
Hide comment
@lead4good

lead4good May 21, 2018

@toserk does your laptop have discrete graphics? An nvidia mx150 by chance?

@toserk does your laptop have discrete graphics? An nvidia mx150 by chance?

@toserk

This comment has been minimized.

Show comment
Hide comment
@toserk

toserk May 22, 2018

@lead4good
No, only Intel igpu (HD 620)

toserk commented May 22, 2018

@lead4good
No, only Intel igpu (HD 620)

@maertsen

This comment has been minimized.

Show comment
Hide comment
@maertsen

maertsen May 22, 2018

I can confirm the observations made by @evilaliv3 (Thinkpad T480, fresh installation, USB3-controller removed fixes issue).

I have experimented with unloading xhcd_pci within sys-usb, as a random guess, but this does not make any difference.

Update: it seems #3689 has more details concerning the T480, I will take my comments there. Sorry for the noise.

maertsen commented May 22, 2018

I can confirm the observations made by @evilaliv3 (Thinkpad T480, fresh installation, USB3-controller removed fixes issue).

I have experimented with unloading xhcd_pci within sys-usb, as a random guess, but this does not make any difference.

Update: it seems #3689 has more details concerning the T480, I will take my comments there. Sorry for the noise.

@evilaliv3

This comment has been minimized.

Show comment
Hide comment
@evilaliv3

evilaliv3 Jun 2, 2018

I finally managed to get this working on my laptop (Thinkpad T480)

It required to configure the USB3 controllers to behave as USB2 controllers

Details on how to achieve this are descrived in: https://www.systutorials.com/241533/how-to-force-a-usb-3-0-port-to-work-in-usb-2-0-mode-in-linux/

Specifically on Thinkpad T480 this is achievable by issuing:
setpci -H1 -d 8086:7020 d0.l=0
setpci -H1 -d 8086:9d2f d0.l=0

The commands should be executed inside the sys-usb domain.
You could first them and if the fix works you may add them to /etc/rw/rc.local to have them be executed automatically at any boot of the domain.

\cc @maertsen @Scinawa

I finally managed to get this working on my laptop (Thinkpad T480)

It required to configure the USB3 controllers to behave as USB2 controllers

Details on how to achieve this are descrived in: https://www.systutorials.com/241533/how-to-force-a-usb-3-0-port-to-work-in-usb-2-0-mode-in-linux/

Specifically on Thinkpad T480 this is achievable by issuing:
setpci -H1 -d 8086:7020 d0.l=0
setpci -H1 -d 8086:9d2f d0.l=0

The commands should be executed inside the sys-usb domain.
You could first them and if the fix works you may add them to /etc/rw/rc.local to have them be executed automatically at any boot of the domain.

\cc @maertsen @Scinawa

@lead4good

This comment has been minimized.

Show comment
Hide comment
@lead4good

lead4good Jun 4, 2018

As of 4.8.3-8 the problem persists on my hardware.

As of 4.8.3-8 the problem persists on my hardware.

@maertsen

This comment has been minimized.

Show comment
Hide comment
@maertsen

maertsen Jun 10, 2018

I've just tried the workaround suggested by @evilaliv3, both with xen-hypervisor at 4.8.3-3 and 4.8.3-7. In both cases, the systems freezes after wakeup, though I get to type some characters in the xscreensavers password prompt. After the freeze, the fan speeds up.

@evilaliv3, can you state your version of xen-hypervisor and other packages you deem relevant? I see mention of fixes in 4.8.3-8 in #3689, which may or may not be related. I'm waiting for 4.8.3-8 to land in qubes-dom0-current, though can test if required.

I've just tried the workaround suggested by @evilaliv3, both with xen-hypervisor at 4.8.3-3 and 4.8.3-7. In both cases, the systems freezes after wakeup, though I get to type some characters in the xscreensavers password prompt. After the freeze, the fan speeds up.

@evilaliv3, can you state your version of xen-hypervisor and other packages you deem relevant? I see mention of fixes in 4.8.3-8 in #3689, which may or may not be related. I'm waiting for 4.8.3-8 to land in qubes-dom0-current, though can test if required.

@evilaliv3

This comment has been minimized.

Show comment
Hide comment
@evilaliv3

evilaliv3 Jun 10, 2018

I'm with 4.8.3-7

As i wrote i continue to confirm that the fix above fixed the situation for me as the issue no longer happened.

I'm with 4.8.3-7

As i wrote i continue to confirm that the fix above fixed the situation for me as the issue no longer happened.

@maertsen

This comment has been minimized.

Show comment
Hide comment
@maertsen

maertsen Jun 15, 2018

I have just retested with 4.8.3-8. Issue remains.

The workaround to shutdown sys-usb prior to suspend also still works.
The usb 2 downgrade as suggested by @evilaliv3 does not work for me.

I am interested to hear pointers on how to further debug this issue.

I have just retested with 4.8.3-8. Issue remains.

The workaround to shutdown sys-usb prior to suspend also still works.
The usb 2 downgrade as suggested by @evilaliv3 does not work for me.

I am interested to hear pointers on how to further debug this issue.

@maertsen

This comment has been minimized.

Show comment
Hide comment
@maertsen

maertsen Jun 15, 2018

@evilaliv3 I just noticed that the setpci command does not appear to have any effect for 8086:9d2f as verified by lspci -xxxx. It remains at value 02 for d0. Is that different for you?

@evilaliv3 I just noticed that the setpci command does not appear to have any effect for 8086:9d2f as verified by lspci -xxxx. It remains at value 02 for d0. Is that different for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment