Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upHeavy system instability after suspend - cause possibly identified #3359
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Aekez
Nov 30, 2017
I did a full test yesterday with 20 suspends, and there was no issue.
Today, it happened again, even though I had removed the blacklist module fix. The only difference, is this time I moved the laptop around physically in my bag.
Apparently the blacklist of these modules do not cause the issue, but it certainly still does aggravate it.
Now I only get it sometimes (1 out of 20 times so far), instead of all the time.
Though it seems like it's a little different now, like it only happens when I move my laptop in my bag, but not when I test it on the table.
Could it be a screen sleeping sensor that causes the conflict perhaps? It's a laptop / tablet hybrid, it has a suspend sensor next to the camera. Maybe this could explain why it seems to happen only when moving now. But I wouldn't know, at the very least it doesn't appear like a loose connection in the hardware.
- Removing blacklist as explained in the primary post above, fixes it mostly.
- Suspend without closing the laptop lid and without moving it around, seems to keep it stable too. But it's too early to tell if this is a factor or not.
Aekez
commented
Nov 30, 2017
|
I did a full test yesterday with 20 suspends, and there was no issue. Apparently the blacklist of these modules do not cause the issue, but it certainly still does aggravate it. Though it seems like it's a little different now, like it only happens when I move my laptop in my bag, but not when I test it on the table.
|
andrewdavidwong
added
bug
C: core
labels
Dec 1, 2017
andrewdavidwong
added this to the Release 4.0 milestone
Dec 1, 2017
danjjeff
referenced this issue
Dec 13, 2017
Closed
Input/Output Errors and PCI devices unavailable after suspend #3049
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Aekez
Jan 17, 2018
As of the last 10 days (daily suspending multiple of times), Qubes 4 RC-2 fully updated (current testing), has on this one system been stable regarding any issues above.
I'm not entirely sure which day, or update, that fixed it, but it's probably longer back than the last 10 days. I can only speak for my own system setup, but as of this time, the above is no longer an issue.
Also I'm able to use the wi-fi driver fix "Automatically reloading drivers on suspend/resume" @ https://www.qubes-os.org/doc/wireless-troubleshooting/ again, without issues.
Also, not sure what happened about the battery issue during the suspend, but it's pretty good now. It can last over the whole weekend in suspend mode. All in all, it just seems to work smooth now for this laptop. For reference, this laptop/tablet is an Asus T300 Chi.
Aekez
commented
Jan 17, 2018
|
As of the last 10 days (daily suspending multiple of times), Qubes 4 RC-2 fully updated (current testing), has on this one system been stable regarding any issues above. I'm not entirely sure which day, or update, that fixed it, but it's probably longer back than the last 10 days. I can only speak for my own system setup, but as of this time, the above is no longer an issue. Also I'm able to use the wi-fi driver fix "Automatically reloading drivers on suspend/resume" @ https://www.qubes-os.org/doc/wireless-troubleshooting/ again, without issues. Also, not sure what happened about the battery issue during the suspend, but it's pretty good now. It can last over the whole weekend in suspend mode. All in all, it just seems to work smooth now for this laptop. For reference, this laptop/tablet is an Asus T300 Chi. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Jan 18, 2018
Member
Closing this for now as it appears that the issue no longer affects any Qubes users. If you believe this is a mistake, or if anyone is still affected by this issue, please leave a comment, and we'll be happy to reopen this. Thank you.
|
Closing this for now as it appears that the issue no longer affects any Qubes users. If you believe this is a mistake, or if anyone is still affected by this issue, please leave a comment, and we'll be happy to reopen this. Thank you. |
Aekez commentedNov 30, 2017
Qubes OS version:
Qubes 4 RC-2
Affected TemplateVMs:
sys-net
Steps to reproduce the behavior:
This issue is triggered by following the Qubes driver module blacklist reload guide, i.e. regain internet after suspend / hibernation. Specifically, the last headline section in the guide, about putting this action into automatic process by blacklisting the drivers in suspend/hibernation, found here
https://www.qubes-os.org/doc/wireless-troubleshooting/
Doing this manually is not a problem, it's the automatic process in the blacklist configuration file that is the root of the problem. Removing the driver module blacklisting, and the system works flawlessly again. Except, of course, the driver issue the blacklist solves originally.
Driver modules are iwlwifi and iwlmvm, same as the ones in the guide, on this particular machine.
Unfortunately I did not manage to find time to try this out in the new release Qubes 4 RC-3, but if not resolved by then, I'll update this thread when I get around to try Qubes 4 RC-3.
I can reproduce this issue 100% of the time simply by putting these drivers in the autoamtic blacklist and then suspend or hibernate.
System is otherwise completely stable. It's purely happening by the above steps, and purely after a hibernate or suspend.
Expected behavior:
Actual behavior:
Interface freezes or graphical server appears to be collapsing and quickly becomes unstable after returning from suspend or hibernation, while using the automatic driver module blacklist solution in the Qubes guide. It appears to be purely a graphical collapse, everything otherwise seems to still work, it's just not visible on the screen. I.e. I can still run qubes-dom0-update, so sys-net and sys-firewall is still working, except all VM graphics are either frozen or gone (depending on scenario), including network widgets or other menu icons originating from within any VM. I assume the other AppVM's still work too, though not visibly on the screen. The exception is when the entire line of user processes collapses and kicks me back to the login screen, or kernel panic that causes reboot.
It varies how quickly it happens, but it's usually quick after returning from suspend or hibernation. Often only the graphical process that I was doing, that remains working, while the remaining interface freezes. Nothing works except the graphic "action" I was doing the moment of freeze. See below for examples in the most common type of issues. The lesser common issues are unrelated to freezes, but appears related to the root cause as well.
Most common symptoms after wake
Lesser common symptoms but happens frequently enough (1-2 times a day with some 5-6 suspends).
It's 100% reproduce-able on this particular machine, if the common does not happen, then a lesser common happens. It's always one of the two types of scenarios.
General notes:
This issue appears to start happening sometime 10 days before the Qubes 4 RC-3 release. I believe it was some of the python code that fixed some of the VM issues back then. Possibly? This was the testing repository.
The wireless driver guide, linked above, used to work flawlessly in Qubes 3.2 and Qubes 4 RC-2 as well, until the above update came around.
Undoing the automatic driver blacklist fixes everything above, but instead leaves the user without internet after suspend / hibernation.
I found a workable workaround solution by making a bash file in sys-net to rmmod and modprobe the drivers in question, which I trigger by keybinding "qvm-run sys-net 'bash wifi-resume' " with the dom0 xfce4 keyboard tools. Doing a manual keybind triggered driver reset in sys-net causes no issues, it's entirely the automatic blacklisting that causes the issue.
Another workaround I found was to make sure sys-net was not running when suspend / hibernate. Any other VM can run just fine, as long as sys-net is not running, the above issue does not happen.
I'll be happy to provide with whichever information, logs or otherwise to help solve the solution. I'm content with my manual workaround and does not seek a solution as such, but I'll be glad to help with whatever I can.
Related issues:
None that I can think of, I have never seen this issue before. It seems unique, and in relation to the python code update in testing repository, specifically those some 10 days before Qubes 4 RC-3 release, possibly.