-
-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD CPU: Waking from suspend doesn't work without external keyboard #80
Comments
Thanks for the update! I'll include these in the README, as well as the For the lid events, I think we just need to find the correct ID for the AMD version: Can you provide me the output of |
Updated the README, keeping this issue open so we can discuss the lid. |
Full dmidecode output - looks like the SKU to DMI match on is wifi_fw.sh - I should probably rewrite this to drop the 'good' fw image somewhere in |
so actually - I don't think that the gpelid stuff is needed for this, since as it is I see the lid in e: similar issues when suspending in a text-only session, so it's not graphics-related, at least. Also notably, the fn/capslock lights still work in the low-power suspended state |
Having a usb keyboard plugged in (hello, yubikey) ((or probably, any USB device)) causes the suspend to fail into a broken state, not for it to succeed once. The xhci device doesn't enter a low-power state quickly enough on the first time; the next time it succeeds. Dunno why. Quick question before I poke at this more: does resuming on the intel version only work via opening the laptop lid? Or can you just As it is, there appears to just be no way to resume the 15" SL3 right now.
|
So on Intel I'm able to resume by opening the lid or by pressing the power button (though, I had to remap it from shutting down the computer first). I think for the keyboard itself to wake the device, I would have to mark them as a wakeup source somehow |
\cc @qzed since he has a bit more insight on how the system is set up Was trying to transfer this issue over to linux-surface but I can't transfer from personal -> org sadly |
That should only work via the lid at the moment. Since the keyboard is connected via the EC, you could try to enable wakeup via the EC by running Not sure what's going wrong with the xhci device.
As far as I can tell, the AMD variant uses ACPI GPIO interrupt handling for that instead of GPE interrupts (compare SL3 with e.g. SB2 where |
I unfortunately tried a blanket I'm going to experiment over the next few days - I saw one report where suspending/etc on fedora worked fine, so it might be a weird combination kernel+distro stuff, somehow. My userspace stuff is pretty trivial to migrate to a new install so I'll try a fresh arch install (and a fedora install if that fails). The fact that I see nothing listed in /proc/acpi/wakeup is sorta concerning; I feel like I should at least be seeing the lid or power button (since those should just work out of the box?) or /something/ not dependent on the EC. |
The If you want to try something out, you can enable the wakeup on the SSH device (should be To suspend the EC only (without the whole system), you'll need to run the
|
Cool! I watched So, there's definitely wake requests being issued, and the device is definitely marked as wakeup-enabled in sysfs at the time I invoke a system suspend - so I'm not really sure about why the irq just isn't waking the kernel properly. It's been ages since I've had the (mis)fortune of poking at irqs so nothing really jumps out at me currently. There's no obvious issues with the enable_irq_wake setup (though maybe it's worth testing with that force-enabled?). |
I think the enable_irq_wake setup should be correct. At least it wakes my SB2 and I don't see any reason why it shouldn't on the SL3. Have you tried playing around with rtcwake (e.g. |
so, I actually tried to do an rtc timer -> suspend a couple days ago, but the rtc timer isn't wakeup capable so I can't set rtcwake. (I assumed that this was maybe reasonable for a laptop, and neglected to mention that). Resume via power button also doesn't work, either. I think that freeze/mem actually do the same suspend, since I've poked through most of these debugging steps; suspending in textmode && the mainline kernel have the same problems, so it's not like there's something broken being done in the module, there's just..... some arcane piece missing that's preventing the laptop from ever resuming, as far as I can tell. I did notice once or twice that a sleep would fail with cfg80211 or xhci stuff not unloading in time and it would send the system into a broken sleep - but I'm fairly certain that most of the time the suspends are working and it's just failing to resume.
|
Alright, so e: after unloading ath10k_pci & its deps, every level of pm_test succeeds. However, once it's set to |
Resuming by pressing a key on a USB keyboard worked. I'm honestly dumbfounded that this works but not the lid/power button/keyboard (earlier I thought that a usb keyboard was causing suspend issues, but it looks like ath10k was the actual culprit?). So. That's something; I can at least suspend and wake now, so I can grab suspend logs more easily now. |
Oh, I kind of though that pretty much every device should support that. Interesting.
I somehow thought they'd be a bit different but after looking at the docs it seems you are right, sorry.
Yeah, that sounds like two different issues.
Okay, so that's definitely weird. Can you wake the device via the power button if you have the USB keyboard connected (that could then be an issue with some XHCI driver not suspending correctly if nothing is attached, but I kind-of doubt that)? Also the power button does work while powered on, right? I'm just wondering if the power button uses a different driver. The MSHW0040-based power-button driver that is used on the other devices does also use GPIO interrupts (same as the lid), so maybe something isn't working correctly in some GPIO core driver and GPIO interrupts can't wake the devices. That'd at least be a connection. Based on the DSDT it should use |
It doesn't (as far as I can tell only a USB device works (on either the A port or the C port; same keyboard using different cables), but weirdly, as soon as the laptop resumes, it will poweroff, as if I pressed the power button as soon as the device woke - so maybe whatever handles the power button / lid aren't being suspended properly? Dunno what's wrong with the keyboard, though.
Yeah; I haven't configured anything for it so normally pressing it will cause the laptop to just immediately shutdown.
Correct; unloading this prevented the power button from doing anything (and then once I reloaded it and pressed the power button, it shut off immediately as normal. Nothing weird with buffered presses like when it was simply suspended). |
So that all kind-of fits together: Power-button, lid, and EC-wake are all GPIO interrupts. And it kind-of looks like those are not handled properly when the device is suspended. There's probably some (maybe hardware-)logic that keeps track of interrupts (something set-interrupt/clear-interrupt), which would explain the interrupt being detected after resume. So they are detected but don't wake the device. To be honest, I have absolutely no idea how to debug this. But looks like an issue in the GPIO driver stack. |
Hm, that should lie mostly in pinctrl-amd (which iirc - I needed that for SAM to correctly work in my initrd). Possibly it's just something with dependency load order during suspend, but I'll take a poke through that and see what I can come up with. |
I think it's probably a bit more than that, looks like an upstream bug. I'll try to have a peek at it.
Good luck! |
Here's a quick dump from the (or in a massive collapsible here)
So normally, only the following pins are wakeup enabled in s0: 2, 5, 7, 11, 22, 23, 44, 52, 58, 59, 61, 172 I grabbed the gpio states with the lid open & closed and got the following diff: --- /tmp/gpio_lid_open 2020-01-01 15:42:01.157652893 -0800
+++ /tmp/gpio_lid_close 2020-01-01 15:42:14.970985511 -0800
@@ -46,3 +46,3 @@
pin22 Edge trigger| Active on both| interrupt is enabled| interrupt is unmasked| enable wakeup in S0i3 state| enable wakeup in S3 state|
- enable wakeup in S4/S5 state| input is low| 4k pull-up| pull-up is enabled| Pull-down is disabled| output is disabled| 0x14fc60
+ enable wakeup in S4/S5 state| input is high| 4k pull-up| pull-up is enabled| Pull-down is disabled| output is disabled| 0x15fc60
pin23 Edge trigger| Active on both| interrupt is enabled| interrupt is unmasked| enable wakeup in S0i3 state| enable wakeup in S3 state|
@@ -148,5 +148,5 @@
pin74 Edge trigger| Active high| interrupt is enabled| interrupt is masked| disable wakeup in S0i3 state| disable wakeup in S3 state|
- disable wakeup in S4/S5 state| input is low| pull-up is disabled| Pull-down is disabled| output is disabled| 0x40800
+ disable wakeup in S4/S5 state| input is high| pull-up is disabled| Pull-down is disabled| output is disabled| 0x50800
pin75 Edge trigger| Active high| interrupt is enabled| interrupt is masked| disable wakeup in S0i3 state| disable wakeup in S3 state|
- disable wakeup in S4/S5 state| input is low| pull-up is disabled| Pull-down is disabled| output is disabled| 0x40800
+ disable wakeup in S4/S5 state| input is high| pull-up is disabled| Pull-down is disabled| output is disabled| 0x50800
pin76 Edge trigger| Active high| interrupt is enabled| interrupt is masked| disable wakeup in S0i3 state| disable wakeup in S3 state|
@@ -184,3 +184,3 @@
pin92 interrupt is disabled| interrupt is masked| disable wakeup in S0i3 state| disable wakeup in S3 state|
- disable wakeup in S4/S5 state| input is low| pull-up is disabled| Pull-down is disabled| output is disabled| 0x40000
+ disable wakeup in S4/S5 state| input is high| pull-up is disabled| Pull-down is disabled| output is disabled| 0x50000
pin93 interrupt is disabled| interrupt is masked| disable wakeup in S0i3 state| disable wakeup in S3 state| It's not entirely yet clear to me how to easily figure out which pins map to what / how the module (un)masks/enables interrupts on them, but it should be possible to jankily force the lid into working, at least, I think. (iirc they come from the acpi tables?) |
AFAIK the ACPI pin number is the pin offset of the chip plus the pin number local to the chip. But I don't really know more than that. Never looked into that stuff before (pinctrl drivers and IRQ implementations, that is). |
filed an upstream bug with hopefully enough information. As best I can tell, the interrupts/wakeups for gpio pins comes from the acpi tables; I'm not sure why pins 74/75 are masked - my guess is that whatever underlying bug there is might just be causing the interrupts to be masked during suspend so that the irqs never fire. |
I haven't really been able to figure out too much over the past few days, but I did notice a couple things: One curious thing I've noticed recently is that although pinctrl_amd is loaded, I never actually see anything from it in dmesg - i.e. no "amd gpio driver loaded", but the info from the gpio debugfs is definitely what's printed by the pinctrl amd module. Also one other curious line:
So, I'm thinking this may actually just be an acpi bug - something that microsoft works around in their drivers (I poked through the SL3A windows driver/firmware package and they do have a specific GPIO controller driver for it - so maybe rather than fix the acpi tables, they just have a driver override?). In which case, I may just try to patch the acpi tables and load a manual dsdt override to handle this. Really hoping this isn't the case, and I really hope that Microsoft would at least just fix their own device's acpi table instead of doing this, but who knows? Will do some more debugging into the acpi stuff now, I guess, since gpio was more or less a dead end. Also, I did verify that I have the latest firmware for everything (pulled the windows drivers/firmware msi, extracted everything, checked versions), so it's also not something that's been fixed already. |
I'd hope so, too...
Maybe debug output isn't enabled yet when that loads? Tbh. I haven't used the dbg prints often as I'm too lazy to always re-enable that and just use dev_info etc. while testing, so not sure about that.
HPET is High Precison Event Timer, so I'd assume that it should normally have an IRQ. No idea if that's connected, but might also be a symptom of the root cause. Seems at least non-standard. Or some weird edge-case in the standard that only MS uses. If you find a fix via ACPI let me know. I'm sure we can somehow figure out how to transfer that in kernel code. |
Also if no one responds to your bug, you could try and ask on the linux-gpio mailing list directly. According to |
Maybe, though there's also nothing when ignore_loglevel is set, so I'd expect it to print. I don't see a path for it to load, not print that, and not unload. There's definitely something incredibly weird going on with gpio/acpi somewhere in there.
Hopefully it doesn't come down to patching the acpi table. If that is the case, then SL3-A users will have to just add an additional initrd image alongside the amd ucode. Not the worst, but it will be uefi-version specific and it's definitely a pain.
Will do all of these later today; I'll try to consolidate all the info a bit better when I do so. I vaguely know someone on the surface firmware team at msft so I also poked them about this, since having this fixed properly in a firmware update would be great. At the least, they might be able to at least confirm it's a bug, even if it's not fixed ever. |
Right, setting that as boot parameter should normally do the trick. I suspect that it's could be somehow due to the GPIO driver loading at a very early stage, but tbh. I don't know and I'm just guessing.
Yeah, this is not a solution that I'd consider acceptable in the long term. I'm sure we can figure something out on how to bring a fix into the kernel as some kind of quirk once we've got a better idea of why this is happening and how a fix would look like as ACPI patch. |
Some good news: I can patch the acpi table to have it always make a notify () request in the LID0 handler, and the laptop will now always wake immediately upon a suspend! If I understand the SL3-I table correctly, it has an additional function it calls (ULID or something) that has a bunch of logic and then, at the end, makes a notify() call. I just replaced that with a blind notify(). Hopefully it's possible to replicate the logic in the SL3-I table and have a patched acpi table do this, or to otherwise properly mark the relevant GPIOs in a driver. (It's also nice that I can definitely somewhat-easily patch my acpi table and poke at this). The big downside is that, as I understand it, the "correct" solution is usually to just load a patched ACPI table and wait for a firmware update. I sorta doubt we'll see a firmware update fixing this. I'm going to hold off on soliciting more responses to the kernel bug for now, since I think there's two bugs we're looking at: the gpio bug with the chip not initializing properly (or something to that effect), and the firmware bug with a very janky acpi table. The Gpio bug is probably not directly caused by the acpi shenanigans, but I wouldn't write it off yet. I'm going to poke through the surface drivers package on windows and see if there's a patched acpi table I can extract and use; hopefully they just patch the acpi table, since that would be an easier starting point. |
Can you upload a diff? If I understand you correctly that solution will cause a loop in ACPI: Speaking of not-ready: That can actually be an issue. If the GPIO pin triggers and |
Correct on the former - the acpi table is just, frankly, broken on the SL3's November 2019 firmware. I still haven't been able to sufficiently patch the table sufficiently for the wakes to properly be handled. I did just look and see that there was a firmware update put out on Jan 30th. I'll check tonight and see if that fixes the issue, since that would be incredibly ideal. |
No diffs to the acpi tables. The firmware was built on 2019-12-10 (or at least, that's the timestamp on its 'release' in the fwupd package). I don't think Microsoft will fix this at a firmware level in any reasonable amount of time, if ever, and I unfortunately don't have the time right now to disassemble windows drivers to figure out exactly how they're handling this, since my stabs in the dark with patching the acpi tables haven't gone anywhere productive. |
First off, thank you @qzed @PandorasFox for all the work here. This doesn't help much, but I was unable to wake with external KB attached. I'm not sure if there was anything else I needed to do for that to work but. But AMD SL3 on this update https://www.microsoft.com/en-us/download/details.aspx?id=100428 edit; adding screenshot in case page is updated by microsoft |
I didn't try the March 4th firmware since that was after I gave my SL3 to a windows-using friend of mine, but it's odd that you can't wake via an external keyboard. You might want to go try some of the sleep debugging steps I linked in this comment above and see if you can narrow it down, I guess. |
Thanks! I'll take a look when I get some time this coming week hopefully. Everything is working great except suspend. |
It could be that you have to enable keyboard/USB wake-ups (see e.g. https://askubuntu.com/questions/848698/wake-up-from-suspend-using-wireless-usb-keyboard-or-mouse-for-any-linux-distro) as they might be disabled by default. |
@qzed sorry about the delay. I was able to wake with KB attached (almost). panel backlight comes on, usb has power, kb lights up, but it doesn't make it any further, screen stays black (just powered on). tried to switch to a tty but no go. i'd be able to debug but i really have no idea what im doing when it comes to this stuff. if theres anything i can do tho more than happy to help |
I'm curious if there's been progress toward a fix for this -- right now it's the only issue preventing me from having a perfectly-working Linux experience on my SL3 AMD. |
None of us own the AMD model sadly. You might be able to get it working if you mess with the DSDT table and write an override. |
Hi @archseer, sorry to resurrect this thread but I just got an AMD Surface Laptop 3 and would love to see this issue resolved. I don't have a ton of kernel experience, but I have a dump of the the decompiled DSDT with the latest firmware and a diff from it against the DSDT in linux-surface/acpidumps. If someone could point me in the right direction I'd love to help contribute a fix for this! |
I've been testing suspend / wake on my Surface Laptop 3 (AMD). On Arch Linux kernel 5.16.16 I am unable to wake, but it's not 100% clear yet if it's from an issue with the AMDGPU driver or related to the above mentioned issues. I can say that I've successfully gone in and out of sleep using just the lid close/open and via commandline / power button on kernel 5.17. |
I've been running the 5.16 linux-surface kernel for Arch on my machine and have been hibernating to S4 instead of suspending to S3, because AMDGPU fails on S3 with visual corruption on the screen on wakeup. But 5.17 potentially solving this would make sense, I do remember reading that some changes to AMDGPU were added by Valve and AMD for the Steam Deck. Could make sense if those benefit the AMD APU in the Surface Laptops. |
Which surface laptop do you have? On SL3 (AMD) I don't have an option for S3 (just s2idle) |
has this been fixed yet? |
Would love to see a fix for this. It's the only reason I keep windows as a dual boot option on my SL3 AMD. That, and maybe the touchscreen... |
@qzed @PandorasFox i have working amd sl3 i can provide. are either of you in the US? |
@k3mist Thank you for the offer, but unfortunately I live in Germany. |
@KyleGospo im ok with that if @qzed is interested |
@k3mist I could pitch in on shipping as well! |
Alright, I thought a bit about this, but I don't think that (at the moment at least) I'll have the time to properly look into all the issues on this device and do this justice. But thanks to all of you for the offers. Maybe someone else is interested in giving it a try? |
I would love to see linux properly working on this device, unfortunately i dont know anything about kernel debugging/development so the only thing i can offer is testing. Just sending this message to show interest. Thank you all for the hard work you do! |
Not sure when it happened, but this pretty much isn't an issue for me any more. Not sure when it happened but I'm pretty happy about it. All that's left is the touch screen for the SL3 AMD |
Absolutely agree with @TannerNelson16. The Surface Laptop 3 experience has only improved as Valve and AMD have continued to upstream their AMD APU-related code, with stability really taking a step up sometime around the release of kernel 6.2. I don't have touchscreen on my SL3 running Fedora Silverblue, although I'm not complaining. The high GPU memory usage with increased system uptime is a little concerning, with occasional screen tearing and lock up with high uptime. But a restart always resolves it. Sleep/wake is still a problem, but with Gnome configured not to sleep on power, it's not too bad. The lid switch is problematic, although if I recall correctly that is related to the hardware mechanism triggering shutdowns and suspends the kernel isn't aware and not handling. AMD Surface Laptop 3 running Linux 6.7.9-1.surface.fc39.x86_64 on Fedora Linux 39.20240311.0 (Silverblue). |
Generally the same overall, with a couple minor differences:
/usr/lib/firmware/ath10k/QCA6174L/hw(2.1|3.0)/board(-2).bin
. I'll have to check the exact 3-4 files again later.The second bit is the more annoying bit for right now; I'm not really sure what needs doing to fix that.
The text was updated successfully, but these errors were encountered: