Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

applespi: Re-enable SPI on resume #2

Merged
merged 1 commit into from Oct 2, 2016
Merged

Conversation

cgutman
Copy link
Contributor

@cgutman cgutman commented Sep 11, 2016

The firmware calls the SIEN ACPI method on boot to enable the
keyboard and trackpad. However, after resuming from sleep, the SPI
interface is disabled and we must call SIEN(1) to enable it.

Signed-off-by: Cameron Gutman aicommander@gmail.com

@cb22
Copy link
Owner

cb22 commented Sep 12, 2016

Hi Cameron - thanks for the pull request! Quick question, what hardware / configuration have you been testing on? The reason I ask is that I've been struggling a bit with suspend / resume in general.

I'm on the MacBook9,1 and booting off the internal NVMe drive. Suspending goes OK, but the laptop doesn't resume properly. After much digging, it turns out that the _PS0 method for the NVMe drive is returning 1 - see line 7116 of https://bugzilla.kernel.org/attachment.cgi?id=228521 and the kernel informs that the device did not change to D0 from D3. I've been meaning to ask @l1k about this, but unfortunately I haven't had much spare time to work on this recently.

Of course, if you're running from an external USB drive, this doesn't really apply; but perhaps there's a simple cmdline flag I'm missing.

@cgutman
Copy link
Contributor Author

cgutman commented Sep 12, 2016

I've been running from a USB disk, so I haven't hit the NVMe resume issue. I'm on 4.8-rc5 where I did hit a deadlock on suspend in brcmfmac which someone already found and fixed here: https://patchwork.kernel.org/patch/9280681/

I still do see the message complaining about a device not transitioning from D3 to D0 on resume, so I don't think it's been fixed in mainline yet. Since I wasn't using the NVMe drive, I never noticed it dropping off the bus after resume.

Edit: My machine is a MacBook9,1

@l1k
Copy link
Contributor

l1k commented Sep 12, 2016

@cgutman: You're calling SIEN on ->probe, not sure why this is needed, but I could imagine this makes sense e.g. if the driver was not loaded on boot (blacklisted), the machine suspended and the driver subsequently loaded manually.

However on the MBP12,1, if the controller was switched to USB mode by EFI and the USB driver has already loaded, things will break because you're taking the interface away from it. You need to test presence of UIST, if it exists call it and if the result is 1, bail out. And only afterwards call SIEN.

Please store the handle to the SIEN method in struct applespi_data so that the namespace doesn't have to be searched for it on every resume. (See l1k/linux@d65f187 for an example.)

The 50 ms delay is quite long, is this really necessary? The macOS driver doesn't seem to delay at all AFAICS, but maybe I've missed something.

With that addressed,
Reviewed-by: Lukas Wunner <lukas@wunner.de>

@l1k
Copy link
Contributor

l1k commented Sep 12, 2016

@cb22: That's not the _PS0 method of the NVMe drive but of the root port it's attached to. It checks the Data Link Active bit in the PCIe Link Status register and bails out if the bit doesn't revert to 1. What does lspci say before and after resume for the root port and the NVMe drive below it?

lspci -vvvv -s 00:1c.0 # RP01
lspci -vvvv -s 01:00.0 # SSD0, not sure if the bus number is correct

@l1k
Copy link
Contributor

l1k commented Sep 12, 2016

The behaviour of _PS0 changes depending on a byte named NVME in the GNVS region. The byte should be at address 0x8AF9C918 + offset 0xB2 + 42 bytes = 0x8AF9C9F4 (decimal 2331625972). Which value do you get if you type:
dd iflag=skip_bytes,count_bytes skip=2331625972 count=1 if=/dev/mem 2>/dev/null | hexdump

As a shot in the dark, EFI configures the machine in a special BootCamp mode for Windows unless the OS identifies as macOS using a proprietary EFI protocol before ExitBootServices is called. Perhaps the SSD is treated differently on suspend/resume depending on macOS vs. BootCamp mode. You could try one of the methods described at https://github.com/0xbb/gpu-switch#macbook-pro-113-and-115-notes to enable macOS mode and see if this fixes suspend/resume for the SSD.

@cb22
Copy link
Owner

cb22 commented Sep 12, 2016

The behaviour of _PS0 changes depending on a byte named NVME in the GNVS region. The byte should be at address 0x8AF9C918 + offset 0xB2 + 42 bytes = 0x8AF9C9F4 (decimal 2331625972). Which value do you get if you type:

I did that, and got:

[root@localhost ~]#  dd iflag=skip_bytes,count_bytes skip=2331625972 count=1 if=/dev/mem 2>/dev/null | hexdump
0000000 0001                                   
0000001

Which makes sense to me - considering the return 1 branch happens when NVME == One.

As a shot in the dark, EFI configures the machine in a special BootCamp mode for Windows unless the OS identifies as macOS using a proprietary EFI protocol before ExitBootServices is called. Perhaps the SSD is treated differently on suspend/resume depending on macOS vs. BootCamp mode. You could try one of the methods described at https://github.com/0xbb/gpu-switch#macbook-pro-113-and-115-notes to enable macOS mode and see if this fixes suspend/resume for the SSD.

Gave that a shot, unfortunately no difference. I tried both with the EFI application method, and by patching the kernel - just to make sure it was indeed being called.

@cb22: That's not the _PS0 method of the NVMe drive but of the root port it's attached to. It checks the Data Link Active bit in the PCIe Link Status register and bails out if the bit doesn't revert to 1. What does lspci say before and after resume for the root port and the NVMe drive below it?

Ah, yes indeed. Here are the lspcis before suspend:

[root@localhost ~]# lspci -vvvv -s 00:1c.0
00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 256 bytes
        Interrupt: pin A routed to IRQ 16
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: c1700000-c17fffff
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #1, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s <1us, L1 <16us
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x2, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #0, PowerLimit 25.000W; Interlock- NoCompl+
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [90] Subsystem: Intel Corporation Device 7270
        Capabilities: [a0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v0] #00
        Capabilities: [140 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [200 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
        Capabilities: [220 v1] #19
        Kernel driver in use: pcieport
        Kernel modules: shpchp

[root@localhost ~]# lspci -vvvv -s 01:00.0 # SSD0, not sure if the bus number is correct
01:00.0 Mass storage controller: Apple Inc. Device 2003 (rev 11) (prog-if 02)
        Subsystem: Apple Inc. Device 2003
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 256 bytes
        Interrupt: pin A routed to IRQ 255
        Region 0: Memory at c1700000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <32us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <1us, L1 <8us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [148 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [168 v1] #19
        Capabilities: [188 v1] Latency Tolerance Reporting
                Max snoop latency: 3145728ns
                Max no snoop latency: 3145728ns
        Capabilities: [190 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=25us PortTPowerOnTime=10us

and after resume:

[root@localhost ~]# lspci -vvvv -s 00:1c.0
00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 256 bytes
        Interrupt: pin A routed to IRQ 16
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: c1700000-c17fffff
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #1, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <16us
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train+ SlotClk+ DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #0, PowerLimit 25.000W; Interlock- NoCompl+
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
                        Changed: MRL- PresDet- LinkState-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [90] Subsystem: Intel Corporation Device 7270
        Capabilities: [a0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v0] #00
        Capabilities: [140 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [200 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
        Capabilities: [220 v1] #19
        Kernel driver in use: pcieport
        Kernel modules: shpchp

[root@localhost ~]# lspci -vvvv -s 01:00.0 # SSD0, not sure if the bus number is correct
01:00.0 Mass storage controller: Apple Inc. Device 2003 (rev ff) (prog-if ff)
        !!! Unknown header type 7f
```.

@l1k
Copy link
Contributor

l1k commented Sep 12, 2016

This looks like the SSD isn't powered on after resume, PCIe link is down and fails to link-train. I'll have to do some research on this first, stand by.

@l1k
Copy link
Contributor

l1k commented Sep 13, 2016

One odd thing in the lspci output is that the Link Capabilities Register of the Root Port lists support for ASPM L0s after resume but not before. But it seems more likely to me that the SSD wasn't powered up.

The DSDT for the MacBook8,1 contains this for _PS3:

                        Store (0x00, GD38) /* \GD38 */
                        Sleep (0x01)
                        Store (0x00, GD33) /* \GD33 */
                        Sleep (0x0A)

And this for _PS0:

                        Store (0x01, GD33) /* \GD33 */
                        Sleep (0x0A)
                        Store (0x01, GD38) /* \GD38 */
                        Sleep (0x01)

GPIO pin 38 resets the SSD, GPIO pin 33 shuts down several power rails for suspend or brings them up for resume.

On the MacBook9,1, the GPIO pins are apparently no longer mapped into SystemIO space but into SystemMemory space, hence it looks a little different there but likely does the same thing. _PS3:

                            SGDO (0x02070001)
                            SGOV (0x02070001, Zero)
                            Sleep (0x0A)
                            SGDO (0x02040016)
                            SGOV (0x02040016, Zero)
                            Sleep (One)

_PS0:

                        SGDI (0x02040016)
                        Sleep (One)
                        SGDI (0x02070001)
                        Sleep (0x0B)

Notably, if NVME is set to a different value than 1, these GPIO pins are not toggled. You could try this with:

echo -n 2 | dd oflag=seek_bytes seek=2331625972 of=/dev/mem

Perhaps Apple used this for debugging, I don't know. Please do not hold me liable if this damages your hardware or erases your data. (It's probably safe.) Power consumption may of course be higher during sleep if the SSD wasn't powered down.

Are you sure that _PS0 is executed at all after resume? You may want to add some debug messages to drivers/pci/pci-acpi.c, specifically in acpi_pci_power_manageable() and acpi_pci_set_power_state(), to see if the PCI core calls this before trying to put the device in D0.

@cb22
Copy link
Owner

cb22 commented Sep 13, 2016

Notably, if NVME is set to a different value than 1, these GPIO pins are not toggled. You could try this with:

Hmm, so that's strange, toggling that seems to have no effect.

Are you sure that _PS0 is executed at all after resume

I thought it might be something strange in the PCI resume code, so I've also tested by just using acpi_call and calling _PS0 and _PS3 directly. What's weird is that changing the byte seems to have no effect:

[root@localhost acpi]# echo -e -n \\x00 | dd oflag=seek_bytes seek=2331625972 of=/dev/mem
0+1 records in
0+1 records out
1 byte copied, 0.000109894 s, 9.1 kB/s

[root@localhost acpi]#  dd iflag=skip_bytes,count_bytes skip=2331625972 count=1 if=/dev/mem 2>/dev/null | hexdump
0000000 0000                                   
0000001

[root@localhost acpi]# echo \\_SB_.PCI0.RP01._PS3 > call; cat call
0x0called

[root@localhost acpi]# echo \\_SB_.PCI0.RP01._PS0 > call; cat call
0x1called

So _PS0 is still returning 1, which can only happen in the NVME == One branch. (the call also takes a while to run, which I'm presuming is due to the while loop waiting for the LACT status to change.

Also interesting, calling \_SB_.PCI0.RP01._PS0 without calling \_SB_.PCI0.RP01._PS3 returns immediately with 0. But if I do the byte value change, \_SB_.PCI0.RP01._PS0 returns immediately with 1

I'll add some debug stores to the DSDT and see if I can figure out what's going on

@l1k
Copy link
Contributor

l1k commented Sep 14, 2016

Another idea would be to look at the macOS and Windows drivers, perhaps they do something special to resume the SSD. The DSDT sets device properties for the SSD called deep-idle and nvme-LPSR-during-S3-S4. Notably these are only set if NVME is set to 1, i.e. if the GPIO pins for reset and powerdown are toggled.

If you egrep -i '(deep-idle|nvme-LPSR-during-S3-S4)' /System/Library/Extensions, do you find anything? I can't find matches on my macOS installation but I have a different machine and macOS release.

The firmware calls the SIEN ACPI method on boot to enable the
keyboard and trackpad. However, after resuming from sleep, the SPI
interface is disabled and we must call SIEN(1) to enable it.

Signed-off-by: Cameron Gutman <aicommander@gmail.com>
@cgutman
Copy link
Contributor Author

cgutman commented Sep 16, 2016

@l1k

You're calling SIEN on ->probe, not sure why this is needed, but I could imagine this makes sense e.g. if the driver was not loaded on boot (blacklisted), the machine suspended and the driver subsequently loaded manually.

Yep, exactly why I added it there.

However on the MBP12,1, if the controller was switched to USB mode by EFI and the USB driver has already loaded, things will break because you're taking the interface away from it. You need to test presence of UIST, if it exists call it and if the result is 1, bail out. And only afterwards call SIEN.

Good catch, thanks. I've changed it to just return -ENODEV in probe() if the USB interface is enabled. I'm not sure of the real pros and cons (maybe power consumption?) of driving it via USB vs. SPI but for now, the USB HID implementation is more feature complete.

Please store the handle to the SIEN method in struct applespi_data so that the namespace doesn't have to be searched for it on every resume. (See l1k/linux@d65f187 for an example.)

Done.

The 50 ms delay is quite long, is this really necessary? The macOS driver doesn't seem to delay at all AFAICS, but maybe I've missed something.

I did find the delay was necessary. I added a call to SIST to skip the SIEN call and the delay if SPI is already enabled. What I saw without it (or a smaller delay), the trackpad would not reliably be put into multitouch mode successfully via applespi_init(). Keyboard and mouse clicks would still work, but pointer movement would not. This would happen randomly coming out of sleep or stressing the probe path with "rmmod applespi && insmod ./applespi.ko" over and over (before I added the SIST check). By my rough count, I'd say it failed 10-20% of the time the system resumed or the module loaded without the delay in place.

I didn't dig too deeply, but maybe the response from the device in applespi_init() may tell us that we need to retry or something.

@cgutman cgutman mentioned this pull request Sep 16, 2016
@l1k
Copy link
Contributor

l1k commented Sep 17, 2016

Thanks @cgutman, it's good to see this driver move forward and gain features. Who knows, maybe the upcoming MacBook Pros and Airs will move to SPI as well, it's important to be prepared and have this in as much a mainline-able form as possible.

Reviewed the changes again, the only thing that I spotted is that it might make sense to disable the SPI interface in the ->suspend hook. That way it would be off if the user invokes suspend-to-idle (which powers down devices but doesn't ask the platform to go to S3), but it would only be of value of it actually saves power. On the other hand, maybe it's necessary to keep the SPI interface enabled to be able to wake up from suspend-to-idle. So I'm not sure about it. Another thing to keep in mind is that UNIVERSAL_DEV_PM_OPS also wires up the runtime PM hooks. So if e.g. the user closes the lid but otherwise leaves the machine running, we could turn off the keyboard in ->runtime_suspend. Again, that only makes sense if it saves power. It's just something that crossed my mind while reading your code.

@cb22
Copy link
Owner

cb22 commented Sep 20, 2016

I didn't dig too deeply, but maybe the response from the device in applespi_init() may tell us that we need to retry or something.

I'm sure it's something like that; the macOS driver does other things like CRC checking and validating the responses. I did record the correct response to the modeswitch command, so that can be used to make the init process much more robust. There are also a ton of other commands that I haven't fully decoded that do things like get the version, name, etc from the touchpad.

I'm happy to merge this in as-is, and we can revisit the msleep once the init code checks for success and retries on failure.

So if e.g. the user closes the lid but otherwise leaves the machine running, we could turn off the keyboard in ->runtime_suspend. Again, that only makes sense if it saves power

Perhaps we should try and profile it; I know that the touchpad still responds to clicks when suspended normally in macOS, so I think even with the SPI interface disabled the touchpad is still running - at least in some form.

@cb22
Copy link
Owner

cb22 commented Sep 20, 2016

Another idea would be to look at the macOS and Windows drivers, perhaps they do something special to resume the SSD. The DSDT sets device properties for the SSD called deep-idle and nvme-LPSR-during-S3-S4. Notably these are only set if NVME is set to 1, i.e. if the GPIO pins for reset and powerdown are toggled.

I've tried replacing the _PS3 and _PS0 methods for RP01 with a single line that just returns 0x02; calling this with acpi_call yields the correct value, indicating that the modified DSDT is being used, but on suspend / resume the exact same behavior as before seems to happen. Is the modified DSDT table perhaps ignored when suspending / resuming?

If you egrep -i '(deep-idle|nvme-LPSR-during-S3-S4)' /System/Library/Extensions, do you find anything? I can't find matches on my macOS installation but I have a different machine and macOS release.

There are indeed matches. I've been reversing the macOS driver, and so far from what I can see it basically just shuts down the NVMe side of things before calling the _PS3 method. This led me to try unload the driver - and this is actually broken. Doing a simple rmmod nvme (with no suspend / resume involved) takes a while and leads to a message about shutdown failing. Perhaps the NVMe drive can only be woken up if it has been shutdown properly?

@cgutman
Copy link
Contributor Author

cgutman commented Sep 21, 2016

I'm fine merging it as is here.

As an aside, we should probably come up with some better way of communicating to work through these issues with the new MacBooks. I don't care whether it's via some mailing list, (slightly) abusing this repo's bug tracker, or creating a separate repo to file issues against. PRs can be a bit difficult to find after the fact, especially if you don't know what you're looking for.

@l1k
Copy link
Contributor

l1k commented Sep 21, 2016

I've tried replacing the _PS3 and _PS0 methods for RP01 with a single line that just returns 0x02; calling this with acpi_call yields the correct value, indicating that the modified DSDT is being used, but on suspend / resume the exact same behavior as before seems to happen. Is the modified DSDT table perhaps ignored when suspending / resuming?

Don't think so, but you'd have to ask the experts on linux-acpi@ to be sure.

When _PS3 or _PS0 is called from acpi_dev_pm_explicit_set(), the buffer for the return value is set to NULL, i.e., it's ignored. So it's irrelevant for the kernel which value is returned from _PS0. This is in accordance with the ACPI spec which says that _PS0's return value is "None".

The problem is that _PS0 does not or cannot complete all steps because it returns early. And it does so because the PCIe link to the SSD doesn't come up. Either the SSD isn't powered up at all or it's somehow stuck or not initialized.

There are indeed matches. I've been reversing the macOS driver, and so far from what I can see it basically just shuts down the NVMe side of things before calling the _PS3 method. This led me to try unload the driver - and this is actually broken. Doing a simple rmmod nvme (with no suspend / resume involved) takes a while and leads to a message about shutdown failing. Perhaps the NVMe drive can only be woken up if it has been shutdown properly?

Does the SSD driver call _PS3 directly, or via IOPCIDevice::setPCIPowerState() or AppleACPIPlatformExpert::setDevicePowerState()?

On Linux, the device hierarchy is walked bottom-up during the suspend phase and top-down during the resume phase. If _PS0 and _PS3 is present, the PCI core should automatically regard the device as platform_pci_power_manageable() and ask ACPI to invoke them via platform_pci_set_power_state(). I'd expect macOS to work mostly the same but perhaps it doesn't and we have an ordering problem somewhere.

With Thunderbolt, we did have an issue with the Cactus Ridge (2012) generation of controllers which have to be transitioned into a suspend state ("Go2Sx") by calling a number of custom ACPI methods in a specific order, this is in drivers/pci/quirks.c:quirk_apple_poweroff_thunderbolt(). If this is omitted, the controller apparently is gone after resume. This was specific to only one generation of these (Intel-designed) controllers, but it shows that such issues do exist. The SSD seems to be a custom design by Apple dubbed S1-X.

@cb22
Copy link
Owner

cb22 commented Oct 2, 2016

I'm fine merging it as is here.

Great, merging.

As an aside, we should probably come up with some better way of communicating to work through these issues with the new MacBooks. I don't care whether it's via some mailing list, (slightly) abusing this repo's bug tracker, or creating a separate repo to file issues against. PRs can be a bit difficult to find after the fact, especially if you don't know what you're looking for.

I agree, I've been (ab)using these PRs a bit to discuss matters and ask questions that really have nothing to do with the trackpad / keyboard. Perhaps there's already a suitable mailing list? Otherwise I'm open to suggestions.

@cb22
Copy link
Owner

cb22 commented Oct 2, 2016

When _PS3 or _PS0 is called from acpi_dev_pm_explicit_set(), the buffer for the return value is set to NULL, i.e., it's ignored. So it's irrelevant for the kernel which value is returned from _PS0. This is in accordance with the ACPI spec which says that _PS0's return value is "None".

Oh, sure. I was just returning a value to verify manually (via acpi_call) that my custom DSDT was being called and not the original one.

Does the SSD driver call _PS3 directly, or via IOPCIDevice::setPCIPowerState() or AppleACPIPlatformExpert::setDevicePowerState()?

I believe it is called directly. I'm quite new when it comes to disassembly, so I could be mistaken.

With Thunderbolt, we did have an issue with the Cactus Ridge (2012) generation of controllers which have to be transitioned into a suspend state ("Go2Sx") by calling a number of custom ACPI methods in a specific order, this is in drivers/pci/quirks.c:quirk_apple_poweroff_thunderbolt(). If this is omitted, the controller apparently is gone after resume. This was specific to only one generation of these (Intel-designed) controllers, but it shows that such issues do exist. The SSD seems to be a custom design by Apple dubbed S1-X.

One thing I did notice, after skimming through the NVMe spec and toying with the NVMe driver, was that attempting to shutdown the controller by removing queues and setting the appropriate register (which is what the Linux driver does on suspend / removal of device; page 182 of the spec) actually leads to the controller indicating a fatal status condition (page 217 / page 45). Interestingly, the controller won't work after this when trying to reload the driver (ie, just doing a simple rmmod nvme && modprobe nvme fails.)

@cb22 cb22 merged commit 6641d58 into cb22:master Oct 2, 2016
@l1k
Copy link
Contributor

l1k commented Oct 5, 2016

@cb22: Interesting find, thanks! The SSD is supposed to be reset by the nvme_resume() hook through nvme_reset_work(), but that doesn't work because the PCI device isn't present, it's as if the drive wasn't powered up. Your theory that something is wrong with the shutdown procedure is quite plausible. The proper mailing list to discuss this would be linux-nvme@lists.infradead.org, please cc: me as I'd be interested to learn what's going wrong there. However if this issue is caused by an Apple-specific quirk, the folks on that list probably won't be able to help and the only solution I see is to disassemble the macOS or Windows driver and see what it's doing when the system is going to sleep. Unfortunately there's no NVMe driver on my older OS X release, otherwise I'd try to find that out myself. If you don't know which driver is used, either run kextstat or alternatively ioreg -l (search for SSD0, then look at the CFBundleIdentifier of the nodes below it).

@cb22
Copy link
Owner

cb22 commented Oct 9, 2016

@l1k I think I've found something worth mentioning; it seems like the macOS driver executes a vendor specific NVMe admin command with the opcode 0xCC in the helpfully titled BuildCommandPrepareForShutdown (I managed to figure this out by looking at a known NVMe command such as identify, opcode 0x06, called in BuildCommandIdentify). This in turn is called from HandlePrepareForLPSR.

According to the NVMe spec, the 0xC0 - 0xFF range is reserved for vendor specific commands. (see page 54 of the spec (v1.0 spec as that's what the controller reported in ioreg -l))

The chain goes something like this: ShutdownNVMe disables the submission and completion queues, then calls SendShutdownNotification. SendShutdownNotification I haven't managed to figure out yet (think there's some vtable magic going on). It then polls the CSTS status and returns it. After that, ToggleLPSRGPIO is called (which calls the _PS3 or _PS0 method)

My guess is whatever SendShutdownNotification does, it ends up calling HandlePrepareForLPSR. I'm going to try and see if calling this opcode before toggling the LPSR GPIO will have any effect.

@l1k
Copy link
Contributor

l1k commented Oct 10, 2016

Nice job, looks like you're fairly close, @cb22.

Most of the kernel extensions are written in C++, or rather the subset of C++ supported by the runtime in the xnu kernel. The way to decipher these vtable calls is as follows: The SendShutdownNotification method is part of some class. You need to look at the constructor of that class, it will contain a call to OSObject::OSObject() and then a load instruction containing the vtable address of the class. Now you know where the vtable for these objects is. The actual vtable starts at offset 0x10 from that address. When an object method is called, you add the offset given in the call to the vtable address + 0x10.

@kelnos
Copy link

kelnos commented Aug 27, 2018

Sorry to add noise and resurrect this thread, but I'm wondering if anyone is still working on overall suspend/resume (I assume the NVMe issue is still the blocker)? I have a 2016 MacBookPro13,1, and it would be great if suspend would work. I'd be happy to pitch in if it's possible to help on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants