bbswitch is broken with kernel 4.8 pcie port power management #140

Open
nathanielwarner opened this Issue Oct 9, 2016 · 99 comments

Comments

Projects
None yet
@nathanielwarner

I just upgraded to kernel 4.8, and bbswitch 0.8-1 no longer works properly. When I try to run something with primusrun, it fails with "bumblebee could not enable discrete graphics card" or something, and I get this in dmesg:

bbswitch: enabling discrete graphics
pci 0000:01:00.0: Refused to change power state, currently in D3
pci 0000:01:00.0: Refused to change power state, currently in D3

When I use the kernel command line option pcie_port_pm=off primusrun works again, and I get this in dmesg upon using primusrun:

bbswitch: enabling discrete graphics
nvidia-nvlink: Nvlink Core is being initialized, major device number 242
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  370.28  Thu Sep  1 19:45:04 PDT 2016
vgaarb: this pci device is not a vga device
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  370.28  Thu Sep  1 19:18:48 PDT 2016
nvidia-modeset: Allocated GPU:0 (GPU-33c835cf-d564-600a-037b-c7ecb9188d7c) @ PCI:0000:01:00.0
nvidia-modeset: Freed GPU:0 (GPU-33c835cf-d564-600a-037b-c7ecb9188d7c) @ PCI:0000:01:00.0
vgaarb: this pci device is not a vga device
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
nvidia-modeset: Unloading
nvidia-nvlink: Unregistered the Nvlink Core, major device number 242
bbswitch: disabling discrete graphics
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
pci 0000:01:00.0: Refused to change power state, currently in D0

Is lack of support for Kernel 4.8 default configuration an issue that anyone else is having? I'm running Manjaro with Kernel 4.8.1-1, Nvidia driver 370.28, bbswitch 0.8-1.

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Oct 10, 2016

Member

bbswitch has indeed not been updated for the new PM method in kernel 4.8. If you have a newer machine (>= 2015), you might experience issues if you enabled runtime PM for devices.

Do you happen to have udev rules or other "laptop mode tools" that enable power saving features (i.e. by writing auto to the power/control node in sysfs)? It is my current belief that your problem cannot occur unless you enable such power saving methods,

As a workaround you can boot with the pcie_port_pm=off kernel option (or disable runtime PM for the NVIDIA PCI device or its parent PCIe port).

Member

Lekensteyn commented Oct 10, 2016

bbswitch has indeed not been updated for the new PM method in kernel 4.8. If you have a newer machine (>= 2015), you might experience issues if you enabled runtime PM for devices.

Do you happen to have udev rules or other "laptop mode tools" that enable power saving features (i.e. by writing auto to the power/control node in sysfs)? It is my current belief that your problem cannot occur unless you enable such power saving methods,

As a workaround you can boot with the pcie_port_pm=off kernel option (or disable runtime PM for the NVIDIA PCI device or its parent PCIe port).

@nathanielwarner

This comment has been minimized.

Show comment
Hide comment
@nathanielwarner

nathanielwarner Oct 10, 2016

I am using TLP and Powertop, but bbswitch still doesn't work with those disabled. The strange thing is that bbswitch seems to think the NVIDIA card is stuck in D0 power state on startup, but then is unable to start it upon invocation of primusrun, and reports that the card is stuck in D3.
On startup, I get these messages:

[    8.164115] bbswitch: version 0.8
[    8.164123] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[    8.164132] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.PEG0.PEGP
[    8.164148] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[    8.164326] bbswitch: detected an Optimus _DSM function
[    8.164338] bbswitch: device 0000:01:00.0 is in use by driver 'nvidia', refusing OFF
[    8.164341] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on
[    8.164941] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[    8.183647] nvidia-modeset: Unloading
[    8.200285] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[    8.200287] Bluetooth: BNEP filters: protocol multicast
[    8.200293] Bluetooth: BNEP socket layer initialized
[    8.200384] nvidia-nvlink: Unregistered the Nvlink Core, major device number 242
[    8.221637] bbswitch: disabling discrete graphics
[    8.221655] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[    8.236787] pci 0000:01:00.0: Refused to change power state, currently in D0

And on invocation of primusrun, I get this:

[  225.420138] bbswitch: enabling discrete graphics
[  225.496646] pci 0000:01:00.0: Refused to change power state, currently in D3
[  225.573232] pci 0000:01:00.0: Refused to change power state, currently in D3

Is it possible that the kernel-based port power management is able to control the power state, but bbswitch is not? This would make sense because on pre-4.8 kernel versions there is no kernel-based PCIe port power management, and the card seems to always be stuck in D0 (see dmesg output in my first post), and bbswitch has no problems this way. It is when the card is successfully put into D3 that bbswitch is unable to use it.

I am using TLP and Powertop, but bbswitch still doesn't work with those disabled. The strange thing is that bbswitch seems to think the NVIDIA card is stuck in D0 power state on startup, but then is unable to start it upon invocation of primusrun, and reports that the card is stuck in D3.
On startup, I get these messages:

[    8.164115] bbswitch: version 0.8
[    8.164123] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[    8.164132] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.PEG0.PEGP
[    8.164148] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[    8.164326] bbswitch: detected an Optimus _DSM function
[    8.164338] bbswitch: device 0000:01:00.0 is in use by driver 'nvidia', refusing OFF
[    8.164341] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on
[    8.164941] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[    8.183647] nvidia-modeset: Unloading
[    8.200285] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[    8.200287] Bluetooth: BNEP filters: protocol multicast
[    8.200293] Bluetooth: BNEP socket layer initialized
[    8.200384] nvidia-nvlink: Unregistered the Nvlink Core, major device number 242
[    8.221637] bbswitch: disabling discrete graphics
[    8.221655] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[    8.236787] pci 0000:01:00.0: Refused to change power state, currently in D0

And on invocation of primusrun, I get this:

[  225.420138] bbswitch: enabling discrete graphics
[  225.496646] pci 0000:01:00.0: Refused to change power state, currently in D3
[  225.573232] pci 0000:01:00.0: Refused to change power state, currently in D3

Is it possible that the kernel-based port power management is able to control the power state, but bbswitch is not? This would make sense because on pre-4.8 kernel versions there is no kernel-based PCIe port power management, and the card seems to always be stuck in D0 (see dmesg output in my first post), and bbswitch has no problems this way. It is when the card is successfully put into D3 that bbswitch is unable to use it.

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Oct 10, 2016

Member

Have you rebooted after making disabling TLP? The PCIe port mgmt introduced with 4.8 cannot be combined with bbswitch in one boot, that case is not supported (this may or may not work, no guarantees).

Have you tried the kernel option which I mentioned above? What is your laptop and GPU btw?

If you see the messages "bbswitch: enabling discrete graphics" followed by "Refused to change power state, currently in D3" (or similarly, "disabling" and "currently in D0"), then it is an indication that something went wrong...

Member

Lekensteyn commented Oct 10, 2016

Have you rebooted after making disabling TLP? The PCIe port mgmt introduced with 4.8 cannot be combined with bbswitch in one boot, that case is not supported (this may or may not work, no guarantees).

Have you tried the kernel option which I mentioned above? What is your laptop and GPU btw?

If you see the messages "bbswitch: enabling discrete graphics" followed by "Refused to change power state, currently in D3" (or similarly, "disabling" and "currently in D0"), then it is an indication that something went wrong...

@nathanielwarner

This comment has been minimized.

Show comment
Hide comment
@nathanielwarner

nathanielwarner Oct 10, 2016

Yes I rebooted after disabling tlp and powertop. As I said I above, when I boot with the kernel option you mention, primusrun works again, but bbswitch reports that it cannot change the gpu power state out of D0 when primusrun stops running. But that happened with earlier kernels as well. And yes, something is clearly going wrong. But maybe this has actually been a problem all along, but is only now showing itself because the kernel is putting the dGPU out of D0.
My laptop is Dell XPS 15 9550, gpu is Intel HD 530 + GeForce GTX 960M.

Yes I rebooted after disabling tlp and powertop. As I said I above, when I boot with the kernel option you mention, primusrun works again, but bbswitch reports that it cannot change the gpu power state out of D0 when primusrun stops running. But that happened with earlier kernels as well. And yes, something is clearly going wrong. But maybe this has actually been a problem all along, but is only now showing itself because the kernel is putting the dGPU out of D0.
My laptop is Dell XPS 15 9550, gpu is Intel HD 530 + GeForce GTX 960M.

@rockorequin

This comment has been minimized.

Show comment
Hide comment
@rockorequin

rockorequin Oct 12, 2016

Odd, I have exactly the same laptop and I'm using tlp and powertop, but I don't get this problem until after a suspend/resume cycle. Or is this issue fixed in bbswitch 0.8.4ubuntu1?

Odd, I have exactly the same laptop and I'm using tlp and powertop, but I don't get this problem until after a suspend/resume cycle. Or is this issue fixed in bbswitch 0.8.4ubuntu1?

@nathanielwarner

This comment has been minimized.

Show comment
Hide comment
@nathanielwarner

nathanielwarner Oct 12, 2016

You're sure you have the exact same model, and that you're running Kernel 4.8? It's possible you have a different BIOS than me (If you don't know, Dell has been rapidly pushing out BIOS updates to try to fix an alarming number of issues. Many of the updates have made things worse, so I'm currently on an older BIOS.) It's also possible that the issue is fixed in bbswitch 0.8.4ubuntu1. Maybe I'll try Ubuntu and see if that works better.

You're sure you have the exact same model, and that you're running Kernel 4.8? It's possible you have a different BIOS than me (If you don't know, Dell has been rapidly pushing out BIOS updates to try to fix an alarming number of issues. Many of the updates have made things worse, so I'm currently on an older BIOS.) It's also possible that the issue is fixed in bbswitch 0.8.4ubuntu1. Maybe I'll try Ubuntu and see if that works better.

@rockorequin

This comment has been minimized.

Show comment
Hide comment
@rockorequin

rockorequin Oct 12, 2016

Yes, it's the same model, with the same GPUs. It has the 4K screen and I'm running the 1.2.0 BIOS (I tried 1.2.14, but it has a screen flickering problem which makes it unusable.) It's possible I disabled tlp and powertop power management and forgot, of course.

Yes, it's the same model, with the same GPUs. It has the 4K screen and I'm running the 1.2.0 BIOS (I tried 1.2.14, but it has a screen flickering problem which makes it unusable.) It's possible I disabled tlp and powertop power management and forgot, of course.

@nathanielwarner

This comment has been minimized.

Show comment
Hide comment
@nathanielwarner

nathanielwarner Oct 12, 2016

I'm in the exact same situation as you, on 1.2.0 (Seriously, Dell needs to get their act together!)
Since disabling tlp and powertop didn't solve it for me, my only guesses are that the version of bbswitch you have is newer than mine, or that you are running a different kernel version (pre 4.8).

I'm in the exact same situation as you, on 1.2.0 (Seriously, Dell needs to get their act together!)
Since disabling tlp and powertop didn't solve it for me, my only guesses are that the version of bbswitch you have is newer than mine, or that you are running a different kernel version (pre 4.8).

@rockorequin

This comment has been minimized.

Show comment
Hide comment
@rockorequin

rockorequin Oct 12, 2016

I'm running the mainline 4.8 kernel also (with the patch from https://bugs.freedesktop.org/show_bug.cgi?id=97596 to avoid a weird flickering artefact that occurs on Skylake architecture with 4.8 if you have a second monitor attached).

I'm running the mainline 4.8 kernel also (with the patch from https://bugs.freedesktop.org/show_bug.cgi?id=97596 to avoid a weird flickering artefact that occurs on Skylake architecture with 4.8 if you have a second monitor attached).

@nathanielwarner

This comment has been minimized.

Show comment
Hide comment
@nathanielwarner

nathanielwarner Oct 12, 2016

Ok, I'll probably try Ubuntu with the mainline kernel at some point to see if that fixes the issue. Until then, should this issue be closed?

Ok, I'll probably try Ubuntu with the mainline kernel at some point to see if that fixes the issue. Until then, should this issue be closed?

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Oct 17, 2016

Member

@rockorequin Perhaps you are using nouveau instead of bbswitch? Personally I am back to nouveau since my new laptop requires it for an external monitor.

Member

Lekensteyn commented Oct 17, 2016

@rockorequin Perhaps you are using nouveau instead of bbswitch? Personally I am back to nouveau since my new laptop requires it for an external monitor.

@nathanielwarner

This comment has been minimized.

Show comment
Hide comment
@nathanielwarner

nathanielwarner Oct 18, 2016

I actually did try it with Ubuntu 16.10 with Kernel 4.8, and it is fixed. There must be something internal to Manjaro that is screwing it up.

I actually did try it with Ubuntu 16.10 with Kernel 4.8, and it is fixed. There must be something internal to Manjaro that is screwing it up.

@ArchangeGabriel

This comment has been minimized.

Show comment
Hide comment
@ArchangeGabriel

ArchangeGabriel Oct 21, 2016

Member

I’m reopening this, because even if it seems to work (i.e. it reports OFF), the power consumption and temperature correspond to the case of a ON card on my setup. Adding pcie_port_pm=off to the kernel parameters solves it.

When using nouveau, temperature and power consumption also correspond to a ON card.

Member

ArchangeGabriel commented Oct 21, 2016

I’m reopening this, because even if it seems to work (i.e. it reports OFF), the power consumption and temperature correspond to the case of a ON card on my setup. Adding pcie_port_pm=off to the kernel parameters solves it.

When using nouveau, temperature and power consumption also correspond to a ON card.

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Oct 21, 2016

Member

The result of combining the DSM method (as used by bbswitch) with the new power resources method (as used since Linux 4.8 and nouveau) in a single boot is not known (I would call it undefined behavior). Forcing pcie_port_pm=off basically reverts to the DSM method.

How do you observe that the video card is off with nouveau? You have to check your dmesg for the last messages related to nouveau. If you see "DRM: resuming kernel object tree" with no "suspending console" as follow up, then you know something is keeping the device busy.

Member

Lekensteyn commented Oct 21, 2016

The result of combining the DSM method (as used by bbswitch) with the new power resources method (as used since Linux 4.8 and nouveau) in a single boot is not known (I would call it undefined behavior). Forcing pcie_port_pm=off basically reverts to the DSM method.

How do you observe that the video card is off with nouveau? You have to check your dmesg for the last messages related to nouveau. If you see "DRM: resuming kernel object tree" with no "suspending console" as follow up, then you know something is keeping the device busy.

@ArchangeGabriel

This comment has been minimized.

Show comment
Hide comment
@ArchangeGabriel

ArchangeGabriel Oct 22, 2016

Member

I do get those lines at the end:

kernel: nouveau 0000:01:00.0: DRM: suspending console...
kernel: nouveau 0000:01:00.0: DRM: suspending display...
kernel: nouveau 0000:01:00.0: DRM: evicting buffers...
kernel: nouveau 0000:01:00.0: DRM: waiting for kernel channels to go idle...
kernel: nouveau 0000:01:00.0: DRM: suspending client object trees...
kernel: nouveau 0000:01:00.0: DRM: suspending kernel object tree...

That being said, I probably need to do some more investigations (power consumption with bbswitch vs nouveau vs nothing, and all those with or without pcie_port_pm=off) to properly determine what seems to work and what not, and then start reporting bug against kernel/nouveau.

Member

ArchangeGabriel commented Oct 22, 2016

I do get those lines at the end:

kernel: nouveau 0000:01:00.0: DRM: suspending console...
kernel: nouveau 0000:01:00.0: DRM: suspending display...
kernel: nouveau 0000:01:00.0: DRM: evicting buffers...
kernel: nouveau 0000:01:00.0: DRM: waiting for kernel channels to go idle...
kernel: nouveau 0000:01:00.0: DRM: suspending client object trees...
kernel: nouveau 0000:01:00.0: DRM: suspending kernel object tree...

That being said, I probably need to do some more investigations (power consumption with bbswitch vs nouveau vs nothing, and all those with or without pcie_port_pm=off) to properly determine what seems to work and what not, and then start reporting bug against kernel/nouveau.

@ArchangeGabriel

This comment has been minimized.

Show comment
Hide comment
@ArchangeGabriel

ArchangeGabriel Oct 22, 2016

Member

Also, @Lekensteyn, grabbed this at some point, if I remember correctly it was while running a boot without pcie_port_pm=off on my newer machine and trying to echo OFF to bbswitch after seeing temperature increase:

[13827.423220] bbswitch: disabling discrete graphics
[13827.423230] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[13827.424013] ------------[ cut here ]------------
[13827.424017] WARNING: CPU: 3 PID: 2343 at drivers/pci/pci.c:1616 pci_disable_device+0xa8/0xd0
[13827.424018] pci 0000:01:00.0: disabling already-disabled device
[13827.424019] Modules linked in:
[13827.424019]  bbswitch(O) mousedev snd_hda_codec_conexant snd_hda_codec_generic hid_generic arc4 msr iTCO_wdt i2c_designware_platform hp_wmi iTCO_vendor_support i2c_designware_core mxm_wmi joydev sparse_keymap nls_iso8859_1 $
[13827.424049]  evdev hp_wireless ac mac_hid tpm_tis acpi_pad tpm_tis_core tpm sch_fq_codel ip_tables x_tables btrfs xor raid6_pq algif_skcipher af_alg dm_crypt dm_mod rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_p$
[13827.424077] CPU: 3 PID: 2343 Comm: tee Tainted: G        W  O    4.8.2-1-ARCH #1
[13827.424078] Hardware name: HP HP ZBook Studio G3/80D4, BIOS N82 Ver. 01.07 04/27/2016
[13827.424079]  0000000000000286 00000000be2b784c ffff8808181bbcf8 ffffffff812fe280
[13827.424081]  ffff8808181bbd48 0000000000000000 ffff8808181bbd38 ffffffff8107c85b
[13827.424083]  0000065000000000 ffff88089b30c000 ffff88089b2fefa0 00007ffd831c7e40
[13827.424086] Call Trace:
[13827.424090]  [<ffffffff812fe280>] dump_stack+0x63/0x83
[13827.424092]  [<ffffffff8107c85b>] __warn+0xcb/0xf0
[13827.424093]  [<ffffffff8107c8df>] warn_slowpath_fmt+0x5f/0x80
[13827.424094]  [<ffffffff8134bf6b>] ? __pci_set_master+0x3b/0xf0
[13827.424096]  [<ffffffff8134ee98>] pci_disable_device+0xa8/0xd0
[13827.424098]  [<ffffffffa06a548d>] bbswitch_off+0xad/0x240 [bbswitch]
[13827.424100]  [<ffffffffa06a5870>] bbswitch_proc_write+0xb0/0xc7 [bbswitch]
[13827.424102]  [<ffffffff81276f82>] proc_reg_write+0x42/0x70
[13827.424104]  [<ffffffff812087b7>] __vfs_write+0x37/0x140
[13827.424107]  [<ffffffff810c7b87>] ? percpu_down_read+0x17/0x50
[13827.424108]  [<ffffffff81209586>] vfs_write+0xb6/0x1a0
[13827.424109]  [<ffffffff8120aa05>] SyS_write+0x55/0xc0
[13827.424111]  [<ffffffff815f7cf2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[13827.424112] ---[ end trace 4f6318674a3d9756 ]---

Will try to reproduce, but I think this is likely caused by bbswitch/pcie_port_pm interaction on 4.8 for newer systems.

Member

ArchangeGabriel commented Oct 22, 2016

Also, @Lekensteyn, grabbed this at some point, if I remember correctly it was while running a boot without pcie_port_pm=off on my newer machine and trying to echo OFF to bbswitch after seeing temperature increase:

[13827.423220] bbswitch: disabling discrete graphics
[13827.423230] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[13827.424013] ------------[ cut here ]------------
[13827.424017] WARNING: CPU: 3 PID: 2343 at drivers/pci/pci.c:1616 pci_disable_device+0xa8/0xd0
[13827.424018] pci 0000:01:00.0: disabling already-disabled device
[13827.424019] Modules linked in:
[13827.424019]  bbswitch(O) mousedev snd_hda_codec_conexant snd_hda_codec_generic hid_generic arc4 msr iTCO_wdt i2c_designware_platform hp_wmi iTCO_vendor_support i2c_designware_core mxm_wmi joydev sparse_keymap nls_iso8859_1 $
[13827.424049]  evdev hp_wireless ac mac_hid tpm_tis acpi_pad tpm_tis_core tpm sch_fq_codel ip_tables x_tables btrfs xor raid6_pq algif_skcipher af_alg dm_crypt dm_mod rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_p$
[13827.424077] CPU: 3 PID: 2343 Comm: tee Tainted: G        W  O    4.8.2-1-ARCH #1
[13827.424078] Hardware name: HP HP ZBook Studio G3/80D4, BIOS N82 Ver. 01.07 04/27/2016
[13827.424079]  0000000000000286 00000000be2b784c ffff8808181bbcf8 ffffffff812fe280
[13827.424081]  ffff8808181bbd48 0000000000000000 ffff8808181bbd38 ffffffff8107c85b
[13827.424083]  0000065000000000 ffff88089b30c000 ffff88089b2fefa0 00007ffd831c7e40
[13827.424086] Call Trace:
[13827.424090]  [<ffffffff812fe280>] dump_stack+0x63/0x83
[13827.424092]  [<ffffffff8107c85b>] __warn+0xcb/0xf0
[13827.424093]  [<ffffffff8107c8df>] warn_slowpath_fmt+0x5f/0x80
[13827.424094]  [<ffffffff8134bf6b>] ? __pci_set_master+0x3b/0xf0
[13827.424096]  [<ffffffff8134ee98>] pci_disable_device+0xa8/0xd0
[13827.424098]  [<ffffffffa06a548d>] bbswitch_off+0xad/0x240 [bbswitch]
[13827.424100]  [<ffffffffa06a5870>] bbswitch_proc_write+0xb0/0xc7 [bbswitch]
[13827.424102]  [<ffffffff81276f82>] proc_reg_write+0x42/0x70
[13827.424104]  [<ffffffff812087b7>] __vfs_write+0x37/0x140
[13827.424107]  [<ffffffff810c7b87>] ? percpu_down_read+0x17/0x50
[13827.424108]  [<ffffffff81209586>] vfs_write+0xb6/0x1a0
[13827.424109]  [<ffffffff8120aa05>] SyS_write+0x55/0xc0
[13827.424111]  [<ffffffff815f7cf2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[13827.424112] ---[ end trace 4f6318674a3d9756 ]---

Will try to reproduce, but I think this is likely caused by bbswitch/pcie_port_pm interaction on 4.8 for newer systems.

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Oct 22, 2016

Member

For your last issue, if you have some udev rule enabling runtime PM for devices (e.g. "laptop mode tools") then indeed it will upset bbswitch on the new behavior (4.8 without pcie_port_pm=off on newer laptops).

Member

Lekensteyn commented Oct 22, 2016

For your last issue, if you have some udev rule enabling runtime PM for devices (e.g. "laptop mode tools") then indeed it will upset bbswitch on the new behavior (4.8 without pcie_port_pm=off on newer laptops).

@nathanielwarner

This comment has been minimized.

Show comment
Hide comment
@nathanielwarner

nathanielwarner Oct 22, 2016

I should point out that if you're using nouveau (rather than nvidia), you actually don't need bumblebee or bbswitch- you can just use DRI_PRIME=1 before the app you want to run with the discrete gpu. See https://wiki.archlinux.org/index.php/PRIME

I should point out that if you're using nouveau (rather than nvidia), you actually don't need bumblebee or bbswitch- you can just use DRI_PRIME=1 before the app you want to run with the discrete gpu. See https://wiki.archlinux.org/index.php/PRIME

@ArchangeGabriel

This comment has been minimized.

Show comment
Hide comment
@ArchangeGabriel

ArchangeGabriel Oct 22, 2016

Member

@nathanielwarner If you’re telling that to me, I assure you that I know. ;) But that’s not really related to the current issue.

@Lekensteyn OK, I’ve got tlp installed (and running) on the same system, I’ll also try with or without it to see what it gives. So that’s one more factor to try. Should have time tomorrow to look at all that. :)

On a side note, do you still intend to update bbswitch for supporting this new method any time soon? Maybe we should release Bumblebee 4.0 without waiting much further and add a release note about bbswitch state (interaction with 4.8 pcie_port_pm, open/known issues). ;)

Member

ArchangeGabriel commented Oct 22, 2016

@nathanielwarner If you’re telling that to me, I assure you that I know. ;) But that’s not really related to the current issue.

@Lekensteyn OK, I’ve got tlp installed (and running) on the same system, I’ll also try with or without it to see what it gives. So that’s one more factor to try. Should have time tomorrow to look at all that. :)

On a side note, do you still intend to update bbswitch for supporting this new method any time soon? Maybe we should release Bumblebee 4.0 without waiting much further and add a release note about bbswitch state (interaction with 4.8 pcie_port_pm, open/known issues). ;)

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Oct 22, 2016

Member

intend to update bbswitch for supporting this new method
yes

any time soon?
no (time constraints). nouveau seems to work so I have not really propritized it here.

I was hoping to get this fixed before Bumblebee 4, but it seems things are really stalling, so maybe it is better to release it since it at least improves the nvidia driver situation. Release note with known issues should be ok :)

Member

Lekensteyn commented Oct 22, 2016

intend to update bbswitch for supporting this new method
yes

any time soon?
no (time constraints). nouveau seems to work so I have not really propritized it here.

I was hoping to get this fixed before Bumblebee 4, but it seems things are really stalling, so maybe it is better to release it since it at least improves the nvidia driver situation. Release note with known issues should be ok :)

@bluca

This comment has been minimized.

Show comment
Hide comment
@bluca

bluca Oct 22, 2016

Member

+1 for a new release, debian 9 deadlines are approaching fast :-)

Member

bluca commented Oct 22, 2016

+1 for a new release, debian 9 deadlines are approaching fast :-)

@ArchangeGabriel

This comment has been minimized.

Show comment
Hide comment
@ArchangeGabriel

ArchangeGabriel Oct 23, 2016

Member

OK, I’ll go through all open issues soon (help appreciated) and will try to release by the end of the week. Stay tuned. If there is any need for discussion, Bumblebee-Project/Bumblebee#319 is the place to go now. ;)

Member

ArchangeGabriel commented Oct 23, 2016

OK, I’ll go through all open issues soon (help appreciated) and will try to release by the end of the week. Stay tuned. If there is any need for discussion, Bumblebee-Project/Bumblebee#319 is the place to go now. ;)

@GreatBigWhiteWorld

This comment has been minimized.

Show comment
Hide comment
@GreatBigWhiteWorld

GreatBigWhiteWorld Oct 31, 2016

I see that in your bumblebee 4 issue, you are delaying its release for another few weeks. So I need to get this to work even temporarily.

If I understand right, I need to add "pcie_port_pm=off" in the grub configuration as kernel parameter, and the drawback is that I am constantly running on nvidia card right?

Thanks in advance.

I see that in your bumblebee 4 issue, you are delaying its release for another few weeks. So I need to get this to work even temporarily.

If I understand right, I need to add "pcie_port_pm=off" in the grub configuration as kernel parameter, and the drawback is that I am constantly running on nvidia card right?

Thanks in advance.

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Oct 31, 2016

Member

@GreatBigWhiteWorld pcie_port_pm=off is a workaround that allows you to use bbswitch with kernel 4.8 and newer. If you use older kernels, you do not need that option.

If you use nouveau (and not bbswitch nor the nvidia proprietary driver), then you do not have to do anything.

Member

Lekensteyn commented Oct 31, 2016

@GreatBigWhiteWorld pcie_port_pm=off is a workaround that allows you to use bbswitch with kernel 4.8 and newer. If you use older kernels, you do not need that option.

If you use nouveau (and not bbswitch nor the nvidia proprietary driver), then you do not have to do anything.

@GreatBigWhiteWorld

This comment has been minimized.

Show comment
Hide comment
@GreatBigWhiteWorld

GreatBigWhiteWorld Nov 1, 2016

Thanks. Yes I am running 4.8 kernel and bbswitch is always off at the moment. I guess I need this option.

Thanks. Yes I am running 4.8 kernel and bbswitch is always off at the moment. I guess I need this option.

@ademcal

This comment has been minimized.

Show comment
Hide comment
@ademcal

ademcal Nov 13, 2016

I have same like that problem and the problem solved with this parameter pcie_port_pm=off Beside Laptop is Dell 7559 and OpenSUSE-Thumbleweed

Dmesg output is down

[    7.982013] ------------[ cut here ]------------
[    7.982017] WARNING: CPU: 6 PID: 1550 at ../drivers/pci/pci.c:1616 pci_disable_device+0xa1/0xd0
[    7.982018] pci 0000:02:00.0: disabling already-disabled device
[    7.982019] Modules linked in:
[    7.982019]  af_packet nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit bnep nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_generic videodev usbhid btusb btrtl snd_hda_codec_hdmi dell_led arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec hid_multitouch kvm_intel snd_hda_core kvm snd_hwdep irqbypass iwlmvm snd_pcm iTCO_wdt crct10dif_pclmul iTCO_vendor_support crc32_pclmul mac80211 snd_seq crc32c_intel ghash_clmulni_intel i2c_designware_platform snd_seq_device snd_timer i2c_designware_core aesni_intel idma64 virt_dma iwlwifi dell_wmi aes_x86_64 sparse_keymap lrw dell_smbios glue_helper dcdbas dell_smm_hwmon ablk_helper cryptd rtsx_pci_ms hci_uart
[    7.982039]  snd ip6t_REJECT nf_reject_ipv6 btbcm memstick pcspkr i2c_i801 mei_me cfg80211 i2c_smbus mei intel_lpss_pci int3403_thermal btqca xt_tcpudp soundcore joydev btintel nf_conntrack_ipv6 battery pinctrl_sunrisepoint bluetooth nf_defrag_ipv6 ac pinctrl_intel intel_lpss_acpi intel_lpss fan processor_thermal_device int3402_thermal int340x_thermal_zone dell_rbtn shpchp int3400_thermal intel_soc_dts_iosf acpi_als acpi_thermal_rel kfifo_buf tpm_tis fjes thermal tpm_tis_core industrialio rfkill ip6table_raw acpi_pad tpm ipt_REJECT nf_reject_ipv4 iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables rtsx_pci_sdmmc mmc_core mxm_wmi i915 serio_raw xhci_pci
[    7.982058]  rtsx_pci mfd_core xhci_hcd i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt usbcore fb_sys_fops usb_common drm wmi video i2c_hid button coretemp msr sg bbswitch(O) efivarfs [last unloaded: nvidia]
[    7.982067] CPU: 6 PID: 1550 Comm: bumblebeed Tainted: P     U     O    4.8.6-2-default #1
[    7.982067] Hardware name: Dell Inc. Inspiron 7559/0H0CC0, BIOS 1.2.0 09/22/2016
[    7.982068]  0000000000000000 ffffffffb03a4272 ffff99f4daab3da8 0000000000000000
[    7.982070]  ffffffffb007de2e ffff99f501704000 ffff99f4daab3df8 ffff99f4daab3f28
[    7.982072]  00000000017d7270 0000000000000000 0000000000000028 ffffffffb007de9f
[    7.982074] Call Trace:
[    7.982082]  [<ffffffffb002eefe>] dump_trace+0x5e/0x310
[    7.982085]  [<ffffffffb002f2cb>] show_stack_log_lvl+0x11b/0x1a0
[    7.982087]  [<ffffffffb0030001>] show_stack+0x21/0x40
[    7.982090]  [<ffffffffb03a4272>] dump_stack+0x5c/0x7a
[    7.982093]  [<ffffffffb007de2e>] __warn+0xbe/0xe0
[    7.982096]  [<ffffffffb007de9f>] warn_slowpath_fmt+0x4f/0x60
[    7.982098]  [<ffffffffb03eb551>] pci_disable_device+0xa1/0xd0
[    7.982101]  [<ffffffffc036e409>] bbswitch_off+0x89/0x230 [bbswitch]
[    7.982104]  [<ffffffffc036e7c3>] bbswitch_proc_write+0x93/0xaa [bbswitch]
[    7.982108]  [<ffffffffb02854dd>] proc_reg_write+0x3d/0x60
[    7.982111]  [<ffffffffb02187c3>] __vfs_write+0x23/0x140
[    7.982114]  [<ffffffffb0219080>] vfs_write+0xb0/0x190
[    7.982115]  [<ffffffffb021a302>] SyS_write+0x42/0x90
[    7.982118]  [<ffffffffb06d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[    7.983563] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xa8

[    7.983564] Leftover inexact backtrace:

[    7.983566] ---[ end trace 8e83878053cc2799 ]---

ademcal commented Nov 13, 2016

I have same like that problem and the problem solved with this parameter pcie_port_pm=off Beside Laptop is Dell 7559 and OpenSUSE-Thumbleweed

Dmesg output is down

[    7.982013] ------------[ cut here ]------------
[    7.982017] WARNING: CPU: 6 PID: 1550 at ../drivers/pci/pci.c:1616 pci_disable_device+0xa1/0xd0
[    7.982018] pci 0000:02:00.0: disabling already-disabled device
[    7.982019] Modules linked in:
[    7.982019]  af_packet nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit bnep nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_generic videodev usbhid btusb btrtl snd_hda_codec_hdmi dell_led arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec hid_multitouch kvm_intel snd_hda_core kvm snd_hwdep irqbypass iwlmvm snd_pcm iTCO_wdt crct10dif_pclmul iTCO_vendor_support crc32_pclmul mac80211 snd_seq crc32c_intel ghash_clmulni_intel i2c_designware_platform snd_seq_device snd_timer i2c_designware_core aesni_intel idma64 virt_dma iwlwifi dell_wmi aes_x86_64 sparse_keymap lrw dell_smbios glue_helper dcdbas dell_smm_hwmon ablk_helper cryptd rtsx_pci_ms hci_uart
[    7.982039]  snd ip6t_REJECT nf_reject_ipv6 btbcm memstick pcspkr i2c_i801 mei_me cfg80211 i2c_smbus mei intel_lpss_pci int3403_thermal btqca xt_tcpudp soundcore joydev btintel nf_conntrack_ipv6 battery pinctrl_sunrisepoint bluetooth nf_defrag_ipv6 ac pinctrl_intel intel_lpss_acpi intel_lpss fan processor_thermal_device int3402_thermal int340x_thermal_zone dell_rbtn shpchp int3400_thermal intel_soc_dts_iosf acpi_als acpi_thermal_rel kfifo_buf tpm_tis fjes thermal tpm_tis_core industrialio rfkill ip6table_raw acpi_pad tpm ipt_REJECT nf_reject_ipv4 iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables rtsx_pci_sdmmc mmc_core mxm_wmi i915 serio_raw xhci_pci
[    7.982058]  rtsx_pci mfd_core xhci_hcd i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt usbcore fb_sys_fops usb_common drm wmi video i2c_hid button coretemp msr sg bbswitch(O) efivarfs [last unloaded: nvidia]
[    7.982067] CPU: 6 PID: 1550 Comm: bumblebeed Tainted: P     U     O    4.8.6-2-default #1
[    7.982067] Hardware name: Dell Inc. Inspiron 7559/0H0CC0, BIOS 1.2.0 09/22/2016
[    7.982068]  0000000000000000 ffffffffb03a4272 ffff99f4daab3da8 0000000000000000
[    7.982070]  ffffffffb007de2e ffff99f501704000 ffff99f4daab3df8 ffff99f4daab3f28
[    7.982072]  00000000017d7270 0000000000000000 0000000000000028 ffffffffb007de9f
[    7.982074] Call Trace:
[    7.982082]  [<ffffffffb002eefe>] dump_trace+0x5e/0x310
[    7.982085]  [<ffffffffb002f2cb>] show_stack_log_lvl+0x11b/0x1a0
[    7.982087]  [<ffffffffb0030001>] show_stack+0x21/0x40
[    7.982090]  [<ffffffffb03a4272>] dump_stack+0x5c/0x7a
[    7.982093]  [<ffffffffb007de2e>] __warn+0xbe/0xe0
[    7.982096]  [<ffffffffb007de9f>] warn_slowpath_fmt+0x4f/0x60
[    7.982098]  [<ffffffffb03eb551>] pci_disable_device+0xa1/0xd0
[    7.982101]  [<ffffffffc036e409>] bbswitch_off+0x89/0x230 [bbswitch]
[    7.982104]  [<ffffffffc036e7c3>] bbswitch_proc_write+0x93/0xaa [bbswitch]
[    7.982108]  [<ffffffffb02854dd>] proc_reg_write+0x3d/0x60
[    7.982111]  [<ffffffffb02187c3>] __vfs_write+0x23/0x140
[    7.982114]  [<ffffffffb0219080>] vfs_write+0xb0/0x190
[    7.982115]  [<ffffffffb021a302>] SyS_write+0x42/0x90
[    7.982118]  [<ffffffffb06d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[    7.983563] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xa8

[    7.983564] Leftover inexact backtrace:

[    7.983566] ---[ end trace 8e83878053cc2799 ]---
@ssbb

This comment has been minimized.

Show comment
Hide comment
@ssbb

ssbb Nov 13, 2016

I have Dell XPS 15 9550 with 960M too. cat /proc/acpii/bbswitch tell me that GPU if off but my laptop is noisy all the time. I think it happens only with 4.8 kernel since this had not been before.

I am added pcie_port_pm=off as kernel paramter but looks like it does not help:

[  193.771954] bbswitch: enabling discrete graphics
[  199.161884] bbswitch: disabling discrete graphics
[  199.161893] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  262.993580] bbswitch: enabling discrete graphics
[  263.317141] nvidia: module license 'NVIDIA' taints kernel.
[  263.317143] Disabling lock debugging due to kernel taint
[  263.324303] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[  263.324323] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.10  Fri Oct 14 10:30:06 PDT 2016 (using threaded interrupts)
[  263.899187] vgaarb: this pci device is not a vga device
[  263.907317] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907458] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907543] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907620] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907696] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907811] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907888] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.937265] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  264.134610] vgaarb: this pci device is not a vga device
[  264.417771] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  375.10  Fri Oct 14 10:05:55 PDT 2016
[  267.564436] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  267.570652] nvidia-modeset: Unloading
[  267.583848] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241
[  267.611886] bbswitch: disabling discrete graphics
[  267.611895] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  267.627364] pci 0000:01:00.0: Refused to change power state, currently in D0

ssbb commented Nov 13, 2016

I have Dell XPS 15 9550 with 960M too. cat /proc/acpii/bbswitch tell me that GPU if off but my laptop is noisy all the time. I think it happens only with 4.8 kernel since this had not been before.

I am added pcie_port_pm=off as kernel paramter but looks like it does not help:

[  193.771954] bbswitch: enabling discrete graphics
[  199.161884] bbswitch: disabling discrete graphics
[  199.161893] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  262.993580] bbswitch: enabling discrete graphics
[  263.317141] nvidia: module license 'NVIDIA' taints kernel.
[  263.317143] Disabling lock debugging due to kernel taint
[  263.324303] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[  263.324323] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.10  Fri Oct 14 10:30:06 PDT 2016 (using threaded interrupts)
[  263.899187] vgaarb: this pci device is not a vga device
[  263.907317] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907458] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907543] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907620] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907696] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907811] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907888] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.937265] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  264.134610] vgaarb: this pci device is not a vga device
[  264.417771] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  375.10  Fri Oct 14 10:05:55 PDT 2016
[  267.564436] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  267.570652] nvidia-modeset: Unloading
[  267.583848] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241
[  267.611886] bbswitch: disabling discrete graphics
[  267.611895] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  267.627364] pci 0000:01:00.0: Refused to change power state, currently in D0
@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Nov 13, 2016

Member

@ssbb Are you sure that your fan issue is new with 4.8? Do you actually need the nvidia GPU? If not, remove pcie_port_pm=off and use nouveau instead.

Member

Lekensteyn commented Nov 13, 2016

@ssbb Are you sure that your fan issue is new with 4.8? Do you actually need the nvidia GPU? If not, remove pcie_port_pm=off and use nouveau instead.

@ssbb

This comment has been minimized.

Show comment
Hide comment
@ssbb

ssbb Nov 13, 2016

@Lekensteyn really not sure. Just reinstalled the system and this happens. I did not have this issue with old system on 4.7 kernel.

I am using nvidia for gaming and thats why I am on bbswitch with bumblebee :)

UPD: about fans - left bottom corner of my laptop is pretty hot. nvidia chip is located here and that's why I am found this issue at all.

ssbb commented Nov 13, 2016

@Lekensteyn really not sure. Just reinstalled the system and this happens. I did not have this issue with old system on 4.7 kernel.

I am using nvidia for gaming and thats why I am on bbswitch with bumblebee :)

UPD: about fans - left bottom corner of my laptop is pretty hot. nvidia chip is located here and that's why I am found this issue at all.

@ademcal

This comment has been minimized.

Show comment
Hide comment
@ademcal

ademcal Nov 14, 2016

I relaized faster fan problem. I never fan problem with kernel 4.7 I am tkinking same like @ssbb

ademcal commented Nov 14, 2016

I relaized faster fan problem. I never fan problem with kernel 4.7 I am tkinking same like @ssbb

@DistantThunder

This comment has been minimized.

Show comment
Hide comment
@DistantThunder

DistantThunder Jan 29, 2017

Arch Linux (ZEN kernel)
Kernel: Linux 4.9.6-1-zen x86_64 GNU/Linux
bbswitch: 0.8-61
bumblebee: 3.2.12
primus: 20151110-6
NVIDIA 375.26 for GTX 960m dGPU on MSI PX60 6QE laptop (Core i7-6700HQ).

Using the "bbswitch-dkms" package, I confirm bbswitch is working again, seemingly thanks to changes made in Kernel 4.9.x mainline.

My laptop power led has a built-in indicator allowing me to see easily if the NVIDIA dGPU is powered on. Up until now with bbswitch disabled on kernel 4.8.x, it was always powered on post-boot.
Now, I can see it being powered off right after the boot session and staying so until I run something with optirun.

As soon as the dGPU-ran program exits, the led changes state and I can see the dGPU has been powered off.

Kudos to the kernel guys and thanks to the bumblebee project!

Arch Linux (ZEN kernel)
Kernel: Linux 4.9.6-1-zen x86_64 GNU/Linux
bbswitch: 0.8-61
bumblebee: 3.2.12
primus: 20151110-6
NVIDIA 375.26 for GTX 960m dGPU on MSI PX60 6QE laptop (Core i7-6700HQ).

Using the "bbswitch-dkms" package, I confirm bbswitch is working again, seemingly thanks to changes made in Kernel 4.9.x mainline.

My laptop power led has a built-in indicator allowing me to see easily if the NVIDIA dGPU is powered on. Up until now with bbswitch disabled on kernel 4.8.x, it was always powered on post-boot.
Now, I can see it being powered off right after the boot session and staying so until I run something with optirun.

As soon as the dGPU-ran program exits, the led changes state and I can see the dGPU has been powered off.

Kudos to the kernel guys and thanks to the bumblebee project!

@tomdee

This comment has been minimized.

Show comment
Hide comment
@tomdee

tomdee Jan 31, 2017

I'm still seeing it not working on Arch - I have the same versions of all the deps you list above. I'm running on a Precision 5510 (Quadro m1000m).

optirun glxspheres64
[  662.547297] [ERROR]Cannot access secondary GPU - error: Could not enable discrete graphics card

[  662.547387] [ERROR]Aborting because fallback start is disabled.

and

[  662.571882] bbswitch: enabling discrete graphics
[  662.571926] pci 0000:01:00.0: Refused to change power state, currently in D3

tomdee commented Jan 31, 2017

I'm still seeing it not working on Arch - I have the same versions of all the deps you list above. I'm running on a Precision 5510 (Quadro m1000m).

optirun glxspheres64
[  662.547297] [ERROR]Cannot access secondary GPU - error: Could not enable discrete graphics card

[  662.547387] [ERROR]Aborting because fallback start is disabled.

and

[  662.571882] bbswitch: enabling discrete graphics
[  662.571926] pci 0000:01:00.0: Refused to change power state, currently in D3
@atsuya

This comment has been minimized.

Show comment
Hide comment
@atsuya

atsuya Feb 3, 2017

The same here on Mi Notebook Air 13.3.

atsuya commented Feb 3, 2017

The same here on Mi Notebook Air 13.3.

@lrafa

This comment has been minimized.

Show comment
Hide comment
@lrafa

lrafa Sep 13, 2017

@qdel
Not really, the important file is bbswitcher.cpp

And you are turning it on/off by writing "ON" or "OFF" into /proc/acpi/bbswitch, which doesn't work here.

Thanks for sharing anyway! (and yeah, I'm also a fan of Qt :P)

lrafa commented Sep 13, 2017

@qdel
Not really, the important file is bbswitcher.cpp

And you are turning it on/off by writing "ON" or "OFF" into /proc/acpi/bbswitch, which doesn't work here.

Thanks for sharing anyway! (and yeah, I'm also a fan of Qt :P)

@qdel

This comment has been minimized.

Show comment
Hide comment
@qdel

qdel Sep 13, 2017

@lrafa
In my computer, the problem is the 'speed' of the process. If i run primusrun glxgears, press escape key to quit glxgears, 100% chance that i meet:
pci 0000:01:00.0: Refused to change power state, currently in D0

It was also the case for my suspend problem. Using pm handler everything was too fast. And i lost the card. Using my scripts / programs, more slow => working. Until now.

qdel commented Sep 13, 2017

@lrafa
In my computer, the problem is the 'speed' of the process. If i run primusrun glxgears, press escape key to quit glxgears, 100% chance that i meet:
pci 0000:01:00.0: Refused to change power state, currently in D0

It was also the case for my suspend problem. Using pm handler everything was too fast. And i lost the card. Using my scripts / programs, more slow => working. Until now.

@Fincer

This comment has been minimized.

Show comment
Hide comment
@Fincer

Fincer Sep 30, 2017

pcie_port_pm=off is not working for me. Kernel 4.13. Arch Linux x86_64.

P.S. Some Power Management related commits for kernel version 4.8 Linux 4.8 - ACPI, EFI, cpufreq, thermal, Power Management

Fincer commented Sep 30, 2017

pcie_port_pm=off is not working for me. Kernel 4.13. Arch Linux x86_64.

P.S. Some Power Management related commits for kernel version 4.8 Linux 4.8 - ACPI, EFI, cpufreq, thermal, Power Management

@senepa

This comment has been minimized.

Show comment
Hide comment
@senepa

senepa Oct 11, 2017

tee /proc/acpi/bbswitch <<<ON with or without pci_port_pm=off is not working for me either. Kernel 4.13. Fedora Linux x86_64

senepa commented Oct 11, 2017

tee /proc/acpi/bbswitch <<<ON with or without pci_port_pm=off is not working for me either. Kernel 4.13. Fedora Linux x86_64

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Oct 11, 2017

Member

@Fincer @senepa laptop model is relevant. Have you tried using nouveau instead of bbswitch? (you won't be able to use Bumblebee then, but at least you'll save power and have working external monitors.)

Member

Lekensteyn commented Oct 11, 2017

@Fincer @senepa laptop model is relevant. Have you tried using nouveau instead of bbswitch? (you won't be able to use Bumblebee then, but at least you'll save power and have working external monitors.)

@zx2c4

This comment has been minimized.

Show comment
Hide comment
@zx2c4

zx2c4 Oct 11, 2017

@Lekensteyn any plans for updating bbswitch?

zx2c4 commented Oct 11, 2017

@Lekensteyn any plans for updating bbswitch?

@archenroot

This comment has been minimized.

Show comment
Hide comment
@archenroot

archenroot Oct 11, 2017

@qdel - I also do best stuff while programming with bottle of vodka 💃

@qdel - I also do best stuff while programming with bottle of vodka 💃

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Oct 11, 2017

Member

@zx2c4 Probably not until at least the end of this year. Progress has also stalled since I was trying to solve Bumblebee-Project/Bumblebee#764 at the same time (without luck). Does nouveau not work for you?

Member

Lekensteyn commented Oct 11, 2017

@zx2c4 Probably not until at least the end of this year. Progress has also stalled since I was trying to solve Bumblebee-Project/Bumblebee#764 at the same time (without luck). Does nouveau not work for you?

@eddynetweb

This comment has been minimized.

Show comment
Hide comment
@eddynetweb

eddynetweb Dec 30, 2017

Having the exact same issue as @qdel - the graphics card won't start after entering a suspended state.

Having the exact same issue as @qdel - the graphics card won't start after entering a suspended state.

@TungstenOxide

This comment has been minimized.

Show comment
Hide comment
@TungstenOxide

TungstenOxide Jan 31, 2018

pcie_port_pm=off does not fix it for me either. I always freeze before I see the login screen. It gets to Started User Manager for UID 42 and that's it. Fedora 27 with Kernel 4.14.11
XPS 15 9560
Core i7-7700HQ GTX 1050

TungstenOxide commented Jan 31, 2018

pcie_port_pm=off does not fix it for me either. I always freeze before I see the login screen. It gets to Started User Manager for UID 42 and that's it. Fedora 27 with Kernel 4.14.11
XPS 15 9560
Core i7-7700HQ GTX 1050

@chenxiaolong

This comment has been minimized.

Show comment
Hide comment
@chenxiaolong

chenxiaolong Jan 31, 2018

@TungstenOxide For the XPS 9560, you'll need to boot with modprobe.blacklist=nouveau. The nouveau driver causes the system to hang.

For bbswitch to work, you'll also need a kernel that supports CONFIG_ACPI_REV_OVERRIDE_POSSIBLE and boot with acpi_rev_override=5. Fedora's kernel is not complied with this option so you'll need a custom kernel. I've made a custom kernel for Fedora 27 here: https://copr.fedorainfracloud.org/coprs/chenxiaolong/kernel-acpi-rev-override/

@TungstenOxide For the XPS 9560, you'll need to boot with modprobe.blacklist=nouveau. The nouveau driver causes the system to hang.

For bbswitch to work, you'll also need a kernel that supports CONFIG_ACPI_REV_OVERRIDE_POSSIBLE and boot with acpi_rev_override=5. Fedora's kernel is not complied with this option so you'll need a custom kernel. I've made a custom kernel for Fedora 27 here: https://copr.fedorainfracloud.org/coprs/chenxiaolong/kernel-acpi-rev-override/

@TungstenOxide

This comment has been minimized.

Show comment
Hide comment
@TungstenOxide

TungstenOxide Jan 31, 2018

@chenxiaolong I am booting with nouveau blacklisted and are you sure that Fedora doesn't have that support?

@chenxiaolong I am booting with nouveau blacklisted and are you sure that Fedora doesn't have that support?

@chenxiaolong

This comment has been minimized.

Show comment
Hide comment
@chenxiaolong

chenxiaolong Jan 31, 2018

How are you blacklisting nouveau? The rd.blacklist option doesn't seem to work properly anymore.

How are you blacklisting nouveau? The rd.blacklist option doesn't seem to work properly anymore.

@TungstenOxide

This comment has been minimized.

Show comment
Hide comment
@TungstenOxide

TungstenOxide Jan 31, 2018

I'm pretty sure that I'm using modprobe

I'm pretty sure that I'm using modprobe

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Feb 1, 2018

Member

@TungstenOxide Do you have the latest BIOS version (1.6 or 1.7 IIRC)? The rev workaround did not work with certain BIOS versions (1.5?) for the XPS 9560.

Member

Lekensteyn commented Feb 1, 2018

@TungstenOxide Do you have the latest BIOS version (1.6 or 1.7 IIRC)? The rev workaround did not work with certain BIOS versions (1.5?) for the XPS 9560.

@TungstenOxide

This comment has been minimized.

Show comment
Hide comment
@TungstenOxide

TungstenOxide Feb 1, 2018

@TungstenOxide

This comment has been minimized.

Show comment
Hide comment
@TungstenOxide

TungstenOxide Feb 3, 2018

@Lekensteyn No good. I'm on 1.6.2 already.

@Lekensteyn No good. I'm on 1.6.2 already.

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Feb 7, 2018

Member

@joshu256 Looks like you solved your problem already in Bumblebee-Project/Bumblebee#946, nvidia-smi triggers loading the nvidia module.

Member

Lekensteyn commented Feb 7, 2018

@joshu256 Looks like you solved your problem already in Bumblebee-Project/Bumblebee#946, nvidia-smi triggers loading the nvidia module.

@joshu256

This comment has been minimized.

Show comment
Hide comment
@joshu256

joshu256 Feb 7, 2018

@Lekensteyn Yeah sorry for wasting your time, I'll delete the comment

joshu256 commented Feb 7, 2018

@Lekensteyn Yeah sorry for wasting your time, I'll delete the comment

@real-or-random

This comment has been minimized.

Show comment
Hide comment
@real-or-random

real-or-random Jun 3, 2018

I've been using pcie_port_pm=off for a long time and it worked. Since I've upgraded from 4.16.12 to 4.16.13, my tray icon indicates that the NVIDIA card is enabled after boot, which should not be the case. There's nothing in the bumblebee logs but when I try to restart bumblebee I get

[  490.555643] bbswitch: disabling discrete graphics
[  490.556229] ------------[ cut here ]------------
[  490.556234] pci 0000:02:00.0: disabling already-disabled device
[  490.556266] WARNING: CPU: 2 PID: 539 at drivers/pci/pci.c:1646 pci_disable_device+0x8a/0xa0
[  490.556268] Modules linked in: uinput cmac rfcomm fuse snd_hda_codec_hdmi snd_hda_codec_realtek bnep snd_hda_codec_generic qcserial usb_wwan xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack cdc_mbim cdc_wdm usbserial cdc_ncm btusb usbnet mii btrtl btbcm btintel bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev ecdh_generic media ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter joydev mousedev bbswitch(O) snd_soc_skl
[  490.556400]  arc4 snd_soc_skl_ipc i915 snd_hda_ext_core snd_soc_sst_dsp snd_soc_sst_ipc snd_soc_acpi snd_soc_core iTCO_wdt iTCO_vendor_support mei_wdt snd_compress ac97_bus snd_pcm_dmaengine iwlmvm wmi_bmof intel_rapl x86_pkg_temp_thermal intel_powerclamp intel_wmi_thunderbolt mac80211 kvm_intel i2c_algo_bit snd_hda_intel nls_iso8859_1 nls_cp437 vfat drm_kms_helper snd_hda_codec fat iwlwifi kvm snd_hda_core drm e1000e snd_hwdep irqbypass cfg80211 intel_cstate snd_pcm intel_uncore intel_rapl_perf i2c_i801 psmouse ptp intel_gtt pps_core input_leds pcspkr thinkpad_acpi snd_timer agpgart syscopyarea sysfillrect sysimgblt ucsi_acpi fb_sys_fops mei_me typec_ucsi nvram rfkill mei intel_pch_thermal typec wmi snd shpchp soundcore led_class i2c_hid evdev ac battery hid rtc_cmos mac_hid vboxnetflt(O) vboxnetadp(O)
[  490.556541]  vboxpci(O) vboxdrv(O) coretemp msr overlay sg crypto_user acpi_call(O) ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg sd_mod uas usb_storage scsi_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc serio_raw atkbd libps2 aesni_intel xhci_pci aes_x86_64 crypto_simd glue_helper xhci_hcd cryptd usbcore usb_common i8042 serio dm_mod
[  490.556613] CPU: 2 PID: 539 Comm: bumblebeed Tainted: G           O     4.16.13-1-ARCH #1
[  490.556616] Hardware name: LENOVO 20H9001EGE/20H9001EGE, BIOS N1VET37W (1.27 ) 11/16/2017
[  490.556622] RIP: 0010:pci_disable_device+0x8a/0xa0
[  490.556625] RSP: 0018:ffffb0fc01f53dd0 EFLAGS: 00010286
[  490.556630] RAX: 0000000000000000 RBX: ffff8f1d2caf3000 RCX: 0000000000000001
[  490.556633] RDX: 0000000080000001 RSI: 0000000000000092 RDI: 00000000ffffffff
[  490.556636] RBP: ffff8f1d2ca7d720 R08: 000001557f96467b R09: 00000000000007e5
[  490.556639] R10: ffffffff905dc720 R11: 0000000000000000 R12: 0000561c612d5ac0
[  490.556642] R13: ffffb0fc01f53f00 R14: 0000561c612d5ac0 R15: 0000000000000000
[  490.556646] FS:  00007f3a858a1040(0000) GS:ffff8f1d3f500000(0000) knlGS:0000000000000000
[  490.556649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  490.556652] CR2: 00007fc10c98b490 CR3: 000000045d946006 CR4: 00000000003606e0
[  490.556655] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  490.556658] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  490.556660] Call Trace:
[  490.556675]  bbswitch_off.cold.4+0xc1/0x1d1 [bbswitch]
[  490.556683]  ? bbswitch_proc_write+0xaf/0xd0 [bbswitch]
[  490.556689]  ? proc_reg_write+0x3c/0x60
[  490.556694]  ? __vfs_write+0x36/0x170
[  490.556701]  ? vfs_write+0xa9/0x190
[  490.556706]  ? SyS_write+0x4f/0xb0
[  490.556714]  ? do_syscall_64+0x74/0x190
[  490.556720]  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  490.556725] Code: 01 48 85 ed 75 07 48 8b ab b0 00 00 00 48 8d bb a0 00 00 00 e8 a8 03 13 00 48 89 ea 48 c7 c7 50 65 e9 8f 48 89 c6 e8 e0 3f cc ff <0f> 0b eb 8f 48 89 df e8 ea fe ff ff 80 a3 c1 07 00 00 f7 5b 5d 
[  490.556817] ---[ end trace bb8ef13124112189 ]---
[  490.577225] thinkpad_acpi: EC reports that Thermal Table has changed

I've been using pcie_port_pm=off for a long time and it worked. Since I've upgraded from 4.16.12 to 4.16.13, my tray icon indicates that the NVIDIA card is enabled after boot, which should not be the case. There's nothing in the bumblebee logs but when I try to restart bumblebee I get

[  490.555643] bbswitch: disabling discrete graphics
[  490.556229] ------------[ cut here ]------------
[  490.556234] pci 0000:02:00.0: disabling already-disabled device
[  490.556266] WARNING: CPU: 2 PID: 539 at drivers/pci/pci.c:1646 pci_disable_device+0x8a/0xa0
[  490.556268] Modules linked in: uinput cmac rfcomm fuse snd_hda_codec_hdmi snd_hda_codec_realtek bnep snd_hda_codec_generic qcserial usb_wwan xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack cdc_mbim cdc_wdm usbserial cdc_ncm btusb usbnet mii btrtl btbcm btintel bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev ecdh_generic media ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter joydev mousedev bbswitch(O) snd_soc_skl
[  490.556400]  arc4 snd_soc_skl_ipc i915 snd_hda_ext_core snd_soc_sst_dsp snd_soc_sst_ipc snd_soc_acpi snd_soc_core iTCO_wdt iTCO_vendor_support mei_wdt snd_compress ac97_bus snd_pcm_dmaengine iwlmvm wmi_bmof intel_rapl x86_pkg_temp_thermal intel_powerclamp intel_wmi_thunderbolt mac80211 kvm_intel i2c_algo_bit snd_hda_intel nls_iso8859_1 nls_cp437 vfat drm_kms_helper snd_hda_codec fat iwlwifi kvm snd_hda_core drm e1000e snd_hwdep irqbypass cfg80211 intel_cstate snd_pcm intel_uncore intel_rapl_perf i2c_i801 psmouse ptp intel_gtt pps_core input_leds pcspkr thinkpad_acpi snd_timer agpgart syscopyarea sysfillrect sysimgblt ucsi_acpi fb_sys_fops mei_me typec_ucsi nvram rfkill mei intel_pch_thermal typec wmi snd shpchp soundcore led_class i2c_hid evdev ac battery hid rtc_cmos mac_hid vboxnetflt(O) vboxnetadp(O)
[  490.556541]  vboxpci(O) vboxdrv(O) coretemp msr overlay sg crypto_user acpi_call(O) ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg sd_mod uas usb_storage scsi_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc serio_raw atkbd libps2 aesni_intel xhci_pci aes_x86_64 crypto_simd glue_helper xhci_hcd cryptd usbcore usb_common i8042 serio dm_mod
[  490.556613] CPU: 2 PID: 539 Comm: bumblebeed Tainted: G           O     4.16.13-1-ARCH #1
[  490.556616] Hardware name: LENOVO 20H9001EGE/20H9001EGE, BIOS N1VET37W (1.27 ) 11/16/2017
[  490.556622] RIP: 0010:pci_disable_device+0x8a/0xa0
[  490.556625] RSP: 0018:ffffb0fc01f53dd0 EFLAGS: 00010286
[  490.556630] RAX: 0000000000000000 RBX: ffff8f1d2caf3000 RCX: 0000000000000001
[  490.556633] RDX: 0000000080000001 RSI: 0000000000000092 RDI: 00000000ffffffff
[  490.556636] RBP: ffff8f1d2ca7d720 R08: 000001557f96467b R09: 00000000000007e5
[  490.556639] R10: ffffffff905dc720 R11: 0000000000000000 R12: 0000561c612d5ac0
[  490.556642] R13: ffffb0fc01f53f00 R14: 0000561c612d5ac0 R15: 0000000000000000
[  490.556646] FS:  00007f3a858a1040(0000) GS:ffff8f1d3f500000(0000) knlGS:0000000000000000
[  490.556649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  490.556652] CR2: 00007fc10c98b490 CR3: 000000045d946006 CR4: 00000000003606e0
[  490.556655] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  490.556658] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  490.556660] Call Trace:
[  490.556675]  bbswitch_off.cold.4+0xc1/0x1d1 [bbswitch]
[  490.556683]  ? bbswitch_proc_write+0xaf/0xd0 [bbswitch]
[  490.556689]  ? proc_reg_write+0x3c/0x60
[  490.556694]  ? __vfs_write+0x36/0x170
[  490.556701]  ? vfs_write+0xa9/0x190
[  490.556706]  ? SyS_write+0x4f/0xb0
[  490.556714]  ? do_syscall_64+0x74/0x190
[  490.556720]  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  490.556725] Code: 01 48 85 ed 75 07 48 8b ab b0 00 00 00 48 8d bb a0 00 00 00 e8 a8 03 13 00 48 89 ea 48 c7 c7 50 65 e9 8f 48 89 c6 e8 e0 3f cc ff <0f> 0b eb 8f 48 89 df e8 ea fe ff ff 80 a3 c1 07 00 00 f7 5b 5d 
[  490.556817] ---[ end trace bb8ef13124112189 ]---
[  490.577225] thinkpad_acpi: EC reports that Thermal Table has changed
@Zeben

This comment has been minimized.

Show comment
Hide comment
@Zeben

Zeben Jun 4, 2018

@real-or-random I can to confirm your issue because I've got the same today. I have Dell Vostro 5459 laptop with discrette Nvidia GPU. When I connect AC adapter to my laptop - the discrette GPU is enabling and my laptop produces a lot of heat and endless spinning of a fan. sensors util is showing me a current temp of GPU (+50...+60°C), which shouldn't to show in case of disabled GPU (+1°C). Rollback from 4.16.13 to 4.16.12 solved the trick. Waiting for Linux 4.17 in hope that the issue is solved there.

Zeben commented Jun 4, 2018

@real-or-random I can to confirm your issue because I've got the same today. I have Dell Vostro 5459 laptop with discrette Nvidia GPU. When I connect AC adapter to my laptop - the discrette GPU is enabling and my laptop produces a lot of heat and endless spinning of a fan. sensors util is showing me a current temp of GPU (+50...+60°C), which shouldn't to show in case of disabled GPU (+1°C). Rollback from 4.16.13 to 4.16.12 solved the trick. Waiting for Linux 4.17 in hope that the issue is solved there.

@Zeben

This comment has been minimized.

Show comment
Hide comment
@Zeben

Zeben Jun 18, 2018

Bump. There is the same problem in Linux 4.17.2. Sadly, need to rollback to 4.16.3 to make bbswitch working correctly. :/

[14754.883889] ------------[ cut here ]------------
[14754.883891] pci 0000:01:00.0: disabling already-disabled device
[14754.883902] WARNING: CPU: 3 PID: 469 at drivers/pci/pci.c:1650 pci_disable_device+0x8a/0xa0
[14754.883903] Modules linked in: xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c br_netfilter bridge stp llc overlay uas usb_storage ccm uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media cmac rfcomm bnep btusb btrtl btbcm btintel bluetooth ecdh_generic fuse arc4 iwlmvm bbswitch(O) mac80211 joydev mousedev intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp iwlwifi kvm_intel snd_soc_skl iTCO_wdt iTCO_vendor_support hid_multitouch hid_generic snd_soc_skl_ipc dell_wmi wmi_bmof sparse_keymap mxm_wmi snd_hda_ext_core cfg80211 kvm nls_iso8859_1 snd_soc_sst_dsp nls_cp437 snd_soc_sst_ipc
[14754.883946]  vfat fat dell_laptop snd_soc_acpi dell_smbios irqbypass dell_wmi_descriptor dcdbas crct10dif_pclmul crc32_pclmul snd_soc_core ghash_clmulni_intel snd_hda_codec_hdmi pcbc snd_compress dell_smm_hwmon ac97_bus snd_pcm_dmaengine snd_hda_codec_conexant snd_hda_codec_generic aesni_intel r8169 aes_x86_64 crypto_simd cryptd mii glue_helper intel_cstate snd_hda_intel snd_hda_codec snd_hda_core intel_uncore snd_hwdep input_leds intel_rapl_perf psmouse led_class pcspkr idma64 snd_pcm snd_timer snd i2c_i801 mei_me intel_lpss_pci mei shpchp i2c_hid processor_thermal_device soundcore intel_lpss intel_soc_dts_iosf intel_pch_thermal hid int3402_thermal dell_rbtn int3400_thermal battery rtc_cmos ac wmi int340x_thermal_zone acpi_thermal_rel evdev rfkill mac_hid sg crypto_user ip_tables x_tables ext4 crc32c_generic
[14754.883986]  crc16 mbcache jbd2 fscrypto sd_mod serio_raw atkbd libps2 ahci xhci_pci libahci xhci_hcd libata crc32c_intel usbcore scsi_mod usb_common i8042 serio i915 intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart
[14754.884005] CPU: 3 PID: 469 Comm: bumblebeed Tainted: G           O      4.17.2-1-ARCH #1
[14754.884006] Hardware name: Dell Inc. Vostro 14-5459/080W31, BIOS 1.1.1 09/22/2017
[14754.884009] RIP: 0010:pci_disable_device+0x8a/0xa0
[14754.884010] RSP: 0018:ffffbbe6426ffdd8 EFLAGS: 00010286
[14754.884012] RAX: 0000000000000000 RBX: ffffa344b9ab0000 RCX: 0000000000000001
[14754.884013] RDX: 0000000080000001 RSI: 0000000000000082 RDI: 00000000ffffffff
[14754.884014] RBP: ffffa344b9a34630 R08: 00002037dd036bf4 R09: 00000000000004ab
[14754.884015] R10: ffffffffa85ef6e0 R11: 0000000000000000 R12: 000055985a2d9ac0
[14754.884016] R13: ffffbbe6426fff08 R14: 000055985a2d9ac0 R15: 0000000000000000
[14754.884017] FS:  00007fd236f31040(0000) GS:ffffa344c3d80000(0000) knlGS:0000000000000000
[14754.884018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14754.884019] CR2: 00007fff7a920b98 CR3: 000000026cd76005 CR4: 00000000003606e0
[14754.884020] Call Trace:
[14754.884028]  bbswitch_off.cold.4+0xc1/0x1d1 [bbswitch]
[14754.884030]  bbswitch_proc_write+0xaf/0xd0 [bbswitch]
[14754.884034]  proc_reg_write+0x3c/0x60
[14754.884036]  __vfs_write+0x36/0x170
[14754.884039]  vfs_write+0xa9/0x190
[14754.884041]  ksys_write+0x4f/0xb0
[14754.884045]  do_syscall_64+0x5b/0x170
[14754.884047]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[14754.884049] RIP: 0033:0x7fd23622e9d4
[14754.884050] RSP: 002b:00007fff7a923448 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[14754.884052] RAX: ffffffffffffffda RBX: 0000000000000028 RCX: 00007fd23622e9d4
[14754.884053] RDX: 0000000000000028 RSI: 000055985a2d9ac0 RDI: 0000000000000004
[14754.884054] RBP: 000055985a2d9ac0 R08: 00007fd236f31040 R09: 000000000000002c
[14754.884055] R10: 00000000000001b6 R11: 0000000000000246 R12: 000055985a2d6600
[14754.884056] R13: 0000000000000028 R14: 00007fd2364f75c0 R15: 0000000000000028
[14754.884058] Code: 01 48 85 ed 75 07 48 8b ab b0 00 00 00 48 8d bb a0 00 00 00 e8 48 19 13 00 48 89 ea 48 c7 c7 00 8a ea a7 48 89 c6 e8 70 32 cb ff <0f> 0b eb 8f 48 89 df e8 ea fe ff ff 80 a3 c1 07 00 00 f7 5b 5d 
[14754.884090] ---[ end trace 48b439d307758e63 ]---
[14754.943731] pci 0000:01:00.0: Refused to change power state, currently in D0

Zeben commented Jun 18, 2018

Bump. There is the same problem in Linux 4.17.2. Sadly, need to rollback to 4.16.3 to make bbswitch working correctly. :/

[14754.883889] ------------[ cut here ]------------
[14754.883891] pci 0000:01:00.0: disabling already-disabled device
[14754.883902] WARNING: CPU: 3 PID: 469 at drivers/pci/pci.c:1650 pci_disable_device+0x8a/0xa0
[14754.883903] Modules linked in: xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c br_netfilter bridge stp llc overlay uas usb_storage ccm uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media cmac rfcomm bnep btusb btrtl btbcm btintel bluetooth ecdh_generic fuse arc4 iwlmvm bbswitch(O) mac80211 joydev mousedev intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp iwlwifi kvm_intel snd_soc_skl iTCO_wdt iTCO_vendor_support hid_multitouch hid_generic snd_soc_skl_ipc dell_wmi wmi_bmof sparse_keymap mxm_wmi snd_hda_ext_core cfg80211 kvm nls_iso8859_1 snd_soc_sst_dsp nls_cp437 snd_soc_sst_ipc
[14754.883946]  vfat fat dell_laptop snd_soc_acpi dell_smbios irqbypass dell_wmi_descriptor dcdbas crct10dif_pclmul crc32_pclmul snd_soc_core ghash_clmulni_intel snd_hda_codec_hdmi pcbc snd_compress dell_smm_hwmon ac97_bus snd_pcm_dmaengine snd_hda_codec_conexant snd_hda_codec_generic aesni_intel r8169 aes_x86_64 crypto_simd cryptd mii glue_helper intel_cstate snd_hda_intel snd_hda_codec snd_hda_core intel_uncore snd_hwdep input_leds intel_rapl_perf psmouse led_class pcspkr idma64 snd_pcm snd_timer snd i2c_i801 mei_me intel_lpss_pci mei shpchp i2c_hid processor_thermal_device soundcore intel_lpss intel_soc_dts_iosf intel_pch_thermal hid int3402_thermal dell_rbtn int3400_thermal battery rtc_cmos ac wmi int340x_thermal_zone acpi_thermal_rel evdev rfkill mac_hid sg crypto_user ip_tables x_tables ext4 crc32c_generic
[14754.883986]  crc16 mbcache jbd2 fscrypto sd_mod serio_raw atkbd libps2 ahci xhci_pci libahci xhci_hcd libata crc32c_intel usbcore scsi_mod usb_common i8042 serio i915 intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart
[14754.884005] CPU: 3 PID: 469 Comm: bumblebeed Tainted: G           O      4.17.2-1-ARCH #1
[14754.884006] Hardware name: Dell Inc. Vostro 14-5459/080W31, BIOS 1.1.1 09/22/2017
[14754.884009] RIP: 0010:pci_disable_device+0x8a/0xa0
[14754.884010] RSP: 0018:ffffbbe6426ffdd8 EFLAGS: 00010286
[14754.884012] RAX: 0000000000000000 RBX: ffffa344b9ab0000 RCX: 0000000000000001
[14754.884013] RDX: 0000000080000001 RSI: 0000000000000082 RDI: 00000000ffffffff
[14754.884014] RBP: ffffa344b9a34630 R08: 00002037dd036bf4 R09: 00000000000004ab
[14754.884015] R10: ffffffffa85ef6e0 R11: 0000000000000000 R12: 000055985a2d9ac0
[14754.884016] R13: ffffbbe6426fff08 R14: 000055985a2d9ac0 R15: 0000000000000000
[14754.884017] FS:  00007fd236f31040(0000) GS:ffffa344c3d80000(0000) knlGS:0000000000000000
[14754.884018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14754.884019] CR2: 00007fff7a920b98 CR3: 000000026cd76005 CR4: 00000000003606e0
[14754.884020] Call Trace:
[14754.884028]  bbswitch_off.cold.4+0xc1/0x1d1 [bbswitch]
[14754.884030]  bbswitch_proc_write+0xaf/0xd0 [bbswitch]
[14754.884034]  proc_reg_write+0x3c/0x60
[14754.884036]  __vfs_write+0x36/0x170
[14754.884039]  vfs_write+0xa9/0x190
[14754.884041]  ksys_write+0x4f/0xb0
[14754.884045]  do_syscall_64+0x5b/0x170
[14754.884047]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[14754.884049] RIP: 0033:0x7fd23622e9d4
[14754.884050] RSP: 002b:00007fff7a923448 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[14754.884052] RAX: ffffffffffffffda RBX: 0000000000000028 RCX: 00007fd23622e9d4
[14754.884053] RDX: 0000000000000028 RSI: 000055985a2d9ac0 RDI: 0000000000000004
[14754.884054] RBP: 000055985a2d9ac0 R08: 00007fd236f31040 R09: 000000000000002c
[14754.884055] R10: 00000000000001b6 R11: 0000000000000246 R12: 000055985a2d6600
[14754.884056] R13: 0000000000000028 R14: 00007fd2364f75c0 R15: 0000000000000028
[14754.884058] Code: 01 48 85 ed 75 07 48 8b ab b0 00 00 00 48 8d bb a0 00 00 00 e8 48 19 13 00 48 89 ea 48 c7 c7 00 8a ea a7 48 89 c6 e8 70 32 cb ff <0f> 0b eb 8f 48 89 df e8 ea fe ff ff 80 a3 c1 07 00 00 f7 5b 5d 
[14754.884090] ---[ end trace 48b439d307758e63 ]---
[14754.943731] pci 0000:01:00.0: Refused to change power state, currently in D0

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Jun 23, 2018

Member

Thanks @real-or-random for narrowing down the version range. I suspect that torvalds/linux@abf92f8 is causing this issue. bbswitch does some ugly things which might interfer with that commit.

You could try the pm-rework branch (possibly without the last vga_switcheroo patch) and see if it improves the situation for you. It is architecturally much more different.

Member

Lekensteyn commented Jun 23, 2018

Thanks @real-or-random for narrowing down the version range. I suspect that torvalds/linux@abf92f8 is causing this issue. bbswitch does some ugly things which might interfer with that commit.

You could try the pm-rework branch (possibly without the last vga_switcheroo patch) and see if it improves the situation for you. It is architecturally much more different.

@liskin

This comment has been minimized.

Show comment
Hide comment
@liskin

liskin Jun 23, 2018

But what's the point of bbswitch with torvalds/linux@abf92f8? At least on ThinkPad T25 it seems that simply disabling bbswitch and enabling runtime-pm in laptop-mode-tools is enough for the dGPU to power off when the nvidia module is unloaded and power back on when it gets loaded.

Without torvalds/linux@abf92f8, this doesn't work since as soon as runtime-pm is enabled for the card, the PCI state is lost and there's no way to enable it without a reboot, and this is true for both nvidia and nouveau. But with the patch, everything seems to work just fine.
(Except bumblebeed which insists on having a PMMethod to unload the module, so I now need to rmmod it manually. I guess that's worth a separate bug report.)

Now I just need to check whether dropping bbswitch and using port pm fixes the battery drain during system suspend. :-)

liskin commented Jun 23, 2018

But what's the point of bbswitch with torvalds/linux@abf92f8? At least on ThinkPad T25 it seems that simply disabling bbswitch and enabling runtime-pm in laptop-mode-tools is enough for the dGPU to power off when the nvidia module is unloaded and power back on when it gets loaded.

Without torvalds/linux@abf92f8, this doesn't work since as soon as runtime-pm is enabled for the card, the PCI state is lost and there's no way to enable it without a reboot, and this is true for both nvidia and nouveau. But with the patch, everything seems to work just fine.
(Except bumblebeed which insists on having a PMMethod to unload the module, so I now need to rmmod it manually. I guess that's worth a separate bug report.)

Now I just need to check whether dropping bbswitch and using port pm fixes the battery drain during system suspend. :-)

@Zeben

This comment has been minimized.

Show comment
Hide comment
@Zeben

Zeben Jul 7, 2018

@liskin can You describe more details about the way to setup power save on Linux properly?
Now I have the issue described below:
I use Arch Linux with latest updates.
I usually play some games using dGPU using proprietary Nvidia driver + bbswitch + bumblebee. I also use laptop-mode-tools. No any config files edited, so they're all in default state.
When AC adapter is connected to my laptop - all works fine. When I disconnect AC from my laptop - bumblebee stops working with the messages:

[  205.460191] [INFO]Response: No - error: Could not load GPU driver
[  205.460214] [ERROR]Cannot access secondary GPU - error: Could not load GPU driver
[  205.460222] [DEBUG]Socket closed.

... and dmesg response:

[   71.717617] bbswitch: enabling discrete graphics
[   75.875371] pci 0000:01:00.0: enabling device (0000 -> 0003)
[   75.982305] ipmi message handler version 39.2
[   75.983821] ipmi device interface
[   76.100767] nvidia: module license 'NVIDIA' taints kernel.
[   76.100768] Disabling lock debugging due to kernel taint
[   76.116347] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[   76.216157] NVRM: The NVIDIA GPU 0000:01:00.0
               NVRM: (PCI ID: 10de:1346) installed in this system has
               NVRM: fallen off the bus and is not responding to commands.
**[   76.266331] nvidia: probe of 0000:01:00.0 failed with error -1**
[   76.266363] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   76.266364] NVRM: None of the NVIDIA graphics adapters were initialized!

After this dGPU keeps powered on: acpi says me 2x less battery time remaining. To disable dGPU I need to restart bumblebeed service, but I still can't to use dGPU when AC adapted is unplugged.
At linux 4.6.12 the thing was working without the problems, with pci_port_pm=off.

Second: when I try to watch some videos using VLC - my dGPU enables and keeps enabled even my VLC is closed. dGPU keeps working. When I try to restart bumblebeed - I'm getting stack trace from dmesg:

...
[  829.838962] CPU: 0 PID: 18843 Comm: bumblebeed Tainted: P           O      4.17.4-1-ARCH #1
[  829.838962] Hardware name: Dell Inc. Vostro 14-5459/080W31, BIOS 1.1.4 05/14/2018
[  829.838965] RIP: 0010:pci_disable_device+0x8a/0xa0
[  829.838966] RSP: 0018:ffffb274812d7dd8 EFLAGS: 00010286
[  829.838967] RAX: 0000000000000000 RBX: ffffa048f9aa7000 RCX: 0000000000000001
[  829.838968] RDX: 0000000080000001 RSI: 0000000000000082 RDI: 00000000ffffffff
[  829.838969] RBP: ffffa048f9a2a390 R08: 000001d20c68992c R09: 0000000000000390
[  829.838970] R10: ffffffff895ef6e0 R11: 0000000000000000 R12: 0000559091bb8ac0
[  829.838970] R13: ffffb274812d7f08 R14: 0000559091bb8ac0 R15: 0000000000000000
[  829.838972] FS:  00007fe4675b3040(0000) GS:ffffa04903c00000(0000) knlGS:0000000000000000
[  829.838973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  829.838973] CR2: 00007ffdb1dc7dd0 CR3: 000000018414a001 CR4: 00000000003606f0
[  829.838974] Call Trace:
[  829.838992]  bbswitch_off.cold.4+0xc1/0x1d1 [bbswitch]
[  829.838994]  bbswitch_proc_write+0xaf/0xd0 [bbswitch]
[  829.839002]  proc_reg_write+0x3c/0x60
[  829.839013]  __vfs_write+0x36/0x170
[  829.839018]  vfs_write+0xa9/0x190
[  829.839020]  ksys_write+0x4f/0xb0
[  829.839022]  do_syscall_64+0x5b/0x170
[  829.839025]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

...

My assumptions looks like these below:
1, From linux > 4.16.12 a power management has been improved with some major changes, which interferers / brokes standard bbswitch behaviour.
2. My laptop-mode-tools configs also interferers with new linux and its new PM mechanisms.
Please, help me to solve the issue. I just really know the one thing: keep working under Linux < 4.16.13 isn't too good idea, but as temporary solution I can to try to use linux-lts.

Zeben commented Jul 7, 2018

@liskin can You describe more details about the way to setup power save on Linux properly?
Now I have the issue described below:
I use Arch Linux with latest updates.
I usually play some games using dGPU using proprietary Nvidia driver + bbswitch + bumblebee. I also use laptop-mode-tools. No any config files edited, so they're all in default state.
When AC adapter is connected to my laptop - all works fine. When I disconnect AC from my laptop - bumblebee stops working with the messages:

[  205.460191] [INFO]Response: No - error: Could not load GPU driver
[  205.460214] [ERROR]Cannot access secondary GPU - error: Could not load GPU driver
[  205.460222] [DEBUG]Socket closed.

... and dmesg response:

[   71.717617] bbswitch: enabling discrete graphics
[   75.875371] pci 0000:01:00.0: enabling device (0000 -> 0003)
[   75.982305] ipmi message handler version 39.2
[   75.983821] ipmi device interface
[   76.100767] nvidia: module license 'NVIDIA' taints kernel.
[   76.100768] Disabling lock debugging due to kernel taint
[   76.116347] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[   76.216157] NVRM: The NVIDIA GPU 0000:01:00.0
               NVRM: (PCI ID: 10de:1346) installed in this system has
               NVRM: fallen off the bus and is not responding to commands.
**[   76.266331] nvidia: probe of 0000:01:00.0 failed with error -1**
[   76.266363] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   76.266364] NVRM: None of the NVIDIA graphics adapters were initialized!

After this dGPU keeps powered on: acpi says me 2x less battery time remaining. To disable dGPU I need to restart bumblebeed service, but I still can't to use dGPU when AC adapted is unplugged.
At linux 4.6.12 the thing was working without the problems, with pci_port_pm=off.

Second: when I try to watch some videos using VLC - my dGPU enables and keeps enabled even my VLC is closed. dGPU keeps working. When I try to restart bumblebeed - I'm getting stack trace from dmesg:

...
[  829.838962] CPU: 0 PID: 18843 Comm: bumblebeed Tainted: P           O      4.17.4-1-ARCH #1
[  829.838962] Hardware name: Dell Inc. Vostro 14-5459/080W31, BIOS 1.1.4 05/14/2018
[  829.838965] RIP: 0010:pci_disable_device+0x8a/0xa0
[  829.838966] RSP: 0018:ffffb274812d7dd8 EFLAGS: 00010286
[  829.838967] RAX: 0000000000000000 RBX: ffffa048f9aa7000 RCX: 0000000000000001
[  829.838968] RDX: 0000000080000001 RSI: 0000000000000082 RDI: 00000000ffffffff
[  829.838969] RBP: ffffa048f9a2a390 R08: 000001d20c68992c R09: 0000000000000390
[  829.838970] R10: ffffffff895ef6e0 R11: 0000000000000000 R12: 0000559091bb8ac0
[  829.838970] R13: ffffb274812d7f08 R14: 0000559091bb8ac0 R15: 0000000000000000
[  829.838972] FS:  00007fe4675b3040(0000) GS:ffffa04903c00000(0000) knlGS:0000000000000000
[  829.838973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  829.838973] CR2: 00007ffdb1dc7dd0 CR3: 000000018414a001 CR4: 00000000003606f0
[  829.838974] Call Trace:
[  829.838992]  bbswitch_off.cold.4+0xc1/0x1d1 [bbswitch]
[  829.838994]  bbswitch_proc_write+0xaf/0xd0 [bbswitch]
[  829.839002]  proc_reg_write+0x3c/0x60
[  829.839013]  __vfs_write+0x36/0x170
[  829.839018]  vfs_write+0xa9/0x190
[  829.839020]  ksys_write+0x4f/0xb0
[  829.839022]  do_syscall_64+0x5b/0x170
[  829.839025]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

...

My assumptions looks like these below:
1, From linux > 4.16.12 a power management has been improved with some major changes, which interferers / brokes standard bbswitch behaviour.
2. My laptop-mode-tools configs also interferers with new linux and its new PM mechanisms.
Please, help me to solve the issue. I just really know the one thing: keep working under Linux < 4.16.13 isn't too good idea, but as temporary solution I can to try to use linux-lts.

@liskin

This comment has been minimized.

Show comment
Hide comment
@liskin

liskin Jul 8, 2018

@Zeben You need to disable/drop bbswitch. If you still get "fallen off the bus and is not responding to commands" afterwards, that means torvalds/linux@abf92f8 isn't working with your hardware and you might need to revert to pci_port_pm=off and/or blacklisting the device in laptop-mode-tools' runtime-pm.

liskin commented Jul 8, 2018

@Zeben You need to disable/drop bbswitch. If you still get "fallen off the bus and is not responding to commands" afterwards, that means torvalds/linux@abf92f8 isn't working with your hardware and you might need to revert to pci_port_pm=off and/or blacklisting the device in laptop-mode-tools' runtime-pm.

@Zeben

This comment has been minimized.

Show comment
Hide comment
@Zeben

Zeben Jul 8, 2018

@liskin Hello. Thank You very much for fast response.
I've been trying to experiment with the combinations of proper setup and I've got the state "one thing - solved, second thing - broken":

  1. I've removed pcie_runtime_pm=off from a kernel command-line options;
  2. I've enabled runtime-pm in LMT via lmt_gui;
  3. I've uninstalled bbswitch package.

Results:

  1. optirun and application used by it works without any problems, both after AC plug and AC unplug;
  2. nvidia kernel modules doesn't unloads after I exit from application that optirun used by;
  3. My dGPU keeps powered on. Always. Even if unused. Battery discharged 2x faster, fan spins more often.
    3.1. I was testing it via acpi command and estimated battery time.
    With bbswitch installed dGPU disables properly, but... All the things returns to the symptoms from my previous message.

I'm really confused. :(

Zeben commented Jul 8, 2018

@liskin Hello. Thank You very much for fast response.
I've been trying to experiment with the combinations of proper setup and I've got the state "one thing - solved, second thing - broken":

  1. I've removed pcie_runtime_pm=off from a kernel command-line options;
  2. I've enabled runtime-pm in LMT via lmt_gui;
  3. I've uninstalled bbswitch package.

Results:

  1. optirun and application used by it works without any problems, both after AC plug and AC unplug;
  2. nvidia kernel modules doesn't unloads after I exit from application that optirun used by;
  3. My dGPU keeps powered on. Always. Even if unused. Battery discharged 2x faster, fan spins more often.
    3.1. I was testing it via acpi command and estimated battery time.
    With bbswitch installed dGPU disables properly, but... All the things returns to the symptoms from my previous message.

I'm really confused. :(

@liskin

This comment has been minimized.

Show comment
Hide comment
@liskin

liskin Jul 8, 2018

No need to be confused, this is expected, and I already mentioned it:

(Except bumblebeed which insists on having a PMMethod to unload the module, so I now need to rmmod it manually. I guess that's worth a separate bug report.)

Try rmmod nvidia-modeset && rmmod nvidia and see if power usage is okay. Next optirun should load these modules again and power up the card.

liskin commented Jul 8, 2018

No need to be confused, this is expected, and I already mentioned it:

(Except bumblebeed which insists on having a PMMethod to unload the module, so I now need to rmmod it manually. I guess that's worth a separate bug report.)

Try rmmod nvidia-modeset && rmmod nvidia and see if power usage is okay. Next optirun should load these modules again and power up the card.

@Zeben

This comment has been minimized.

Show comment
Hide comment
@Zeben

Zeben Jul 8, 2018

@liskin got the notes. I tried to rmmod the modules, but it seems that dGPU keeps powered on, even after laptop restart, but I'm not sure about the case where I can to see that dGPU exactly powered or not. I guess just on acpi and powertop commands' results.
Here the results after laptop restarted and all nvidia-releated modules unloaded:

skovo@devliner ~ % lsmod | grep nv
skovo@devliner ~ % acpi
Battery 0: Discharging, 75%, 02:43:06 remaining

Powertop reports:

The battery reports a discharge rate of 12.2 W

But here the results right after "hacky tricks": I installed right back bbswitch module and restarted bumblebeed:

skovo@devliner ~ % acpi
Battery 0: Discharging, 70%, 04:09:52 remaining

... and Powertop:

The battery reports a discharge rate of 5.58 W

Zeben commented Jul 8, 2018

@liskin got the notes. I tried to rmmod the modules, but it seems that dGPU keeps powered on, even after laptop restart, but I'm not sure about the case where I can to see that dGPU exactly powered or not. I guess just on acpi and powertop commands' results.
Here the results after laptop restarted and all nvidia-releated modules unloaded:

skovo@devliner ~ % lsmod | grep nv
skovo@devliner ~ % acpi
Battery 0: Discharging, 75%, 02:43:06 remaining

Powertop reports:

The battery reports a discharge rate of 12.2 W

But here the results right after "hacky tricks": I installed right back bbswitch module and restarted bumblebeed:

skovo@devliner ~ % acpi
Battery 0: Discharging, 70%, 04:09:52 remaining

... and Powertop:

The battery reports a discharge rate of 5.58 W
@liskin

This comment has been minimized.

Show comment
Hide comment
@liskin

liskin Jul 8, 2018

@Zeben Well, that's strange, in my case rmmod and laptop-mode-tools is enough to power the dGPU down, which on this ThinkPad means going from 7 watts to 5 watts. And I never restart.

Also, the runtime_status of the PCI device shows suspended:

$ cat /sys/bus/pci/devices/0000\:02\:00.0/power/runtime_status 
suspended

liskin commented Jul 8, 2018

@Zeben Well, that's strange, in my case rmmod and laptop-mode-tools is enough to power the dGPU down, which on this ThinkPad means going from 7 watts to 5 watts. And I never restart.

Also, the runtime_status of the PCI device shows suspended:

$ cat /sys/bus/pci/devices/0000\:02\:00.0/power/runtime_status 
suspended
@Zeben

This comment has been minimized.

Show comment
Hide comment
@Zeben

Zeben Jul 8, 2018

@liskin thank you again for the command, now I can exactly see whener dGPU powered or not.
There is results after laptop restart:

$ lspci | grep GeForce
01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 930M] (rev a2)
$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
active

No any nvidia modules loaded.

And.. After installing bbswitch again and restarting bumblebeed.service...

$ dmesg | tail
[  372.669685] bbswitch: loading out-of-tree module taints kernel.
[  372.669940] bbswitch: version 0.8
[  372.669946] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[  372.669952] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.RP01.PEGP
[  372.669962] ACPI Warning: \_SB.PCI0.RP01.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20180313/nsarguments-66)
[  372.670105] bbswitch: detected an Optimus _DSM function
[  372.792842] pci 0000:01:00.0: enabling device (0006 -> 0007)
[  372.793172] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on
[  372.800773] bbswitch: disabling discrete graphics
$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
suspended

Zeben commented Jul 8, 2018

@liskin thank you again for the command, now I can exactly see whener dGPU powered or not.
There is results after laptop restart:

$ lspci | grep GeForce
01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 930M] (rev a2)
$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
active

No any nvidia modules loaded.

And.. After installing bbswitch again and restarting bumblebeed.service...

$ dmesg | tail
[  372.669685] bbswitch: loading out-of-tree module taints kernel.
[  372.669940] bbswitch: version 0.8
[  372.669946] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[  372.669952] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.RP01.PEGP
[  372.669962] ACPI Warning: \_SB.PCI0.RP01.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20180313/nsarguments-66)
[  372.670105] bbswitch: detected an Optimus _DSM function
[  372.792842] pci 0000:01:00.0: enabling device (0006 -> 0007)
[  372.793172] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on
[  372.800773] bbswitch: disabling discrete graphics
$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
suspended
@liskin

This comment has been minimized.

Show comment
Hide comment
@liskin

liskin Jul 8, 2018

Hm, but laptop-mode-tools/tls should really power it off itself. bbswitch and runtime pm don't interact cleanly (I've heard about memory corruption).

liskin commented Jul 8, 2018

Hm, but laptop-mode-tools/tls should really power it off itself. bbswitch and runtime pm don't interact cleanly (I've heard about memory corruption).

@Zeben

This comment has been minimized.

Show comment
Hide comment
@Zeben

Zeben Jul 8, 2018

@liskin after dozen of experiments I've got some little progress in this trick.
I've completely uninstalled laptop-mode-tools and installed tlp package I didn't know about before.
I've learned some pieces of a documentation for the package, updated linux back to 4.17.4, enabled pcie_port_pm=on uninstalled bbswitch and did some tests. Results:

  1. tlp doesn't enable runtime PM for my dGPU:
>> Bad           Enable SATA link power management for host0                                                            
   Bad           Enable SATA link power management for host1
   Bad           VM writeback timeout
   Bad           Runtime PM for PCI Device NVIDIA Corporation GM108M [GeForce 930M]
...
...

... but if I enable it manually via powertop - I'm getting amazing power saving results!

$ acpi
Battery 0: Discharging, 84%, **07:12:10** remaining
  1. optirun works as expected, right as You mentioned about it. To disable dGPU, I need to remove nvidia_modeset and nvidia modules. No any else issues I've found.
$ optirun -b primus glxgears -info 
GL_RENDERER   = GeForce 930M/PCIe/SSE2
GL_VERSION    = 4.6.0 NVIDIA 396.24
GL_VENDOR     = NVIDIA Corporation
...
$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
active

# disabled optirun-releated application
$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
active

# removed modules
$ sudo rmmod nvidia_modeset
$ sudo rmmod nvidia
$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
suspended

So, the last things in this trick is:

  1. "teach" bumblebee to unload needed modules properly
  2. "teach" tlp to enable runtime PM for dGPU (the last trick is not solved).
  3. dGPU is powering on when I plug AC adapter; cat command gives me wrong results:
cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
suspended

$ sensors
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +51.0°C  

dell_smm-virtual-0
Adapter: Virtual device
Processor Fan: 3041 RPM
fan2:               N/A
CPU:            +41.0°C  
GPU:            +53.0°C

GPU: +53.0°C
... and spinning of fan.
:/

Zeben commented Jul 8, 2018

@liskin after dozen of experiments I've got some little progress in this trick.
I've completely uninstalled laptop-mode-tools and installed tlp package I didn't know about before.
I've learned some pieces of a documentation for the package, updated linux back to 4.17.4, enabled pcie_port_pm=on uninstalled bbswitch and did some tests. Results:

  1. tlp doesn't enable runtime PM for my dGPU:
>> Bad           Enable SATA link power management for host0                                                            
   Bad           Enable SATA link power management for host1
   Bad           VM writeback timeout
   Bad           Runtime PM for PCI Device NVIDIA Corporation GM108M [GeForce 930M]
...
...

... but if I enable it manually via powertop - I'm getting amazing power saving results!

$ acpi
Battery 0: Discharging, 84%, **07:12:10** remaining
  1. optirun works as expected, right as You mentioned about it. To disable dGPU, I need to remove nvidia_modeset and nvidia modules. No any else issues I've found.
$ optirun -b primus glxgears -info 
GL_RENDERER   = GeForce 930M/PCIe/SSE2
GL_VERSION    = 4.6.0 NVIDIA 396.24
GL_VENDOR     = NVIDIA Corporation
...
$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
active

# disabled optirun-releated application
$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
active

# removed modules
$ sudo rmmod nvidia_modeset
$ sudo rmmod nvidia
$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
suspended

So, the last things in this trick is:

  1. "teach" bumblebee to unload needed modules properly
  2. "teach" tlp to enable runtime PM for dGPU (the last trick is not solved).
  3. dGPU is powering on when I plug AC adapter; cat command gives me wrong results:
cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
suspended

$ sensors
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +51.0°C  

dell_smm-virtual-0
Adapter: Virtual device
Processor Fan: 3041 RPM
fan2:               N/A
CPU:            +41.0°C  
GPU:            +53.0°C

GPU: +53.0°C
... and spinning of fan.
:/

@Zeben

This comment has been minimized.

Show comment
Hide comment
@Zeben

Zeben Jul 8, 2018

Bump:

dGPU is powering on when I plug AC adapter; cat command gives me wrong results:

The issue was solved via changing RUNTIME_PM_ON_AC from on to auto, which enables runtime PM for all PCI devices, even if AC adapter is plugged on. But I'm not sure if it's right...

Zeben commented Jul 8, 2018

Bump:

dGPU is powering on when I plug AC adapter; cat command gives me wrong results:

The issue was solved via changing RUNTIME_PM_ON_AC from on to auto, which enables runtime PM for all PCI devices, even if AC adapter is plugged on. But I'm not sure if it's right...

@liskin

This comment has been minimized.

Show comment
Hide comment
@liskin

liskin Jul 8, 2018

@Zeben The whole point of disabling pm on AC is that you get rid of those (possibly) tens/hundreds of milliseconds waits for devices to power up. Try doing lspci on battery: it's not instanstaneous but takes almost a second. Try plugging in headphones: there's a somewhat annoying click when the soundcard powers down. On the other hand, with pm enabled, you'll hear your fan a lot less often. It's your decision to make. :-)

liskin commented Jul 8, 2018

@Zeben The whole point of disabling pm on AC is that you get rid of those (possibly) tens/hundreds of milliseconds waits for devices to power up. Try doing lspci on battery: it's not instanstaneous but takes almost a second. Try plugging in headphones: there's a somewhat annoying click when the soundcard powers down. On the other hand, with pm enabled, you'll hear your fan a lot less often. It's your decision to make. :-)

@Zeben

This comment has been minimized.

Show comment
Hide comment
@Zeben

Zeben Jul 8, 2018

So, after more experiments I've got some conclusions.

All works without problems with three types of configurations.

  1. Installed packages: linux 4.17.4, bbswitch, bumblebee, tlp, tlp-ui, nvidia.
    Blacklisted 01:00.0 Nvidia card in RUNTIME_PM_BLACKLIST variable.
    Using pcie_port_pm=on in kernel command-line options.
    Works with plugging/unplugging AC adapter.
    Works enabling/disabling dGPU power.
    No any errors.

  2. Installed packages linux 4.17.4, bumblebee, tlp, tlp-ui, nvidia.
    bbswitch removed.
    Using pcie_port_pm=on in kernel command-line options.
    Tip-1: to make dGPU able to power-off, we need to unload nvidia and nvidia_modeset kernel modules manually.
    Tip-2: When AC adapter plugged/unplugged, dGPU keeps powered on. We need to find our dGPU vendor/device via lspci -nn and add the device into udev rules and always set it to auto, instead on.
    I guess it will be a default configuration in future versions of Linux-based distributions, after some fixes.

  3. Legacy configuration.
    Installed packages: linux 4.16.12, bumblebee, bbswitch-dkms, nvidia-dkms, laptop-mode-tools.
    Using pcie_port_pm=off in kernel command-line options.
    No any additional changes needed.

Many thanks for @liskin for suggestions and tips. Maybe our conversation will be helpful for those who have same issues. Waiting for complete implementation of dynamic switchable graphics, out-the-box, without bbswitch.

Zeben commented Jul 8, 2018

So, after more experiments I've got some conclusions.

All works without problems with three types of configurations.

  1. Installed packages: linux 4.17.4, bbswitch, bumblebee, tlp, tlp-ui, nvidia.
    Blacklisted 01:00.0 Nvidia card in RUNTIME_PM_BLACKLIST variable.
    Using pcie_port_pm=on in kernel command-line options.
    Works with plugging/unplugging AC adapter.
    Works enabling/disabling dGPU power.
    No any errors.

  2. Installed packages linux 4.17.4, bumblebee, tlp, tlp-ui, nvidia.
    bbswitch removed.
    Using pcie_port_pm=on in kernel command-line options.
    Tip-1: to make dGPU able to power-off, we need to unload nvidia and nvidia_modeset kernel modules manually.
    Tip-2: When AC adapter plugged/unplugged, dGPU keeps powered on. We need to find our dGPU vendor/device via lspci -nn and add the device into udev rules and always set it to auto, instead on.
    I guess it will be a default configuration in future versions of Linux-based distributions, after some fixes.

  3. Legacy configuration.
    Installed packages: linux 4.16.12, bumblebee, bbswitch-dkms, nvidia-dkms, laptop-mode-tools.
    Using pcie_port_pm=off in kernel command-line options.
    No any additional changes needed.

Many thanks for @liskin for suggestions and tips. Maybe our conversation will be helpful for those who have same issues. Waiting for complete implementation of dynamic switchable graphics, out-the-box, without bbswitch.

@real-or-random

This comment has been minimized.

Show comment
Hide comment
@real-or-random

real-or-random Jul 9, 2018

Hm, for me removing pcie_port_pm=off does not help. Without that, I cannot load the nvidia driver.

However, the problem with 4.16.13 went away in a later kernel version (actually already some weeks ago, I just forgot to report it here). So for me, pcie_port_pm=off is still the way to go...

Hm, for me removing pcie_port_pm=off does not help. Without that, I cannot load the nvidia driver.

However, the problem with 4.16.13 went away in a later kernel version (actually already some weeks ago, I just forgot to report it here). So for me, pcie_port_pm=off is still the way to go...

@Zeben

This comment has been minimized.

Show comment
Hide comment
@Zeben

Zeben Jul 9, 2018

@real-or-random I've combined two technologies to make using swichable graphics possible: runtime PM for all devices (by keeping pcie_port_pm=on or removing it completely) and blacklisting dGPU in tlp. As a result, bbswitch doesn't interferer with linux, its new runtime PM; bbswitch-releated tracebacks in dmesg is also gone. Now bbswitch completely controls dGPU device and the device isn't controlled by runtime PM.

Zeben commented Jul 9, 2018

@real-or-random I've combined two technologies to make using swichable graphics possible: runtime PM for all devices (by keeping pcie_port_pm=on or removing it completely) and blacklisting dGPU in tlp. As a result, bbswitch doesn't interferer with linux, its new runtime PM; bbswitch-releated tracebacks in dmesg is also gone. Now bbswitch completely controls dGPU device and the device isn't controlled by runtime PM.

@real-or-random

This comment has been minimized.

Show comment
Hide comment
@real-or-random

real-or-random Jul 10, 2018

I tried that but it didn't work. But I'm not convinced that the blacklisting in tlp worked because powertop still showed that PM enabled on for the NVIDIA card. Is that the right place to check? (Where can I check manually?)

I tried that but it didn't work. But I'm not convinced that the blacklisting in tlp worked because powertop still showed that PM enabled on for the NVIDIA card. Is that the right place to check? (Where can I check manually?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment