Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia card refuses to change power state #177

Open
lumag opened this issue Aug 15, 2018 · 13 comments
Open

nvidia card refuses to change power state #177

lumag opened this issue Aug 15, 2018 · 13 comments

Comments

@lumag
Copy link

lumag commented Aug 15, 2018

Hello,

After one of updates I've noticed following error in dmesg on my Dell Inspiron 5558.
Relevant parts of kernel log:

[    0.000000] Linux version 4.18.0-rc4-amd64 (debian-kernel@lists.debian.org) (gcc version 7.3.0 (Debian 7.3.0-25)) #1 SMP Debian 4.18~rc4-1~exp1 (2018-07-12)
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.18.0-rc4-amd64 root=/dev/mapper/rhovanion--vg-root ro pcie_port_pm=off pcie_aspm=force bbswitch.dyndbg quiet
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
--
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 3075436
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.18.0-rc4-amd64 root=/dev/mapper/rhovanion--vg-root ro pcie_port_pm=off pcie_aspm=force bbswitch.dyndbg quiet
[    0.000000] PCIe ASPM is forcibly enabled
[    0.000000] Calgary: detecting Calgary via BIOS EBDA area
--
[   75.261628] usbcore: registered new interface driver rndis_host
[   75.361158] rndis_host 2-3:1.0 enxac5043ccbff8: renamed from eth1
[   75.722701] bbswitch: loading out-of-tree module taints kernel.
[   75.723660] bbswitch: version 0.8
[   75.723669] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[   75.723680] bbswitch: Found discrete VGA device 0000:08:00.0: \_SB_.PCI0.RP05.PEGP
[   75.723696] ACPI Warning: \_SB.PCI0.RP05.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20180531/nsarguments-66)
[   75.723878] bbswitch: detected an Optimus _DSM function
[   75.723901] pci 0000:08:00.0: enabling device (0006 -> 0007)
[   75.724117] bbswitch: Succesfully loaded. Discrete card 0000:08:00.0 is on
[   75.725930] bbswitch: disabling discrete graphics
[   75.726497] bbswitch: Result of Optimus _DSM call: 11000059
[   76.149216] usb 2-6: New USB device found, idVendor=0cf3, idProduct=e005, bcdDevice= 0.02
[   76.149219] usb 2-6: New USB device strings: Mfr=0, Product=0, SerialNumber=0
--
[  139.157964] PKCS#7 signature not signed with a trusted key
[  149.270434] PKCS#7 signature not signed with a trusted key
[  152.936326] bbswitch: enabling discrete graphics
[  156.492705] nvidia: module license 'NVIDIA' taints kernel.
[  156.492707] Disabling lock debugging due to kernel taint
--
[  156.521864] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  390.77  Tue Jul 10 18:28:52 PDT 2018 (using threaded interrupts)
[  170.590012] nvidia-nvlink: Unregistered the Nvlink Core, major device number 242
[  170.669996] bbswitch: disabling discrete graphics
[  170.670633] bbswitch: Result of Optimus _DSM call: 11000059
[  170.687699] pci 0000:08:00.0: Refused to change power state, currently in D0

Do you need any additional info like DSDT?

@Lekensteyn
Copy link
Member

Do you observe any issues other than the message? Like battery drain or increased heat?

When is your system produced? cd /sys/class/dmi/id && grep . bios_*
If it is a recent one, the current "stable" bbswitch might not be the best solution.

@glunardi
Copy link

o/ Similar issue with Linux 4.18 and bbswitch here.

When is your system produced? cd /sys/class/dmi/id && grep . bios_*

bios_date:07/24/2018
bios_vendor:Dell Inc.
bios_version:1.11.0

If it is a recent one, the current "stable" bbswitch might not be the best solution.

This system is a Dell XPS 15 9560 from last year.

The powertop appears unreliable on 4.18 as well so I can not confirm whether the power drain is much higher but it seems to be the case. For sure as a PCI device, NVidia gpu no longer powers down with 4.18 when using bbswitch though.

Any ideas?

@Lekensteyn
Copy link
Member

Specifically for the XPS 9560 you can try to boot with the acpi_rev_override=1 kernel option to workaround the hang issue. This makes it possible to transition to a lower power state without breaking all kinds of stuff.

On the long term, I have been trying (again) this weekend to find a proper fix (using a XPS 9560 from my brother as test target), but there is not much progress unfortunately.

@glunardi
Copy link

Will try right away and report back. Thank you so much for working on this.

If we can help with any testing, do not hesitate to ask.

@glitzflitz
Copy link

glitzflitz commented Sep 4, 2018

Can you turn off gpu by manually removing nvidia modules by doing
sudo rmmod nvidia_drm && sudo rmmod nvidia_modeset && sudo rmmod nvidia && sudo tee /proc/acpi/bbswitch <<<OFF && cat /proc/acpi/bbswitch?
If that gives the error try setting this grub parameter while booting acpi_osi=Linux or acpi_osi='!Windows 2013' if first one doesnt work

@lumag
Copy link
Author

lumag commented Sep 5, 2018

Yes, that still gives an error. Strangely enough with latest nVidia driver 390.77 and Linux 4.17.17 I observe even less stable behaviour (often optirun won't work at all, with the following message in dmesg):

Sep  5 13:36:02 rhovanion kernel: [ 8515.376844] NVRM: RmInitAdapter failed! (0x25:0x40:1101)
Sep  5 13:36:02 rhovanion kernel: [ 8515.575817] NVRM: rm_init_adapter failed for device bearing minor number 0

@glitzflitz
Copy link

Currently on my arch machine I can't reach to graphical target(it hardlocks the kernel). I have to manually start x from multi-user target. Also lspci hardlocks the kernel. I tested another kernel without bbswitch which works fine.

@connormurray7
Copy link

I am not sure if it is related, but I am getting the same pci error message when trying to wake from suspend. I think it also might be related to bbswitch not turning off my graphics card when coming back from suspend either. Keeping the discrete graphics card on after suspend is not super great for battery life, so trying to figure out why that is happening.

This is a Dell XPS 15 9570 running manjaro, kernel 4.18

[    3.061683] EXT4-fs (nvme0n1p8): mounted filesystem with ordered data mode. Opts: (null)
[    3.074294] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[    3.102245] bbswitch: loading out-of-tree module taints kernel.
[    3.102457] bbswitch: version 0.8
[    3.102462] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[    3.102468] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.PEG0.PEGP
[    3.102478] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20180531/nsarguments-66)
[    3.102631] bbswitch: detected an Optimus _DSM function
[    3.102645] pci 0000:01:00.0: enabling device (0006 -> 0007)
[    3.102742] bbswitch: disabling discrete graphics
[    3.110770] random: crng init done
[    3.110771] random: 7 urandom warning(s) missed due to ratelimiting
[    3.139718] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[    3.139719] Bluetooth: BNEP filters: protocol multicast
[    3.139722] Bluetooth: BNEP socket layer initialized
[    3.253572] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input12
[    3.253713] ACPI: Video Device [PEGP] (multi-head: no  rom: yes  post: no)
[    3.253734] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0b/LNXVIDEO:01/input/input13
[    3.253911] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[    3.257772] fbcon: inteldrmfb (fb0) is primary device
[    3.407347] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is off
[    3.409019] iTCO_vendor_support: vendor-support=0
[    3.411236] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11
[    3.411257] iTCO_wdt iTCO_wdt: can't request region for resource [mem 0x00c5fffc-0x00c5ffff]
[    3.411260] iTCO_wdt: probe of iTCO_wdt failed with error -16
--
[   14.925196] ath: Country alpha2 being used: US
[   14.925197] ath: Regpair used: 0x3a
[   14.925199] ath: regdomain 0x8348 dynamically updated by country element
[   14.962084] IPv6: ADDRCONF(NETDEV_CHANGE): wlp59s0: link becomes ready
[   15.003692] wlp59s0: Limiting TX power to 27 (30 - 3) dBm ...
[  109.075891] wlp59s0: deauthenticating from ...
[  109.097480] IPv6: ADDRCONF(NETDEV_UP): wlp59s0: link is not ready
[  109.578895] ACPI: button: The lid device is not compliant to SW_LID.
[  109.749798] PM: suspend entry (s2idle)
[  109.749801] PM: Syncing filesystems ... done.
[  109.756030] bbswitch: enabling discrete graphics
[  109.804199] pci 0000:01:00.0: Refused to change power state, currently in D3
[  109.864793] pci 0000:01:00.0: Refused to change power state, currently in D3
[  109.864836] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  109.866242] OOM killer disabled.
[  109.866242] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  109.867331] Suspending console(s) (use no_console_suspend to debug)
[  111.513526] pci 0000:01:00.0: Refused to change power state, currently in D3
[  116.808926] pci 0000:01:00.0: Refused to change power state, currently in D3
[  119.014290] pci 0000:01:00.0: Refused to change power state, currently in D3

@zaro
Copy link

zaro commented May 11, 2019

Same here on my Xiaomi Notebook Pro with Fedora 30.

$ lspci |grep -E "VGA|3D"
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
01:00.0 3D controller: NVIDIA Corporation GP108M [GeForce MX150] (rev ff)

Everything installs correctly but when I try to run something with optirun I get:

$ optirun nvidia-settings
[  145.725551] [ERROR]Cannot access secondary GPU - error: Could not enable discrete graphics card

[  145.725590] [ERROR]Aborting because fallback start is disabled.

And in the syslog:

May 11 12:50:44 mi kernel: pci 0000:01:00.0: Refused to change power state, currently in D3
May 11 12:50:44 mi kernel: bbswitch: enabling discrete graphics
May 11 12:50:44 mi kernel: pci 0000:01:00.0: Refused to change power state, currently in D3
May 11 12:50:44 mi bumblebeed[957]: [  489.061622] [ERROR]Could not enable discrete graphics card

@ukos-git
Copy link

Same here on DELL Precision 5530 Laptop with Nvidia Quadro P1000 on debian (4.19.0-5-amd64)

package versions
apt-cache policy bumblebee-nvidia primus libgl1-nvidia-glx         
bumblebee-nvidia:
  Installed: 3.2.1-20
  Candidate: 3.2.1-20
  Version table:
 *** 3.2.1-20 500
        500 http://deb.debian.org/debian buster/contrib amd64 Packages
        100 /var/lib/dpkg/status
primus:
  Installed: 0~20150328-7
  Candidate: 0~20150328-7
  Version table:
 *** 0~20150328-7 500
        500 http://deb.debian.org/debian buster/main amd64 Packages
        100 /var/lib/dpkg/status
libgl1-nvidia-glx:
  Installed: 418.74-1
  Candidate: 418.74-1
  Version table:
 *** 418.74-1 500
        500 http://deb.debian.org/debian buster/non-free amd64 Packages
        100 /var/lib/dpkg/status

@WhyKickAmooCow
Copy link

WhyKickAmooCow commented May 31, 2019

Similar to @zaro.

Fedora 30. XPS 9570 with 1050ti.

Both VGA devices are present.

$lspci |grep -E "VGA|3D"
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile)
01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev ff)

And refuses to enable the discrete GPU.

$ optirun glxgears
[23566.444916] [ERROR]Cannot access secondary GPU - error: Could not enable discrete graphics card

[23566.445020] [ERROR]Aborting because fallback start is disabled.

and in dmesg:

[23694.694974] bbswitch: enabling discrete graphics
[23694.695016] pci 0000:01:00.0: Refused to change power state, currently in D3

It even refuses to be enabled manually.

# tee /proc/acpi/bbswitch <<<ON
ON
# cat /proc/acpi/bbswitch
0000:01:00.0 OFF

And this after bbswitch initially turned the card off at startup:

$dmesg | grep bbswitch
[    6.939186] bbswitch: loading out-of-tree module taints kernel.
[    6.939237] bbswitch: module verification failed: signature and/or required key missing - tainting kernel
[    6.939486] bbswitch: version 0.8
[    6.939491] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[    6.939499] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.PEG0.PEGP
[    6.939586] bbswitch: detected an Optimus _DSM function
[    6.939678] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on
[    6.945567] bbswitch: disabling discrete graphics

@ukos-git
Copy link

ukos-git commented Jun 1, 2019 via email

@WhyKickAmooCow
Copy link

Thanks for the explanation.

I have found that the nvidia-xrun utility works almost flawlessly. After following the instructions in the copr repo, and setting nvidia-drm.modeset=0 in /etc/default/grub I can power on the GPU as needed, and turn it off when I want power savings. Apparently the performance is also significantly better than with bumblebee, although some applications do not run very well by themselves and work much better inside a WM like openbox.

It might be worth a try as it is the best solution I have found so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants