Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Laptop freezes when starting X11 and discrete graphics are OFF #764

Open
jgkamat opened this issue May 6, 2016 · 375 comments
Open

Laptop freezes when starting X11 and discrete graphics are OFF #764

jgkamat opened this issue May 6, 2016 · 375 comments

Comments

@jgkamat
Copy link

jgkamat commented May 6, 2016

[edit by @Lekensteyn]
This issue affects newer laptops (from about 2015-2016) with Skylake and GTX 9xxM/10xx cards/
A workaround exists for some laptops, see #764 (comment)
[/edit]


I'm having a weird issue, and I'm not sure what kind of debug information is neccesary, but let me know what to give and I'll supply anything you need.

When I start my graphics (lxdm), I get a freeze (keyboard stops working, no response on monitor at all, even log files stop working), but I can work around this by enabling the graphics card before starting graphics.

System (installed with bumblebee-nvidia in debian testing repos):

Debian Testing
GTX 965M
Nvidia Proprietary Driver: 352.79 
Laptop: SAGER NP7258

Optirun --version:

optirun (Bumblebee) 3.2.1
Copyright (C) 2011 The Bumblebee Project
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

My laptop seems to not work without optimus, the intel drivers work fine, but trying to run w/o the intel drivers (nvidia only) seems to result in a frozen screen. Using the workaround works perfectly for me, however.

Steps to Reproduce:

  1. systemctl start bumblebeed
  2. systemctl start lxdm
  3. Freeze occurs

Workaround:

  1. systemctl start bumblebeed
  2. echo "ON" >/proc/acpi/bbswitch
  3. systemctl start lxdm

Unfortunately, any X11 log files don't seem to survive after my system freezes (they show everything completed successfully, probably from the previous successfull boot). If you know any way of retreiving them I'd be happy to supply them though! (When the system freezes, even my shell history file gets corrupted).

I did have to make some changes to my config files to get things to work in my situation though, I'll post anything I remember changing below. Let me know if you need any more information, I am happy to supply it! Without bumblebee, my laptop would be unusuable 👍

bumblebee.conf

# Configuration file for Bumblebee. Values should **not** be put between quotes

## Server options. Any change made in this section will need a server restart
# to take effect.
[bumblebeed]
# The secondary Xorg server DISPLAY number
VirtualDisplay=:8
# Should the unused Xorg server be kept running? Set this to true if waiting
# for X to be ready is too long and don't need power management at all.
KeepUnusedXServer=false
# The name of the Bumbleblee server group name (GID name)
ServerGroup=bumblebee
# Card power state at exit. Set to false if the card shoud be ON when Bumblebee
# server exits.
TurnCardOffAtExit=false
# The default behavior of '-f' option on optirun. If set to "true", '-f' will
# be ignored.
NoEcoModeOverride=false
# The Driver used by Bumblebee server. If this value is not set (or empty),
# auto-detection is performed. The available drivers are nvidia and nouveau
# (See also the driver-specific sections below)
Driver=nvidia
# Directory with a dummy config file to pass as a -configdir to secondary X
XorgConfDir=/etc/bumblebee/xorg.conf.d

## Client options. Will take effect on the next optirun executed.
[optirun]
# Acceleration/ rendering bridge, possible values are auto, virtualgl and
# primus.
Bridge=auto
# The method used for VirtualGL to transport frames between X servers.
# Possible values are proxy, jpeg, rgb, xv and yuv.
VGLTransport=proxy
# List of paths which are searched for the primus libGL.so.1 when using
# the primus bridge
PrimusLibraryPath=/usr/lib/x86_64-linux-gnu/primus:/usr/lib/i386-linux-gnu/primus:/usr/lib/primus:/usr/lib32/primus
# Should the program run under optirun even if Bumblebee server or nvidia card
# is not available?
AllowFallbackToIGC=false


# Driver-specific settings are grouped under [driver-NAME]. The sections are
# parsed if the Driver setting in [bumblebeed] is set to NAME (or if auto-
# detection resolves to NAME).
# PMMethod: method to use for saving power by disabling the nvidia card, valid
# values are: auto - automatically detect which PM method to use
#         bbswitch - new in BB 3, recommended if available
#       switcheroo - vga_switcheroo method, use at your own risk
#             none - disable PM completely
# https://github.com/Bumblebee-Project/Bumblebee/wiki/Comparison-of-PM-methods

## Section with nvidia driver specific options, only parsed if Driver=nvidia
[driver-nvidia]
# Module name to load, defaults to Driver if empty or unset
KernelDriver=nvidia-current
PMMethod=bbswitch
# colon-separated path to the nvidia libraries
LibraryPath=/usr/lib/x86_64-linux-gnu/nvidia:/usr/lib/i386-linux-gnu/nvidia:/usr/lib/nvidia
# comma-separated path of the directory containing nvidia_drv.so and the
# default Xorg modules path
XorgModulePath=/usr/lib/nvidia,/usr/lib/xorg/modules
XorgConfFile=/etc/bumblebee/xorg.conf.nvidia

## Section with nouveau driver specific options, only parsed if Driver=nouveau
[driver-nouveau]
KernelDriver=nouveau
PMMethod=auto
XorgConfFile=/etc/bumblebee/xorg.conf.nouveau

xorg.conf.nvidia

Section "ServerLayout"
Identifier  "Layout0"
Option      "AutoAddDevices" "false"
Option      "AutoAddGPU" "false"
EndSection

Section "Device"
Identifier  "DiscreteNvidiaj"
Driver      "nvidia"
VendorName  "NVIDIA Corporation"

#   If the X server does not automatically detect your VGA device,
#   you can manually set it here.
#   To get the BusID prop, run `lspci | egrep 'VGA|3D'` and input the data
#   as you see in the commented example.
#   This Setting may be needed in some platforms with more than one
#   nvidia card, which may confuse the proprietary driver (e.g.,
#   trying to take ownership of the wrong device). Also needed on Ubuntu 13.04.
BusID "PCI:01:00:0"

#   Setting ProbeAllGpus to false prevents the new proprietary driver
#   instance spawned to try to control the integrated graphics card,
#   which is already being managed outside bumblebee.
#   This option doesn't hurt and it is required on platforms running
#   more than one nvidia graphics card with the proprietary driver.
#   (E.g. Macbook Pro pre-2010 with nVidia 9400M + 9600M GT).
#   If this option is not set, the new Xorg may blacken the screen and
#   render it unusable (unless you have some way to run killall Xorg).
Option "ProbeAllGpus" "false"

Option "NoLogo" "true"
Option "UseEDID" "false"
Option "UseDisplayDevice" "none"
EndSection

# Section "Screen"
#     Identifier "Default Screen"
#   Device "DiscreteNvidia"
# EndSection

@bluca
Copy link
Member

bluca commented May 6, 2016

If you run:

sudo update-glx --config glx

What is the selected config? It should be /usr/lib/nvidia/bumblebee.

Does the same problem happen if you choose /usr/lib/mesa-diverted instead?

Finally, do you have another DE to try (Gnome would be best) to help narrow it down?

@jgkamat
Copy link
Author

jgkamat commented May 6, 2016

I've been using /usr/lib/nvidia/bumblebee so far, I tried out mesa-diverted and I have the same result. I've tried this with starting lxdm, manually runing startx to start xfce, and sddm (kde), and all have the same behavior. If you think gdm would help I'll try that out but I would rather not install all of gnome.

@bluca
Copy link
Member

bluca commented May 6, 2016

/usr/lib/nvidia/bumblebee is the right one (default) when having bumblebee, I wanted to see if removing all traces of nvidia from the path helped.

It is really strange that X is affected by bumblebee when not running through it. Can you get to another TTY when the screen is frozen?

Don't bother with GDM for now if it's a hassle, was just trying to narrow it down. I'll install xfce on my sid partition and see what happens.

@jgkamat
Copy link
Author

jgkamat commented May 6, 2016

I think this is an issue specific to my hardware setup (as descrete graphics cannot be forced on, optimus must be used). When I say 'the screen is frozen', the TTY I am in (I'm manually starting a display manager) stops responding (the cursor stops blinking). I can't switch to another TTY. Even the keyboard caps lock/numlock lights no longer change when I press them, and the SysReq keys no longer work either. The system has to be force powered off.

@jgkamat
Copy link
Author

jgkamat commented May 6, 2016

I just double checked, but ssh sessions freeze too when this occurs.

@bluca
Copy link
Member

bluca commented May 6, 2016

A kernel hard-lock then, that's a pain. Have you tried nouveau?

@karolherbst
Copy link

maybe nouveau is already loaded and causes tha hang because something doesn't work and Xorg freezes due to messed up modesetting DDX?

@bluca
Copy link
Member

bluca commented May 6, 2016

With the bumblebee-nvidia package nouveau is blacklisted, so it can't be loaded.

@karolherbst
Copy link

and I hope nvidia is also blacklisted, but Xorg freezes and that usually happens for a bad reason.

My guess is: X loads the nvidia DDX, which autoloads the nvidia kernel driver.

@bluca
Copy link
Member

bluca commented May 7, 2016

Yes, all the kernel modules are blacklisted. And the nvidia libraries are out of the path (hence my question earlier about update-alternatives).

@karolherbst
Copy link

I dealt with so many users where something was messed up, that I wouldn't rely on anything here. And that nvidia gets loaded also explains why turning the GPU off helps.

In fact for that the nvidia libraries doesn'T need to be in the Path, because the nvidia ddx already is enough and for that different paths are used.

Anyhow, without logs it will be painfull to debug this.

@jgkamat
Copy link
Author

jgkamat commented May 7, 2016

I've tried w/ nouveau and I still see the same issue (but with the workaround (which worked under nouveau) I started to see some weird behavior like some CPU cores sticking at 100%). Also when running optirun I got some permission denied errors with nouveau. I'm not sure if this will help though.

Just to clarify, simply turning the discrete video card ON with bbswitch before starting X11 fixes my issue (but it is a hassle to deal with every time). I'm not sure if there are any ways for me to get logs with this situation, but if there are let me know. When I run startx, the screen freezes before any errors come up, so I'm not sure if there is much I can do.

bumblebee blacklists all the nvidia/nouveau modules by default, and I have nvidia set under the bumblebee.conf, so I think nouvau isn't conflicting? If there is any way to test this I would be happy to do so!

@karolherbst
Copy link

well you don't use bumblee with nouveau, and that support should be removed in bumblebee

@karolherbst
Copy link

@jgkamat what really would help would be the dmesg output. Maybe you can do "dmesg -w" through ssh while you start X and see if you get enough useful output this way.

@bluca
Copy link
Member

bluca commented May 7, 2016

If dmesg can write it, so will journalctl. If you haven't, enable persistent journal (create /var/log/journal) and then after the freeze reboot and check the previous boot journal with journalctl -b -1

@karolherbst
Copy link

karolherbst commented May 7, 2016

@bluca His machine crashes completly. And on a crash usually error logs can't be written anymore, because the kernel stoped doing anything. Dmesg -w could help us because it immediatly displays messages (even before they get written to disc), but if the network dies too fast, he wouldn't either get this and need to setup netconsole, allthough this also requires a working network.

@jgkamat maybe you have something inside pstore (/sys/fs/pstore)

check here for pstore information:

https://lwn.net/Articles/434821/
https://www.kernel.org/doc/Documentation/ABI/testing/pstore
https://www.kernel.org/doc/Documentation/ramoops.txt

@jgkamat
Copy link
Author

jgkamat commented May 7, 2016

I tried setting up a netconsole (and dmesg -w over ssh) and that dosen't seem to give me any logs either before the freeze. I don't have anything currently inside pstore as far as I can tell. I'm starting to think that this is some sort of race condition where bumblebee tries to turn on the nvidia driver before X starts, but X manages to start before the nvidia card comes online, leading to a lockout (or maybe my hardware can't deal with xorg starting without the nvidia card being on). (running modprobe nvidia before X also makes X start properly, as it also forces the nvidia card on).

@karolherbst
Copy link

@jgkamat could you add a xorg.conf file in /etc/X11 with this content and start X while the gpu is off? https://gist.github.com/karolherbst/1f1bdd1a3822df74097f

and check if your nvidia card also has the 01:00.0 address in lspci. If this works, that means something is loaded which makes your kernel unhappy.

@jgkamat
Copy link
Author

jgkamat commented May 7, 2016

Unfortunately, I'm still seeing the same issue with this config. Just to be sure, I created a new xorg.conf file (as the docs say that none should be present) with that config. My Nvidia card is on that bus. Here's the ouptut of lspci. if that helps:

00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation Sky Lake PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation Skylake Integrated Graphics (rev 06)
00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-H Thermal subsystem (rev 31)
00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31)
00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA Controller [AHCI mode] (rev 31)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #3 (rev f1)
00:1c.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #4 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31)
00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31)
00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31)
01:00.0 VGA compatible controller: NVIDIA Corporation GM206M [GeForce GTX 965M] (rev a1)
02:00.0 Network controller: Intel Corporation Wireless 8260 (rev 3a)
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 5287 (rev 01)
03:00.1 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)

Should that file have gone in /etc/bumblebee/xorg.conf.d instead?

@Lekensteyn
Copy link
Member

I have a Clevo P650RA/P651RA (and also access to a Clevo P670RA/P671RA) which both have GTX 965M cards as well. This issue could be related to Bumblebee-Project/bbswitch#115

In my case an infinite loop would occur in ACPI. See Bumblebee-Project/bbswitch#115 (comment) for more details if you are interested.

@jgkamat
Copy link
Author

jgkamat commented May 11, 2016

I'm not seeing any issues with suspend to the best of my knowlege (the video card is off before/after a sleep, according to bbswitch, and that works fine for me). These issues could be related though.

I'm honestly pretty stoked at how well this performs (with this workaround in place). but I'm worried that a slight change could break it more. I'm happy to provide any more information if that would help!

EDIT: My laptop is a CLEVO N155RF (sager just rebrands them?)

@jkehler
Copy link

jkehler commented May 20, 2016

I've been having the exact same issue with my MSI GE62. If i start X11 with the 960M turned off it will do a hard lock. But if i turn it on first then start X11 it works fine.

I should also note that with Gnome GDM will start fine with the 960M turned off. But once I enter my password to log in to Gnome then it will do a hard lock. I presume this is because GDM is using Wayland?

@Warpgamer
Copy link

Warpgamer commented May 20, 2016

@jkehler : I'm having the exact same behavior with the same model, except I have a 970M
Created a script that executes after GDM login that starts bumblebee. However, when manually stopping bumblebee service, half of the time it'll totally freeze the system, like it does when GDM attempts to login with discrete card off.

@jkehler
Copy link

jkehler commented May 20, 2016

Actually I had just realized I had never actually tried starting Gnome with Wayland instead of X11 to see if it hard freezes. I just tried it now and when using Wayland it worked fine with the 960M turned off. So it definitely appears to just be an issue with X11.

@jgkamat
Copy link
Author

jgkamat commented May 20, 2016

I've had a couple random freezes too. Most of the time, they are triggered by some 'low level' operations, or things involving the graphics card (eg: starting steam, modprobes, even lspci once). This is usually accompanied by some audio garbling for some reason (before hard faulting). If I enable the descrete graphics card via bbswitch then I never have this issue, however.

This is my xorg version, if that helps. I've never tried out wayland, and I don't have the time to test this right now, but If I ever do, I'll post an update here. Isn't wayland supposed to illiminate the need for bumblebee? I'm still fuzzy on that topic though...

X.Org X Server 1.18.3
Release Date: 2016-04-04
X Protocol Version 11, Revision 0
Build Operating System: Linux 3.16.0-4-amd64 x86_64 Debian
Current Operating System: Linux laythe 4.5.0-2-amd64 #1 SMP Debian 4.5.3-2 (2016-05-08) x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.5.0-2-amd64 root=UUID=50a03efa-01f3-4e94-92a9-d4ad458845f0 ro acpi_enforce_resources=lax
Build Date: 05 April 2016  07:00:43AM
xorg-server 2:1.18.3-1 (http://www.debian.org/support) 
Current version of pixman: 0.33.6
    Before reporting problems, check http://wiki.x.org
    to make sure that you have the latest version.

@Lekensteyn
Copy link
Member

I think X is not much of an issue, but a trigger.

Can you switch to a TTY (Ctrl-Alt-F2), log in and try to power off/on the card manually using bbswitch? Repeat this twice to see if it makes a difference.

sudo tee /proc/acpi/bbswitch <<<OFF
sudo tee /proc/acpi/bbswitch <<<ON
sudo tee /proc/acpi/bbswitch <<<OFF
sudo tee /proc/acpi/bbswitch <<<ON

If that still does not hang, try this (exact output does not matter, only whether it hangs or not):

sudo lspci -vvvs 00:01:0
sudo lspci -vvvs 01:00:0

My guess is that trying to access some PCI configuration registers too fast results in failure. Why exactly this happens is something I have been trying for a week to figure out on a Clevo P651RA/GTX965M. Current key words: PCIe link training failure.

@Warpgamer
Copy link

Hello @Lekensteyn
switching gpus manually causes no issue.
Both commands below do not hang either, though second one produces no output at all.

However, I've found that if I disable discrete card at boot with bbswitch, the system won't properly boot; on loading gnome, in freezes; visual artifacts in the console may appear at freeze instant, and nothing but power button answers. All this while being on the integrated intel card.

Warp

@jkehler
Copy link

jkehler commented Jun 11, 2016

@Lekensteyn I finally got around to trying what you had suggested above. Switching to a TTY and repeatedly turning the GPU on and off did not result in any sort of hard lock for me.

But when I ran your second set of commands the first one outputted the following.

01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev ff) (prog-if ff)
    !!! Unknown header type 7f
    Kernel modules: nouveau, nvidia_drm, nvidia

The 2nd command didn't output anything. But then I ran the first command a 2nd time and it resulted in a hard-lock for me.

@withersky
Copy link

withersky commented Oct 12, 2019

How to do it. I'm just new to this business. Even with the correct spelling everything hangs

p.s. я ес чо русский. вообще не понимаю почему фризит на окне входа. хотелось бы сделать

@Vlad1mir-D
Copy link

Vlad1mir-D commented Oct 13, 2019

я ес чо русский. вообще не понимаю

Pretty obvious.

@grzegorzk
Copy link

grzegorzk commented Nov 14, 2019

Same issue here - Lenovo ThinkPad P53 with NVIDIA Quadro RTX 3000.

I tried both acpi_osi="!Windows 2015" as well as acpi_osi=! acpi_osi="!Windows 2009". System is freezing at startup of kde.

@karolherbst
Copy link

anybody up for trying this patch? https://lists.freedesktop.org/archives/dri-devel/2019-October/240521.html

In case it doesn't help, I'd like to see your "lspci -nn" and lspci -tv" output

@grzegorzk
Copy link

Hi @karolherbst , I'll check if your patch already made it to archlinux, meantime this is the output of commands you asked for:

# lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers [8086:3ec4] (rev 0d)
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 0d)
00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 630 (Mobile) [8086:3e9b] (rev 02)
00:04.0 Signal processing controller [1180]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903] (rev 0d)
00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model [8086:1911]
00:12.0 Signal processing controller [1180]: Intel Corporation Cannon Lake PCH Thermal Controller [8086:a379] (rev 10)
00:14.0 USB controller [0c03]: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller [8086:a36d] (rev 10)
00:14.2 RAM memory [0500]: Intel Corporation Cannon Lake PCH Shared SRAM [8086:a36f] (rev 10)
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 [8086:a368] (rev 10)
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 [8086:a369] (rev 10)
00:16.0 Communication controller [0780]: Intel Corporation Cannon Lake PCH HECI Controller [8086:a360] (rev 10)
00:1b.0 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 [8086:a340] (rev f0)
00:1c.0 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 [8086:a338] (rev f0)
00:1c.5 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 [8086:a33d] (rev f0)
00:1c.7 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 [8086:a33f] (rev f0)
00:1d.0 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 [8086:a330] (rev f0)
00:1e.0 Communication controller [0780]: Intel Corporation Device [8086:a328] (rev 10)
00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:a30e] (rev 10)
00:1f.3 Audio device [0403]: Intel Corporation Cannon Lake PCH cAVS [8086:a348] (rev 10)
00:1f.4 SMBus [0c05]: Intel Corporation Cannon Lake PCH SMBus Controller [8086:a323] (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller [8086:a324] (rev 10)
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (7) I219-LM [8086:15bb] (rev 10)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU106GLM [Quadro RTX 3000 Mobile / Max-Q] [10de:1f36] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation TU106 High Definition Audio Controller [10de:10f9] (rev a1)
01:00.2 USB controller [0c03]: NVIDIA Corporation TU106 USB 3.1 Host Controller [10de:1ada] (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C Port Policy Controller [10de:1adb] (rev a1)
02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
04:00.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] [8086:15ea] (rev 06)
05:00.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] [8086:15ea] (rev 06)
05:01.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] [8086:15ea] (rev 06)
05:02.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] [8086:15ea] (rev 06)
05:04.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] [8086:15ea] (rev 06)
06:00.0 System peripheral [0880]: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] [8086:15eb] (rev 06)
2c:00.0 USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [8086:15ec] (rev 06)
2d:00.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] [8086:15ef] (rev 06)
2e:02.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] [8086:15ef] (rev 06)
2e:04.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] [8086:15ef] (rev 06)
2f:00.0 USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge DD 2018] [8086:15f0] (rev 06)
52:00.0 Network controller [0280]: Intel Corporation Wi-Fi 6 AX200 [8086:2723] (rev 1a)
54:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a] (rev 01)
55:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
# lspci -tv
-[0000:00]-+-00.0  Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers
+-01.0-[01]--+-00.0  NVIDIA Corporation TU106GLM [Quadro RTX 3000 Mobile / Max-Q]
|            +-00.1  NVIDIA Corporation TU106 High Definition Audio Controller
|            +-00.2  NVIDIA Corporation TU106 USB 3.1 Host Controller
|            \-00.3  NVIDIA Corporation TU106 USB Type-C Port Policy Controller
+-02.0  Intel Corporation UHD Graphics 630 (Mobile)
+-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
+-08.0  Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model
+-12.0  Intel Corporation Cannon Lake PCH Thermal Controller
+-14.0  Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller
+-14.2  Intel Corporation Cannon Lake PCH Shared SRAM
+-15.0  Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0
+-15.1  Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1
+-16.0  Intel Corporation Cannon Lake PCH HECI Controller
+-1b.0-[02]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
+-1c.0-[04-51]----00.0-[05-51]--+-00.0-[06]----00.0  Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018]
|                               +-01.0-[07-2b]--
|                               +-02.0-[2c]----00.0  Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018]
|                               \-04.0-[2d-51]----00.0-[2e-51]--+-02.0-[2f]----00.0  Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge DD 2018]
|                                                               \-04.0-[30-51]--
+-1c.5-[52]----00.0  Intel Corporation Wi-Fi 6 AX200
+-1c.7-[54]----00.0  Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader
+-1d.0-[55]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
+-1e.0  Intel Corporation Device a328
+-1f.0  Intel Corporation Device a30e
+-1f.3  Intel Corporation Cannon Lake PCH cAVS
+-1f.4  Intel Corporation Cannon Lake PCH SMBus Controller
+-1f.5  Intel Corporation Cannon Lake PCH SPI Controller
\-1f.6  Intel Corporation Ethernet Connection (7) I219-LM

@karolherbst
Copy link

karolherbst commented Nov 18, 2019

Hi @karolherbst , I'll check if your patch already made it to archlinux, meantime this is the output of commands you asked for:

yeah.. that sounds like a system which is affected according to my current theory. What laptop is that?

@grzegorzk
Copy link

grzegorzk commented Nov 19, 2019

What laptop is that?

I'm running on Lenovo ThinkPad P53, I slowly start regretting this choice...

@Askannz
Copy link

Askannz commented Nov 19, 2019

@karolherbst I have a laptop with Kaby Lake + Pascal so I'd be up to test that patch. But what bug does it fix exactly ? Do you have reproduction steps ?

@karolherbst
Copy link

@karolherbst I have a laptop with Kaby Lake + Pascal so I'd be up to test that patch. But what bug does it fix exactly ? Do you have reproduction steps ?

it fixes D3cold with the Intel 0x1901 pcie bridge controller, so that the GPU can be powered on again.

@NekoiNemo
Copy link

Seems to be the same issue here: laptop freezes on powering down for suspend and waking up from it. Sometimes it unfreezes after 3-4 minutes, but most of the time it's permanent.

MSI gl65 9sdk
CPU: 9750H (coffee lake)
GPU: 1660 Ti (turing)
driver: 440xx

@blackmennewstyle
Copy link

Welcome to a reality which has been a nightmare for many since a while sadly, Intel + NVidia is really not fun sometimes when using GNU/Linux.
Problem is these companies who own billions of dollars are not really willing to improve things despite the fact that GNU/Linux enthusiastics don't borrow their hardwares, they do have to buy them. But still, they get very little attention and efforts from these companies as a real market place.

@ArtyomFR
Copy link

@karolherbst I've read your patch and it sound like my system is affected by the D3 error.
My laptop is a CLEVO P960RN.
The question is is how could we apply your patch on a distribution like Arch? Can we do it manually or can we only wait?

@karolherbst
Copy link

@karolherbst I've read your patch and it sound like my system is affected by the D3 error.
My laptop is a CLEVO P960RN.
The question is is how could we apply your patch on a distribution like Arch? Can we do it manually or can we only wait?

I wouldn't include this fix unless someone can be sure it doesn't break anything, there are some upstream discussions going on. And maybe we get something merged for 5.5.

In the end it's up to you to make a discussion on that though.

@ArtyomFR
Copy link

@karolherbst So it's part of the kernel, wich mean we have to make a custom one to try the patch?
What do you fear about this patch? Can it make irreversible physical damage?
If not i'm whilling to try it. I can deploy a Arch on a SD card without touching my actual system.

@Leo1003
Copy link

Leo1003 commented Nov 27, 2019

@karolherbst
I read the comments in the mailing list, and I want to provide some informations (I don't know how to make a comment in the mailing list).
As mentioned in it, there are duplicate _OFF and _ON acpi methods in two different SSDT tables, I guessed that Windows would execute both of them. While Linux will report an ACPI Error when parsing the second method.
Therefore, I make an ACPI override on my laptop which creating an merged version of the _ON and _OFF methods.
The result is that my laptop never freeze again! (I currently not using the dirty hack I provided in this thread)

However, I don't have another laptop to test if this works on others.

@ArtyomFR
Copy link

@Leo1003 Do you think your fix can be easily reproduce on other laptop?
If you don't use the kernel patch it could be a simplified fix to apply.
Do you think you could write some instructions for others to achieve?

@notthebee
Copy link

notthebee commented Nov 27, 2019

My XPS 15 9570 freezes completely after starting X.Org (even in a LiveCD environment). Tried all the different distros, Ubuntu, Fedora, Arch, even GParted LiveCD, all exhibit the same behaviour. Would this patch still be relevant in my case since my laptop is not Skylake-based? (it's 8th gen)
acpi_osi=! acpi_osi="Windows 2009" helps but breaks the touchscreen and touchpad becomes less responsive.

@Leo1003
Copy link

Leo1003 commented Nov 27, 2019

@ArtyomFR

I don't have time to make a detail tutorial recently.
If someone knows how to modify the ACPI code, here are a simple example.

HP Pavilion Gaming Laptop 15-cx0xxx
OS: Arch Linux 5.3.13 x86_64
CPU: Intel i7-8750H (Coffee Lake)
Grapic: NVIDIA GTX 1050 Mobile

1st duplicate path

\_SB.PCI0.PEG0.PEGP._OFF
\_SB.PCI0.PEG0.PEGP._ON
Sample AML codes


2nd duplicate path

\_SB.PCI0.PEG0.PEGP._OFF
\_SB.PCI0.PEG0.PEGP._ON
Sample AML codes

P.s. Is this some OEM patches???


The two same path ACPI method are located in two different SSDT tables. Try to merge them into one ACPI method.
Merged AML codes example

The AML codes are different on each laptop, make sure you use the codes extracted from your own laptop. Also, it may changed after BIOS update, be sure to check it!


Update [11/28 03:11 GMT]: Correct the path of the duplicate methods

[off-topic]: The pm of the card is invalid since a Linux kernel 5.x bug to keep the NVIDIA audio controller powered on. I am very frustrated...

@notthebee
Copy link

Will try to inject this SSDT in OpenCore and boot to Linux

@x-qq
Copy link

x-qq commented Nov 30, 2019

I have had problems with getting Bumblebee to work on Asus GL753VE related to system fan becoming out of control and requiring a full system poweroff after bbswitch powers down the GPU ( same problem as @cdbrendel ).

Now after many failures and research I have a solution with which I am able to get bumblebee & primusrun to work as usual/as expected on this hardware using Debian Unstable amd64, so I will document it here in case anyone needs it.

Solving the issue consisted of understanding and overcoming 4 sub-problems:

  1. bbswitch no longer works with newer kernels
  2. whatever bbswitch does ends up causing some sort of undefind behavior in Asus ACPI/BIOS
  3. primus bridge no longer works with modern nvidia drivers (at least since 418)
  4. primus bridge does not seem to work with glvnd nvidia libs

So to get bumblebee to work:

  1. all use of bbswitch has to be disabled. This is done in bumblebee.conf by setting all instances of PMMethod to none and using AlwaysUnloadKernelDriver=true
  2. primus needs older nvidia drivers. 390xx branch seems to work.
  3. nonglvnd nvidia libs should be installed.

Additionally to prevent conflicts the nouveau xorg driver was uninstalled and the nouveau kernel module blacklisted.

My full working configs:

/etc/X11/xorg.conf.d/20-intel-gpu.conf

Section "Device"
  Identifier  "Intel HD 630"
  Driver      "intel"

  Option      "AccelMethod" "SNA"
  Option      "TearFree"    "true"
EndSection

/etc/bumblebee/bumblebee.conf (the kernel driver name may be debian-specific, find /lib/ -iname '*nvidia*' helps)

# Configuration file for Bumblebee. Values should **not** be put between quotes
 
## Server options. Any change made in this section will need a server restart
# to take effect.
[bumblebeed]
# The secondary Xorg server DISPLAY number
VirtualDisplay=:8
# Should the unused Xorg server be kept running? Set this to true if waiting
# for X to be ready is too long and don't need power management at all.
KeepUnusedXServer=false
# The name of the Bumbleblee server group name (GID name)
ServerGroup=bumblebee
# Card power state at exit. Set to false if the card shoud be ON when Bumblebee
# server exits.
TurnCardOffAtExit=false
# The default behavior of '-f' option on optirun. If set to "true", '-f' will
# be ignored.
NoEcoModeOverride=false
# The Driver used by Bumblebee server. If this value is not set (or empty),
# auto-detection is performed. The available drivers are nvidia and nouveau
# (See also the driver-specific sections below)
Driver=nvidia
# Directory with a dummy config file to pass as a -configdir to secondary X
XorgConfDir=/etc/bumblebee/xorg.conf.d
# Xorg binary to run
XorgBinary=/usr/lib/xorg/Xorg

## Client options. Will take effect on the next optirun executed.
[optirun]
# Acceleration/ rendering bridge, possible values are auto, virtualgl and
# primus.
Bridge=primus
# The method used for VirtualGL to transport frames between X servers.
# Possible values are proxy, jpeg, rgb, xv and yuv.
VGLTransport=proxy
# List of paths which are searched for the primus libGL.so.1 when using
# the primus bridge
PrimusLibraryPath=/usr/lib/x86_64-linux-gnu/primus:/usr/lib/i386-linux-gnu/primus:/usr/lib/primus:/usr/lib32/primus
# Should the program run under optirun even if Bumblebee server or nvidia card
# is not available?
AllowFallbackToIGC=false


# Driver-specific settings are grouped under [driver-NAME]. The sections are
# parsed if the Driver setting in [bumblebeed] is set to NAME (or if auto-
# detection resolves to NAME).
# PMMethod: method to use for saving power by disabling the nvidia card, valid
# values are: auto - automatically detect which PM method to use
#         bbswitch - new in BB 3, recommended if available
#       switcheroo - vga_switcheroo method, use at your own risk
#             none - disable PM completely
# https://github.com/Bumblebee-Project/Bumblebee/wiki/Comparison-of-PM-methods

## Section with nvidia driver specific options, only parsed if Driver=nvidia
[driver-nvidia]
# Module name to load, defaults to Driver if empty or unset
KernelDriver=nvidia-legacy-390xx
PMMethod=none
# colon-separated path to the nvidia libraries
LibraryPath=/usr/lib/x86_64-linux-gnu/nvidia:/usr/lib/i386-linux-gnu/nvidia:/usr/lib/nvidia
# comma-separated path of the directory containing nvidia_drv.so and the
# default Xorg modules path
XorgModulePath=/usr/lib/nvidia/nvidia,/usr/lib/xorg/modules
XorgConfFile=/etc/bumblebee/xorg.conf.nvidia
# If set to true, will always unload the kernel module(s) even with
# PMMethod=none - useful for newer Optimus models on which the kernel power
# management works out of the box to power the card on/off without bbswitch.
AlwaysUnloadKernelDriver=true

## Section with nouveau driver specific options, only parsed if Driver=nouveau
[driver-nouveau]
KernelDriver=nouveau
PMMethod=none
XorgConfFile=/etc/bumblebee/xorg.conf.nouveau

update-alternatives --config glx

There are 3 choices for the alternative glx (providing /usr/lib/glx).

  Selection    Path                       Priority   Status
------------------------------------------------------------
  0            /usr/lib/nvidia             100       auto mode
* 1            /usr/lib/mesa-diverted      5         manual mode
  2            /usr/lib/nvidia             100       manual mode
  3            /usr/lib/nvidia/bumblebee   95        manual mode

relevant packages:

ii  bumblebee                                                        3.2.1-20
ii  glx-alternative-nvidia                                           1.1.0
ii  libegl-nvidia-legacy-390xx0:amd64                                390.132-1
ii  libegl-nvidia-legacy-390xx0:i386                                 390.132-1
ii  libegl1-nvidia-legacy-390xx:amd64                                390.132-1
ii  libegl1-nvidia-legacy-390xx:i386                                 390.132-1
ii  libgl1-nvidia-legacy-390xx-glx:amd64                             390.132-1
ii  libgl1-nvidia-legacy-390xx-glx:i386                              390.132-1
ii  libgles-nvidia-legacy-390xx1:amd64                               390.132-1
ii  libgles-nvidia-legacy-390xx1:i386                                390.132-1
ii  libgles-nvidia-legacy-390xx2:amd64                               390.132-1
ii  libgles-nvidia-legacy-390xx2:i386                                390.132-1
ii  libglx-nvidia-legacy-390xx0:amd64                                390.132-1
ii  libglx-nvidia-legacy-390xx0:i386                                 390.132-1
ii  libnvidia-eglcore:i386                                           430.64-1
ii  libnvidia-legacy-390xx-cfg1:amd64                                390.132-1
ii  libnvidia-legacy-390xx-cfg1:i386                                 390.132-1
ii  libnvidia-legacy-390xx-eglcore:amd64                             390.132-1
ii  libnvidia-legacy-390xx-eglcore:i386                              390.132-1
ii  libnvidia-legacy-390xx-glcore:amd64                              390.132-1
ii  libnvidia-legacy-390xx-glcore:i386                               390.132-1
ii  libnvidia-legacy-390xx-ml1:amd64                                 390.132-1
ii  linux-image-amd64                                                5.3.9-3
ii  nvidia-installer-cleanup                                         20151021+10
rc  nvidia-kernel-4.2.0-1-amd64                                      340.96+1+2+4.2.6-1
rc  nvidia-kernel-4.3.0-1-amd64                                      352.79+1+1+4.3.3-7
ii  nvidia-kernel-common                                             20151021+10
ii  nvidia-legacy-390xx-alternative                                  390.132-1
ii  nvidia-legacy-390xx-driver-libs-nonglvnd:amd64                   390.132-1
ii  nvidia-legacy-390xx-driver-libs-nonglvnd:i386                    390.132-1
ii  nvidia-legacy-390xx-driver-libs-nonglvnd-i386:i386               390.132-1
ii  nvidia-legacy-390xx-kernel-dkms                                  390.132-1
ii  nvidia-legacy-390xx-kernel-support                               390.132-1
ii  nvidia-legacy-390xx-nonglvnd-vulkan-icd:amd64                    390.132-1
ii  nvidia-legacy-390xx-nonglvnd-vulkan-icd:i386                     390.132-1
ii  nvidia-legacy-390xx-vdpau-driver:amd64                           390.132-1
ii  nvidia-legacy-check                                              430.64-1
ii  nvidia-modprobe                                                  430.50-1
ii  nvidia-nonglvnd-vulkan-common                                    430.64-1
ii  nvidia-persistenced                                              430.64-1
ii  nvidia-settings-legacy-390xx                                     390.116-1
ii  nvidia-support                                                   20151021+10
ii  primus                                                           0~20150328-9
ii  primus-libs:amd64                                                0~20150328-9
ii  primus-libs:i386                                                 0~20150328-9
ii  primus-libs-ia32:i386                                            0~20150328-9
ii  xserver-xorg-video-nvidia-legacy-390xx                           390.132-1

Result:

% glxinfo | grep vendor 
server glx vendor string: SGI
client glx vendor string: Mesa Project and SGI
OpenGL vendor string: Intel Open Source Technology Center

% primusrun glxinfo | grep vendor 
/usr/bin/primusrun: line 41: warning: command substitution: ignored null byte in input
server glx vendor string: NVIDIA Corporation
client glx vendor string: primus
OpenGL vendor string: NVIDIA Corporation

% 

No kernel boot options required, no problems with touchpad/freezing/etc. And most importantly no uncontrollable fan problems. I am going to stick with this setup until xorg with patches to make nvidia's official offloading work is released ( as described in https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/primerenderoffload.html ).

UPDATE 2020-05-14
Xorg 1.20.6+ now supports bumblebee-less per-process GPU switching.
The mechanism is described in https://download.nvidia.com/XFree86/Linux-x86_64/440.82/README/primerenderoffload.html . This setup seems to work.
However, this mechanism depends on 'modesetting' xorg video driver, which does not have TearFree support like 'intel' driver has, and this can result in screen tearing / jitter effect under some window managers/desktop envs.
More details at https://gitlab.freedesktop.org/xorg/xserver/issues/244
An attempt to address this issue https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/24

@kmare
Copy link

kmare commented Nov 30, 2019

@x-qq Thank you for sharing your experience and solution (it doesn't affect me as I have a different setup, but it should help others). Just wanted to add that the nvidia's xorg patches are actually in the already released xorg 1.20.6 version. This is the current version for Fedora 31 (before that, Fedora actually added these patches in the default build, so it was working anyway). I have no clue about Debian though.

@ArtyomFR
Copy link

@karolherbst I've applied your patch on a custom lts kernel according to the official documentation and booted on it.
This a fresh new dual boot install and I only try to boot with the card OFF and put it ON with bbswitch. The system boot normaly with bbswitch load and the nvidia card OFF by adding acpi_osi=! acpi_osi='Windows 2009' to the startup parameters.
But, the card refuse to power on with echo "ON" >/proc/acpi/bbswitch.
Here is the dmesg return:

[  279.018859] bbswitch: enabling discrete graphics
[  309.123995] ACPI Error: Method parse/execution failed \_SB.PCI0.PGON, AE_AML_LOOP_TIMEOUT (20180810/psparse-514)
[  309.124285] ACPI Error: Method parse/execution failed \_SB.PCI0.PEG0.PEGP._ON, AE_AML_LOOP_TIMEOUT (20180810/psparse-514)
[  309.124585] ACPI Error: Method parse/execution failed \_SB.PCI0.PEG0.PEGP._PS0, AE_AML_LOOP_TIMEOUT (20180810/psparse-514)
[  309.124799] video LNXVIDEO:00: Failed to change power state to D0
[  309.124803] pci 0000:01:00.0: Refused to change power state, currently in D3
[  309.226065] pci 0000:01:00.0: Refused to change power state, currently in D3

lspci -nn:

00:00.0 Host bridge [0600]: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers [8086:3ec4] (rev 07)
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 630 (Mobile) [8086:3e9b]
00:04.0 Signal processing controller [1180]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903] (rev 07)
00:12.0 Signal processing controller [1180]: Intel Corporation Cannon Lake PCH Thermal Controller [8086:a379] (rev 10)
00:14.0 USB controller [0c03]: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller [8086:a36d] (rev 10)
00:14.2 RAM memory [0500]: Intel Corporation Cannon Lake PCH Shared SRAM [8086:a36f] (rev 10)
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 [8086:a368] (rev 10)
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 [8086:a369] (rev 10)
00:16.0 Communication controller [0780]: Intel Corporation Cannon Lake PCH HECI Controller [8086:a360] (rev 10)
00:17.0 SATA controller [0106]: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller [8086:a353] (rev 10)
00:1b.0 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #21 [8086:a32c] (rev f0)
00:1d.0 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 [8086:a330] (rev f0)
00:1d.5 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #14 [8086:a335] (rev f0)
00:1d.6 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #15 [8086:a336] (rev f0)
00:1d.7 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #16 [8086:a337] (rev f0)
00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:a30d] (rev 10)
00:1f.3 Audio device [0403]: Intel Corporation Cannon Lake PCH cAVS [8086:a348] (rev 10)
00:1f.4 SMBus [0c05]: Intel Corporation Cannon Lake PCH SMBus Controller [8086:a323] (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller [8086:a324] (rev 10)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104M [GeForce RTX 2080 Mobile] [10de:1e90] (rev ff)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f8] (rev ff)
01:00.2 USB controller [0c03]: NVIDIA Corporation TU104 USB 3.1 Host Controller [10de:1ad8] (rev ff)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller [10de:1ad9] (rev ff)
07:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Black 2018/PC SN720 NVMe SSD [15b7:5002]
08:00.0 Network controller [0280]: Intel Corporation Wireless-AC 9260 [8086:2526] (rev 29)
09:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
0a:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a] (rev 01)

lspci -tv:

-[0000:00]-+-00.0  Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers
           +-01.0-[01-05]--+-00.0  NVIDIA Corporation TU104M [GeForce RTX 2080 Mobile]
           |               +-00.1  NVIDIA Corporation Device 10f8
           |               +-00.2  NVIDIA Corporation TU104 USB 3.1 Host Controller
           |               \-00.3  NVIDIA Corporation TU104 USB Type-C UCSI Controller
           +-02.0  Intel Corporation UHD Graphics 630 (Mobile)
           +-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
           +-12.0  Intel Corporation Cannon Lake PCH Thermal Controller
           +-14.0  Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller
           +-14.2  Intel Corporation Cannon Lake PCH Shared SRAM
           +-15.0  Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0
           +-15.1  Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1
           +-16.0  Intel Corporation Cannon Lake PCH HECI Controller
           +-17.0  Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller
           +-1b.0-[06]--
           +-1d.0-[07]----00.0  Sandisk Corp WD Black 2018/PC SN720 NVMe SSD
           +-1d.5-[08]----00.0  Intel Corporation Wireless-AC 9260
           +-1d.6-[09]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           +-1d.7-[0a]----00.0  Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader
           +-1f.0  Intel Corporation Device a30d
           +-1f.3  Intel Corporation Cannon Lake PCH cAVS
           +-1f.4  Intel Corporation Cannon Lake PCH SMBus Controller
           \-1f.5  Intel Corporation Cannon Lake PCH SPI Controller

Many things could have gone wrong:
1 - Maybe I've installed the patch incorrectly, how to check it?
2 - Maybe the patch doesn't support my laptop, can i do anything about it?
3 - Maybe the problem is elsewhere?

I can give you any other output if needed.

@gavingc
Copy link

gavingc commented Feb 26, 2020

Hi All,

Looks like we can add:
HP Spectre X360 (gem cut) i7-8750H UHD Graphics 630 (Mobile) + Nvidia GP107M [GeForce GTC 1050 Ti Mobile]
to the list with no acpi_osi workaround available.

cd /sys/class/dmi/id && grep . bios_*
bios_date:11/27/2019
bios_vendor:AMI
bios_version:F.40

I have tried every combination in #764 (comment) and it looks like any kind of PCI scan like lspci or even logging out of KDE or an SDDM shutdown (after first login) results in a hard freeze when the card is off.

I'm able to boot Debian Buster, load and unload all kernel modules, turn off/on Nvidia GPU and start and stop all processes sucessfully with this work around:
#1036

But the card must be ON to shutdown or run lspci (more than once).

Interestingly the default bumblee-nvidia install in Debian Buster pulls in nvidia-persistenced so bumblebee is not able to unload drivers or turn card off, regardless off any bumblebee config. It turns out that this is not such a bad setup since nvidia-persistenced drops the card to it's lowest power mode (after a short time) and at a tty the laptop is averaging around 7.7 W compared to 6.5 W with everything unloaded and card off. Plus this setup is safer since the card is on and anything triggering a pci scan will not result in a freeze for this laptop. I'll write more on this else where when I've done some more testing.

@robertjk
Copy link

robertjk commented Mar 4, 2020

Adding acpi_osi=! acpi_osi="Windows 2009" to kernel parameters helped me to boot Hybrid Graphics on ThinkPad P52 (with NVIDIA Quadro P2000 GPU).

@rico-chet
Copy link

Adding acpi_osi=! acpi_osi="Windows 2009" to kernel parameters helped me to boot Hybrid Graphics on HP ZBook 15 G5 (with NVIDIA Quadro P2000 Mobile GPU).

@rhysperry111
Copy link

Adding acpi_osi=! acpi_osi="Windows 2009" to kernel parameters helped me to boot Hybrid Graphics on HP Pavillion 15-cx0598na (with NVIDIA 1050TI).

It has created another problem though. I can no longer turn off the card. There are no errors in dmesg

@Nek-12
Copy link

Nek-12 commented Apr 11, 2020

After installing optimus-manager the ASUS x560-ud laptop with Nvidia GTX 1050 freezed on every shutdown and after getting past the login screen.
The following optimus-manager config was used:

[optimus]
switching=nouveau
pci_power_control=yes
pci_remove=no
pci_reset=no

None of the solutions posted above helped.
Did not try switching to bbswitch and acpi_call modes (because according to the link
they use the same options. If someone tries please post the results.)
However, setting the parameter:
switching = none
solved my problem. I don't know yet how to test for sure, but I suspect my Nvidia card is not being suspended, therefore no battery gain. Also, the powertop utility report some kind of 100% for my Nvidia device.

P.S. In windows, a while ago after buying this device I had noticed that battery report utility issues warnings about the graphics card not supporting so-called "Link state power management".
P.P.S I am sorry for my English.

Update 02 Aug 2020:
After a complete reinstall I managed to make optimus-manager disable the video card. Using the config specified above and setting pcie_aspm=force as the linux kernel parameter (none of the parameters mentioned above were used) solved the issue. Powertop now reports 0% energy consumption for the video adapter. It seems that on my particular laptop model the firmware is buggy. I haven't tried options other than noveau.

@nearwood
Copy link

nearwood commented Oct 11, 2020

I have a ThinkPad P52 with the NVIDIA Quadro P2000 GPU.

I tried acpi_osi=! acpi_osi="Windows 2009" as the other P52 person mentioned. No dice. But my issue is different.

Power control of the dGPU works fine. I can use nvidia-xrun. But when using xfce's display settings it locks up on any change (rearranging displays, disabling displays, mirror, etc.). Otherwise it works fine.

This happens with hybrid mode, or dedicated mode in the BIOS.

EDIT: After further investigation, this doesn't look like it's a bumblebee issue. More like Nvidia driver.

@dav2017
Copy link

dav2017 commented Sep 22, 2023

On Lenovo ThinkPad P1 I had no problems switching to nVidia until recently. I solved it with: acpi_osi="!Windows 2015"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests