-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bbswitch is broken with kernel 4.8 pcie port power management #140
Comments
bbswitch has indeed not been updated for the new PM method in kernel 4.8. If you have a newer machine (>= 2015), you might experience issues if you enabled runtime PM for devices. Do you happen to have udev rules or other "laptop mode tools" that enable power saving features (i.e. by writing As a workaround you can boot with the |
I am using TLP and Powertop, but bbswitch still doesn't work with those disabled. The strange thing is that bbswitch seems to think the NVIDIA card is stuck in D0 power state on startup, but then is unable to start it upon invocation of primusrun, and reports that the card is stuck in D3.
And on invocation of primusrun, I get this:
Is it possible that the kernel-based port power management is able to control the power state, but bbswitch is not? This would make sense because on pre-4.8 kernel versions there is no kernel-based PCIe port power management, and the card seems to always be stuck in D0 (see dmesg output in my first post), and bbswitch has no problems this way. It is when the card is successfully put into D3 that bbswitch is unable to use it. |
Have you rebooted after making disabling TLP? The PCIe port mgmt introduced with 4.8 cannot be combined with bbswitch in one boot, that case is not supported (this may or may not work, no guarantees). Have you tried the kernel option which I mentioned above? What is your laptop and GPU btw? If you see the messages "bbswitch: enabling discrete graphics" followed by "Refused to change power state, currently in D3" (or similarly, "disabling" and "currently in D0"), then it is an indication that something went wrong... |
Yes I rebooted after disabling tlp and powertop. As I said I above, when I boot with the kernel option you mention, primusrun works again, but bbswitch reports that it cannot change the gpu power state out of D0 when primusrun stops running. But that happened with earlier kernels as well. And yes, something is clearly going wrong. But maybe this has actually been a problem all along, but is only now showing itself because the kernel is putting the dGPU out of D0. |
Odd, I have exactly the same laptop and I'm using tlp and powertop, but I don't get this problem until after a suspend/resume cycle. Or is this issue fixed in bbswitch 0.8.4ubuntu1? |
You're sure you have the exact same model, and that you're running Kernel 4.8? It's possible you have a different BIOS than me (If you don't know, Dell has been rapidly pushing out BIOS updates to try to fix an alarming number of issues. Many of the updates have made things worse, so I'm currently on an older BIOS.) It's also possible that the issue is fixed in bbswitch 0.8.4ubuntu1. Maybe I'll try Ubuntu and see if that works better. |
Yes, it's the same model, with the same GPUs. It has the 4K screen and I'm running the 1.2.0 BIOS (I tried 1.2.14, but it has a screen flickering problem which makes it unusable.) It's possible I disabled tlp and powertop power management and forgot, of course. |
I'm in the exact same situation as you, on 1.2.0 (Seriously, Dell needs to get their act together!) |
I'm running the mainline 4.8 kernel also (with the patch from https://bugs.freedesktop.org/show_bug.cgi?id=97596 to avoid a weird flickering artefact that occurs on Skylake architecture with 4.8 if you have a second monitor attached). |
Ok, I'll probably try Ubuntu with the mainline kernel at some point to see if that fixes the issue. Until then, should this issue be closed? |
@rockorequin Perhaps you are using nouveau instead of bbswitch? Personally I am back to nouveau since my new laptop requires it for an external monitor. |
I actually did try it with Ubuntu 16.10 with Kernel 4.8, and it is fixed. There must be something internal to Manjaro that is screwing it up. |
I’m reopening this, because even if it seems to work (i.e. it reports OFF), the power consumption and temperature correspond to the case of a ON card on my setup. Adding When using nouveau, temperature and power consumption also correspond to a ON card. |
The result of combining the DSM method (as used by bbswitch) with the new power resources method (as used since Linux 4.8 and nouveau) in a single boot is not known (I would call it undefined behavior). Forcing How do you observe that the video card is off with nouveau? You have to check your |
I do get those lines at the end:
That being said, I probably need to do some more investigations (power consumption with |
Also, @Lekensteyn, grabbed this at some point, if I remember correctly it was while running a boot without
Will try to reproduce, but I think this is likely caused by bbswitch/pcie_port_pm interaction on 4.8 for newer systems. |
For your last issue, if you have some udev rule enabling runtime PM for devices (e.g. "laptop mode tools") then indeed it will upset bbswitch on the new behavior (4.8 without |
I should point out that if you're using nouveau (rather than nvidia), you actually don't need bumblebee or bbswitch- you can just use DRI_PRIME=1 before the app you want to run with the discrete gpu. See https://wiki.archlinux.org/index.php/PRIME |
@nathanielwarner If you’re telling that to me, I assure you that I know. ;) But that’s not really related to the current issue. @Lekensteyn OK, I’ve got tlp installed (and running) on the same system, I’ll also try with or without it to see what it gives. So that’s one more factor to try. Should have time tomorrow to look at all that. :) On a side note, do you still intend to update bbswitch for supporting this new method any time soon? Maybe we should release Bumblebee 4.0 without waiting much further and add a release note about bbswitch state (interaction with 4.8 pcie_port_pm, open/known issues). ;) |
I was hoping to get this fixed before Bumblebee 4, but it seems things are really stalling, so maybe it is better to release it since it at least improves the nvidia driver situation. Release note with known issues should be ok :) |
+1 for a new release, debian 9 deadlines are approaching fast :-) |
OK, I’ll go through all open issues soon (help appreciated) and will try to release by the end of the week. Stay tuned. If there is any need for discussion, Bumblebee-Project/Bumblebee#319 is the place to go now. ;) |
I see that in your bumblebee 4 issue, you are delaying its release for another few weeks. So I need to get this to work even temporarily. If I understand right, I need to add "pcie_port_pm=off" in the grub configuration as kernel parameter, and the drawback is that I am constantly running on nvidia card right? Thanks in advance. |
@GreatBigWhiteWorld If you use nouveau (and not bbswitch nor the nvidia proprietary driver), then you do not have to do anything. |
Thanks. Yes I am running 4.8 kernel and bbswitch is always off at the moment. I guess I need this option. |
I have same like that problem and the problem solved with this parameter pcie_port_pm=off Beside Laptop is Dell 7559 and OpenSUSE-Thumbleweed Dmesg output is down
|
I have Dell XPS 15 9550 with 960M too. I am added
|
Bump:
The issue was solved via changing |
@Zeben The whole point of disabling pm on AC is that you get rid of those (possibly) tens/hundreds of milliseconds waits for devices to power up. Try doing lspci on battery: it's not instanstaneous but takes almost a second. Try plugging in headphones: there's a somewhat annoying click when the soundcard powers down. On the other hand, with pm enabled, you'll hear your fan a lot less often. It's your decision to make. :-) |
So, after more experiments I've got some conclusions. All works without problems with three types of configurations.
Many thanks for @liskin for suggestions and tips. Maybe our conversation will be helpful for those who have same issues. Waiting for complete implementation of dynamic switchable graphics, out-the-box, without |
Hm, for me removing However, the problem with 4.16.13 went away in a later kernel version (actually already some weeks ago, I just forgot to report it here). So for me, |
@real-or-random I've combined two technologies to make using swichable graphics possible: runtime PM for all devices (by keeping |
I tried that but it didn't work. But I'm not convinced that the blacklisting in tlp worked because powertop still showed that PM enabled on for the NVIDIA card. Is that the right place to check? (Where can I check manually?) |
@liskin I have a Dell XPS 15 9570, I have reached the same point as you. I'm using runtime PM without bbswitch, because using bbswitch (normal or pm-rework branch) both result in a dGPU you cannot power back on. Optirun / Primusrun both work, they load the nvidia module, but unloading afterwards does not work. I tried normal bumblebee and bumblebee-git with development branch with libkmod2. So I have to remove with with modprobe -r. You said something about making a bug report about it, but I can't find anything. Did I miss it? or were you still planning to make it? I guess we need a new PMMethod=modules_only, that only unloads the modules? You seem to know what the issue is, I'm rather new laptop with nvidia / bumblebee. @Lekensteyn We talked on IRC briefly about your pm-rework branch. I thought it was working, since compared to the normal bbswitch, it turned the dGPU on / off. However I could not load the nvidia module due to:
This probably has to do with the torvalds/linux@abf92f8 commit you already pointed out before. As you suggested I switched to using runtime-pm. (where Bumblebee is not unloading the nvidia module, as I wrote at the start of this post). But I figured I followed you up on my attempt to use pm-rework. |
Which distro are you using? I gave up and used Arch and that's working pretty well. |
I have Arch on the 9570. Seems it works differently then your 9560, since I followed the archwiki 9560 page at first, but it simply did not work. (See my/our struggle in Dell XPS 15 9570 - bbswitch not working, Nvdia card won't power off/on). However, with runtime-pm, turning dGPU on with bumblebee works, this happens just by loading the nvidia module. Unloading the nvidia module lets the dGPU to go into suspend. I'm using TLP right now for that. However, bumblebee doesn't unload the modules after it is used, so I have to do that manually. I figure I can make a wrapper script, that calls a wrapped "modprobe -r nvidia" script that I can then allow to be used with sudo without password. Seems to me a good compromise between security and usability. But, I figure I'm not the only one with this problem and it could be better implemented in Bumblebee. Now, I know people think Bumblebee is "feature complete", but I guess this actually is a case where a new PMMethod based on just unloading the nvidia modules could be handy. |
Bumblebee is not feature complete. There are still improvements to be made w.r.t. modules and PM handling, as well as Vulkan/VDPAU support missing (but being worked on). |
I was mostly referring to the sentiment in the bumblebee issue "Is this project dead?". Some called it dead, some called it complete. But yeah, I have been following the Vulkan discussion, since I use DXVK myself. Now, you say "but being worked on", was that just referring to Vulkan/VDPAU? Or is there work under way for PM handling just by unloading modules after use? |
I was referring to Vulkan/VDPAU. Actually, reworking PM/modules should be easier, but requires some time available. |
@IngeniousDox You didn't miss anything, I'm just too busy/lazy. :-) |
I have 1 xiaomi mi book 13.3 2017 and im having the same issue with |
Hi, I'm having this issue as well on Arch Linux. I'm using a Dell G5 15 5587 laptop, with Nvidia GTX 1050 Ti 4GB. I'm using kernel version 4.17.11-arch1 . I have installed bumblebee, bbswitch(and bbswitch-dkms), intel drivers, nvidia drivers, and primus. When trying to run primus I get this error:
At exactly the same time that this happens, I see these lines in the debug logging of bumblebeed:
Looking further in dmesg, I see these lines:
I have tlp installed, and I have added the nvidia drivers as well as the pci id to its blacklist. I have also added the kernel parameter pcie_port_pm=off and the results are the same as above. There are no nouveau drivers installed, so there is not conflict there. Additionally, when trying to manually enable the card using this command:
and then cat-ing the file, I receive this output
Currently this is the only thread that I've found where people are actively working on this issue. I'd like to hear what you guys think the issue is and help come up with a solution to this problem. |
@KonnorTimmons1297 Can you try the bbswitch-less approach as well? I have no idea why bbswitch doesn't work for you as you seem to have done all the right steps, but spending more time making it work is a bit pointless these days. :-) |
@liskin Alright, are you referring to using the nouveau drivers and using the PRIME method to switch between intel graphics and the dedicated graphics card? I thought that nouveau driver was causing problems with an optimus system. If this is what you are talking about, then I just want to clarify that I have the right idea on what to do next. I need to uninstall the nvidia driver, bbswitch, bumblebee, and primus. Then install the nouveau drivers and xrandr(or something like that)? |
@KonnorTimmons1297 No, I'm not referring to that. I'm referring to the fact modern kernels on modern systems will power the card off as soon as you unload the nvidia module. Just drop bbswitch and give it a try. And read the comments above you if you run into problems. (Which you will.) |
@liskin I have opened an issue with a Feature Request. I think I covered everything, but if you could look at it and see if I missed something, that would be nice. @KonnorTimmons1297 This is the Arch thread where we were hunting for solutions aswell: https://bbs.archlinux.org/viewtopic.php?pid=1800742. In the end, we use ended up using Bumblebee with PMMethod=none and something like tlp to put the pci bus + gpu to runtime suspend when it isn't used. There are some issues still, which is why I opened a feature request. And perhaps some other kinks you need to be aware off, but they are discussed in that forum thread. |
Alright, this stuff is still kind of new to me. I'm not entirely sure how to do this, however I do understand what it means to load a kernel module. Looking at the ArchWiki I get the sense that this is what needs to be done, correct me if I'm wrong. Uninstall bbswitch, run this command |
@IngeniousDox Thanks! |
Alright, I uninstalled bbswitch and tried manually loading the nvidia kernel module. After that, I was able to successfully run I'd like to completely disable the dGPU while it is not in use so that I extend my battery life. I realize that that's what bbswitch originally intended to do, but because of this power management bug, it is unable to automate the process of turning the card on and off. I have tried removing the kernel module by running
Is there a way that I can manually enable and disable the kernel module, as well as cutting the power off the card itself? I have tlp installed and I thought that if I remove the card, and the driver from it's blacklist that it would take care of that for me, but that isn't the case because the card is still 'active' according to /sys/module/nvidia/drivers/pci:nvidia/0000:01:00.0/power/runtime_status. Do you guys know what can be done to shut the card down at all? |
You absolutely need to unload the module. If it says it's in use, it's in use, and bbswitch wouldn't have helped you either. Something is keeping the card not just powered on, but in use. Try |
Oh, and it could also be that another module is using it. In my case, I need to |
Even if you unload the other modules in the correct order, it could be that the nvida module is kept busy by X. For sddm I had to make a special xorg.conf that didn't use modesetting (See Arch forum thread). I used xf86-video-intel for the intel gpu, and dummy driver for the nvidia gpu. That allowed me to load / unload the modules without issue. However it seems that doesn't work for gdm/gnome-shell. michelesr outlined his method for that in the Arch forum thread. Anyways, like liskin said, you can use |
I ran `lsof /dev/nvidiactl' and found that module was being used by Xorg, and that explains why I am able to unload it. I was able to 'disable' the card, according to bbswitch, by using the command After adding the xorg.conf and rebooting the battery drain seems to have stabilized and is not draining as fast. |
I have the same laptop as you with a 1060. Running Manjaro with 4.20 kernel and I still have bbswitch not working. I have to go into bumblebee.conf and set PMMethod to none, then reboot, to get the dGPU working. Otherwise I run into the same D3 stuck state issue. |
Same here with an alienware m15 (rtx 2060... so no nouveau anytime in the next few years). Disabling PM in bbswitch allows optirun to work... Otherwise, I get the same D3 stuck state issue. |
I just upgraded to kernel 4.8, and bbswitch 0.8-1 no longer works properly. When I try to run something with primusrun, it fails with "bumblebee could not enable discrete graphics card" or something, and I get this in dmesg:
When I use the kernel command line option
pcie_port_pm=off
primusrun works again, and I get this in dmesg upon using primusrun:Is lack of support for Kernel 4.8 default configuration an issue that anyone else is having? I'm running Manjaro with Kernel 4.8.1-1, Nvidia driver 370.28, bbswitch 0.8-1.
The text was updated successfully, but these errors were encountered: