Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Will the removal of the nvidia devices from the PCIe bus work for others? #157

Closed
klmcwhirter opened this issue Mar 7, 2024 · 2 comments
Labels
question Further information is requested

Comments

@klmcwhirter
Copy link
Contributor

klmcwhirter commented Mar 7, 2024

Does the approach described below work for most people (different nvidia H/W, linux distros, etc.) ?

I encountered a technique to remove the NVIDIA devices from the PCIe bus so that the nvidia hardware can be switched off with the goal of improving battery life.

I have put together a POC and am working with @bayasdev on potentially including this in envycontrol upon switch to integrated mode.

The approach I have found has these benefits.

  • Easily turns off the dGPU by placing rules in a single file - /etc/tmpfiles.d/nvidia_no_gpu.conf
  • does not need the nvidia drivers installed - again, my use case assumes only integrated or hybrid mode is needed (think HDMI connection)
  • the dGPU can easily be brought back at runtime, in user space without a reboot!
    • then to go back to the dGPU turned off - simply reboot to reapply the rules. In theory, they can be re-applied without reboot, but when I tried that my system locked up hard. It is more reliable to just reboot.

Prerequisites

I reference the blog post behind the first URL from the POC README above.

  1. JS mentions an absolutely bare minimum Linux kernel version is 5.12 - but recommends 5.14 or 5.15. I suspect it may have to do with the version of the intel/nouveau drivers and the tmpfiles.d support. Although, he doesn't really mention specifically why 5.12 is recommended.

He just says at the end of that paragraph that:

I'm personally using Fedora, but everything listed here should work with any distribution running a modern kernel and systemd.

  1. obviously, the bus ids will need to be tweaked for your machine. I have tried to generalize that process at https://github.com/klmcwhirter/nvidia-more-battery/blob/master/nvidia_more_battery/services/tmpfiles.py#L69-L75. But I only have the one system to test with.

I should also mention that I have an Acer Nitro 5 with an RTX 3050 Ti.

nitro5-neofetch.png

$ lspci | grep NVIDIA
0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA107M [GeForce RTX 3050 Ti Mobile] (rev a1)
0000:01:00.1 Audio device: NVIDIA Corporation Device 2291 (rev a1)

My /etc/tmpfiles.d/nvidia_no_gpu.conf file.

$ cat /etc/tmpfiles.d/nvidia_no_gpu.conf 
d /run/no-nvidia 0755 1000 1000
f /run/no-nvidia/in-effect 0444 1000 1000 - 1
w /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/remove - - - - 1
w /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/remove - - - - 1

I'll be happy to answer any questions you may have about using the POC code over in the https://github.com/klmcwhirter/nvidia-more-battery repo.

Please comment with your feedback to help us decide whether this can or even should be included in envycontrol directly, or simply added to the documentation as another option.

Thanks for the help.

@Miaua
Copy link

Miaua commented May 4, 2024

I'm using EnvyControl, enabled integrated GPU, but the lspci command can still wake disabled devices up.

I'm viewing my devices from powertop, before and after i run lspci command.
After i run "sudo lspci -m -k", the Host bridge Device 14e8 gets replaced with PCI bridge Device 14ed.
The power draw is permanently tripled forever stuck.
Lenovo Legion Slim 5 16" Gen 8, 7840HS, 780M, RTX-4060, Fedora 40 KDE.

Before i run "sudo lspci -m -k"
The battery reports a discharge rate of 7.25 W
The energy consumed was 152 J

          Usage     Device name
          2,2%        CPU misc
          2,2%        CPU core

        00:00.0 "Host bridge" "Advanced Micro Devices, Inc. [AMD]" "Device 14e8" -p00 "Lenovo" "Device 3802"

        PCI Device: Advanced Micro Devices, Inc. [AMD] Device 14ed

After i did run "sudo lspci -m -k" The battery reports a discharge rate of 20.5 W
The energy consumed was 419 J

          Usage     Device name
         11,4%        CPU misc
         11,4%        CPU core

        00:01.1 "PCI bridge" "Advanced Micro Devices, Inc. [AMD]" "Device 14ed" -p00 "Advanced Micro Devices, Inc. [AMD]" "Device 1453"

        00:02.3 "PCI bridge" "Advanced Micro Devices, Inc. [AMD]" "Device 14ed" -p00 "Advanced Micro Devices, Inc. [AMD]" "Device 1453"

@klmcwhirter
Copy link
Contributor Author

This approach only seems to work with a specific set of hardware / software and is not a general solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants