Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia-powerd.service randomly jumping to 100% CPU usage #432

Closed
ghost opened this issue Dec 29, 2022 · 29 comments
Closed

nvidia-powerd.service randomly jumping to 100% CPU usage #432

ghost opened this issue Dec 29, 2022 · 29 comments
Assignees
Labels
bug Something isn't working NV-Triaged An NVBug has been created for dev to investigate

Comments

@ghost
Copy link

ghost commented Dec 29, 2022

NVIDIA Open GPU Kernel Modules Version

525.60.11

Does this happen with the proprietary driver (of the same version) as well?

Yes

Operating System and Version

Pop!_OS 22.04 LTS

Kernel Release

Linux pop-os 6.0.12-76060006-generic #202212290932167165296522.04~452ea9d SMP PREEMPT_DYNAMIC Wed D x86_64 x86_64 x86_64 GNU/Linux

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 3050 Laptop GPU (UUID: GPU-b2f720a1-2b4e-9b17-4383-76f9361248a2)

Describe the bug

nvidia-powerd processes using 100% cpu usahe

This only happens while using the Nvidia Optimus mode (meaning everything is rendered on the Nvidia GPU)
This does not happen on hybrid mode (nvidia on demand) but on that mode i get severe screen tearing issues so it is not usable

To Reproduce

Randomly happens after i quit a game on Lutris or while/after quiting a Youtube Video on firefox
The only way to stop it is to stop the service

The bug always happens around 30 mins - 2 hours of firefox youtube watching

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

Screenshot from 2022-12-27 18-08-07

@ghost ghost added the bug Something isn't working label Dec 29, 2022
@Matthew-Beckett
Copy link

I am also seeing this on the following platform:

Version: 525.60.11
OS: Fedora release 36 (Thirty Six)
Kernel: Linux fedora 6.0.15-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Dec 21 18:46:09 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
GPU: GPU 0: NVIDIA GeForce RTX 3050 Ti Laptop GPU (UUID: GPU-8f10cc8e-a57c-0e12-9f84-3ae222d31f53)

@atauln
Copy link

atauln commented Jan 2, 2023

I am also seeing this:

Version: 525.60.11
OS: Arch Linux
Kernel: Linux ROGPad 6.0.12-arch1-g14-2 #1 SMP PREEMPT_DYNAMIC Sat, 17 Dec 2022 13:52:21 +0000 x86_64 GNU/Linux
GPU: GPU 0: NVIDIA GeForce RTX 3050 Ti Laptop GPU (UUID: GPU-a17019e1-c96d-234e-52ff-813939d31896)

@weter11
Copy link

weter11 commented Jan 2, 2023

This problem also affect fully proprietary driver and is not exclusively limited to 525.xx drivers, but also affected 520.xx, 510.xx, Intel and AMD processors, and afterall exist since NVIDIA introduce nvidia-powerd.service in february 2022. Also it can happen any time, but mostly on Javascript heavy websites much faster, maybe some buffer overflow. And bug affected only 1 CPU thread (core). At this moment as a temporary solution I create a script that reload service

#!/bin/bash
sudo systemctl stop nvidia-powerd.service
sudo systemctl start nvidia-powerd.service

After reload service working as usual until next overflow. Also on demand have a problem, that performance worse (8-10%) and I don't want to see 140W power consumption and performance on par with 115W, which can be reached without powerd.service at default 115W for my GPU (3070 mobile edition).

@Matthew-Beckett
Copy link

Matthew-Beckett commented Jan 2, 2023

This problem also affect fully proprietary driver and is not exclusively limited to 525.xx drivers, but also affected 520.xx, 510.xx, Intel and AMD processors, and afterall exist since NVIDIA introduce nvidia-powerd.service in february 2022. Also it can happen any time, but mostly on Javascript heavy websites much faster, maybe some buffer overflow. And bug affected only 1 CPU thread (core). At this moment as a temporary solution I create a script that reload service


#!/bin/bash

sudo systemctl stop nvidia-powerd.service

sudo systemctl start nvidia-powerd.service

After reload service working as usual until next overflow. Also on demand have a problem, that performance worse (8-10%) and I don't want to see 140W power consumption and performance on par with 115W, which can be reached without powerd.service at default 115W for my GPU (3070 mobile edition).

I'm specifically encountering this playing RuneScape and am also having the same power limiting issues when using powerd.

I have found a workaround though, it's called Radeon. /s

In all seriousness though, my next GPU will be Matrox before it's Nvidia.

@atauln
Copy link

atauln commented Jan 2, 2023

Instead of creating a script, it is useful to note that nvidia-powerd is responsible for NVIDIA Dynamic Boost, which is only used when plugged in to AC power. It's likely far simpler to just unplug and replug your device, and is what I typically do.

Also, this issue has caused moments where I don't notice that I have one core at 100% non stop, causing my cpu to overheat and then force shutdown the pc. I have taken the risk of running sudo systemctl mask nvidia-powerd, in hopes that the service is not critical to anything else. I will inform if that causes issues down the line, but it has been very helpful for now.

@jrouquie
Copy link

This thread
https://forums.developer.nvidia.com/t/510-39-01-beta-driver-nvidia-powerd-causing-high-system-load/200839
considers nvidia-powerd "currently probably only for vendor-testing so it shouldn’t be installed" and "not meant for general distribution"
and suggests to disable it:

sudo systemctl stop nvidia-powerd
sudo systemctl disable nvidia-powerd

@atauln
Copy link

atauln commented Jan 12, 2023

Unfortunately disabling it does not work, as it re-enables itself via a dependent systemd process. You have to:
sudo systemctl mask nvidia-powerd
This will link it to /dev/null, preventing it from running.

@initramfs
Copy link

@jrouquie The documentation at https://download.nvidia.com/XFree86/Linux-x86_64/525.78.01/README/dynamicboost.html makes no mention of the fact that nvidia-powerd service should be for vendor-only testing. If anything, it gives pretty precise software and hardware requirements for it's usage, suggesting that it's a service meant for general availability (of course, providing the documentation for the drivers are to be trusted).

Of course, at the time the forum posts were written (Jan 2022 to May 2022), the service could have been in beta.

@xkill
Copy link

xkill commented Jan 16, 2023

Same problem here.

Ubuntu 22.04, tried with normal kernel, OEM kernel and custom kernel. Seems not a problem with the kernel.
Using nvidia package: nvidia-compute-utils-525 (525.60.13-0ubuntu1)

@iovcho
Copy link

iovcho commented Jan 18, 2023

I have the same problem:

OS: Fedora 36
Installed nvidia packages:
nvidia-gpu-firmware-20221109-144.fc36.noarch
kmod-nvidia-6.0.9-200.fc36.x86_64-520.56.06-1.fc36.x86_64
xorg-x11-drv-nvidia-cuda-libs-525.60.11-1.fc36.x86_64
xorg-x11-drv-nvidia-kmodsrc-525.60.11-1.fc36.x86_64
xorg-x11-drv-nvidia-libs-525.60.11-1.fc36.i686
xorg-x11-drv-nvidia-libs-525.60.11-1.fc36.x86_64
akmod-nvidia-525.60.11-1.fc36.x86_64
xorg-x11-drv-nvidia-power-525.60.11-1.fc36.x86_64
xorg-x11-drv-nvidia-525.60.11-1.fc36.x86_64
nvidia-settings-525.60.11-1.fc36.x86_64
kmod-nvidia-6.0.12-200.fc36.x86_64-525.60.11-1.fc36.x86_64
kmod-nvidia-6.0.15-200.fc36.x86_64-525.60.11-1.fc36.x86_64

Kernel:
root@Vivobook /# uname -a
Linux Vivobook 6.0.15-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Dec 21 18:46:09 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

VideoCards:
root@Vivobook # lspci |grep -E "VGA|3D"
0000:00:02.0 VGA compatible controller: Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics] (rev 01)
0000:01:00.0 3D controller: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] (rev a1)

@retrixe
Copy link

retrixe commented Jan 19, 2023

Without nvidia-powerd, I consistently find graphics corruption when resuming from suspend on Fedora 37 + GNOME Wayland. Afaik, the docs make no mention of nvidia-powerd being used in suspend logic, but this does seem to be the case (suspend works reliably when the daemon is enabled), so this bug is particularly annoying.

The bug happens randomly with me after running something, e.g. a game, on the NVIDIA GPU for some indeterminate amount of time.

@aaronp24
Copy link
Member

Do you have vidmem preservation enabled? Corruption after resume is likely a symptom of having that disabled. I'm not sure why having nvidia-powerd enabled would fix that, though.

@kyrbrbik
Copy link

I've made a quick workaround until this bug gets fixed.

@retrixe
Copy link

retrixe commented Jan 21, 2023

Do you have vidmem preservation enabled? Corruption after resume is likely a symptom of having that disabled. I'm not sure why having nvidia-powerd enabled would fix that, though.

I just retested, and it seems nvidia-powerd is having no effect on the corruption issue. However, now corruption after suspend seems to happen consistently. I have both NVreg_TemporaryFilePath=/var/tmp and NVreg_PreserveVideoMemoryAllocations=1 set, nvidia-suspend/resume/hibernate are enabled as well and seem to be executing without any error, and the issue occurs with both s2idle and deep sleep. For reference, I've been testing with the Steam app running on my NVIDIA GeForce GTX 1650 GPU in my laptop, and with the proprietary kernel driver.

@gustawdaniel
Copy link

I also had this issue.

logs from systemctl

lut 07 15:28:47 fedora systemd[1]: Starting nvidia-powerd.service - nvidia-powerd service...
lut 07 15:28:47 fedora /usr/bin/nvidia-powerd[1056]: nvidia-powerd version:1.0(build 1)
lut 07 15:28:49 fedora systemd[1]: Started nvidia-powerd.service - nvidia-powerd service.
lut 07 15:28:49 fedora /usr/bin/nvidia-powerd[1056]: Dbus Connection is established
lut 07 17:30:56 fedora systemd[1]: Stopping nvidia-powerd.service - nvidia-powerd service...
lut 07 17:30:56 fedora /usr/bin/nvidia-powerd[1056]: Could not set max CPU limit: current CPU limit (Khz): 4600000
lut 07 17:30:56 fedora /usr/bin/nvidia-powerd[1056]: Quit successfully
lut 07 17:30:56 fedora systemd[1]: nvidia-powerd.service: Deactivated successfully.
lut 07 17:30:56 fedora systemd[1]: Stopped nvidia-powerd.service - nvidia-powerd service.
lut 07 17:30:56 fedora systemd[1]: nvidia-powerd.service: Consumed 1h 7min 18.759s CPU time.

hardware

lspci |grep -E "VGA|3D"
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
01:00.0 VGA compatible controller: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1)

packages

dnf list installed | grep nvidia
akmod-nvidia.x86_64                                  3:525.85.05-1.fc37                  @rpmfusion-nonfree-nvidia-driver                        
kmod-nvidia-6.1.6-200.fc37.x86_64.x86_64             3:525.85.05-1.fc37                  @@commandline                                           
kmod-nvidia-6.1.7-603.inttf.fc37.x86_64.x86_64       3:525.78.01-1.fc37                  @@commandline                                           
kmod-nvidia-6.1.7-603.inttf.fc37.x86_64.x86_64       3:525.85.05-1.fc37                  @@commandline                                           
kmod-nvidia-6.1.8-603.inttf.fc37.x86_64.x86_64       3:525.85.05-1.fc37                  @@commandline                                           
nvidia-gpu-firmware.noarch                           20230117-146.fc37                   @updates                                                
nvidia-persistenced.x86_64                           3:525.85.05-1.fc37                  @rpmfusion-nonfree-nvidia-driver                        
nvidia-settings.x86_64                               3:525.85.05-1.fc37                  @rpmfusion-nonfree-nvidia-driver                        
xorg-x11-drv-nvidia.x86_64                           3:525.85.05-1.fc37                  @rpmfusion-nonfree-nvidia-driver                        
xorg-x11-drv-nvidia-cuda.x86_64                      3:525.85.05-1.fc37                  @rpmfusion-nonfree-nvidia-driver                        
xorg-x11-drv-nvidia-cuda-libs.i686                   3:525.85.05-1.fc37                  @rpmfusion-nonfree-nvidia-driver                        
xorg-x11-drv-nvidia-cuda-libs.x86_64                 3:525.85.05-1.fc37                  @rpmfusion-nonfree-nvidia-driver                        
xorg-x11-drv-nvidia-kmodsrc.x86_64                   3:525.85.05-1.fc37                  @rpmfusion-nonfree-nvidia-driver                        
xorg-x11-drv-nvidia-libs.i686                        3:525.85.05-1.fc37                  @rpmfusion-nonfree-nvidia-driver                        
xorg-x11-drv-nvidia-libs.x86_64                      3:525.85.05-1.fc37                  @rpmfusion-nonfree-nvidia-driver                        
xorg-x11-drv-nvidia-power.x86_64                     3:525.85.05-1.fc37                  @rpmfusion-nonfree-nvidia-driver   

kernel

uname -a
Linux fedora 6.1.8-603.inttf.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jan 26 16:25:51 EET 2023 x86_64 x86_64 x86_64 GNU/Linux

The worse was, when it happens in hibernation mode when my laptop was in backpack and I lost him for few hours because of overheating.

@1475015695
Copy link

same issue

@amrit1711 amrit1711 self-assigned this Feb 23, 2023
@amrit1711 amrit1711 added the NV-Triaged An NVBug has been created for dev to investigate label Feb 23, 2023
@amrit1711
Copy link
Collaborator

We have filed a bug 3934310 internally for tracking purpose.
We have a local repro and issue has been root caused.
Once driver is released publicly with the fix, I will update.

@o-lenczyk
Copy link

I am also affected by this issue

@ahmetsaglam
Copy link

ahmetsaglam commented Feb 28, 2023

same issue here. once unplugged, it stops. It started to happen after I (think) removed CUDA 12.0 and installed 11.7 with ZED camera SDK. It looks like I have two versions of CUDA. I hope this helps solve the bug.

Here is the output from nvidia-smi:

Tue Feb 28 08:18:39 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P8    20W / 125W |    772MiB / 16384MiB |     24%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1852      G   /usr/lib/xorg/Xorg                102MiB |
|    0   N/A  N/A      3170      G   /usr/lib/xorg/Xorg                350MiB |
|    0   N/A  N/A      3525      G   /usr/bin/gnome-shell              110MiB |
|    0   N/A  N/A      4015      G   /usr/lib/firefox/firefox          175MiB |
|    0   N/A  N/A     86505      G   ...veSuggestionsOnlyOnDemand       19MiB |
+-----------------------------------------------------------------------------+

Here is the output from cat /usr/local/cuda/version.json:
{ "cuda" : { "name" : "CUDA SDK", "version" : "11.7.1" }, "cuda_cccl" : { "name" : "CUDA C++ Core Compute Libraries", "version" : "11.7.91" }, "cuda_cudart" : { "name" : "CUDA Runtime (cudart)", "version" : "11.7.99" }, "cuda_cuobjdump" : { "name" : "cuobjdump", "version" : "11.7.91" }, "cuda_cupti" : { "name" : "CUPTI", "version" : "11.7.101" }, "cuda_cuxxfilt" : { "name" : "CUDA cu++ filt", "version" : "11.7.91" }, "cuda_demo_suite" : { "name" : "CUDA Demo Suite", "version" : "11.7.91" }, "cuda_gdb" : { "name" : "CUDA GDB", "version" : "11.7.91" }, "cuda_memcheck" : { "name" : "CUDA Memcheck", "version" : "11.7.91" }, "cuda_nsight" : { "name" : "Nsight Eclipse Plugins", "version" : "11.7.91" }, "cuda_nvcc" : { "name" : "CUDA NVCC", "version" : "11.7.99" }, "cuda_nvdisasm" : { "name" : "CUDA nvdisasm", "version" : "11.7.91" }, "cuda_nvml_dev" : { "name" : "CUDA NVML Headers", "version" : "11.7.91" }, "cuda_nvprof" : { "name" : "CUDA nvprof", "version" : "11.7.101" }, "cuda_nvprune" : { "name" : "CUDA nvprune", "version" : "11.7.91" }, "cuda_nvrtc" : { "name" : "CUDA NVRTC", "version" : "11.7.99" }, "cuda_nvtx" : { "name" : "CUDA NVTX", "version" : "11.7.91" }, "cuda_nvvp" : { "name" : "CUDA NVVP", "version" : "11.7.101" }, "cuda_sanitizer_api" : { "name" : "CUDA Compute Sanitizer API", "version" : "11.7.91" }, "fabricmanager" : { "name" : "Fabric Manager", "version" : "515.65.01" }, "libcublas" : { "name" : "CUDA cuBLAS", "version" : "11.10.3.66" }, "libcufft" : { "name" : "CUDA cuFFT", "version" : "10.7.2.91" }, "libcufile" : { "name" : "GPUDirect Storage (cufile)", "version" : "1.3.1.18" }, "libcurand" : { "name" : "CUDA cuRAND", "version" : "10.2.10.91" }, "libcusolver" : { "name" : "CUDA cuSOLVER", "version" : "11.4.0.1" }, "libcusparse" : { "name" : "CUDA cuSPARSE", "version" : "11.7.4.91" }, "libnpp" : { "name" : "CUDA NPP", "version" : "11.7.4.75" }, "libnvidia_nscq" : { "name" : "NvSwitch Library", "version" : "515.65.01" }, "libnvjpeg" : { "name" : "CUDA nvJPEG", "version" : "11.8.0.2" }, "nsight_compute" : { "name" : "Nsight Compute", "version" : "2022.2.1.3" }, "nsight_systems" : { "name" : "Nsight Systems", "version" : "2022.1.3.3" }, "nvidia_driver" : { "name" : "NVIDIA Linux Driver", "version" : "515.65.01" }, "nvidia_fs" : { "name" : "NVIDIA file-system", "version" : "2.12.8" } }

@KK01101011
Copy link

KK01101011 commented Feb 28, 2023

I have the same issue as well. I need to use the below commands constantly to do even use my computer which is on up-to-date Fedora Linux. If I don't use it even the Power Off button does not show up.
sudo systemctl stop nvidia-powerd.service && sudo systemctl start nvidia-powerd.service

@gustawdaniel
Copy link

@KK01101011 as temporary solution you can mask this service

sudo systemctl mask nvidia-powerd.service

I did it and my computer still works

➜ sudo systemctl status nvidia-powerd.service
[sudo] hasło użytkownika daniel: 
○ nvidia-powerd.service
     Loaded: masked (Reason: Unit nvidia-powerd.service is masked.)
     Active: inactive (dead)

@KK01101011
Copy link

@KK01101011 as temporary solution you can mask this service

sudo systemctl mask nvidia-powerd.service

I did it and my computer still works

➜ sudo systemctl status nvidia-powerd.service
[sudo] hasło użytkownika daniel: 
○ nvidia-powerd.service
     Loaded: masked (Reason: Unit nvidia-powerd.service is masked.)
     Active: inactive (dead)

Thank you very much.

@ashleyx
Copy link

ashleyx commented Mar 2, 2023

In case anyone needed a report of a longer time span. I've had the nvidia-powerd service masked for 5 weeks now and there has been no impact on gaming or gpu compute performance.

@atauln
Copy link

atauln commented Mar 2, 2023

I've had mine masked for two months, and there has been no effect on my system's performance or stability.

@straurob
Copy link

I have the same issue as well. I need to use the below commands constantly to do even use my computer which is on up-to-date Fedora Linux. If I don't use it even the Power Off button does not show up. sudo systemctl stop nvidia-powerd.service && sudo systemctl start nvidia-powerd.service

Experiencing the same behavior since Fedora 36, then Fedora 37 and now on Fedora 38 Beta.

@oliveiraethales
Copy link

I've had mine masked for two months, and there has been no effect on my system's performance or stability.

Doing the same over here on Fedora 37 on a Lenovo Legion i5 (RTX 3060) and all is well for now.

@amrit1711
Copy link
Collaborator

Fix is incorporated in driver 530.41.03, please help to verify and share test results.

@retrixe
Copy link

retrixe commented Apr 4, 2023

I've been using the latest 530 driver for more than a week with nvidia-powerd enabled and haven't experienced this issue. On older versions, it would definitely have happened several times in this timespan.

@ghost
Copy link

ghost commented Jun 3, 2023

I never had this bug anymore , this issue should be marked as fixed and closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working NV-Triaged An NVBug has been created for dev to investigate
Projects
None yet
Development

No branches or pull requests