Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA card permanently active, never suspended in V560TNE #999

Closed
filipleple opened this issue Aug 13, 2024 · 16 comments
Closed

NVIDIA card permanently active, never suspended in V560TNE #999

filipleple opened this issue Aug 13, 2024 · 16 comments
Assignees
Labels
bug Something isn't working firmware kernel Issues caused by kernel (outdated version, driver issue etc) needs review novacustom_v56_mtl NovaCustom V56 Series

Comments

@filipleple
Copy link
Member

Component

Dasharo firmware

Device

NovaCustom V56 14th Gen

Dasharo version

v0.9.1-rc2

Dasharo Tools Suite version

No response

Test case ID

NVI002.001

Brief summary

NVIDIA card permanently active, never suspended in V560TNE

How reproducible

100%

How to reproduce

Run the NVI002.001 test

Expected behavior

It should pass

Actual behavior

The card is never suspended. No additional GUI apps that could utilize HW acceleration are running.

------------------------------------------------------------------------------
NVI002.001 NVIDIA Graphics power management (Ubuntu) :: Check whet... ....
Checking if mesa-utils is installed...

Package mesa-utils is installed
.
Checking if pciutils is installed...

Package pciutils is installed
NVI002.001 NVIDIA Graphics power management (Ubuntu) :: Check whet... | FAIL |
'active' does not contain 'suspended'
------------------------------------------------------------------------------

Screenshots

No response

Additional context

No response

Solutions you've tried

No response

@mkopec
Copy link
Member

mkopec commented Aug 19, 2024

should be fixed by Dasharo/coreboot@66abce3

@philipandag
Copy link

philipandag commented Aug 20, 2024

Issue still exists in v0.9.1-rc3

scripts/run.sh dasharo-compatibility/nvidia.robot -- -t "NVI002*"
dasharo-compatibility/nvidia.robot -- -t NVI002*
robot -L TRACE -l logs/novacustom-v560tne/2024_08_20_15_07_00/dasharo-compatibility/nvidia.robot__log.html -r logs/novacustom-v560tne/2024_08_20_15_07_00/dasharo-compatibility/nvidia.robot__report.html -o logs/novacustom-v560tne/2024_08_20_15_07_00/dasharo-compatibility/nvidia.robot__out.xml -b logs/novacustom-v560tne/2024_08_20_15_07_00/dasharo-compatibility/nvidia.robot__debug.log -v rte_ip:127.0.0.1 -v config:novacustom-v560tne -v device_ip:192.168.4.168 -v fw_file:novacustom_v560tnx.rom -t NVI002* dasharo-compatibility/nvidia.robot
==============================================================================
Nvidia                                                                        
==============================================================================
NVI002.001 NVIDIA Graphics power management (Ubuntu) :: Check whet... ....
Checking if mesa-utils is installed...

Package mesa-utils is installed
.
Checking if pciutils is installed...

Package pciutils is installed
NVI002.001 NVIDIA Graphics power management (Ubuntu) :: Check whet... | FAIL |
'active' does not contain 'suspended'
------------------------------------------------------------------------------
Nvidia                                                                | FAIL |
1 test, 0 passed, 1 failed
==============================================================================

@mkopec
Copy link
Member

mkopec commented Aug 20, 2024

is the nvidia driver loaded? Please check lsmod | grep nvidia

@philipandag
Copy link

@mkopec

ubuntu@3mdeb:~$ lsmod | grep -i nvidia
nvidia_uvm           5021696  0
nvidia_drm            122880  2
nvidia_modeset       1507328  2 nvidia_drm
nvidia               8781824  32 nvidia_uvm,nvidia_modeset
ecc                    45056  2 ecdh_generic,nvidia
video                  73728  3 xe,i915,nvidia_modeset
ubuntu@3mdeb:~$ 

@mkopec
Copy link
Member

mkopec commented Aug 22, 2024

in an x11 session, Xorg keeps the gpu powered on. fixed by switching to GNOME Wayland session:

cat /proc/driver/nvidia/gpus/0000\:01\:00.0/power
Runtime D3 status:          Enabled (fine-grained)
Video Memory:               Off

GPU Hardware Support:
 Video Memory Self Refresh: Supported
 Video Memory Off:          Supported

S0ix Power Management:
 Platform Support:          Supported
 Status:                    Disabled

@mkopec mkopec closed this as completed Aug 22, 2024
@philipandag
Copy link

philipandag commented Aug 27, 2024

in an x11 session, Xorg keeps the gpu powered on. fixed by switching to GNOME Wayland session:

Xorg was not used here (in my case). Wayland was used from the beginning. Switching to Xorg and back to Wayland does not change the output of cat /sys/class/drm/card1/device/power/runtime_status which is used in the test documentation and automatic tests.

Also, running cat /proc/driver/nvidia/gpus/0000\:01\:00.0/power results in cat: '/proc/driver/nvidia/gpus/0000:01:00.0/power': No such file or directory

lsmod | grep -i nvidia returns nothing suggesting the driver is not even loaded. But lspci | grep -i nvidia detects the card. I tried reinstalling the driver and dkms in version 550-open and rebooting which did not make any difference.

@philipandag philipandag reopened this Aug 27, 2024
@philipandag
Copy link

I think the nvidia gpu is not working at all on V560TNE with the Ubuntu 24.04, kernel 6.9 which is installed on our device. I tried installing the drivers -open variant in versions 535, 550 and 600 using apt as well as Ubuntu's Software & Updates app, rebooting after every install, but to no avail, because running lsmod | grep -i nvidia always yielded no results. I checked what gpu chromium uses by going to chrome://gpu and it used the iGPU despite hardware acceleration being allowed in the settings. The GPU works fine on Windows so maybe I am doing something wrong,

@philipandag
Copy link

V540TND v0.9.1-rc5 kernel 6.9 is the same as #999 (comment), both on Wayland and Xorg session (changed using the cogwhell in the low-right corner on the login screen)

  • When I first run lsmod | grep -i nvidia i got results similar to here NVIDIA card permanently active, never suspended in V560TNE #999 (comment) but after rebooting it returns nothing
  • nvidia-smi was also working fine and after a reboot it returns NVIDIA_SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest driver is installed and running
  • cat /proc/driver/nvidia/gpus/0000\:01\:00.0/power fails, /proc/driver/nvidia/ does not exist

Before and after this unfortunate reboot it was true that:

  • cat /sys/class/drm/card1/device/power/runtime_status has active permamently

Logs after the reboot:
cbmem-nvidia.log
dmesg-nvidia.log

dkms status:

ubuntu@3mdeb:~$ dkms status
acpi-call/1.2.2, 6.8.0-35-generic, x86_64: installed
acpi-call/1.2.2, 6.8.0-39-generic, x86_64: installed
nvidia/535.183.01: added

apt-cache policy nvidia-driver-530:

nvidia-driver-530:
  Installed: (none)
  Candidate: 535.183.01-0ubuntu0.24.04.1
  Version table:
     535.183.01-0ubuntu0.24.04.1 500
        500 http://pl.archive.ubuntu.com/ubuntu noble-updates/restricted amd64 Packages
        500 http://security.ubuntu.com/ubuntu noble-security/restricted amd64 Packages
     535.171.04-0ubuntu2 500
        500 http://pl.archive.ubuntu.com/ubuntu noble/restricted amd64 Packages

Looking at the outputs I suspect it is related to updating to kernel 6.9. Maybe the command on dasharo docs is wrong? I have been just copy/pasting it.

sudo apt install nvidia-driver-550-open contains:

(...)
Module build for kernel 6.9.0-060900-generic was skipped since the kernel headers for this kernel do not seem to be installed.
(...)

@mkopec
Copy link
Member

mkopec commented Sep 11, 2024

you need to also install kernel headers if you're installing a new kernel. It's a separate package. Without it the nvidia kernel driver will not be built and it won'be available.

@philipandag
Copy link

When running the command from docs.dasharo

sudo apt install ./linux-headers-6.9.0-060900_6.9.0-060900.202405122134_all.deb     ./linux-image-unsigned-6.9.0-060900-generic_6.9.0-060900.202405122134_amd64.deb     ./linux-modules-6.9.0-060900-generic_6.9.0-060900.202405122134_amd64.deb 

The output contains:

(...)
linux-headers-6.9.0-060900 is already the newest version (6.9.0-060900.202405122134).
linux-image-unsigned-6.9.0-060900-generic is already the newest version (6.9.0-060900.202405122134).
linux-modules-6.9.0-060900-generic is already the newest version (6.9.0-060900.202405122134).
(...)

suggesting the headers are installed.

@wessel-novacustom
Copy link

FWIW: Suspend seems to work fine with Pop!_OS, while the NVIDIA graphics card is working.

@mkopec mkopec added the kernel Issues caused by kernel (outdated version, driver issue etc) label Oct 2, 2024
@mkopec
Copy link
Member

mkopec commented Oct 3, 2024

Works fine out of the box on Ubuntu 24.10

@mkopec mkopec closed this as completed Oct 3, 2024
@SebastianCzapla
Copy link

This still failing for me at rc6 with 6.11 kernel.
Driver version 560.35.03, newest available in Ubuntu 24.10.

/sys/class/drm/card1/device/power/runtime_status shows active within all scenarios:
Nvidia is primary GPU (nvidia-smi crashes, does not show info, nvidia-smi -r cannot reset the card)
Intel is primary GPU (nvidia driver does not load, and cannot be modprobe'd)
Tested on xorg and wayland variants.

nvidia.robot_log.zip

@mkopec
Copy link
Member

mkopec commented Oct 8, 2024

Is the driver actually loaded at all @SebastianCzapla ? If nvidia-smi crashes, something is wrong with the driver

Run lsmod | grep nvidia to check. Maybe also attach the dmesg log here

@SebastianCzapla
Copy link

image
dmesg-log-nvidia.zip

@SebastianCzapla
Copy link

SebastianCzapla commented Oct 8, 2024

nvidia-smi crashes due to OOM on ubuntu 24.10, with 560.35.03 drivers, as well as other drivers. For some reason it calls mmap() with two ridiculous numbers, precisely 48 GiB and 4 GiB.

I found workaround to that crash, it is to block nvidia persistance socket
sudo chmod o-w /var/run/nvidia-persistenced/socket.

OOM still occurs when running nvidia-smi as root.

Then, nvidia-smi shows no processes running, while card state remains active, even with desktop manage disabled, with only TTY running.

This does not impact windows at all, power management there works as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working firmware kernel Issues caused by kernel (outdated version, driver issue etc) needs review novacustom_v56_mtl NovaCustom V56 Series
Projects
Development

No branches or pull requests

6 participants