-
Notifications
You must be signed in to change notification settings - Fork 113
Closed
Labels
Description
Problem Description
Booting with runtime pm enabled causes the devices to fail to apear due to a failure to resume smu on mi100 devices, dmesg:
[ 33.711163] [drm] PCIE GART of 512M enabled.
[ 33.716881] [drm] PTB located at 0x00000087FEF00000
[ 33.723056] amdgpu 0000:03:00.0: amdgpu: PSP is resuming...
[ 33.778774] amdgpu 0000:03:00.0: amdgpu: reserve 0x400000 from 0x87fe800000 for PSP TMR
[ 33.850011] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 33.858894] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[ 33.865392] amdgpu 0000:03:00.0: amdgpu: SMC is not ready
[ 33.871308] amdgpu 0000:03:00.0: amdgpu: SMC engine is not correctly up!
[ 33.878965] amdgpu 0000:03:00.0: amdgpu: resume of IP block <smu> failed -5
[ 33.886517] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-5).
[ 37.878379] [drm] PCIE GART of 512M enabled.
[ 37.883438] [drm] PTB located at 0x00000087FEF00000
[ 37.889037] amdgpu 0000:83:00.0: amdgpu: PSP is resuming...
[ 37.945322] amdgpu 0000:83:00.0: amdgpu: reserve 0x400000 from 0x87fe800000 for PSP TMR
[ 38.016581] amdgpu 0000:83:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 38.024903] amdgpu 0000:83:00.0: amdgpu: SMU is resuming...
[ 38.030956] amdgpu 0000:83:00.0: amdgpu: SMC is not ready
[ 38.036682] amdgpu 0000:83:00.0: amdgpu: SMC engine is not correctly up!
[ 38.044093] amdgpu 0000:83:00.0: amdgpu: resume of IP block <smu> failed -5
[ 38.051452] amdgpu 0000:83:00.0: amdgpu: amdgpu_device_ip_resume failed (-5).
[ 38.416529] amdgpu 0000:c3:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
rocm-smi:
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
==========================================================================================================================
0 3 0x738c, 4106 N/A N/A N/A, N/A, 0 None None 0% unknown Unsupported 0% 0%
Operating System
ubuntu 24.04
CPU
Epyc 7552
GPU
MI100
ROCm Version
ROCm 6.3.1
ROCm Component
No response
Steps to Reproduce
add amdgpu.runpm=1 to kernel command line
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
/opt/rocm/bin/rocminfo --support
ROCk module is loaded
hsa api call failure at: /usr/src/debug/rocminfo/rocminfo-rocm-6.2.4/rocminfo.cc:1306
Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.
Additional Information
No response
LunNova