Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocm-smi 4.1 returns error #1446

Closed
perestoronin opened this issue Apr 9, 2021 · 6 comments
Closed

rocm-smi 4.1 returns error #1446

perestoronin opened this issue Apr 9, 2021 · 6 comments

Comments

@perestoronin
Copy link

perestoronin commented Apr 9, 2021

python3 /opt/rocm/bin/rocm_smi.py
Failed to get "domain" properity from properties files for kfd node 1.
rsmi_init() failed
Exception caught: rsmi_init.
ERROR:root:ROCm SMI returned 8 (the expected value is 0)

but rocminfo work perfect

@ROCmSupport
Copy link

Hi @perestoronin
Thanks for reaching out.
In my case, rocm-smi shows information properly.
Can you please share us with below information.
Asic, OS, Kernel, ROCm# version and dmesg output.

@perestoronin
Copy link
Author

perestoronin commented Apr 9, 2021

Linux 5.4.110-gentoo-rt54 #1 SMP PREEMPT_RT Thu x86_64 AMD Phenom(tm) II X6 1100T

rocminfo https://gist.github.com/raw/37969bd15b49c281c4d3535e9791876a

python /opt/rocm-4.1.0/bin/rocm_smi.py
Failed to get "domain" properity from properties files for kfd node 1.
rsmi_init() failed
Exception caught: rsmi_init.
ERROR:root:ROCm SMI returned 8 (the expected value is 0)

dmesg https://gist.github.com/raw/800e51601e1aa6ccc0326abb44736890

cat /sys/class/kfd/kfd/topology/nodes/1/properties https://gist.github.com/raw/eaa36a7bcac19abc5711ecf8cb692e0a

@ROCmSupport
Copy link

Thanks for more information @perestoronin
I looked at the device properties and found that "domain" property is missed and so its throwing error.

But in my case, "domain 0" and so no issue observed.
Working more to gather more information.

@perestoronin
Copy link
Author

perestoronin commented Apr 9, 2021

I investigated, that domain introduced since kernel version 5.8.

@ROCmSupport what versions of kernel compiance with rocm 4.1.1 ? In docs for rocm obsolete information for requerement kernel version driver amdgpu ?

@perestoronin
Copy link
Author

/opt/rocm/bin # python rocm_smi.py
GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 39.0c 7.0W 852Mhz 167Mhz 14.9% auto 220.0W 1% 6%

/opt/rocm/bin # uname -a
Linux 5.10.28-gentoo-rt36 #1 SMP PREEMPT_RT x86_64 AMD Phenom(tm) II X6 1100T

Аfter update kernel from deprecated 5.4 to actual for rocm-4.1.1 kernel-5.10 issue is resolved.

@ROCmSupport
Copy link

Thanks for upgrading kernel and closing this issue.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants