-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocm_agent_enumerator couldnt recognise the GPU #223
Comments
With an RX470 (custom bios) and a RX Vega 56 installed, kfd doesnt detect the RX470. If I take the Vega out of the system, my RX470 works just fine by itself. $ /opt/rocm/bin/rocm_agent_enumerator -t GPU $ lspci | grep VGA Here is dmesg with RX470 load failure and Vega56 load success. [ 12.206462] amdgpu 0000:26:00.0: amdgpu_init failed Here is with just RX470 $ /opt/rocm/bin/rocm_agent_enumerator -t GPU $ lspci | grep VGA dmesg [ 3.073351] fbcon: amdgpudrmfb (fb0) is primary device Thoughts? |
amdgpu driver initialization fails. Can you provide full dmesg output? |
Also, do we have two different issues reported on this thread? @prasanth09 has a single Fiji NANO card. @rhlug has a combination of Vega10 and Polaris10. |
@fxkamd its possible they are different issues. I'll have to set things back up to generate that full dmesg. |
@rhlug can you update to ROCm 1.6.4 which was released last night let us know if you continue to see the issue, we want to trace this down |
@gstoner I will get this setup today and tested. Update 1 - So at first, I added my rx470 in the x8 slot. On bootup, the kernel got into some exit_to_usermode_loop, and never would boot. So I moved the rx470 beside my rxvega in the 2nd x16 slot. Now I'm booting and both detecting.
Update 2 -
I need to test rx470 in x16 and rx570 in x8 (or x1) to rule out the vega being the cause of the failed boot ups. Also need to determine if this is all just MSI x370 A4 w/ Ryzen 1700 related. |
I updated to latest ROCm and the issue is fixed. I am closing this issue now. |
Thats fine, I'll spin off a new BZ once I get the time and all the details sorted. |
Seems this discussion is better than #281 for the "GPU not recognised problem"
Second GPU in slot 3, /opt/rocm/bin/rocm_agent_enumerator gives:
|
But what I did in this case was just updatting ROCm. I was using Rocm 1.5 I suppose and upgrading it to 1.6.4 resolved the issue. Now in 1.7 , I end up with "No device" error. |
Possibly ROCm 1.7 reintroduced the problem with PCIx8? The list of VGA-devices from
When at port 4, that one would have had a Width of x8. So if somebody from the ROCm-team could please test with a GPU on a x8 port, to see if this is indeed a problem? |
What linux kernel version are you on, it might be the issue |
Hello @gstoner , Can you point me to the right Linux kernel version ie., the output of |
Step one start with clean ubuntu 16.04 release. We have it running on stock 4.4 kernel and 4.10 kernel. You have to follow these instructions exactly https://rocm.github.io/ROCmInstall.html. If your on Intel based you can use the 4.4, Only Ryzen do need update Kernel 4.10 in 16.04.3. Soon they will release 4.13 but you want to wait for this one with newer 1.7.1 update we will be releasing. |
Hi ,
I am trying to run /opt/rocm/bin/rocm_agent_enumerator -t GPU and getting the following output :
gfx000
No GPU is being specified. I have tried updating the rocm driver but of no use.
command : uname -a
output : Linux prasanth 4.11.0-kfd-compute-rocm-rel-1.6-148 #1 SMP Wed Aug 23 12:00:35 CDT 2017 x86_64 x86_64 x86_64 GNU/Linux
command : /opt/rocm/hcc/bin/hcc --version
output :
HCC clang version 6.0.0 (based on HCC 1.0.17373-bd1f35c-c639ce0-e4adac0 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
command : lspci -v | grep -i amd
output :
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ca) (prog-if 00 [VGA controller])
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Radeon R9 FURY X / NANO
Kernel driver in use: amdgpu
Kernel modules: amdgpu
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aae8
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device aae8
Any inputs ?
The text was updated successfully, but these errors were encountered: