New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm-1.3 problem #46
Comments
I uninstalled all the packages and installed again and now it works. |
Actually, not quite.
|
Has anyone been able to duplicate this problem? |
On Ubuntu 16.04,with you will bump into this issue and it's a bug in ubuntu: On Nov 13, 2016, at 8:45 AM, Brian <notifications@github.commailto:notifications@github.com> wrote: Has anyone been able to duplicate this problem? — |
I just installed 14.04.5 and ROCm and I still see both this problem & the linker problem. It does not look like these problems I'm experiencing is related to Ubuntu version.
|
What are your system details Get Outlook for iOShttps://aka.ms/o0ukef On Sun, Nov 13, 2016 at 2:15 PM -0700, "Brian" <notifications@github.commailto:notifications@github.com> wrote: I just installed 14.04.5 and ROCm and I still see both this problem & the linker problem. It does not look like these problems I'm experiencing is related to Ubuntu version. briansp@FijiX2: You are receiving this because you commented. |
My set up is i7-6700 ASUS Z170M-E D3 + 2 Fury Nano. BIOS 2001. |
Hi Brian, From your description you have 2 cards, both Fury Nano's. What slots are these installed in? From what I understand there is 1 x16 slot (the grey slot) and 2 x4 slots (the black slots) on a ASUS Z170M-E motherboard. |
The board is a mATX and only has 2 x16 connector. One is x16 electrically (one closes to the CPU) and the other (farthest from CPU) is x4 electrically. |
I was looking at another board, but this is good information. From your output it appears that sample is hanging before it even finds a device. Could you remove the card on the x4 slot an see if you can reproduce the error. If it fails, please send the stack trace of the sample (build the sample in debug by adding -g to the HIPCC_FLAGS line). |
I have both Fiji on a water block with a plastic header. So, it won't be easy to remove just 1. When I go home, I'll get the stack trace with both card in the machine. Hopefully, that will give you some clues. Thanks for your help. |
I built it with -g and ran it with gdb.
Is this what you meant? |
This error indicates that the HSA runtime couldn't be initialized correctly. This seems to be different than a hang; if this happens the application would abort. In other words, if vector_copy is working, this error shouldn't occur. Can you post the output of the vector_copy sample? Also, I would like to see the output of the following command 'lspci -vvvv -d 0x1002:0x7300'. |
I forgot to mention that I'm using UEFI BIOS for the Fiji https://community.amd.com/community/gaming/blog/2016/04/05/radeon-r9-fury-nano-uefi-firmware I don't know whether that matters or not. I'll run lspci & vector_copy and post when I get home. |
lspci output.
|
vector_copy output
|
I ran bit_extract again and this time, it hung. I hit ctrl-c and got the following output.
|
@jedwards-AMD |
Hi Brian, Sorry for the late response. Several of our team members were out last week for the Thanksgiving holidays. We are still looking into the situation and we think we have a solution. I will update you soon. |
@jedwards-AMD Thanks! |
Unfortunately the issue is most likely with the capabilities of the x4 PCIe slot the second card is inserted in. To verify this is the case I need you to attach the output of the following commands to the post: |
pciedumphex.txt : https://gist.github.com/briansp2020/40489677ca4dd47f49de9a151616e233 ROCm 1.2 worked. So, I'm not sure why 1.3 is having this issue if it was a hardware problem. Also, if I don't provide any parameter, shouldn't the runtime pick the first GPU in x16 slot? |
It appears that your second device is on a PCI Bridge (00:1c.4) that doesn't support atomic operations. This is probably the cause of the hang for the HIP sample (bit_extract). It is possible that the HIP sample or the HCC runtime changed to include both devices; this would explain the different behavior between the 1.2 and 1.3 releases. I will follow up on that, but removing the card from the 00:1c.4 PCI bridge should fix the issue for now. |
Maybe it is something similiar, but I have a problem with A8-7600 + R390 setup. |
This configuration is not supported by ROCm. The A8-7600 doesn't support PCIe atomic operations and I doubt your mother board has a x16 PCIe 3.0 port for the video card. Further, the AMD Radeon R9 390 is not on the list of ROCm supported video cards. |
Hi,
I just installed ROCm-1.3 and am having problems. Even simple examples (vector_copy) no longer work. dmesg shows
My set up is i7-6700 ASUS Z170M-E D3 + 2 Fury Nano. I did not have any issue when I was using ROCm-1.2.
I upgraded from ROCm-1.2 set up. Maybe that caused some issue? When I tried
It said the packages were held back. So, I did
and it upgraded. After ward, I did
to remove packages that apt-get said no longer needed.
Any ideas?
The text was updated successfully, but these errors were encountered: