Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm 5.0/OpenCL] Immediate crash #1673

Closed
FilipVaverka opened this issue Feb 11, 2022 · 9 comments
Closed

[ROCm 5.0/OpenCL] Immediate crash #1673

FilipVaverka opened this issue Feb 11, 2022 · 9 comments

Comments

@FilipVaverka
Copy link

Upgrading to ROCm 5.0 seems to cause any OpenCL application (including clinfo distributed with ROCm) crash (segmentation fault) immediately on startup - see stack trace below.

I'm running latest OpenSUSE Tumbleweed (ROCm 5.0 packages from AMD repository) with

5.16.5-1-default #1 SMP PREEMPT Thu Feb 3 05:26:48 UTC 2022 (1af4009) x86_64 x86_64 x86_64 GNU/Linux

on

Lenovo Legion 5 (AMD Ryzen 7 4800H with Radeon Graphics)

gdb /opt/rocm/opencl/bin/clinfo stack trace:

Thread 1 "clinfo" received signal SIGSEGV, Segmentation fault.
0x00007fffefbe1c08 in rocr::image::ImageRuntime::GetImageInfoMaxDimension(hsa_agent_s, hsa_agent_info_t, void*) () from /opt/rocm-5.0.0/opencl/bin/../lib/../../lib/libhsa-runtime64.so.1
(gdb) bt
#0 0x00007fffefbe1c08 in rocr::image::ImageRuntime::GetImageInfoMaxDimension(hsa_agent_s, hsa_agent_info_t, void*) ()
from /opt/rocm-5.0.0/opencl/bin/../lib/../../lib/libhsa-runtime64.so.1
#1 0x00007fffefbe0d71 in rocr::image::hsa_amd_image_get_info_max_dim(hsa_agent_s, hsa_agent_info_t, void*) ()
from /opt/rocm-5.0.0/opencl/bin/../lib/../../lib/libhsa-runtime64.so.1
#2 0x00007fffefb22cb1 in rocr::AMD::GpuAgent::GetInfo(hsa_agent_info_t, void*) const () from /opt/rocm-5.0.0/opencl/bin/../lib/../../lib/libhsa-runtime64.so.1
#3 0x00007fffefb47d5f in rocr::HSA::hsa_agent_get_info(hsa_agent_s, hsa_agent_info_t, void*) ()
from /opt/rocm-5.0.0/opencl/bin/../lib/../../lib/libhsa-runtime64.so.1
#4 0x00007ffff7594fe7 in ?? ()
from /opt/rocm-5.0.0/opencl/bin/../lib/libamdocl64.so
#5 0x00007ffff7596a08 in ?? ()
from /opt/rocm-5.0.0/opencl/bin/../lib/libamdocl64.so
#6 0x00007ffff75972c1 in ?? ()
from /opt/rocm-5.0.0/opencl/bin/../lib/libamdocl64.so
#7 0x00007ffff7560f0e in ?? ()
from /opt/rocm-5.0.0/opencl/bin/../lib/libamdocl64.so
#8 0x00007ffff758ab46 in ?? ()
from /opt/rocm-5.0.0/opencl/bin/../lib/libamdocl64.so
#9 0x00007ffff7549ab5 in ?? ()
from /opt/rocm-5.0.0/opencl/bin/../lib/libamdocl64.so
#10 0x00007ffff78ea948 in __pthread_once_slow (once_control=0x7ffff78318c0,
init_routine=0x7ffff7c4b6c0 std::__once_proxy()) at pthread_once.c:117
#11 0x00007ffff7549c49 in clIcdGetPlatformIDsKHR ()
from /opt/rocm-5.0.0/opencl/bin/../lib/libamdocl64.so
#12 0x00007ffff7dbae6d in khrIcdVendorAdd ()
from /opt/rocm-5.0.0/opencl/bin/../lib/libOpenCL.so.1
#13 0x00007ffff7dbcce6 in khrIcdOsVendorsEnumerate ()
from /opt/rocm-5.0.0/opencl/bin/../lib/libOpenCL.so.1
#14 0x00007ffff78ea948 in __pthread_once_slow (
once_control=0x7ffff7fbf0e8 ,
init_routine=0x7ffff7dbcb00 )
at pthread_once.c:117
#15 0x00007ffff7dbb3f1 in clGetPlatformIDs ()
from /opt/rocm-5.0.0/opencl/bin/../lib/libOpenCL.so.1
#16 0x000000000040c0bc in cl::Platform::get(std::vector<cl::Platform, std::alloc--Type for more, q to quit, c to continue without paging--
atorcl::Platform >*) ()
#17 0x00000000004022a0 in main ()

@langyuxf
Copy link

There is a workaround for this issue. Just override the gfx version.

$ HSA_OVERRIDE_GFX_VERSION=9.0.2 /opt/rocm/opencl/bin/clinfo

@fabiscafe
Copy link

@xfyucg Thanks for this. Do you know if this is a known problem and if there will be a fix for it in time? ROCm 5.0 is the first time I'm able to use OpenCL on AMD hardware on linux. That's a huge thing for me. :)

@langyuxf
Copy link

@fabiscafe Only Renoir and Cezanne series APU have this issue in ROCm 5.0. It should be fixed in a future release.

@ROCmSupport
Copy link

Hi @FilipVaverka
Thanks for reaching out.
Can you please share the GPU details for better understanding. Thank you.

@FilipVaverka
Copy link
Author

Here is rocminfo.txt.

The GPU is gfx90c - Renoir iGPU integrated in the AMD Ryzen 7 4800H.
The workaround suggested by @xfyucg is working for me.

@dagrim
Copy link

dagrim commented Feb 23, 2022

Hello,
Does anyone know what is lacking in ROCm to support Renoir APU ? I mean, which components would I need to rebuild with additional gfx support to allow HSA / TensorFlow / Pytorch on those GPUs ?
Thanks !

@ROCmSupport
Copy link

Thanks @FilipVaverka
Integrated GPUs are not supported with ROCm officially. Thank you.
Yes, you can go with workarounds.

@Mushoz
Copy link

Mushoz commented Aug 4, 2022

People that have gotten their iGPU to work with ROCm, are you able to use tensorflow at all? I am trying to use the workarounds presented here, but tensorflow-rocm refuses to run on my 4800u's integrated vega GPU.

@langyuxf
Copy link

langyuxf commented Aug 5, 2022

People that have gotten their iGPU to work with ROCm, are you able to use tensorflow at all? I am trying to use the workarounds presented here, but tensorflow-rocm refuses to run on my 4800u's integrated vega GPU.

#1743

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants