Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU.jl on rolling release distros (Arch): Libraries unavailable #478

Open
leios opened this issue Aug 30, 2023 · 3 comments
Open

AMDGPU.jl on rolling release distros (Arch): Libraries unavailable #478

leios opened this issue Aug 30, 2023 · 3 comments

Comments

@leios
Copy link

leios commented Aug 30, 2023

Hello, just a quick report and temporary solution. I was getting this issue on Arch:

julia> using AMDGPU
┌ Warning: HSA runtime is unavailable, compilation and runtime functionality will be disabled.
│ Reason: Could not find `libhsa-runtime64` v1 library
└ @ AMDGPU ~/.julia/packages/AMDGPU/FXTo5/src/AMDGPU.jl:181
┌ Warning: LLD is unavailable, compilation functionality will be disabled.
│ Reason: unknown
└ @ AMDGPU ~/.julia/packages/AMDGPU/FXTo5/src/AMDGPU.jl:193
┌ Warning: Device libraries are unavailable, device intrinsics will be disabled.
│ Reason: unknown
└ @ AMDGPU ~/.julia/packages/AMDGPU/FXTo5/src/AMDGPU.jl:205
┌ Warning: HIP library is unavailable, HIP integration will be disabled.
│ Reason: unknown
└ @ AMDGPU ~/.julia/packages/AMDGPU/FXTo5/src/AMDGPU.jl:221

I think this is because the JLLs are still being built for the Julia 1.9 and the latest AMDGPU package, so it is falling back to my native instances of each of these libraries, which are on a later version than what is supported by AMDGPU. The quick fix is just to downgrade the packages to 5.4.3.

In arch, this can be done with the Downgrade package like so:

sudo downgrade rocsparse
sudo downgrade rocblas
sudo downgrade rocsolver
sudo downgrade rocfft
sudo downgrade rocrand
sudo downgrade miopen-hip
sudo downgrade hip-runtime-amd 
sudo downgrade rocm-device-libs
sudo downgrade hsa-rocr
sudo downgrade hsakmt-roct

There is definitely a better script, but this one works. It will open a UI and ask which version of each library to downgrade to, then you just go to 5.4.3. It's up to you whether you keep the settings so when you pacman -Syu it doesn't upgrade them by default.

I decided to create an issue instead of updating the docs because:

  1. I figure people will just google the error message
  2. I think once the JLLs are built, we won't need to do this anymore.

Also: if anyone has a better downgrade script, feel free to post it.

@originalsouth
Copy link

If the JLLs are too old, probably this issue needs to be addressed in Yggdrasil?

@ffrancesco94
Copy link

ffrancesco94 commented Oct 3, 2023

Hi,
Is it safe to downgrade those packages? I'm on Manjaro and ROCm version currently is 5.6.1. Upon doing using AMDGPU I get the following error:

[ Info: Precompiling AMDGPU [21141c5a-9bdb-4563-92ae-f87d6854732e]
julia: /usr/src/debug/hip-runtime-amd/clr-rocm-5.6.1/rocclr/os/os_posix.cpp:310: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion 'Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.

[4501] signal (6.-6): Aborted
in expression starting at /home/fra/.julia/packages/AMDGPU/bQD5E/src/AMDGPU.jl:61
unknown function (ip: 0x7f4a9168e83c)
raise at /usr/bin/../lib/libc.so.6 (unknown line)
abort at /usr/bin/../lib/libc.so.6 (unknown line)
unknown function (ip: 0x7f4a916263db)
__assert_fail at /usr/bin/../lib/libc.so.6 (unknown line)
unknown function (ip: 0x7f49e64ed8f4)
unknown function (ip: 0x7f49e64fb3e7)
unknown function (ip: 0x7f49e62c6421)
unknown function (ip: 0x7f4a9198e0fd)
unknown function (ip: 0x7f4a9198e1eb)
_dl_catch_exception at /lib64/ld-linux-x86-64.so.2 (unknown line)
unknown function (ip: 0x7f4a91994ad5)
_dl_catch_exception at /lib64/ld-linux-x86-64.so.2 (unknown line)
unknown function (ip: 0x7f4a91994e4b)
unknown function (ip: 0x7f4a916889eb)
_dl_catch_exception at /lib64/ld-linux-x86-64.so.2 (unknown line)
unknown function (ip: 0x7f4a9198a602)
unknown function (ip: 0x7f4a916884f6)
dlopen at /usr/bin/../lib/libc.so.6 (unknown line)
ijl_load_dynamic_library at /usr/bin/../lib/julia/libjulia-internal.so.1 (unknown line)
unknown function (ip: 0x7f4a77914556)
dlopen at ./libdl.jl:116 [inlined]
find_library at ./libdl.jl:206
find_library at ./libdl.jl:214 [inlined]
find_library at ./libdl.jl:214 [inlined]
find_rocm_library at /home/fra/.julia/packages/AMDGPU/bQD5E/src/discovery_utils.jl:64
#find_system_library!#9 at /home/fra/.julia/packages/AMDGPU/bQD5E/src/rocm_discovery.jl:52
find_system_library! at /home/fra/.julia/packages/AMDGPU/bQD5E/src/rocm_discovery.jl:49 [inlined]
macro expansion at /home/fra/.julia/packages/AMDGPU/bQD5E/src/rocm_discovery.jl:170 [inlined]
#10 at ./task.jl:514
unknown function (ip: 0x7f4a888c9161)
unknown function (ip: 0x7f4a90c6b17e)
Allocations: 1398669 (Pool: 1397552; Big: 1117); GC: 2
ERROR: Failed to precompile AMDGPU [21141c5a-9bdb-4563-92ae-f87d6854732e] to "/home/fra/.julia/compiled/v1.9/AMDGPU/jl_OmQaOz".

Do you think this is related to the ROCm version and if so do you think I can safely downgrade? Thanks!

@pxl-th
Copy link
Collaborator

pxl-th commented Oct 3, 2023

Latest supported ROCm version is 5.4 for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants