-
Notifications
You must be signed in to change notification settings - Fork 0
Sync meeting on EESSI ROCm support (2026 03 16)
Caspar van Leeuwen edited this page Apr 13, 2026
·
1 revision
attendees:
- Caspar & Aayush (SURF)
- Jan (JSC)
- Kenneth (HPC-UGent)
notes:
- updates Jan:
- PR for working ROCm-LLVM (ROCm 7.2) open:https://github.com/easybuilders/easybuild-easyconfigs/pull/25467
- no bootstrapping, pulling in rocm-runtime which provides rebuilding LLVM
- no tests yet
- copying what AMD does with TheRock
- can't use TheRock directly, because they still rely on
.gitdir being present (https://github.com/ROCm/TheRock/issues/3245)
- can't use TheRock directly, because they still rely on
- HIP: same approach as before, except for minor cleanup in easyconfig
- problem found: HIP depends on libglvnd
- provided by OpenGL module in EB (as a bundle component) in 2025b generation, which includes Mesa, which depends on LLVM
- results in accidental linking to LLVM rather than ROCm-LLVM
- working on patch to avoid LLVM in
$LIBRARY_PATHto dance around this, but still seeing problems with the tests (--rpathbeing used instead of-rpath)- would require a lot of testing, removing
$LIBRARY_PATHcould have big impact -
should be OK, things should still be findable with CMake or
llvm-config
- would require a lot of testing, removing
- should libglvnd be separate (as it was before)?
- real problem that is that we're in trouble as soon as we pick up LLVM as an (indirect) dependency, for example via Mesa
- should Mesa be moved to GCC instead of GCCcore (along with OpenGL, Qt6, ...)
- EB's compiler wrappers pick up on LLVM in case of ROCm-LLVM and LLVM both being loaded
- ROCm-LLVM should declare a conflict with LLVM
- solutions could be:
- statically link LLVM (for Mesa)
- or move things higher up the toolchain hierarchy (GCCcore -> GCC)
- not listing LLVM in
$LIBRARY_PATH+ fix compiler wrappers issue + ... - avoid name clash in binaries/libraries: use
rocm-llvm-*instead ofllvm-*,librocmllvm*insteadlibllvm*- => propose upstream (Arch, Fedora also suffering from this, what is Ubuntu doing?)
- how is this problem avoided for ROCm-LLVM itself?
- their LLVM stuff is in a deeper subdir, they know where their stuff is
- but providing standard LLVM through
$LIBRARY_PATHcauses trouble?
- problem found: HIP depends on libglvnd
- building OpenBLAS with ROCm-LLVM leads to weird problem because of replacing
-rwith-rpath- see https://github.com/OpenMathLib/OpenBLAS/issues/5664
- tests not passing after applying fix from https://github.com/OpenMathLib/OpenBLAS/pull/5666
- OpenBLAS (0.3.31) tests do pass with LLVM 21.1.8
- fails with ROCm 7.2 + OpenBLAS 0.3.30 => could open issue upstream with OpenBLAS on this
- Jan wants to check first with LLVM 22.x, could be LLVM bug
- TheRock bundles OpenBLAS in some way
- see also ...
- looks like Laura Promberger (ex-CVMFS) is working on this, could reach out to set up a call
- PR for working ROCm-LLVM (ROCm 7.2) open:https://github.com/easybuilders/easybuild-easyconfigs/pull/25467
- progress Aayush
- working on moving Jan's easyconfigs to use generic
ROCmComponenteasyblock +rocm-compilerstoolchain- https://github.com/easybuilders/easybuild-easyblocks/pull/3861
-
https://github.com/easybuilders/easybuild-framework/pull/5099
- use of
amdclangeventually leads to trouble becauseclangis expected to be found in the same location as RPATH compiler wrapper foramdclang
- use of
- hipTensor doesn't run on Snellius, but does run where it was built (SURF ETP), probably because of ROCm installed in system (in
/opt/rocm)- known issue for Jan: because of missing ROCm component (
composable_kernel)
- known issue for Jan: because of missing ROCm component (
- rocBLAS
- trouble with Tensile to verify device, on both ETP & Snellius, so not related to having an AMD GPU or not
- working on moving Jan's easyconfigs to use generic