-
Notifications
You must be signed in to change notification settings - Fork 0
Sync meeting on EESSI ROCm support (2026 04 13)
Caspar van Leeuwen edited this page Apr 13, 2026
·
1 revision
Next meeting: Monday 3 May, 10-11 CEST
- Merged PRs:
- easybuild-framework#5099 -- add support for ROCm-based toolchains (rocm-compilers, rompi, rfbf, rfoss)
- easybuild-easyblocks#3861 -- add generic ROCmComponent easyblock to build & install ROCm components
-
easybuild-easyconfigs#25576 -- Add ROCm-6.4.1 easyconfigs
- Build failures with ROCgdb
- Need
bison/flexasbuilddependencies. - Make sure to filterEBROOTBINUTIILS/lib64fromLIBRARY_PATHinpreconfigoptsandprebuildopts, see comment - Split out ROCgdb to separate PR to investigate further - Did not happen with EESSI due to filteredbinutils - Address review comments
- Generally almost ready to merge
- Try another build on
jsc-zen3=> Make sure to use reduced parallelism with full memory to not run out of memory - Jan Andre: can spin up another build on his AMD nodes.
- Caspar: can spin up a new build on ETP / Snellius
- Try another build on
- Build failures with ROCgdb
- Need
-
GROMACS as a first test target
- Manual build first, then EC PR
- EC PR should build fine, matches the options required for manual build
- ADH test https://hpc.nih.gov/apps/gromacs/
- Should use EasyBlock eventually
-
Building OpenMPI & OpenBLAS
- Fork PR: https://github.com/zerefwayne/easybuild-easyconfigs/pull/1/changes
- OpenBLAS:
- Faced (linking) issues?
- OpenBLAS definitely needs #5666 to work correctly.
- General patch for our OpenBLAS versions?
- OpenBLAS definitely needs #5666 to work correctly.
- Faced failing LAPACK tests
- Also seen with ROCm-LLVM v7.2.0 and OpenBLAS v0.3.30
- Ask upstream for help, maybe also Bart?
- Also seen with ROCm-LLVM v7.2.0 and OpenBLAS v0.3.30
- Faced (linking) issues?
- OpenMPI:
- probably need to rebuild libtool (EasyBuild v5.3.0) for LLVM: see easybuild-easyconfigs@5c53b38
- should probably build with AMD GPU support (for GPU memory buffers)
- See: https://gpuopen.com/learn/amd-lab-notes/amd-lab-notes-gpu-aware-mpi-readme/
- UCX-ROCm? Probably do something similar to UCX-ROCm-1.11.2-GCCcore-11.2.0-ROCm-4.5.0.eb
- TODO: make this an upstream PR.
- Still working on TheRock
- Last time:
- couldn't build TheRock with EB at all, if
.gitnot kept. Used for Manifest files - See TheRock#3245
- couldn't build TheRock with EB at all, if
- Now:
- there is a fix (TheRock#3978), but last release doesn't include it.
- Installed from a development branch
- Fetching takes around 1-1.5h (2nd time around, it comes from the EB cache), mainly because it needs to tar everything up, and it's huge
- there is a fix (TheRock#3978), but last release doesn't include it.
- TheRock has two build modes:
- Provide all deps yourselves (typically what EB would use)
- Tried this, but still some libs missing in upstream EB (e.g.
libbacktrace) - One component: rocm-kpack expects to find
zstd- We provide it, but they expect it's built with
cmake. - We build with makefiles, which does not provide
.cmakefiles - Upstream issue on
zstdto providecmakefiles, no progress on this. - See rocm-kpack#12
- Switch
zstdto CMake would introduce circular dependency, ugly to work around
- We provide it, but they expect it's built with
- Tried this, but still some libs missing in upstream EB (e.g.
- Bundles, try to build everything internally, and build it in a way that it shouldn't interfere with anything on the system (e.g. pre/postfix things to library names so they don't collide). We're not sure how well that works.
- After
zstdissues, tried this approach as well. - Build takes very long. Makes it hard to debug issues.
- LLVM, we just built, picks up the wrong GCC
- We need to provide a compiler config file, but we'd need to provide it during the build.
- Only seeing this because the GCC provide on Rocky 9 doesn't have support for some FORTRAN feature.
- After
- TheRock currently doesn't build with GCC 15. Might be an issue for
2026toolchains- Several components are not ready for the C23 standard, but the build system doesn't set the old standard
- Provide all deps yourselves (typically what EB would use)
- Updated EasyConfigs for HIP/hipBLAS on NVIDIA GPUs https://github.com/easybuilders/easybuild-easyconfigs/pull/25263
- First, should be followed up with more components, and see what works on NVIDIA GPUs
- Last time:
-
ROCmValidationSuite: tests if your GPUs work, does some stress testing via some of the ROCm (math) components
-
Autodetection, multiple options:
-
amd-smiis availableamd-smi static --asic | grep TARGET_GRAPHICS_VERSION. Could add--json. - Similar things could probably be done with
rocm-smi. - Without any tools, probably
cat /sys/class/kfd/kfd/topology/nodes/*/properties | grep gfx_target_versionwould work. Probably use what LLVM uses to convert this to a gfx capability https://github.com/llvm/llvm-project/blob/6e738e187055bbd33b6c3d203b6b55904dfcb624/clang/tools/offload-arch/AMDGPUArchByKFD.cpp - Plan: implement the last opion, then have the other two as fallbacks.
-
-
Jan Andre: asked Sebastian if we can install EESSI on one of the AMD GPU nodes. Would also help for testing.
-
Caspar: can have a look into configuring the SURF build bot for AMD. Or even his local one.
- For full support, we'd even need to reconfigure the AWS build bot as well. Then we can cross-compile everything.
- This would allow use to build/ingest something (ROCm-LLVM), even before exposing it
- Long term, we might look into native builds. For now, probably just cross-compile.