Skip to content

Sync meeting on EESSI ROCm support (2026 04 13)

Caspar van Leeuwen edited this page Apr 13, 2026 · 1 revision

2026-04-13 AMD sync meeting

Next meeting: Monday 3 May, 10-11 CEST

Update EasyBuild

Update Aayush

  • easybuild-easyconfigs#25576 -- Add ROCm-6.4.1 easyconfigs

    • Build failures with ROCgdb - Need bison/flex as builddependencies. - Make sure to filter EBROOTBINUTIILS/lib64 from LIBRARY_PATH in preconfigopts and prebuildopts, see comment - Split out ROCgdb to separate PR to investigate further - Did not happen with EESSI due to filtered binutils
    • Address review comments
      • ROCmValidationSuite for multi-GPU archs: comment
      • RCCL needs abs_path_compilers for proper rpath handling: comment
      • HIP checksums are missing, picks up from next component?: comment
        • Likely a new(?) bug in the Bundle easyblock
    • Generally almost ready to merge
      • Try another build on jsc-zen3 => Make sure to use reduced parallelism with full memory to not run out of memory
      • Jan Andre: can spin up another build on his AMD nodes.
      • Caspar: can spin up a new build on ETP / Snellius
  • GROMACS as a first test target

  • Building OpenMPI & OpenBLAS

Update Jan Andre

  • Still working on TheRock
    • Last time:
      • couldn't build TheRock with EB at all, if .git not kept. Used for Manifest files
      • See TheRock#3245
    • Now:
      • there is a fix (TheRock#3978), but last release doesn't include it.
        • Installed from a development branch
        • Fetching takes around 1-1.5h (2nd time around, it comes from the EB cache), mainly because it needs to tar everything up, and it's huge
    • TheRock has two build modes:
      • Provide all deps yourselves (typically what EB would use)
        • Tried this, but still some libs missing in upstream EB (e.g. libbacktrace)
        • One component: rocm-kpack expects to find zstd
          • We provide it, but they expect it's built with cmake.
          • We build with makefiles, which does not provide .cmake files
          • Upstream issue on zstd to provide cmake files, no progress on this.
          • See rocm-kpack#12
          • Switch zstd to CMake would introduce circular dependency, ugly to work around
      • Bundles, try to build everything internally, and build it in a way that it shouldn't interfere with anything on the system (e.g. pre/postfix things to library names so they don't collide). We're not sure how well that works.
        • After zstd issues, tried this approach as well.
        • Build takes very long. Makes it hard to debug issues.
        • LLVM, we just built, picks up the wrong GCC
          • We need to provide a compiler config file, but we'd need to provide it during the build.
          • Only seeing this because the GCC provide on Rocky 9 doesn't have support for some FORTRAN feature.
      • TheRock currently doesn't build with GCC 15. Might be an issue for 2026 toolchains
        • Several components are not ready for the C23 standard, but the build system doesn't set the old standard
    • Updated EasyConfigs for HIP/hipBLAS on NVIDIA GPUs https://github.com/easybuilders/easybuild-easyconfigs/pull/25263
      • First, should be followed up with more components, and see what works on NVIDIA GPUs

General discussion

  • ROCmValidationSuite: tests if your GPUs work, does some stress testing via some of the ROCm (math) components

  • Autodetection, multiple options:

  • Jan Andre: asked Sebastian if we can install EESSI on one of the AMD GPU nodes. Would also help for testing.

  • Caspar: can have a look into configuring the SURF build bot for AMD. Or even his local one.

    • For full support, we'd even need to reconfigure the AWS build bot as well. Then we can cross-compile everything.
    • This would allow use to build/ingest something (ROCm-LLVM), even before exposing it
    • Long term, we might look into native builds. For now, probably just cross-compile.

Clone this wiki locally