Workaround elimination with latest amdflang drop and Removing array reshapes#1433
Merged
Conversation
Contributor
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Member
|
it has merge conflicts already... |
Contributor
Author
|
It's a very minor conflict in CMake from a change yesterday, I will resolve it |
This comment was marked as resolved.
This comment was marked as resolved.
added 10 commits
May 12, 2026 15:42
This comment was marked as off-topic.
This comment was marked as off-topic.
sbryngelson
previously approved these changes
May 12, 2026
…elocation error flang-23/LLD defaults to building PIE executables. SILO and LAPACK static libraries on Frontier are compiled without -fPIC, so their 32-bit absolute relocations (R_X86_64_32) are rejected by LLD when linking a PIE binary. Add -no-pie to post_process link options for LLVMFlang to allow non-PIC system libraries. simulation is unaffected (no SILO/LAPACK dependency).
sbryngelson
previously approved these changes
May 13, 2026
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1433 +/- ##
==========================================
+ Coverage 64.95% 65.00% +0.04%
==========================================
Files 72 72
Lines 18879 18810 -69
Branches 1571 1553 -18
==========================================
- Hits 12263 12227 -36
+ Misses 5640 5615 -25
+ Partials 976 968 -8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
sbryngelson
previously approved these changes
May 13, 2026
… limit AMD flang case-opt compilation takes close to the 2h hackathon wall limit, leaving no time for the run step. Split into two sequential hackathon GPU jobs: 1. Pre-Build: compiles all benchmarks via --dry-run (build only, no execution) 2. Run: skips build (binaries cached), runs and validates benchmarks Also preserve dependency dirs in prebuild for non-Phoenix clusters (deps are already built by the Fetch Dependencies step, so only clean staging dirs).
sbryngelson
added a commit
that referenced
this pull request
May 13, 2026
- m_thinc.fpp: take master's extended Fypp for-loop tuple (STENCIL_VAR, COORDS, X_BND/Y_BND/Z_BND), update CC_PRI x_cc/y_cc/z_cc -> x%cc/y%cc/z%cc - m_rhs.fpp: take master's drop of 'dummy' workaround condition, keep bc%y%beg naming - m_riemann_solvers.fpp: take master's unified Re_avg_rsx_vf indexing (j,k,l) for all cylindrical faces, update y_cb/y_cc -> y%cb/y%cc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR updates MFC's AMD GPU (OpenMP target offload) support to use the latest therock/AFAR flang-23 drop and eliminates two performance-limiting workarounds:
New AMD flang drop (therock-23.1.0 / AFAR) — Updates the
famdbuild target to use the therock-23 compiler drop. Adds a dedicatedfrontier_amdCI workflow and Docker container. Updatestoolchain/bootstrap/modules.shwith the newOLCF_AFAR_ROOTpath and library environment variables.Remove
dummyvariable workaround — Thedummyvariable was a workaround for an amdflang bug where GPU kernels using a loop-index variable (id) directly insideGPU_PARALLEL_LOOPcaused incorrect code generation. The new flang-23 drop fixes this natively, so the workaround is removed. Replaced with a module-leveliglobvariable updated viaGPU_UPDATE(device=...)before each kernel.Eliminate array reshapes in WENO/Riemann/viscous/surface-tension/THINC — Replaces temporary array reshape operations (which required GPU memory copies) with scalar extraction using the
${SF('')}$Fypp macro pattern. This reduces GPU memory traffic and improves performance for case-optimized builds.Fix
post_processPIE relocation error with LLVMFlang — flang-23/LLD defaults to building PIE executables. Static libraries (SILO, LAPACK) built without-fPICproduceR_X86_64_32relocations that LLD rejects in PIE mode. Added-no-pietopost_processlink options for LLVMFlang.simulationis unaffected (no SILO/LAPACK dependency).CMake: use direct
find_libraryfor HIP/hipfort with LLVMFlang — Replacesfind_package(hipfort COMPONENTS hip CONFIG REQUIRED)with directfind_librarycalls using$ENV{OLCF_AFAR_ROOT}/lib, matching howCRAY_HIPFORT_LIBis handled. Avoids CMake config package dependency for the therock drop layout.Testing
Checklist
GPU changes (expand if you modified
src/simulation/)