Anchored Dual-pass HLLD for Hypoelasticity (+ HLLC and interface-consistent HLL)#1414
Anchored Dual-pass HLLD for Hypoelasticity (+ HLLC and interface-consistent HLL)#1414ChrisZYJ wants to merge 104 commits into
Conversation
…2 (1-fluid is valid)
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1414 +/- ##
==========================================
+ Coverage 60.43% 60.64% +0.20%
==========================================
Files 83 84 +1
Lines 19871 20705 +834
Branches 2956 3064 +108
==========================================
+ Hits 12010 12556 +546
- Misses 5860 6054 +194
- Partials 2001 2095 +94 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
sbryngelson
left a comment
There was a problem hiding this comment.
High-level code-quality pass
This PR is large (~12k additions), but most of that is auto-generated tests/*/golden-metadata.txt fixtures (~40 new test dirs) — the hand-written src/ diff is closer to ~2,850 lines. Overall the new code is well-documented for its complexity (good WHY-comments, sensible naming, the dev-notes markdown files are legitimate cross-file architecture docs, not compensating for opaque code). The points below are meant to help trim the remaining LOC/complexity without touching the numerics, plus one real correctness gap.
1. Biggest lever: s_hypo_hlld_riemann_solver doesn't use the codebase's own decomposition tool
src/simulation/m_riemann_solver_hypo_hlld.fpp:21-965 is a single ~945-line subroutine with an ~800-line GPU-parallel loop body (147 privatized scalars) — over contributing.md's own soft guideline (≤500 lines/subroutine, ≤1000/file; this file is 1090). MFC already has a working, GPU-offload-safe pattern for factoring per-cell math out of parallel loops: $:GPU_ROUTINE(parallelism='[seq]') helper subroutines (used for s_compute_speed_of_sound, s_compute_pressure, etc.), and this file already calls s_compute_speed_of_sound that way (lines 301-304) — but never applies the same pattern to its own star-state algebra (~lines 556-630) or output-permutation logic (~lines 822-880). Extracting those into [seq]-parallel helpers would roughly halve the loop body with no offload-capability cost.
(Context: this isn't a new anti-pattern — s_hllc_riemann_solver is already ~1970 lines pre-existing — so this PR extends existing tech debt rather than introducing a new category of it. But it's the single biggest opportunity to cut LOC here.)
2. ~150 lines of duplicated small physics formulas — safe, mechanical extraction
Confirmed identical across 3-4 files:
- Hypoelastic energy-correction floor/guard formula:
m_riemann_solver_hll.fpp:347-360,m_riemann_solver_hllc.fpp:308-320and:1416-1428,m_riemann_solver_hypo_hlld.fpp:284-294 - Wave-speed estimate (Rodriguez et al. 2019 formula):
m_riemann_solver_hll.fpp:396-403,m_riemann_solver_hllc.fpp:378-389and:1489-1500,m_riemann_solver_hypo_hlld.fpp:306-309 s_finalize_riemann_solver_hatR(m_riemann_solver_hypo_hlld.fpp:1030-1088) duplicates the existing generic finalizer inm_riemann_state.fpp:1034-1183almost verbatim, instead of using the fypp-templating approach this same file already uses for the analogouss_finalize_nc_iface_vel${SUFFIX}$(lines 974-1020).
Pulling these into shared helper functions/a template would save ~100-150 lines with no design risk.
3. Real gap, not just style: mhd .and. hypoelasticity .and. riemann_solver==4 is never rejected
m_checker.fpp only prohibits HLLD (riemann_solver==4) when neither mhd nor hypoelasticity is set — never when both are set simultaneously. In that combination, hypo_nc_dual_pass becomes true and m_riemann_solvers.fpp unconditionally routes to s_hypo_hlld_riemann_solver, which has no reference to mhd/Bx/By/Bz anywhere — magnetic-field physics is silently dropped with no error, and the existing MHD HLLD path is simply never reached. Suggest adding @:PROHIBIT(riemann_solver == 4 .and. mhd .and. hypoelasticity, "HLLD does not support combined MHD and hypoelasticity") alongside the existing checks in s_check_inputs_hypo_branch.
Lower priority / follow-up material
m_rhs.fppthreads the newadv_src_modethrough 6 separate subroutines (s_initialize_rhs_module,s_compute_rhs,s_compute_directional_rhs,s_compute_advection_source_term,s_add_directional_advection_source_terms,s_finalize_rhs_module) rather than resolving the mode once and passing it down — some shotgun surgery, though consistent with that file's existing (non-templated) style.m_riemann_solvers.fpp's own comment (lines 42-49) states hypoelasticity now enters the Riemann layer in "three distinct code shapes" (HLL/HLLC inline branches vs. HLLD's separate fused-dual-pass module). A future 4th hypoelastic solver variant would face the same scattered integration choice — worth unifying behind one seam before that happens, not a blocker for this PR.- The three mutually-exclusive hypo-mode booleans derived in
m_global_parameters.fpp(hypo_nc_finite_diff/hypo_nc_interface/hypo_nc_dual_pass) could be one enum, matching theadv_src_modepattern already used elsewhere in the same file — inconsistent style for an equivalent problem, not a bug.
None of the above is a blocker — the code is correct and well-tested per the PR description. These are opportunities to reduce the diff's LOC/maintenance footprint using patterns already established elsewhere in this codebase, plus one validation gap worth closing before merge.
Generated with the help of Claude Code.
|
And thanks @sbryngelson for the suggestions! I addressed the following cleanup items:
Also fixed the spelling CI failure: |
|
@sbryngelson If this looks good after the CI rerun, would it be possible to merge this PR soon? I’m happy to continue improving the code in follow-up PRs after this lands. The branch has already gone through several upstream re-merges and CI/review cleanup rounds, so merging the core implementation now would make future cleanup work smaller and easier to review. Thank you! |
|
I'm no longer merging messy/bloated/WIP code on the promise that it will be fixed later. I've been burned on this far too many times (perhaps not by you), and it only ratchets up, not down. Please clean your code of smells, refactor into nice helpers, and minimize SLOC while maintaining speed if you want the PR merged. Thanks. |
|
Okay. I'll refactor the solver into helpers according to your reviews, and minimize SLOC as you requested. Will post the changes here soon. |
…(identical codegen)
…e (identical codegen)
…hint on f_hlld_wave_zone
|
I've done the refactor. Here is what changed:
This removes 95 lines across the touched files. What is left in the main subroutine is dominated by the per-cell star-state algebra, which appears only once, so there is no duplication left to remove. Extracting it behind an interface would take ~50 scalar arguments or regrouping the privatized locals into derived types, and either would change register allocation on the GPU kernel. Since the goal was fewer lines at unchanged speed, that block stays inline, and the helper extractions went where the interfaces are narrow. I've taken care to ensure that most changes do not affect the generated code, and that they have minimal impact on performance. Also merged in the current master. |
|
One note on CI. The last two runs each failed a single Frontier job at the Fetch Dependencies step, before anything gets built or tested (gpu-omp 1/2, then gpu-acc 2/2). Both failures show the same error on the same runner, frontier-5: Maybe a corrupted uv cache entry on that runner's node? Could you help me re-run the one failed job? All the tests should pass. |
Self-hosted Frontier/Frontier-AMD matrix legs (acc/omp/cpu x shards) run their "Fetch Dependencies" step directly on the same login node as the same OS user, all pointed at the same UV_CACHE_DIR (introduced in #1385 to dodge NFS file-lock errors on ~/.cache/uv). uv's own cache lock guards individual entries, but concurrent installs from separate uv processes can still race while one extracts/prunes the shared archive-v0 store, leaving a corrupted entry behind (e.g. a missing dist-info METADATA file) that fails every subsequent install until the cache is cleared by hand -- as happened on PR #1414's Frontier gpu-acc [2/2] job. Serialize the actual `uv pip install` call with flock so only one process touches a given cache dir at a time, while keeping the cache itself shared and warm across runs.
|
Addresses Copilot review on #1630, plus a live recurrence caught on this PR's own CI run (job 85133634699, "Frontier (AMD) cpu [1/2]"): - UV_LOCK_DIR's guard only checked -w, so a writable non-directory TMPDIR would pass and then get used as a directory prefix, breaking flock. Add a -d check alongside -w, per Copilot's suggestion. - That same CI run hit a *new* corruption symptom ("The wheel is invalid: Missing .dist-info directory" for pandas) on the same physical login node (login05) as the original incident on #1414, even with the new lock in place. Root cause: a cache entry corrupted before the lock existed (or by any other cause) just fails forever until someone manually clears it -- which is exactly what had happened here; login05's ~1.2GiB cache had never actually been cleared (an earlier `uv cache clean` in this investigation was run from a different login node's session and never touched login05). Since self-hosted runners are spread across login nodes we can't all individually SSH into and inspect every time this happens, make the script self-heal instead: on install failure, clear the uv cache and retry once before giving up.
Description
Adds:
Key Design Choices
Separate HLLD Riemann Solvers
At a glance it might be tempting to combine HLLD MHD with dual-pass hypoelasticity HLLD, but keeping them separate makes the code cleaner and much easier to maintain because:
Riemann Source Terms
For the non-conservative terms, unlike the usual governing equations that only need div U i.e. du/dx, dv/dy, dw/dz (alpha div U, K div U, etc.), Hypoelasticity has cross terms like du/dy, so we must also pass those Riemann-consistent traces from Riemann solver to the rhs. (The old Hypoelasticity code with the HLL Riemann solver uses finite difference for non-conservative rhs, which provides enough stability given that HLL smears the interface immediately, so there wasn't a need to pass the du/dy traces before this PR. But that does not work for HLLC/HLLD for Hypoelasticity.)
Also grouped/named the condition branches (with lots of comments within the code):
adv_src_alpha_ifaceflux_src_n(dir)%vf(j_adv)= per-fluidnc_iface_vel_n(dir)%vf(1)adv_src_vel_ifaceflux_src_n(dir)%vf(adv\%beg)= sharedflux_src_nslot (alreadyadv_src_noneThe derivations, meanings, and usage of the Riemann source variables are not straightforward. I've added some hopefully very helpful notes in
misc/dev_notesfor future developers (or AI agents; directing them to my notes should help them make fewer mistakes with the source terms) in terms of the understanding and derivations for the HLL/HLLC non-conservative fluxes, and their variable mapping for Riemann solvers and RHS.Backwards Compatibiilty
Type of change
Testing
All tests passed locally on CPU and Nvidia GPU, and on Frontier.
Smooth Eigenmode Convergence
Checklist
AI code reviews
Reviews are not triggered automatically. To request a review, comment on the PR:
@coderabbitai review— incremental review (new changes only)@coderabbitai full review— full review from scratch/review— Qodo review/improve— Qodo code suggestions@claude full review— Claude full review (also triggers on PR open/reopen/ready)claude-full-review— Claude full review via label