Skip to content

Anchored Dual-pass HLLD for Hypoelasticity (+ HLLC and interface-consistent HLL)#1414

Open
ChrisZYJ wants to merge 104 commits into
MFlowCode:masterfrom
ChrisZYJ:hypo_hlld
Open

Anchored Dual-pass HLLD for Hypoelasticity (+ HLLC and interface-consistent HLL)#1414
ChrisZYJ wants to merge 104 commits into
MFlowCode:masterfrom
ChrisZYJ:hypo_hlld

Conversation

@ChrisZYJ

@ChrisZYJ ChrisZYJ commented May 9, 2026

Copy link
Copy Markdown
Contributor

Description

Adds:

  1. Hypoelasticity: Anchored Dual-pass HLLD
  2. Hypoelasticity: HLLC
  3. Hypoelasticity: HLL option (interface-consistent)
  4. HLL option (alpha div U) so non-conservative treatment aligns with HLLC

Key Design Choices

Separate HLLD Riemann Solvers

At a glance it might be tempting to combine HLLD MHD with dual-pass hypoelasticity HLLD, but keeping them separate makes the code cleaner and much easier to maintain because:

  1. Unlike HLL or HLLC, HLLD is a class of HLLD-type solvers, with formulas and states dependent on the eigenstructure of the governing equations, so the inner states' equations are completely different for MHD vs Hypoelasticity.
  2. HLLD hypoelasticity has a newly developed dual-pass anchored form, making it different from any convenional HLLD Riemann solver. The anchored forms are necessary for the non-conservative hypoelasticity terms, which MHD does not have.
  3. MHD and Hypoelasticity deal with completely different physical regimes with different governing equations, and any changes or new physical models added in the future will not apply to both modules at once.

Riemann Source Terms

For the non-conservative terms, unlike the usual governing equations that only need div U i.e. du/dx, dv/dy, dw/dz (alpha div U, K div U, etc.), Hypoelasticity has cross terms like du/dy, so we must also pass those Riemann-consistent traces from Riemann solver to the rhs. (The old Hypoelasticity code with the HLL Riemann solver uses finite difference for non-conservative rhs, which provides enough stability given that HLL smears the interface immediately, so there wasn't a need to pass the du/dy traces before this PR. But that does not work for HLLC/HLLD for Hypoelasticity.)

Also grouped/named the condition branches (with lots of comments within the code):

Branch Face quantity read RHS formula per $\alpha_k$ K*div(u) velocity source
adv_src_alpha_iface flux_src_n(dir)%vf(j_adv) = per-fluid $\Psi_{\alpha_k}$ $u_\text{cell} \cdot \Delta\Psi_\alpha / \Delta x$ nc_iface_vel_n(dir)%vf(1)
adv_src_vel_iface flux_src_n(dir)%vf(adv\%beg) = shared $\Psi_u$ $\alpha_k \cdot \Delta\Psi_u / \Delta x$ Same flux_src_n slot (already $\Psi_u$)
adv_src_none Skipped (HLLD handles internally)

The derivations, meanings, and usage of the Riemann source variables are not straightforward. I've added some hopefully very helpful notes in misc/dev_notes for future developers (or AI agents; directing them to my notes should help them make fewer mistakes with the source terms) in terms of the understanding and derivations for the HLL/HLLC non-conservative fluxes, and their variable mapping for Riemann solvers and RHS.

Backwards Compatibiilty

  • All default behaviors preserved exactly (newly added features as options)
    • The only exception is the removal of an incorrect ad-hoc fluids-limit guard that affects only Hypoelasticity HLL
  • All existing usage of Riemann and rhs source terms are preserved. No refactor is done to keep the scope of this PR limited (any refactoring would touch most of the existing HLLC functionalities)

Type of change

  • New feature

Testing

  • All tests passed locally on CPU and Nvidia GPU, and on Frontier.

  • Smooth Eigenmode Convergence

image
  • Weak Solution Comparison (Rodriguez & Johnsen (2019) §5.3(b))
image
  • Weak Scaling on Frontier
image

Checklist

  • I added or updated tests for new behavior
  • I updated documentation if user-facing behavior changed
  • GPU results match CPU results
  • Tested on NVIDIA GPU or AMD GPU

AI code reviews

Reviews are not triggered automatically. To request a review, comment on the PR:

  • @coderabbitai review — incremental review (new changes only)
  • @coderabbitai full review — full review from scratch
  • /review — Qodo review
  • /improve — Qodo code suggestions
  • @claude full review — Claude full review (also triggers on PR open/reopen/ready)
  • Add label claude-full-review — Claude full review via label

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 68.18966% with 369 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.64%. Comparing base (804c3bf) to head (3a3beab).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/simulation/m_riemann_solver_hypo_hlld.fpp 60.97% 129 Missing and 31 partials ⚠️
src/simulation/m_rhs.fpp 69.78% 56 Missing and 41 partials ⚠️
src/simulation/m_riemann_solver_hllc.fpp 69.18% 28 Missing and 25 partials ⚠️
src/simulation/m_riemann_solver_hll.fpp 48.48% 9 Missing and 8 partials ⚠️
src/simulation/m_riemann_solvers.fpp 10.52% 8 Missing and 9 partials ⚠️
src/simulation/m_hypoelastic.fpp 90.81% 3 Missing and 6 partials ⚠️
src/simulation/m_riemann_solver_lf.fpp 0.00% 5 Missing ⚠️
src/simulation/m_global_parameters.fpp 84.00% 0 Missing and 4 partials ⚠️
src/simulation/m_riemann_state.fpp 95.16% 0 Missing and 3 partials ⚠️
src/common/m_variables_conversion.fpp 66.66% 0 Missing and 2 partials ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1414      +/-   ##
==========================================
+ Coverage   60.43%   60.64%   +0.20%     
==========================================
  Files          83       84       +1     
  Lines       19871    20705     +834     
  Branches     2956     3064     +108     
==========================================
+ Hits        12010    12556     +546     
- Misses       5860     6054     +194     
- Partials     2001     2095      +94     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sbryngelson
sbryngelson previously approved these changes Jun 27, 2026

@sbryngelson sbryngelson left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level code-quality pass

This PR is large (~12k additions), but most of that is auto-generated tests/*/golden-metadata.txt fixtures (~40 new test dirs) — the hand-written src/ diff is closer to ~2,850 lines. Overall the new code is well-documented for its complexity (good WHY-comments, sensible naming, the dev-notes markdown files are legitimate cross-file architecture docs, not compensating for opaque code). The points below are meant to help trim the remaining LOC/complexity without touching the numerics, plus one real correctness gap.

1. Biggest lever: s_hypo_hlld_riemann_solver doesn't use the codebase's own decomposition tool

src/simulation/m_riemann_solver_hypo_hlld.fpp:21-965 is a single ~945-line subroutine with an ~800-line GPU-parallel loop body (147 privatized scalars) — over contributing.md's own soft guideline (≤500 lines/subroutine, ≤1000/file; this file is 1090). MFC already has a working, GPU-offload-safe pattern for factoring per-cell math out of parallel loops: $:GPU_ROUTINE(parallelism='[seq]') helper subroutines (used for s_compute_speed_of_sound, s_compute_pressure, etc.), and this file already calls s_compute_speed_of_sound that way (lines 301-304) — but never applies the same pattern to its own star-state algebra (~lines 556-630) or output-permutation logic (~lines 822-880). Extracting those into [seq]-parallel helpers would roughly halve the loop body with no offload-capability cost.

(Context: this isn't a new anti-pattern — s_hllc_riemann_solver is already ~1970 lines pre-existing — so this PR extends existing tech debt rather than introducing a new category of it. But it's the single biggest opportunity to cut LOC here.)

2. ~150 lines of duplicated small physics formulas — safe, mechanical extraction

Confirmed identical across 3-4 files:

  • Hypoelastic energy-correction floor/guard formula: m_riemann_solver_hll.fpp:347-360, m_riemann_solver_hllc.fpp:308-320 and :1416-1428, m_riemann_solver_hypo_hlld.fpp:284-294
  • Wave-speed estimate (Rodriguez et al. 2019 formula): m_riemann_solver_hll.fpp:396-403, m_riemann_solver_hllc.fpp:378-389 and :1489-1500, m_riemann_solver_hypo_hlld.fpp:306-309
  • s_finalize_riemann_solver_hatR (m_riemann_solver_hypo_hlld.fpp:1030-1088) duplicates the existing generic finalizer in m_riemann_state.fpp:1034-1183 almost verbatim, instead of using the fypp-templating approach this same file already uses for the analogous s_finalize_nc_iface_vel${SUFFIX}$ (lines 974-1020).

Pulling these into shared helper functions/a template would save ~100-150 lines with no design risk.

3. Real gap, not just style: mhd .and. hypoelasticity .and. riemann_solver==4 is never rejected

m_checker.fpp only prohibits HLLD (riemann_solver==4) when neither mhd nor hypoelasticity is set — never when both are set simultaneously. In that combination, hypo_nc_dual_pass becomes true and m_riemann_solvers.fpp unconditionally routes to s_hypo_hlld_riemann_solver, which has no reference to mhd/Bx/By/Bz anywhere — magnetic-field physics is silently dropped with no error, and the existing MHD HLLD path is simply never reached. Suggest adding @:PROHIBIT(riemann_solver == 4 .and. mhd .and. hypoelasticity, "HLLD does not support combined MHD and hypoelasticity") alongside the existing checks in s_check_inputs_hypo_branch.

Lower priority / follow-up material

  • m_rhs.fpp threads the new adv_src_mode through 6 separate subroutines (s_initialize_rhs_module, s_compute_rhs, s_compute_directional_rhs, s_compute_advection_source_term, s_add_directional_advection_source_terms, s_finalize_rhs_module) rather than resolving the mode once and passing it down — some shotgun surgery, though consistent with that file's existing (non-templated) style.
  • m_riemann_solvers.fpp's own comment (lines 42-49) states hypoelasticity now enters the Riemann layer in "three distinct code shapes" (HLL/HLLC inline branches vs. HLLD's separate fused-dual-pass module). A future 4th hypoelastic solver variant would face the same scattered integration choice — worth unifying behind one seam before that happens, not a blocker for this PR.
  • The three mutually-exclusive hypo-mode booleans derived in m_global_parameters.fpp (hypo_nc_finite_diff/hypo_nc_interface/hypo_nc_dual_pass) could be one enum, matching the adv_src_mode pattern already used elsewhere in the same file — inconsistent style for an equivalent problem, not a bug.

None of the above is a blocker — the code is correct and well-tested per the PR description. These are opportunities to reduce the diff's LOC/maintenance footprint using patterns already established elsewhere in this codebase, plus one validation gap worth closing before merge.


Generated with the help of Claude Code.

@ChrisZYJ

ChrisZYJ commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

And thanks @sbryngelson for the suggestions! I addressed the following cleanup items:

  • Added checker + Python validator guard for mhd + hypoelasticity + HLLD.
  • Deduped repeated hypo elastic-energy and HLL/HLLC wave-speed formulas via Fypp inline macros.
  • Fixed an HLLC 2D-axisym elastic-energy bug: HLLC now doubles only shear_indices, matching m_variables_conversion.
  • Kept the intrusive s_hypo_hlld_riemann_solver helper extraction as follow-up scope. The solve kernel has been optimized for Frontier, and prior profiling showed it is register/occupancy-sensitive, so moving large blocks behind helpers could be done separately with performance profiling.

Also fixed the spelling CI failure: docs/documentation/case.md now uses patch(es) instead of patche(s). I think
this was exposed by a newer typos version on CI.

@ChrisZYJ

ChrisZYJ commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

@sbryngelson If this looks good after the CI rerun, would it be possible to merge this PR soon? I’m happy to continue improving the code in follow-up PRs after this lands. The branch has already gone through several upstream re-merges and CI/review cleanup rounds, so merging the core implementation now would make future cleanup work smaller and easier to review. Thank you!

@sbryngelson

Copy link
Copy Markdown
Member

I'm no longer merging messy/bloated/WIP code on the promise that it will be fixed later. I've been burned on this far too many times (perhaps not by you), and it only ratchets up, not down.

Please clean your code of smells, refactor into nice helpers, and minimize SLOC while maintaining speed if you want the PR merged. Thanks.

@ChrisZYJ

ChrisZYJ commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Okay. I'll refactor the solver into helpers according to your reviews, and minimize SLOC as you requested. Will post the changes here soon.

@ChrisZYJ

ChrisZYJ commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

I've done the refactor. Here is what changed:

  • Factored the 25 repeated per-component wave-fan flux folds into small Fypp component tables driving one shared template.
  • Factored the 12 outer-wave stress star-state updates into one formula family with a table over component, sign, and elastic coefficient.
  • Used a single definition for the per-component HLL flux formula in both the degenerate-fan fallback and the ADC blend.
  • Factored s_finalize_riemann_solver_hatR and the generic finalizer into one template in m_riemann_state.fpp, so they will not diverge in the future.
  • Used a single [seq] helper f_hlld_wave_zone for the five-wave fan-zone selection, now shared by the flux fold, the NC face-velocity export, and the axisymmetric face-state pick.
  • Changed the three mutually exclusive hypo_nc_* booleans into one hypo_nc_mode enum, following the same pattern as adv_src_mode.

This removes 95 lines across the touched files. m_riemann_solver_hypo_hlld.fpp drops from 1084 to 968 lines, the main subroutine from 938 to 859, and the GPU loop body from ~820 to ~690.

What is left in the main subroutine is dominated by the per-cell star-state algebra, which appears only once, so there is no duplication left to remove. Extracting it behind an interface would take ~50 scalar arguments or regrouping the privatized locals into derived types, and either would change register allocation on the GPU kernel. Since the goal was fewer lines at unchanged speed, that block stays inline, and the helper extractions went where the interfaces are narrow.

I've taken care to ensure that most changes do not affect the generated code, and that they have minimal impact on performance. Also merged in the current master.

@ChrisZYJ

ChrisZYJ commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

One note on CI. The last two runs each failed a single Frontier job at the Fetch Dependencies step, before anything gets built or tested (gpu-omp 1/2, then gpu-acc 2/2). Both failures show the same error on the same runner, frontier-5:

error: Failed to install: pandas-3.0.3-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
  Caused by: failed to open file `/tmp/uv-cache-sbryngelson/archive-v0/h07_8qgFw9kMCAsSlH4jZ/pandas-3.0.3.dist-info/METADATA`: No such file or directory

Maybe a corrupted uv cache entry on that runner's node? Could you help me re-run the one failed job? All the tests should pass.

sbryngelson added a commit that referenced this pull request Jul 4, 2026
Self-hosted Frontier/Frontier-AMD matrix legs (acc/omp/cpu x shards) run
their "Fetch Dependencies" step directly on the same login node as the
same OS user, all pointed at the same UV_CACHE_DIR (introduced in #1385
to dodge NFS file-lock errors on ~/.cache/uv). uv's own cache lock
guards individual entries, but concurrent installs from separate uv
processes can still race while one extracts/prunes the shared
archive-v0 store, leaving a corrupted entry behind (e.g. a missing
dist-info METADATA file) that fails every subsequent install until the
cache is cleared by hand -- as happened on PR #1414's Frontier gpu-acc
[2/2] job.

Serialize the actual `uv pip install` call with flock so only one
process touches a given cache dir at a time, while keeping the cache
itself shared and warm across runs.
@sbryngelson

Copy link
Copy Markdown
Member

One note on CI. The last two runs each failed a single Frontier job at the Fetch Dependencies step, before anything gets built or tested (gpu-omp 1/2, then gpu-acc 2/2). Both failures show the same error on the same runner, frontier-5:

error: Failed to install: pandas-3.0.3-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
  Caused by: failed to open file ``/tmp/uv-cache-sbryngelson/archive-v0/h07_8qgFw9kMCAsSlH4jZ/pandas-3.0.3.dist-info/METADATA``: No such file or directory

Maybe a corrupted uv cache entry on that runner's node? Could you help me re-run the one failed job? All the tests should pass.

#1630

sbryngelson added a commit that referenced this pull request Jul 4, 2026
Addresses Copilot review on #1630, plus a live recurrence caught on
this PR's own CI run (job 85133634699, "Frontier (AMD) cpu [1/2]"):

- UV_LOCK_DIR's guard only checked -w, so a writable non-directory
  TMPDIR would pass and then get used as a directory prefix, breaking
  flock. Add a -d check alongside -w, per Copilot's suggestion.

- That same CI run hit a *new* corruption symptom ("The wheel is
  invalid: Missing .dist-info directory" for pandas) on the same
  physical login node (login05) as the original incident on #1414,
  even with the new lock in place. Root cause: a cache entry corrupted
  before the lock existed (or by any other cause) just fails forever
  until someone manually clears it -- which is exactly what had
  happened here; login05's ~1.2GiB cache had never actually been
  cleared (an earlier `uv cache clean` in this investigation was run
  from a different login node's session and never touched login05).

  Since self-hosted runners are spread across login nodes we can't all
  individually SSH into and inspect every time this happens, make the
  script self-heal instead: on install failure, clear the uv cache and
  retry once before giving up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Development

Successfully merging this pull request may close these issues.

3 participants