Anchored Dual-pass HLLD for Hypoelasticity (+ HLLC and interface-consistent HLL) by ChrisZYJ · Pull Request #1414 · MFlowCode/MFC

ChrisZYJ · 2026-05-09T01:26:46Z

Description

Adds:

Hypoelasticity: Anchored Dual-pass HLLD
Hypoelasticity: HLLC
Hypoelasticity: HLL option (interface-consistent)
HLL option (alpha div U) so non-conservative treatment aligns with HLLC

Key Design Choices

Separate HLLD Riemann Solvers

At a glance it might be tempting to combine HLLD MHD with dual-pass hypoelasticity HLLD, but keeping them separate makes the code cleaner and much easier to maintain because:

Unlike HLL or HLLC, HLLD is a class of HLLD-type solvers, with formulas and states dependent on the eigenstructure of the governing equations, so the inner states' equations are completely different for MHD vs Hypoelasticity.
HLLD hypoelasticity has a newly developed dual-pass anchored form, making it different from any convenional HLLD Riemann solver. The anchored forms are necessary for the non-conservative hypoelasticity terms, which MHD does not have.
MHD and Hypoelasticity deal with completely different physical regimes with different governing equations, and any changes or new physical models added in the future will not apply to both modules at once.

Riemann Source Terms

For the non-conservative terms, unlike the usual governing equations that only need div U i.e. du/dx, dv/dy, dw/dz (alpha div U, K div U, etc.), Hypoelasticity has cross terms like du/dy, so we must also pass those Riemann-consistent traces from Riemann solver to the rhs. (The old Hypoelasticity code with the HLL Riemann solver uses finite difference for non-conservative rhs, which provides enough stability given that HLL smears the interface immediately, so there wasn't a need to pass the du/dy traces before this PR. But that does not work for HLLC/HLLD for Hypoelasticity.)

Also grouped/named the condition branches (with lots of comments within the code):

Branch	Face quantity read	RHS formula per $\alpha_k$	K*div(u) velocity source
`adv_src_alpha_iface`	`flux_src_n(dir)%vf(j_adv)` = per-fluid $\Psi_{\alpha_k}$	$u_\text{cell} \cdot \Delta\Psi_\alpha / \Delta x$	`nc_iface_vel_n(dir)%vf(1)`
`adv_src_vel_iface`	`flux_src_n(dir)%vf(adv\%beg)` = shared $\Psi_u$	$\alpha_k \cdot \Delta\Psi_u / \Delta x$	Same `flux_src_n` slot (already $\Psi_u$)
`adv_src_none`	—	Skipped (HLLD handles internally)	—

The derivations, meanings, and usage of the Riemann source variables are not straightforward. I've added some hopefully very helpful notes in misc/dev_notes for future developers (or AI agents; directing them to my notes should help them make fewer mistakes with the source terms) in terms of the understanding and derivations for the HLL/HLLC non-conservative fluxes, and their variable mapping for Riemann solvers and RHS.

Backwards Compatibiilty

All default behaviors preserved exactly (newly added features as options)
- The only exception is the removal of an incorrect ad-hoc fluids-limit guard that affects only Hypoelasticity HLL
All existing usage of Riemann and rhs source terms are preserved. No refactor is done to keep the scope of this PR limited (any refactoring would touch most of the existing HLLC functionalities)

Type of change

New feature

Testing

All tests passed locally on CPU and Nvidia GPU, and on Frontier.
Smooth Eigenmode Convergence

Weak Solution Comparison (Rodriguez & Johnsen (2019) §5.3(b))

Weak Scaling on Frontier

Checklist

I added or updated tests for new behavior
I updated documentation if user-facing behavior changed
GPU results match CPU results
Tested on NVIDIA GPU or AMD GPU

AI code reviews

Reviews are not triggered automatically. To request a review, comment on the PR:

@coderabbitai review — incremental review (new changes only)
@coderabbitai full review — full review from scratch
/review — Qodo review
/improve — Qodo code suggestions
@claude full review — Claude full review (also triggers on PR open/reopen/ready)
Add label claude-full-review — Claude full review via label

…ass all tests

…2 (1-fluid is valid)

codecov · 2026-06-25T07:19:31Z

Codecov Report

❌ Patch coverage is 68.18966% with 369 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.64%. Comparing base (804c3bf) to head (3a3beab).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_riemann_solver_hypo_hlld.fpp	60.97%	129 Missing and 31 partials ⚠️
src/simulation/m_rhs.fpp	69.78%	56 Missing and 41 partials ⚠️
src/simulation/m_riemann_solver_hllc.fpp	69.18%	28 Missing and 25 partials ⚠️
src/simulation/m_riemann_solver_hll.fpp	48.48%	9 Missing and 8 partials ⚠️
src/simulation/m_riemann_solvers.fpp	10.52%	8 Missing and 9 partials ⚠️
src/simulation/m_hypoelastic.fpp	90.81%	3 Missing and 6 partials ⚠️
src/simulation/m_riemann_solver_lf.fpp	0.00%	5 Missing ⚠️
src/simulation/m_global_parameters.fpp	84.00%	0 Missing and 4 partials ⚠️
src/simulation/m_riemann_state.fpp	95.16%	0 Missing and 3 partials ⚠️
src/common/m_variables_conversion.fpp	66.66%	0 Missing and 2 partials ⚠️
... and 2 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1414      +/-   ##
==========================================
+ Coverage   60.43%   60.64%   +0.20%     
==========================================
  Files          83       84       +1     
  Lines       19871    20705     +834     
  Branches     2956     3064     +108     
==========================================
+ Hits        12010    12556     +546     
- Misses       5860     6054     +194     
- Partials     2001     2095      +94

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sbryngelson

High-level code-quality pass

This PR is large (~12k additions), but most of that is auto-generated tests/*/golden-metadata.txt fixtures (~40 new test dirs) — the hand-written src/ diff is closer to ~2,850 lines. Overall the new code is well-documented for its complexity (good WHY-comments, sensible naming, the dev-notes markdown files are legitimate cross-file architecture docs, not compensating for opaque code). The points below are meant to help trim the remaining LOC/complexity without touching the numerics, plus one real correctness gap.

1. Biggest lever: `s_hypo_hlld_riemann_solver` doesn't use the codebase's own decomposition tool

src/simulation/m_riemann_solver_hypo_hlld.fpp:21-965 is a single ~945-line subroutine with an ~800-line GPU-parallel loop body (147 privatized scalars) — over contributing.md's own soft guideline (≤500 lines/subroutine, ≤1000/file; this file is 1090). MFC already has a working, GPU-offload-safe pattern for factoring per-cell math out of parallel loops: $:GPU_ROUTINE(parallelism='[seq]') helper subroutines (used for s_compute_speed_of_sound, s_compute_pressure, etc.), and this file already calls s_compute_speed_of_sound that way (lines 301-304) — but never applies the same pattern to its own star-state algebra (~lines 556-630) or output-permutation logic (~lines 822-880). Extracting those into [seq]-parallel helpers would roughly halve the loop body with no offload-capability cost.

(Context: this isn't a new anti-pattern — s_hllc_riemann_solver is already ~1970 lines pre-existing — so this PR extends existing tech debt rather than introducing a new category of it. But it's the single biggest opportunity to cut LOC here.)

2. ~150 lines of duplicated small physics formulas — safe, mechanical extraction

Confirmed identical across 3-4 files:

Hypoelastic energy-correction floor/guard formula: m_riemann_solver_hll.fpp:347-360, m_riemann_solver_hllc.fpp:308-320 and :1416-1428, m_riemann_solver_hypo_hlld.fpp:284-294
Wave-speed estimate (Rodriguez et al. 2019 formula): m_riemann_solver_hll.fpp:396-403, m_riemann_solver_hllc.fpp:378-389 and :1489-1500, m_riemann_solver_hypo_hlld.fpp:306-309
s_finalize_riemann_solver_hatR (m_riemann_solver_hypo_hlld.fpp:1030-1088) duplicates the existing generic finalizer in m_riemann_state.fpp:1034-1183 almost verbatim, instead of using the fypp-templating approach this same file already uses for the analogous s_finalize_nc_iface_vel${SUFFIX}$ (lines 974-1020).

Pulling these into shared helper functions/a template would save ~100-150 lines with no design risk.

3. Real gap, not just style: `mhd .and. hypoelasticity .and. riemann_solver==4` is never rejected

m_checker.fpp only prohibits HLLD (riemann_solver==4) when neither mhd nor hypoelasticity is set — never when both are set simultaneously. In that combination, hypo_nc_dual_pass becomes true and m_riemann_solvers.fpp unconditionally routes to s_hypo_hlld_riemann_solver, which has no reference to mhd/Bx/By/Bz anywhere — magnetic-field physics is silently dropped with no error, and the existing MHD HLLD path is simply never reached. Suggest adding @:PROHIBIT(riemann_solver == 4 .and. mhd .and. hypoelasticity, "HLLD does not support combined MHD and hypoelasticity") alongside the existing checks in s_check_inputs_hypo_branch.

Lower priority / follow-up material

m_rhs.fpp threads the new adv_src_mode through 6 separate subroutines (s_initialize_rhs_module, s_compute_rhs, s_compute_directional_rhs, s_compute_advection_source_term, s_add_directional_advection_source_terms, s_finalize_rhs_module) rather than resolving the mode once and passing it down — some shotgun surgery, though consistent with that file's existing (non-templated) style.
m_riemann_solvers.fpp's own comment (lines 42-49) states hypoelasticity now enters the Riemann layer in "three distinct code shapes" (HLL/HLLC inline branches vs. HLLD's separate fused-dual-pass module). A future 4th hypoelastic solver variant would face the same scattered integration choice — worth unifying behind one seam before that happens, not a blocker for this PR.
The three mutually-exclusive hypo-mode booleans derived in m_global_parameters.fpp (hypo_nc_finite_diff/hypo_nc_interface/hypo_nc_dual_pass) could be one enum, matching the adv_src_mode pattern already used elsewhere in the same file — inconsistent style for an equivalent problem, not a bug.

None of the above is a blocker — the code is correct and well-tested per the PR description. These are opportunities to reduce the diff's LOC/maintenance footprint using patterns already established elsewhere in this codebase, plus one validation gap worth closing before merge.

Generated with the help of Claude Code.

ChrisZYJ · 2026-07-02T07:28:02Z

And thanks @sbryngelson for the suggestions! I addressed the following cleanup items:

Added checker + Python validator guard for mhd + hypoelasticity + HLLD.
Deduped repeated hypo elastic-energy and HLL/HLLC wave-speed formulas via Fypp inline macros.
Fixed an HLLC 2D-axisym elastic-energy bug: HLLC now doubles only shear_indices, matching m_variables_conversion.
Kept the intrusive s_hypo_hlld_riemann_solver helper extraction as follow-up scope. The solve kernel has been optimized for Frontier, and prior profiling showed it is register/occupancy-sensitive, so moving large blocks behind helpers could be done separately with performance profiling.

Also fixed the spelling CI failure: docs/documentation/case.md now uses patch(es) instead of patche(s). I think
this was exposed by a newer typos version on CI.

ChrisZYJ · 2026-07-02T07:56:22Z

@sbryngelson If this looks good after the CI rerun, would it be possible to merge this PR soon? I’m happy to continue improving the code in follow-up PRs after this lands. The branch has already gone through several upstream re-merges and CI/review cleanup rounds, so merging the core implementation now would make future cleanup work smaller and easier to review. Thank you!

sbryngelson · 2026-07-02T13:55:36Z

I'm no longer merging messy/bloated/WIP code on the promise that it will be fixed later. I've been burned on this far too many times (perhaps not by you), and it only ratchets up, not down.

Please clean your code of smells, refactor into nice helpers, and minimize SLOC while maintaining speed if you want the PR merged. Thanks.

ChrisZYJ · 2026-07-03T01:47:05Z

Okay. I'll refactor the solver into helpers according to your reviews, and minimize SLOC as you requested. Will post the changes here soon.

…(identical codegen)

…e (identical codegen)

…selections

…c_mode pattern)

…hint on f_hlld_wave_zone

…fetch)

ChrisZYJ · 2026-07-04T02:28:15Z

I've done the refactor. Here is what changed:

Factored the 25 repeated per-component wave-fan flux folds into small Fypp component tables driving one shared template.
Factored the 12 outer-wave stress star-state updates into one formula family with a table over component, sign, and elastic coefficient.
Used a single definition for the per-component HLL flux formula in both the degenerate-fan fallback and the ADC blend.
Factored s_finalize_riemann_solver_hatR and the generic finalizer into one template in m_riemann_state.fpp, so they will not diverge in the future.
Used a single [seq] helper f_hlld_wave_zone for the five-wave fan-zone selection, now shared by the flux fold, the NC face-velocity export, and the axisymmetric face-state pick.
Changed the three mutually exclusive hypo_nc_* booleans into one hypo_nc_mode enum, following the same pattern as adv_src_mode.

This removes 95 lines across the touched files. m_riemann_solver_hypo_hlld.fpp drops from 1084 to 968 lines, the main subroutine from 938 to 859, and the GPU loop body from ~820 to ~690.

What is left in the main subroutine is dominated by the per-cell star-state algebra, which appears only once, so there is no duplication left to remove. Extracting it behind an interface would take ~50 scalar arguments or regrouping the privatized locals into derived types, and either would change register allocation on the GPU kernel. Since the goal was fewer lines at unchanged speed, that block stays inline, and the helper extractions went where the interfaces are narrow.

I've taken care to ensure that most changes do not affect the generated code, and that they have minimal impact on performance. Also merged in the current master.

ChrisZYJ · 2026-07-04T02:30:36Z

One note on CI. The last two runs each failed a single Frontier job at the Fetch Dependencies step, before anything gets built or tested (gpu-omp 1/2, then gpu-acc 2/2). Both failures show the same error on the same runner, frontier-5:

error: Failed to install: pandas-3.0.3-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
  Caused by: failed to open file `/tmp/uv-cache-sbryngelson/archive-v0/h07_8qgFw9kMCAsSlH4jZ/pandas-3.0.3.dist-info/METADATA`: No such file or directory

Maybe a corrupted uv cache entry on that runner's node? Could you help me re-run the one failed job? All the tests should pass.

Self-hosted Frontier/Frontier-AMD matrix legs (acc/omp/cpu x shards) run their "Fetch Dependencies" step directly on the same login node as the same OS user, all pointed at the same UV_CACHE_DIR (introduced in #1385 to dodge NFS file-lock errors on ~/.cache/uv). uv's own cache lock guards individual entries, but concurrent installs from separate uv processes can still race while one extracts/prunes the shared archive-v0 store, leaving a corrupted entry behind (e.g. a missing dist-info METADATA file) that fails every subsequent install until the cache is cleared by hand -- as happened on PR #1414's Frontier gpu-acc [2/2] job. Serialize the actual `uv pip install` call with flock so only one process touches a given cache dir at a time, while keeping the cache itself shared and warm across runs.

sbryngelson · 2026-07-04T12:51:48Z

One note on CI. The last two runs each failed a single Frontier job at the Fetch Dependencies step, before anything gets built or tested (gpu-omp 1/2, then gpu-acc 2/2). Both failures show the same error on the same runner, frontier-5:
error: Failed to install: pandas-3.0.3-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
  Caused by: failed to open file ``/tmp/uv-cache-sbryngelson/archive-v0/h07_8qgFw9kMCAsSlH4jZ/pandas-3.0.3.dist-info/METADATA``: No such file or directory
Maybe a corrupted uv cache entry on that runner's node? Could you help me re-run the one failed job? All the tests should pass.

#1630

Addresses Copilot review on #1630, plus a live recurrence caught on this PR's own CI run (job 85133634699, "Frontier (AMD) cpu [1/2]"): - UV_LOCK_DIR's guard only checked -w, so a writable non-directory TMPDIR would pass and then get used as a directory prefix, breaking flock. Add a -d check alongside -w, per Copilot's suggestion. - That same CI run hit a *new* corruption symptom ("The wheel is invalid: Missing .dist-info directory" for pandas) on the same physical login node (login05) as the original incident on #1414, even with the new lock in place. Root cause: a cache entry corrupted before the lock existed (or by any other cause) just fails forever until someone manually clears it -- which is exactly what had happened here; login05's ~1.2GiB cache had never actually been cleared (an earlier `uv cache clean` in this investigation was run from a different login node's session and never touched login05). Since self-hosted runners are spread across login nodes we can't all individually SSH into and inspect every time this happens, make the script self-heal instead: on install failure, clear the uv cache and retry once before giving up.

ChrisZYJ added 30 commits October 13, 2025 15:00

hypo HLLC works

b212c7e

HLLC ADC works

d18cf7e

HLLD works

bb1c88f

HLLD ADC works

bae9d1b

fix critical HLLD index bug

0004a17

complete refactor and backwards compatibility

b921abf

HLLD KdivU; HLL u_interface NC

cb2b1fb

fix non-hypo HLLC bug

aaad026

2D Axisym works

8359c71

2DA oracle -> main; fix OpenACC

1a3b845

3D works

fd8f89e

fix 3D

f30889d

fix 3D; HLLC+ADC for all

03d68a5

fix 2DA HLLD

ea63315

in progress: major bug fix; HLL method 2

1d4a08d

in progress: major bug fix

556a035

in progress: major bug fix

9828375

in progress: major bug fix

9731acf

major bug fix likely done

9ca2ce8

format

4107872

tests

606970e

fix HLLM2 2DA hypo noKdivu; golden HLLM1 hypo; golden HLLM2 nonhypo

6248e68

hll_alpha_interface to hll_u_interface

db5f9d4

merge upstream; all conflicts resolved

839caac

fix compile; pass old tests

969bcfa

formatting and checker

ffd97d7

new hypo use riemann instead with golden generated using pre-merge; p…

e1a199b

…ass all tests

formatting

7525593

gpu fix

838f06a

fix GPU compile

74fcc3d

Copilot started reviewing on behalf of sbryngelson June 24, 2026 23:52 View session

fix: HLL/HLLC hypoelasticity guard should reject num_fluids>2, not !=…

0a3a7a9

…2 (1-fluid is valid)

ChrisZYJ and others added 3 commits June 25, 2026 23:33

Fix amdflang compile bug

11f8de6

fix Cray-OMP segfault - ACC-only for nc_iface_vel GPU_DECLARE

4aa5745

Merge branch 'master' into hypo_hlld

7a28684

sbryngelson previously approved these changes Jun 27, 2026

View reviewed changes

sbryngelson reviewed Jul 1, 2026

View reviewed changes

Checker & elastic energy macro

6cfcb5e

ChrisZYJ dismissed sbryngelson’s stale review via 6cfcb5e July 2, 2026 06:28

Fix spelling (new CI version for old doc typo)

4cc544c

ChrisZYJ added 8 commits July 2, 2026 19:28

HLLD kernel dedup: table-driven wave-fan fold and stress star states …

e46ad07

…(identical codegen)

Co-generate hatR flux finalizer with the generic via one Fypp templat…

d8e410a

…e (identical codegen)

Extract f_hlld_wave_zone seq helper; unify the three wave-fan upwind …

02c4a09

…selections

Consolidate hypo_nc_* booleans into hypo_nc_mode enum (matches adv_sr…

dea0ca5

…c_mode pattern)

Update dev notes for hypo_nc_mode enum

ff3320d

Review nits: rejoin wrapped comments, module end indent, cray_inline …

5560776

…hint on f_hlld_wave_zone

Merge branch 'master' into hypo_hlld

6737464

Trigger CI rerun (Frontier gpu-omp [1/2] runner failed at dependency …

dff2045

…fetch)

sbryngelson mentioned this pull request Jul 4, 2026

ci: serialize uv installs against the shared node-local cache #1630

Merged

3 tasks

Merge branch 'master' into hypo_hlld

3a3beab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Anchored Dual-pass HLLD for Hypoelasticity (+ HLLC and interface-consistent HLL)#1414

Anchored Dual-pass HLLD for Hypoelasticity (+ HLLC and interface-consistent HLL)#1414
ChrisZYJ wants to merge 104 commits into
MFlowCode:masterfrom
ChrisZYJ:hypo_hlld

ChrisZYJ commented May 9, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

sbryngelson left a comment

Uh oh!

ChrisZYJ commented Jul 2, 2026

Uh oh!

ChrisZYJ commented Jul 2, 2026

Uh oh!

sbryngelson commented Jul 2, 2026

Uh oh!

ChrisZYJ commented Jul 3, 2026

Uh oh!

ChrisZYJ commented Jul 4, 2026

Uh oh!

ChrisZYJ commented Jul 4, 2026

Uh oh!

sbryngelson commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ChrisZYJ commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Design Choices

Separate HLLD Riemann Solvers

Riemann Source Terms

Backwards Compatibiilty

Type of change

Testing

Checklist

AI code reviews

Uh oh!

codecov Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sbryngelson left a comment

Choose a reason for hiding this comment

High-level code-quality pass

1. Biggest lever: s_hypo_hlld_riemann_solver doesn't use the codebase's own decomposition tool

2. ~150 lines of duplicated small physics formulas — safe, mechanical extraction

3. Real gap, not just style: mhd .and. hypoelasticity .and. riemann_solver==4 is never rejected

Lower priority / follow-up material

Uh oh!

ChrisZYJ commented Jul 2, 2026

Uh oh!

ChrisZYJ commented Jul 2, 2026

Uh oh!

sbryngelson commented Jul 2, 2026

Uh oh!

ChrisZYJ commented Jul 3, 2026

Uh oh!

ChrisZYJ commented Jul 4, 2026

Uh oh!

ChrisZYJ commented Jul 4, 2026

Uh oh!

sbryngelson commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

ChrisZYJ commented May 9, 2026 •

edited

Loading

codecov Bot commented Jun 25, 2026 •

edited

Loading

1. Biggest lever: `s_hypo_hlld_riemann_solver` doesn't use the codebase's own decomposition tool

3. Real gap, not just style: `mhd .and. hypoelasticity .and. riemann_solver==4` is never rejected