ForceFreeStates: Dual Riccati + Parallel FM integration + Δ' output#178
ForceFreeStates: Dual Riccati + Parallel FM integration + Δ' output#178
Conversation
…(1.6x speedup on Solovev) Implements the dual Riccati matrix S = U₁·U₂⁻¹ as a faster alternative to the standard Euler-Lagrange ODE integration. Enable with `use_riccati = true` in jpec.toml. Integration strategy: uses `sing_der!` (same ODE RHS as standard) with periodic Riccati renormalization S = U₁·U₂⁻¹, U₂ = I in the callback when column norms exceed ucrit. This is mathematically equivalent to the explicit Riccati ODE (dS/dψ = B + A·S - S·D - S·C·S) but numerically stable: the explicit Riccati ODE has quadratic blowup for explicit solvers when K̄·S >> Q, while sing_der! + renorm tracks the bounded ratio S = U₁/U₂. The Riccati crossing (`riccati_cross_ideal_singular_surf!`) skips Gaussian reduction (which can produce NaN/Inf when S is near-zero near the axis) and uses `ipert_res` directly. Benchmarks on Solovev example (N=8, 1 singular surface): Standard ODE: 83.7 ms, 157 steps Riccati ODE: 51.4 ms, 121 steps (1.63x speedup, 0.006% energy difference) See: Glasser (2018) Phys. Plasmas 25, 032507 — Eq. 19 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## Part 1: Δ' output (tearing stability parameter)
- Add `delta_prime::Vector{ComplexF64}` to `SingType`
- Add `compute_delta_prime_from_ca!` in EulerLagrange.jl, called at end of
`eulerlagrange_integration` (standard path only — see normalization note below)
- Write `singular/delta_prime` as (msing × n_modes) ComplexF64 to HDF5 output in JPEC.jl
- Riccati path does NOT compute delta_prime: ca_l is accumulated in (S,I) normalization
which is inconsistent with the Δ' formula (standard (U1,U2) normalization required)
## Part 2: Parallel Fundamental Matrix (FM) integration
- Add `ChunkPropagator` struct (two N×N×2 blocks for identity-block ICs) in Structs
- Add `use_parallel::Bool = false` control flag in ForceFreeStatesControl
- Add `integrate_propagator_chunk!` — integrates each chunk from IC=(I,0) and IC=(0,I)
independently using BS5 solver, no callback; suitable for Threads.@threads
- Add `apply_propagator!` — in-place 2×2 block matrix multiply on odet.u
- Add `balance_integration_chunks` — sub-divides chunks using ode_itime_cost for
load-balanced parallel work; target = max(2*msing+3, 4*nthreads)
- Add `ode_itime_cost` — log-divergent cost model from STRIDE (Glasser 2018)
- Add `parallel_eulerlagrange_integration` — parallel phase with Threads.@threads,
serial assembly calling renormalize_riccati_inplace! before each crossing (needed
because apply_propagator! gives general (U1,U2) state but riccati crossing expects
(S,I) form); uses ipert_res-direct zeroing to correctly identify the resonant column
- Dispatch from eulerlagrange_integration: use_parallel → use_riccati → standard
## Tests (29 total: 11 Riccati + 18 Parallel FM)
- runtests_riccati.jl: update Δ' test — only standard path populates delta_prime
- runtests_parallel_integration.jl (new): ChunkPropagator identity/linearity,
balance_integration_chunks count/coverage/crossings, ode_itime_cost additivity,
parallel FM energy match (rtol=2%, Solovev)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rallel tests to suite Δ' is now computed inline in riccati_cross_ideal_singular_surf! using the diagonal formula on the bounded (U₁, U₂) state (max ≤ ucrit, no GR permutation). This gives physically correct values: 57.3 and -4.03 for the two Solovev singular surfaces. The standard path does not populate delta_prime — Gaussian Reduction inflates the resonant column's asymptotic coefficients, making ca_l non-physical regardless of when it is computed. A comment in cross_ideal_singular_surf! explains the limitation. Also adds runtests_riccati.jl and runtests_parallel_integration.jl to the default test suite (runtests.jl). Both were previously excluded. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… lacks Δ' The comment in cross_ideal_singular_surf! previously said the issue was GR "normalization inflation." The real reason is more subtle: Δ' is a complex, normalization-convention-dependent quantity. The Riccati renormalization (U₂→I) continuously phases solution columns into a specific gauge where the diagonal formula (ca_r - ca_l)/denom gives physically meaningful values. The standard path's solution columns grow from the axis with an arbitrary complex phase; dividing by the outer asymptotic coefficient normalizes the magnitude but not the complex phase, producing a value in a different convention that does not match what SingularCoupling.jl expects. Also reverts the failed attempt to compute Δ' in cross_ideal_singular_surf! via perm_col + A_outer normalization, which produced -0.10-0.54i vs the Riccati 57.3+58.3i (same physical quantity, incompatible conventions). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…large-N documentation 1. Add SingType.delta_prime_col (N × n_res_modes Matrix) storing the full column (ca_r[:, ipert_res, 2] - ca_l[:, ipert_res, 2]) / (4π²·psio) at each crossing. The diagonal element matches delta_prime[i] exactly. Off-diagonal elements give intra-surface coupling of all N modes to each resonant mode through the singular layer asymptotic expansion. Only populated for Riccati/parallel FM paths. 2. Add singular/m, singular/n, singular/delta_prime_col HDF5 outputs so downstream users can access the full off-diagonal Δ' without needing to index ca_left/ca_right. 3. Document the known numerical limitation of the parallel FM path for large N: FM propagators become ill-conditioned for N ≳ 20 without QR orthogonalization, causing ~10% energy error for DIIID (N=26) with no wall-clock speedup over Riccati. Deferred fix: bidirectional integration or continuous QR (noted in docstring/tests). 4. Update outer-plasma Riccati re-integration (already committed) docstring to match. Tests: 50/50 Riccati+parallel, 84/84 EulerLagrange all pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trix Implements the STRIDE global boundary value problem for computing the full 2·msing × 2·msing inter-surface tearing stability matrix. Each entry gives the U₂[ipert_res] response amplitude at one surface boundary when driving with unit amplitude at another, encoding cross-surface coupling. Changes: - Riccati.jl: add assemble_fm_matrix (chunk FM product) and compute_delta_prime_matrix! (BVP assembly + solve via STRIDE formulation from Glasser 2018 Phys. Plasmas 25, 032501 Sec. III.B); call from parallel_eulerlagrange_integration - ForceFreeStatesStructs.jl: add delta_prime_matrix field to ForceFreeStatesInternal with docstring - JPEC.jl: write delta_prime_matrix to singular/delta_prime_matrix in HDF5 - test/runtests_parallel_integration.jl: add delta_prime_matrix regression test (shape, finiteness, non-zero diagonal); 30 tests total (was 23) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… for large-N accuracy
The all-forward parallel FM path had ~10% energy error for large-N problems
(DIIID N=26, n=1) because the chunk immediately before each rational surface
crossing integrates into exponentially growing solution territory, producing
an ill-conditioned FM propagator.
Fix: integrate the crossing chunk *backward* (psi_end → psi_start). Solutions
that grow exponentially forward decay backward, yielding a well-conditioned
backward FM Φ_bwd. The accurate forward propagation is recovered as Φ_bwd⁻¹
via a stable LU solve in apply_propagator_inverse!.
The same backward FM is used directly in the Δ' BVP (compute_delta_prime_matrix!)
as Phi_L[j], splitting each ill-conditioned inter-surface FM product into
well-conditioned Phi_R (forward chunks) and Phi_L (backward crossing chunk).
Changes:
- IntegrationChunk: add direction::Int=1 field (+1 forward, -1 backward)
- chunk_el_integration_bounds: add bidirectional=false kwarg; crossing chunks
get direction=-1 when true
- balance_integration_chunks: left sub-chunk always direction=1; right inherits
chunk.direction so the near-singularity chunk stays backward after splitting
- integrate_propagator_chunk!: reverses tspan for direction=-1 chunks
- apply_propagator_inverse!: new function, LU solve Φ_bwd·x = u_old
- Serial assembly: branches on chunk.direction (inverse vs forward apply)
- parallel_eulerlagrange_integration: passes bidirectional=true
- compute_delta_prime_matrix!: BVP now uses Phi_R·x_right - Phi_L·x_left = 0
at each junction instead of ill-conditioned monolithic Phi_segs product
- assemble_fm_matrix: safe for empty idx_range (uses propagators[1] for N)
Results (et[1] stability eigenvalue):
Solovev N=8: 0.006% error (was already fine)
DIIID N=26: 0.236% error (was ~10.5% — 44× accuracy improvement)
Tests: 31/31 pass in runtests_parallel_integration.jl (+1 DIIID accuracy test)
18/18 pass in runtests_riccati.jl (unchanged)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@claude review this |
|
Claude finished @logan-nc's task in 3m 55s —— View job Code ReviewThis is a well-designed PR implementing three complementary integration paths. The theory is clearly traced to Glasser (2016/2018), the docstrings are excellent, and the bidirectional fix for large-N conditioning is elegant. Below are issues found, ordered by severity. Bug:
|
Adds benchmarks/benchmark_threads.jl to measure wall-clock time and accuracy of the standard, Riccati, and parallel FM integration paths across varying thread counts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes from code review of PR #178: - assemble_fm_matrix: add explicit isempty guard before the propagator loop so an empty idx_range (e.g. i_crossings[1]==1) returns the identity matrix without relying on the loop falling through silently. - compute_delta_prime_matrix!: add @Assert at function entry that all singular surfaces have exactly one resonant mode, so multi-resonance surfaces fail loudly instead of silently using only sp.m[1]/sp.n[1]. - runtests_parallel_integration.jl: remove stale comment that described large-N FM ill-conditioning as an open problem with ~10% energy error; bidirectional integration (now the default for use_parallel=true) has resolved this. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…er! and compute_delta_prime_from_ca! Two developer benchmark scripts for verifying the two dead-code reference implementations flagged in the Claude code review of PR #178: benchmarks/benchmark_riccati_der.jl Verifies riccati_der! correctly evaluates Glasser 2018 Eq. 19: dS/dψ = w†·F̄⁻¹·w - S·Ḡ·S, w = Q - K̄·S Uses Hermitian test states (physical constraint: the EL system preserves S†=S from the axis) and compares riccati_der! against manual evaluation of the same formula using the ffit splines directly. Observed error: ~1e-17 (machine epsilon). No TOML flags needed. benchmarks/benchmark_delta_prime_methods.jl Verifies compute_delta_prime_from_ca! gives bit-for-bit identical Δ' values to the inline computation in riccati_cross_ideal_singular_surf!. Both apply the same diagonal formula to the same ca_l/ca_r arrays, so the result must be exactly zero difference. Observed difference: 0.0 (exact). No TOML flags needed. Neither script requires new TOML flags: they call internal functions directly without going through ForceFreeStatesControl. Developer-only knobs belong in scripts, not in user-facing config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tring The previous "O(Δψ)" phrasing in the Integration Strategy section read as a global accuracy statement, suggesting the Riccati path is only first-order accurate. This is wrong: the method integrates the linear EL ODE with Tsit5 (5th-order) and recovers S = U₁·U₂⁻¹ by exact renormalization, achieving the full ODE solver reltol. Rewrite the section in three clearly labelled parts: - Why riccati_der! (quadratic ODE) is avoided: relative error control is unfaithful when |S| is large, not a step-size problem, not fixable by adaptation without an implicit solver. - What the implementation actually does: sing_der! (linear ODE, exact RHS), Tsit5 (5th-order), exact renormalization, same global accuracy as standard. - Local consistency analysis: the O(Δψ) expansion is retained but now labelled explicitly as a consistency check, not an accuracy claim. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…setup + new unit tests
Two changes in one pass:
Shared setup (performance):
equil (Grad-Shafranov) and ffit (metric matrices) are now built once and
shared across all integration-dependent testsets via a make_solovev_intr
helper for cheap fresh intr construction. Previously, setup_equilibrium +
make_metric + make_matrix ran 4 times and riccati_eulerlagrange_integration
ran 3 times. Now each runs once, cutting total test time significantly.
New unit tests (dead code coverage):
"riccati_der! formula — Glasser 2018 Eq. 19": verifies riccati_der!
correctly evaluates dS/dψ = w†F̄⁻¹w − SGS at several ψ points using
Hermitian test states (physical constraint). Agrees with manual formula
evaluation to machine precision (~1e-17). No extra integration needed.
"compute_delta_prime_from_ca! matches inline Δ'": verifies the standalone
Δ' formula gives bit-for-bit identical results to the inline computation
in riccati_cross_ideal_singular_surf!. Reuses the shared odet_ric.
Total: 23 tests (was 18), runtime ~51s (was ~80s+ with redundant setup).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dd 3 unit tests - Delete unused parallel_threads field from ForceFreeStatesControl: the field was silently ignored (Threads.@threads uses JULIA_NUM_THREADS at startup, not a runtime field). Removes false impression that thread count can be set from jpec.toml. - Add apply_propagator_inverse! round-trip unit test: verifies Φ⁻¹·Φ = I algebraically, complementing the existing apply_propagator! identity and linearity tests. - Add chunk_el_integration_bounds direction field test: verifies bidirectional=true sets direction=-1 on crossing chunks and direction=+1 on non-crossing chunks, and that balance_integration_chunks preserves direction correctly (right sub-chunk inherits, left sub-chunk always +1). Catches direction propagation regressions. - Add delta_prime_matrix DIIID regression test: verifies the STRIDE BVP Δ' matrix is finite and non-zero for the large-N case (N≈26, multiple rational surfaces), where ill-conditioned (non-bidirectional) FM propagators would produce NaN/Inf entries. 56/56 parallel integration tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…f/riccati One conflict resolved in src/GeneralizedPerturbedEquilibrium.jl (was src/JPEC.jl): write_outputs_to_HDF5 vacuum section. Resolution: - Keep new Δ' HDF5 outputs from perf/riccati (singular/m, singular/n, delta_prime, delta_prime_col, delta_prime_matrix) - Adopt develop's vacuum output format: vac_data variable name, plasma_pts/wall_pts fields (3D Cartesian), y_plasma/y_wall entries, always-write pattern with empty arrays All other files (EulerLagrange.jl, ForceFreeStatesStructs.jl, runtests.jl) auto-merged cleanly. Default HDF5 filename updated to gpec.h5. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…um references post-rename Update test files and benchmarks to use the new package name and config filename (gpec.toml) following the GPEC rename merged from develop: - test/runtests_riccati.jl - test/runtests_parallel_integration.jl - benchmarks/benchmark_threads.jl - benchmarks/benchmark_riccati_der.jl - benchmarks/benchmark_delta_prime_methods.jl 23/23 riccati tests and 56/56 parallel integration tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Post-merge update — after develop (GPEC rename) mergeMerged Test results (post-merge)
Benchmark results (post-merge, 1 thread)
Benchmark results (post-merge, 4 threads)
Note: 1-thread Solovev timings have high variance (sub-100ms runs) and are not meaningful for speedup comparisons. DIIID (1.2s+) is the reliable benchmark. At 4 threads, Riccati achieves 1.29× and parallel achieves 1.30× on DIIID. Energy agreement is unchanged: <0.13% for both paths vs standard. |
…page Creates docs/src/stability.md covering the ForceFreeStates module: - Newcomb/DCON ideal MHD stability criterion with paper citations (Glasser 2016 Phys. Plasmas 23 112506, 2018a 032507, 2018b 032501) - Standard, Riccati, and parallel FM integration methods - Bidirectional integration strategy for large-N accuracy - Δ' tearing parameter: per-surface (delta_prime/delta_prime_col) and inter-surface matrix (delta_prime_matrix / STRIDE BVP) - Configuration reference, API autodocs block, example usage Adds page to docs/make.jl navigation and cross-links from equilibrium.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kdown links)
1. Add Random stdlib to Project.toml [deps] and [compat] — required by
runtests_riccati.jl but missing from declared dependencies, causing
CI failure with "Package Random not found in current path".
2. Fix docstring markdown in Riccati.jl and ForceFreeStatesStructs.jl:
- Wrap bare [array_notation] (link text) immediately followed by
(description) (parsed as URL) in code fences to prevent Documenter
from treating them as broken local links.
- Affected: assemble_fm_matrix BVP unknowns block, Phi_L/Phi_R
equations, and VacuumData plasma_pts/wall_pts field descriptions.
These were surfaced by the new @autodocs block in stability.md.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nt logging Three targeted fixes from pre-merge code review: 1. Threads.@threads :static — since Julia 1.7, the default :dynamic scheduler can migrate tasks between OS threads mid-execution, making Threads.threadid() unreliable for indexing into odet_proxies. Using :static guarantees a 1:1 task-to-thread mapping for the parallel FM integration phase. 2. outer_chunk psi_end guard — the outer-plasma re-integration in parallel_eulerlagrange_integration now uses psilim*(1-eps) to match the guard applied by chunk_el_integration_bounds, avoiding a potential boundary evaluation at exactly psilim. 3. Replace println with @info/@warn — all verbose-mode output in Riccati.jl now uses Julia logging macros, consistent with EulerLagrange.jl. This enables log-level filtering and suppression in tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pre-merge cleanup (commits 1515591..725a527)Three rounds of fixes applied after a code review pass: CI fixes (1515591)
Documentation (142a79c)
Pre-merge code review fixes (725a527)Three issues flagged as must-fix or high-priority:
All 23 Riccati tests and 56 parallel FM tests pass locally after these changes. |
Summary
Implements two new integration paths for the Euler-Lagrange ODE and adds full Δ' (tearing stability parameter) computation to the Riccati and parallel FM paths.
Dual Riccati integration (
use_riccati = true): Sequential reformulation as a matrix Riccati ODE S = U₁·U₂⁻¹ with periodic renormalization to maintain bounded (U₁, U₂). Validated on Solovev (0.006% energy error) and DIIID n=1 (0.12% error). Faster than standard for large N.Parallel FM integration (
use_parallel = true): Each integration chunk is solved independently from identity ICs usingThreads.@threads, then assembled serially. Uses bidirectional integration — crossing chunks (near rational surfaces) are integrated backward to keep FM propagators well-conditioned. The well-conditioned backward FM inverse is applied via LU solve during serial assembly. Fixes a ~10% energy error for DIIID N=26 that existed with the all-forward approach.Per-surface Δ': Both paths compute
intr.sing[s].delta_prime(scalar) anddelta_prime_col(full N-vector of off-diagonal coupling). Written tosingular/delta_primeandsingular/delta_prime_colin HDF5.Inter-surface Δ' matrix (parallel FM only):
intr.delta_prime_matrix(2·msing × 2·msing) via the STRIDE global BVP [Glasser 2018 Phys. Plasmas 25, 032501]. Written tosingular/delta_prime_matrixin HDF5.Accuracy
The ~0.12% gap on DIIID is algorithmic (different crossing convention: Riccati-style vs Gaussian Reduction), not ODE tolerance. Both paths are physically correct; 0.12% is well within physical uncertainty.
Performance (4 threads)
Tests
runtests_riccati.jlruntests_parallel_integration.jlRiccati tests include unit tests for:
renormalize_riccati_inplace!,renormalize_riccati!, Riccati end state U₂≈I, Δ' regression values, delta_prime_col shape/diagonal consistency,riccati_der!formula (Glasser 2018 Eq. 19), andcompute_delta_prime_from_ca!bit-identical consistency.Parallel tests include:
apply_propagator!identity and linearity,apply_propagator_inverse!round-trip, chunk balance target count,directionfield propagation throughbidirectional=true, Solovev energy accuracy, DIIID energy accuracy (key bidirectional fix regression),ode_itime_costadditivity, and Δ' matrix regression for both Solovev and DIIID.Files changed
src/ForceFreeStates/ForceFreeStatesStructs.jlIntegrationChunk.direction,SingType.delta_prime_col,ForceFreeStatesInternal.delta_prime_matrix; removed deadparallel_threadsfieldsrc/ForceFreeStates/EulerLagrange.jlchunk_el_integration_bounds(bidirectional=false),balance_integration_chunksdirection propagationsrc/ForceFreeStates/Riccati.jlintegrate_propagator_chunk!bidirectional tspan;apply_propagator_inverse!; bidirectional serial assembly;compute_delta_prime_matrix!with Phi_L/Phi_R BVP;assemble_fm_matrixempty-range guard; single-resonance assertionsrc/GeneralizedPerturbedEquilibrium.jlsingular/delta_prime,delta_prime_col,delta_prime_matrixtest/runtests_riccati.jlriccati_der!formula test,compute_delta_prime_from_ca!testtest/runtests_parallel_integration.jlapply_propagator_inverse!, direction field, DIIID Δ' matrixbenchmarks/benchmark_threads.jlbenchmarks/benchmark_riccati_der.jlriccati_der!sanity checkbenchmarks/benchmark_delta_prime_methods.jlcompute_delta_prime_from_ca!sanity checkCode review responses
Fixed:
assemble_fm_matrixempty-range guard · single-resonance assertion incompute_delta_prime_matrix!· stale comment in parallel tests · deadparallel_threadsfield removedDeferred:
apply_propagator!per-call allocations · test boilerplate inruntests_parallel_integration.jl🤖 Generated with Claude Code