Known Fixed Bugs

🐛 Known Fixed Bugs

GRANITE v0.6.8 | ← Initial Data | Parameter Reference →

Authoritative record of all confirmed fixed bugs. Never re-introduce any item on this list.
Cross-referenced with CHANGELOG.md for full technical details.

⚠️ Standing Order

Every PR that touches any of the files listed in this table must explicitly verify that the corresponding fix is still in place. Include in your PR description: "Verified fix [ID] is present: [how you verified it]."

Complete Bug Registry

C1 — EOS Sound Speed: Hardcoded γ=1

Field	Value
ID	C1
Severity	CRITICAL — Physics wrong for all non-trivial EOS
File	`src/grmhd/grmhd.cpp` — `maxWavespeeds()`
Phase Fixed	Phase 4D
Commit	Phase 4D session

What was wrong:
The fast magnetosonic wavespeed computation used cs² = p / (ρh) — the ideal gas formula for Γ=1 — regardless of the actual EOS. For IdealGasEOS(Γ=5/3), the correct value is cs² = Γ·p/(ρh), giving a wavespeed underestimate of factor √(5/3) ≈ 1.29. For stiff nuclear EOS (Γ≈2.5), the underestimate reached √2.5 ≈ 1.58.

Consequence: Underestimated wavespeeds → HLLE stencil narrower than the true light cone → subcritical scheme → potential CFL violation in production simulations.

The fix:

// BEFORE (wrong):
const Real cs2 = press / (rho * h + 1.0e-30);  // Hardcoded Γ=1

// AFTER (correct):
const Real cs_raw = eos.soundSpeed(rho, eps, press);  // Exact from EOS
const Real cs = std::clamp(cs_raw, 0.0, 1.0 - 1.0e-8);
const Real cs2 = cs * cs;

Verification test: GRMHDGRTest.HLLEFluxScalesWithLapse — passes when wavespeeds are correct.

C3 — HLLE Flat-Metric Hard-Code

Field	Value
ID	C3
Severity	CRITICAL — All previous simulations used SR not GR fluxes
File	`src/grmhd/grmhd.cpp` — `computeHLLEFlux()`
Phase Fixed	Phase 4D

What was wrong:
computeHLLEFlux() internally constructed a dummy flat Minkowski metric:

Metric3 g; g.alpha=1; g.betax=0; g.betay=0; g.betaz=0; g.sqrtg=1; // FLAT!

This meant the HLLE solver computed SR (special-relativistic) fluxes regardless of the actual spacetime curvature. Near a neutron star surface (α≈0.8, √γ≈1.2), the flux magnitude was wrong by α·√γ ≈ 0.96 — a ~4% systematic error.

Additionally, the wavespeed formula was wrong: λ± = α·v_pm - α (subtracted lapse instead of shift).

The fix: Promoted Metric3 to public GRMetric3 struct. Changed computeHLLEFlux() signature to accept const GRMetric3& g. Corrected wavespeed formula:

// AFTER (correct GR formula):
const Real beta_dir = (dir==0)?g.betax:(dir==1)?g.betay:g.betaz;
return {g.alpha * vL - beta_dir, g.alpha * vR - beta_dir};

Architectural consequence: GRMetric3 is now the mandatory, public-only interface between CCZ4 and GRMHD. This architectural rule must be maintained.

H1 — KO Dissipation: 22 Separate OpenMP Regions

Field	Value
ID	H1
Severity	HIGH — ~30% of compute time wasted; potential race conditions
File	`src/spacetime/ccz4.cpp` — `applyDissipation()`
Phase Fixed	Phase 4D

What was wrong:

for (int var = 0; var < 22; ++var) {         // Outer loop: 22 vars
    #pragma omp parallel for collapse(2)      // ← 22 separate thread spawns!
    for (int k ...) for (int j ...) for (int i ...)
        rhs.data(var, i, j, k) += ko_term;
}

This spawned and joined the OpenMP thread pool 22 times per RHS call. Each cycle: ~50 μs overhead × 22 × 10⁶ RHS calls ≈ 1,100 seconds wasted per long run (~30% of total compute time).

The fix: Single parallel region, variable loop innermost:

#pragma omp parallel for collapse(2) schedule(static)  // ONE thread spawn
for (int k ...) for (int j ...) {
    Real diss[22] = {};  // Stack-allocated, L1 cache resident
    for (int i ...) {
        for (int var = 0; var < 22; ++var)
            rhs.data(var, i, j, k) += ko_compute(var, i, j, k);
    }
}

Verification test: CCZ4FlatTest.KODissipationIsZeroForFlatSpacetime — confirms zero dissipation on constant data with the new loop structure.

H2 — GridBlock: vector-of-vectors Allocation

Field	Value
ID	H2
Severity	HIGH — Cache thrashing, no SIMD, 22 heap allocations per block
File	`include/granite/core/grid.hpp`, `src/core/grid.cpp`
Phase Fixed	Phase 5

What was wrong:

std::vector<std::vector<Real>> data_;  // 22 separate heap allocations!
// Each var's data at a different memory address
// Cache miss on every var access in KO dissipation accumulator

The fix:

std::vector<Real> data_;  // SINGLE flat allocation
// Indexed as: data_[var * stride + flat(i,j,k)]
// All vars of any given cell are stride_ words apart in memory
// All cells of any given var are 8 bytes apart (SIMD-friendly)

22 → 1 heap allocations per GridBlock. All data() accessor calls produce identical results — zero behavioral change to callers.

Verification tests: GridBlockFlatLayoutTest (4 tests): FlatBufferContiguity, RawDataEquivalentToAccessor, FlatBufferSpansAllVars, MultiVarWriteReadBack.

H3 — RHS Zero-Out Loop Order

Field	Value
ID	H3
Severity	HIGH — Cache thrashing in the most frequently executed code path
File	`src/grmhd/grmhd.cpp` — `GRMHDEvolution::computeRHS()`
Phase Fixed	Phase 5

What was wrong:

// WRONG — var outermost causes stride_ jumps between writes:
for (int var = 0; var < 9; ++var)
    for (int k ...) for (int j ...) for (int i ...)
        rhs.data(var, i, j, k) = 0.0;
// Consecutive writes: stride_ words apart → TLB pressure on large grids

The fix:

// CORRECT — spatial outermost, var innermost = sequential writes:
for (int k ...) for (int j ...) for (int i ...)
    for (int var = 0; var < 9; ++var)
        rhs.data(var, i, j, k) = 0.0;
// Consecutive writes: 8 bytes apart (one double) → L1 cache hits

Rule: Spatial loops must always be outer; variable loop must always be inner. This must not be changed.

TOV — TOV Solver RSUN_CGS Conversion

Field	Value
ID	TOV
Severity	CRITICAL — TOV solver produced nonsensical results
File	`src/initial_data/initial_data.cpp`
Phase Fixed	Phase 4A/4C

What was wrong:

const Real r_cgs = r_km * RSUN_CGS;  // RSUN_CGS ≈ 6.957e10 cm  ← WRONG!
// Used solar radius instead of km→cm conversion factor
// Off by factor: 6.957e10 / 1e5 = ~700,000
// Step sizes were astronomically large; solver diverged on first step

The fix:

const Real r_cgs = r_km * 1.0e5;  // 1 km = 1e5 cm ← CORRECT

Verification: PolytropeTest.TOVSolver — M ≈ 1.4 M☉, R ≈ 10 km for standard polytrope.

KO-σ — Over-Dissipation at ko_sigma = 0.35

Field	Value
ID	KO-σ
Severity	HIGH — Destroys trumpet gauge profile silently
File	`benchmarks/B2_eq/params.yaml`
Phase Fixed	v0.6.5 (reverted B2_eq config)

What happened:
The B2_eq benchmark YAML was configured with ko_sigma: 0.35 (from Phase 7 experimentation with aggressive gauge damping). This value over-dissipates the trumpet gauge profile: the high-frequency noise is suppressed but so is the physically important steep gradient that forms the trumpet geometry. Secondary gauge collapse occurs without NaN events, making the run appear stable while the physics is unphysical.

The fix: ko_sigma: 0.35 → ko_sigma: 0.1 in all benchmark YAML files.

Rule: ko_sigma > 0.15 is forbidden in all production configurations. Default and enforced value: 0.1.

Sommerfeld+BL — Incompatible BC + Initial Data

Field	Value
ID	Sommerfeld+BL
Severity	HIGH — 8× higher constraint violations from step 1
File	Configuration / user error
Phase Fixed	Phase 6 (documented; user responsibility)

Root cause: Sommerfeld BCs assume a spherical 1/r outgoing wave profile. Brill-Lindquist initial data has a conformal factor ψ = 1 + Σ(m/2r) which does NOT match this profile. The BC formula therefore introduces artificial metric gradients from t=0 that feed into the Ricci tensor.

Evidence (Phase 6 experiment):

Copy BC at t=5.625M: ‖H‖₂ = 0.276
Sommerfeld BC at t=5.625M: ‖H‖₂ = 2.257 (8.2× worse)

Rule: Use boundary.type: copy with initial_data.type: brill_lindquist. This is non-negotiable.

Domain — Boundary Gauge Reflections on Small Domains

Field	Value
ID	Domain
Severity	HIGH — Constraint explosion at predictable time
File	`benchmarks/*/params.yaml` — domain.size parameter
Phase Fixed	Phase 6 (documented) / Phase 7 (domain expansion to ±48M)

Physics:
Gauge waves in 1+log slicing propagate at speed c_gauge = √2 ≈ 1.41 M/M. A wave emitted at t=0 from r=0 reaches the outer boundary at t = domain_size / √2. Upon reflection, constraint violations grow exponentially to NaN in O(1) output window.

Minimum safe domain sizes:

Single puncture (t=500M): ±48M (t_reflection ≈ 33.9M; stable to t=500M confirmed)
BBH inspiral (t=500M): ±200M (t_reflection ≈ 141M; validated in B2_eq benchmark)

alpha_center — Diagnostic Bug: AMR Level 0 Sampling

Field	Value
ID	alpha_center
Severity	MEDIUM — Misleading diagnostic output
File	`src/core/evolution_loop.cpp` — `alpha_center` diagnostic
Status	Known but NOT YET FIXED — v0.7 target

What is wrong:
The alpha_center diagnostic reads the lapse from AMR level 0 (the coarsest grid level) at the grid point nearest to the domain center. At 64³ base resolution with dx=6.25M, this cell center is at approximately r ≈ 0.65·dx ≈ 4M from the origin — not at the puncture.

Consequence: The displayed α_center ≈ 0.97 in B2_eq runs is the lapse at r≈4M from the puncture, where the spacetime is nearly flat. The actual puncture lapse (r→0) should be ~0.03 in a resolved trumpet.

Fix required: Sample α from the finest AMR level at the tracking sphere center.

Known Fixed Bugs

🐛 Known Fixed Bugs

⚠️ Standing Order

Complete Bug Registry

C1 — EOS Sound Speed: Hardcoded γ=1

C3 — HLLE Flat-Metric Hard-Code

H1 — KO Dissipation: 22 Separate OpenMP Regions

H2 — GridBlock: vector-of-vectors Allocation

H3 — RHS Zero-Out Loop Order

TOV — TOV Solver RSUN_CGS Conversion

KO-σ — Over-Dissipation at ko_sigma = 0.35

Sommerfeld+BL — Incompatible BC + Initial Data

Domain — Boundary Gauge Reflections on Small Domains

alpha_center — Diagnostic Bug: AMR Level 0 Sampling

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🌌 GRANITE Wiki

🚀 Getting Started

🔬 Physics & Science

🖥️ Running Simulations

👨‍💻 Development

📚 Reference

Clone this wiki locally