Skip to content

Architecture Overview

LiranOG edited this page May 9, 2026 · 15 revisions

🏗️ Architecture Overview

GRANITE v0.6.8 | ← AMR Design | Benchmarks & Validation →

This page documents every layer of the GRANITE engine — from the top-level data flow through each subsystem's internal design, memory layout, and coupling interfaces. Implementation details reference the source directly.


Table of Contents

  1. High-Level System Diagram
  2. Data Flow: From YAML to Waveform
  3. Core Data Structure: GridBlock
  4. The CCZ4 ↔ GRMHD Interface: GRMetric3
  5. CCZ4 Variable Layout — 22 Variables
  6. GRMHD Variable Layout — 9 Variables
  7. Subsystem Coupling Map
  8. The RK3 Time Loop — Step by Step
  9. Memory Architecture
  10. Namespace Structure

1. High-Level System Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                        GRANITE ENGINE                               │
│                                                                     │
│  params.yaml ──► Parameter Parser ──► Initial Data Generator        │
│                                            │                        │
│                               ┌────────────▼────────────┐           │
│                               │   AMR Grid Hierarchy    │           │
│                               │  (Berger-Oliger, ≤12 L) │           │
│                               └────────────┬────────────┘           │
│                                            │                        │
│              ┌─────────────────────────────▼──────────────────┐     │
│              │              SSP-RK3 Time Loop                 │     │
│              │                                                │     │
│              │  ┌──────────────────────────────────────────┐ │     │
│              │  │  RK3 Stage k (k = 1, 2, 3)               │ │     │
│              │  │                                          │ │     │
│              │  │  1. Ghost zone exchange (MPI_Isend/Irecv)│ │     │
│              │  │  2. Apply outer BCs (Sommerfeld / Copy)  │ │     │
│              │  │  3. CCZ4 RHS (4th-order FD + KO diss.)  │ │     │
│              │  │  4. GRMHD RHS (MP5/HLLD/CT)             │ │     │
│              │  │  5. M1 Radiation RHS  [built, not wired] │ │     │
│              │  │  6. SSP-RK3 combine                      │ │     │
│              │  │  7. Apply floors (χ, α)                  │ │     │
│              │  │  8. Enforce algebraic constraints        │ │     │
│              │  └──────────────────────────────────────────┘ │     │
│              │                         │                      │     │
│              │  ◄── AMR regrid (every N steps) [partial]     │     │
│              │  ◄── Ghost zone prolongation (coarse → fine)  │     │
│              │  ◄── Ghost zone restriction  (fine → coarse)  │     │
│              │  ◄── Diagnostics: ‖H‖₂, α_center, NaN scan   │     │
│              │  ◄── Checkpoint write (HDF5, every N steps)   │     │
│              │  ◄── GW extraction (Ψ₄) [infrastructure ready]│     │
│              └────────────────────────────────────────────────┘     │
│                                            │                        │
│                          Python post-processing                     │
│          (granite_analysis: sim_tracker CLI, gw, plotting)          │
└─────────────────────────────────────────────────────────────────────┘

2. Data Flow: From YAML to Waveform

Step 1: Input
  └─ params.yaml parsed by yaml-cpp
  └─ All parameters validated and stored in SimParams struct

Step 2: Initial Data
  └─ ID type selected: Brill-Lindquist / Bowen-York / Two-Punctures / TOV
  └─ Conformal factor computed on base grid
  └─ CCZ4 variables initialized (χ, γ̃_ij, K, Ã_ij, Γ̃^i, α, β^i, Θ)
  └─ GRMHD variables initialized (D, S_i, τ, B^i, Y_e)

Step 3: AMR Initialization
  └─ AMRHierarchy constructed with max_levels
  └─ Tracking spheres registered for each BH puncture
  └─ Initial refinement cascade to min_level around punctures
  └─ GridBlock patches created at each level
  └─ Level timesteps set: dt_ℓ = dt_0 / 2^ℓ

Step 4: Time Loop (t = 0 → t_final)
  └─ For each RK3 stage:
      └─ MPI ghost zone exchange (non-blocking Isend/Irecv)
      └─ Outer boundary conditions applied
      └─ CCZ4 RHS computed (OpenMP parallel, field-major loops)
      └─ GRMHD RHS computed (MP5 reconstruction → HLLD → CT)
      └─ Stage update: U^(k+1) = α_k U^n + β_k (U^(k) + dt L(U^(k)))
      └─ Floors applied: χ ∈ [1e-4, 1.5], α ≥ 1e-6
      └─ Algebraic constraints enforced: det(γ̃)=1, tr(Ã)=0
  └─ Every diagnostic_every steps:
      └─ Compute ‖H‖₂, ‖M^i‖₂ over entire domain
      └─ Scan for NaN events
      └─ Write telemetry line (parsed by `granite_analysis.cli.sim_tracker`)
  └─ Every checkpoint_every steps:
      └─ Write HDF5 checkpoint (parallel MPI-IO)

Step 5: Post-Processing
  └─ granite_analysis: load HDF5, plot ‖H‖₂, extract Ψ₄
  └─ gw.py: strain recovery, spherical harmonic decomposition

3. Core Data Structure: GridBlock

GridBlock is the fundamental compute unit — a single uniform Cartesian patch covering a rectangular region of the simulation domain, including ghost zones.

3.1 Memory Layout

// Field-major, flat contiguous allocation
// data_[var * stride + flat(i,j,k)]
//
// stride = (nx + 2*nghost) * (ny + 2*nghost) * (nz + 2*nghost)
// flat(i,j,k) = (i + nghost) * ny_stride
//             + (j + nghost) * nz_stride
//             + (k + nghost)
//
// i ∈ [-nghost, nx+nghost-1]  (negative = ghost)
// j ∈ [-nghost, ny+nghost-1]
// k ∈ [-nghost, nz+nghost-1]

class GridBlock {
    std::vector<Real> data_;   // SINGLE flat allocation — never vector<vector<>>
    int nx_, ny_, nz_;         // Interior cell counts
    int nghost_ = 4;           // 4 ghost zones for 4th-order FD
    int nvar_;                 // Number of evolved variables
    Real dx_, dy_, dz_;        // Cell spacing
    Real x0_, y0_, z0_;        // Physical origin (ghost-inclusive)
};

3.2 Why Field-Major?

// FIELD-MAJOR (our layout):
// data_[var=0, all cells], data_[var=1, all cells], ...
//
// RHS inner loop: for(var...) rhs(var,i,j,k) += ...
// → consecutive var writes are 8 bytes apart (adjacent in memory)
// → L1 cache hit rate: ~99%

// CELL-MAJOR (wrong for our use case):
// data_[i,j,k, all vars], ...
//
// RHS inner loop: for(var...) rhs(var,i,j,k) += ...
// → each var write jumps by stride_ words
// → TLB pressure for large grids

3.3 Access Pattern

// Canonical accessor — use this everywhere
inline Real& GridBlock::at(int var, int i, int j, int k) {
    return data_[var * stride_
                 + (i + nghost_) * ny_stride_
                 + (j + nghost_) * nz_stride_
                 + (k + nghost_)];
}
// i,j,k are interior-relative: 0 → nx-1 are valid interior indices
// i = -1 is the first ghost layer toward -x
// i = -nghost is the outermost ghost layer

3.4 Ghost Zone Layout (2D cross-section)

nghost=4:

[g][g][g][g] | [i][i][i]...[i] | [g][g][g][g]
 ← ghosts →     ← interior →     ← ghosts →
     4               nx              4

Ghost zones filled by:
  - MPI exchange: same-level neighbors on different ranks
  - Prolongation: from parent AMR level (coarser)
  - Boundary conditions: outermost ghosts (Sommerfeld / copy)

3.5 Key Design Decisions (Immutable)

Decision Rationale Reference
Field-major layout RHS loop vectorization: all cells of one variable are contiguous Fix H2
Single flat vector<Real> No pointer-chasing; enables SIMD; 1 malloc per block Fix H2
nghost = 4 always 4th-order FD requires 2 ghost layers; 4 provides safety margin for reconstruction Architecture
Spatial outer, var inner in RHS Matches field-major memory order; eliminates cache thrashing Fix H3

4. The CCZ4 ↔ GRMHD Interface: GRMetric3

GRMetric3 is the only permitted interface between the spacetime evolution module and the GRMHD matter module. Direct access to CCZ4 internal variables from GRMHD code is forbidden.

// include/granite/grmhd/grmhd.hpp
namespace granite::grmhd {

struct GRMetric3 {
    // Conformal decomposition: g_ij = χ^{-1} * γ̃_ij
    Real chi;                // Conformal factor (χ)
    Real gamma_tilde[3][3];  // Conformal spatial metric (γ̃_ij)
    Real K;                  // Trace of extrinsic curvature
    Real A_tilde[3][3];      // Traceless conformal extrinsic curvature (Ã_ij)

    // Gauge quantities
    Real alpha;              // Lapse (α)
    Real beta[3];            // Shift vector (β^i)
    Real Gamma_tilde[3];     // Contracted conformal Christoffel (Γ̃^i)

    // Derived (computed on demand from above)
    Real gamma[3][3];        // Physical spatial metric (γ_ij = γ̃_ij / χ)
    Real gamma_inv[3][3];    // Inverse physical metric (γ^ij)
    Real sqrt_gamma;         // √(det γ)
    Real K_ij[3][3];         // Full extrinsic curvature

    // Factory: flat Minkowski spacetime (for tests)
    static GRMetric3 flat() {
        GRMetric3 g{};
        g.chi = 1.0; g.alpha = 1.0; g.sqrt_gamma = 1.0;
        g.gamma_tilde[0][0] = g.gamma_tilde[1][1] = g.gamma_tilde[2][2] = 1.0;
        g.gamma[0][0] = g.gamma[1][1] = g.gamma[2][2] = 1.0;
        g.gamma_inv[0][0] = g.gamma_inv[1][1] = g.gamma_inv[2][2] = 1.0;
        return g;
    }
};

} // namespace granite::grmhd

Coupling rule: At each RK3 stage, after the CCZ4 RHS is computed, readMetric() extracts a GRMetric3 from the current CCZ4 GridBlock and passes it to computeHLLEFlux() / computeHLLDFlux(). The GRMHD solver never reads raw CCZ4 variables directly.


5. CCZ4 Variable Layout — 22 Variables

Index Symbol Name Physical Meaning
0 χ chi Conformal factor — encodes the volume element of the physical metric
1 γ̃_xx gtxx Conformal metric xx-component
2 γ̃_xy gtxy Conformal metric xy-component (symmetric)
3 γ̃_xz gtxz Conformal metric xz-component
4 γ̃_yy gtyy Conformal metric yy-component
5 γ̃_yz gtyz Conformal metric yz-component
6 γ̃_zz gtzz Conformal metric zz-component
7 K K Trace of extrinsic curvature — expansion of spatial slices
8 Ã_xx Atxx Traceless conformal extrinsic curvature xx
9 Ã_xy Atxy Traceless conformal extrinsic curvature xy
10 Ã_xz Atxz Traceless conformal extrinsic curvature xz
11 Ã_yy Atyy Traceless conformal extrinsic curvature yy
12 Ã_yz Atyz Traceless conformal extrinsic curvature yz
13 Ã_zz Atzz Traceless conformal extrinsic curvature zz
14 Γ̃^x Ghatx Contracted conformal Christoffel — x-component
15 Γ̃^y Ghaty Contracted conformal Christoffel — y-component
16 Γ̃^z Ghatz Contracted conformal Christoffel — z-component
17 α alpha Lapse — rate of proper time vs. coordinate time
18 β^x betax Shift vector — x-component
19 β^y betay Shift vector — y-component
20 β^z betaz Shift vector — z-component
21 Θ Theta CCZ4 constraint variable — actively damped

Algebraic constraints (enforced after each full RK3 step):

  • det(γ̃) = 1 — conformal metric normalization
  • tr(Ã) = 0 — traceless condition on conformal extrinsic curvature

6. GRMHD Variable Layout — 9 Variables

Index Symbol Name Physical Meaning
0 D dens Conserved baryon number density: D = ρ W √γ
1 S_x momx Conserved x-momentum density
2 S_y momy Conserved y-momentum density
3 S_z momz Conserved z-momentum density
4 τ ener Conserved energy density (excluding rest mass)
5 B^x Bx Magnetic field x-component (staggered on x-faces for CT)
6 B^y By Magnetic field y-component (staggered on y-faces for CT)
7 B^z Bz Magnetic field z-component (staggered on z-faces for CT)
8 D·Y_e DYe Conserved electron fraction (advected passively: D·Y_e)

Primitive variables (recovered from conserved at each RK3 stage via Newton-Raphson C2P):

Symbol Name Description
ρ rho Rest-mass density
v^i vx, vy, vz 3-velocity
ε eps Specific internal energy
P press Thermal pressure (from EOS)
T temp Temperature (from EOS inversion)
Y_e Ye Electron fraction
W Lorentz factor: W = 1/√(1 - v²)

7. Subsystem Coupling Map

┌─────────────────────────────────────────────────────────────────┐
│                     Subsystem Interfaces                        │
│                                                                 │
│  CCZ4Solver ──────────────► GRMetric3 ──────────────► GRMHD    │
│     (spacetime)           (read-only copy             (matter)  │
│                            of metric at cell)                   │
│                                                                 │
│  GRMHDEvolution ──────────► StressEnergyTensor3 ──► CCZ4       │
│     (matter)               (T_μν source terms       (spacetime) │
│                             in CCZ4 RHS)                        │
│                                                                 │
│  M1Radiation ─────────────► Source terms ──────────► GRMHD     │
│     (radiation)            [built, NOT wired         (matter)   │
│                             in v0.6.8]                          │
│                                                                 │
│  AMRHierarchy ─────────────► GridBlock* ──────────── All        │
│     (grid mgmt)            (provides patches         subsystems │
│                             to all subsystems)                   │
│                                                                 │
│  ConstraintMonitor ────────► DiagnosticOutput ─────► Dashboard  │
│     (observables)          (‖H‖₂, ‖M^i‖₂, α_center   (Python) │
│                             NaN scan, GW extraction)            │
└─────────────────────────────────────────────────────────────────┘

Interface rules:

From To Interface Rule
CCZ4 GRMHD GRMetric3 ONLY interface — no direct CCZ4 var access from GRMHD
GRMHD CCZ4 StressEnergyTensor3 Source terms in ∂_t K and ∂_t Ã_ij equations
M1 GRMHD Source terms Built but not called in v0.6.8 RK3 loop
AMR All GridBlock hierarchy All physics operates on GridBlock* patches
Diagnostics Output Telemetry lines Parsed by granite_analysis.cli.sim_tracker via regex

8. The RK3 Time Loop — Step by Step

The Shu-Osher SSP-RK3 scheme evolves all variables simultaneously:

U^(1) = U^n + Δt · L(U^n)
U^(2) = (3/4) U^n + (1/4) [U^(1) + Δt · L(U^(1))]
U^(n+1) = (1/3) U^n + (2/3) [U^(2) + Δt · L(U^(2))]

For each stage k, L(U^(k)) is computed as follows:

Stage k computation (src/core/evolution_loop.cpp):

1. syncGhostZones(level)
   ├── MPI_Isend ghost data to ±x, ±y, ±z neighbors (non-blocking)
   ├── MPI_Irecv ghost data from ±x, ±y, ±z neighbors (non-blocking)
   ├── Prolongate ghost cells from parent AMR level
   └── MPI_Waitall (overlap with local computation)

2. applyOuterBC(level)
   ├── Sommerfeld (Two-Punctures data): f_ghost = f∞ + (f_int - f∞)·r_int/r_ghost
   └── Copy (BL data): f_ghost = f_interior

3. computeCCZ4RHS(state, rhs)
   ├── 4th-order centered FD derivatives (d1, d2)
   ├── Chi-weighted selective advection (d1up near-field, d1 near puncture)
   ├── Ricci tensor components
   ├── Γ̃^i evolution (T7 term centered always)
   ├── KO dissipation (single OpenMP region, all 22 vars)
   └── Stress-energy coupling (T_μν from GRMHD primitives)

4. computeGRMHDRHS(state, rhs, metric)
   ├── readMetric() → GRMetric3 at each cell
   ├── MP5/PPM/PLM reconstruction → left/right interface states
   ├── HLLE or HLLD Riemann solver (with actual GRMetric3)
   ├── Flux differencing: RHS += (F_{i+1/2} - F_{i-1/2}) / dx
   ├── Constrained Transport: CT update for B^i
   └── conservedToPrimitive (Newton-Raphson C2P)

5. applyFloors(state)
   ├── chi: clamp to [1e-4, 1.5]  (NaN-safe: uses !isfinite check)
   └── alpha: clamp to [1e-6, ∞)

6. SSP-RK3 combine (stage-dependent coefficients)

[After all 3 stages complete:]
7. applyAlgebraicConstraints(state)
   ├── Enforce det(γ̃) = 1
   └── Enforce tr(Ã) = 0

9. Memory Architecture

9.1 Per-Simulation Memory Estimate

Memory per GridBlock = nvar × (nx+8) × (ny+8) × (nz+8) × 8 bytes
where nghost=4, so 8 cells of padding per dimension

CCZ4:  22 vars
GRMHD:  9 vars
RHS buffers: 31 vars (copies for each RK3 stage)
Total vars: ~93 per GridBlock (with 3 RK3 stage buffers)

For 128³ base grid, 4 AMR levels:
Level 0: 128³ = 2,097,152 cells × 93 vars × 8 bytes ≈ 1.56 GB
Level 1: 64³ ≈ 0.20 GB
Level 2: 32³ ≈ 0.025 GB
Level 3: 16³ ≈ 0.003 GB
Total: ~1.8 GB (fits in 16 GB desktop RAM with headroom)

9.2 Scaling Table

Configuration AMR Levels RAM Estimate
64³ (development) 4 ~0.5 GB
128³ (desktop) 4 ~2 GB
256³ (workstation) 6 ~16 GB
512³ (cluster) 8 ~128 GB
B5_star production 12 ~2 TB

10. Namespace Structure

granite::                       # Top-level namespace
├── GridBlock                   # core/grid.hpp
├── Real                        # core/types.hpp (= double)
├── NUM_SPACETIME_VARS = 22     # core/types.hpp
├── NUM_HYDRO_VARS = 9          # core/types.hpp
├── spacetime::                 # Spacetime subsystem
│   ├── CCZ4Solver              # CCZ4 evolution equations
│   ├── GaugeDriver             # 1+log + Gamma-driver
│   └── ConstraintMonitor       # ‖H‖₂, ‖M^i‖₂
├── grmhd::                     # GRMHD subsystem
│   ├── GRMetric3               # CCZ4 ↔ GRMHD interface
│   ├── GRMHDEvolution          # Valencia formulation
│   ├── EquationOfState         # EOS base class
│   ├── IdealGasEOS             # Γ-law EOS
│   └── TabulatedEOS            # Nuclear EOS with NR inversion
├── amr::                       # AMR subsystem
│   └── AMRHierarchy            # Berger-Oliger grid hierarchy
├── initial_data::              # Initial data generators
│   ├── BrillLindquistID        # Conformal factor sum
│   ├── BowenYorkID             # Extrinsic curvature
│   ├── TwoPuncturesBBH         # Spectral BBH initial data
│   └── TOVSolver               # Neutron star initial data
├── radiation::                 # Radiation transport
│   ├── M1Closure               # Moment equations (grey)
│   └── NeutrinoLeakage         # Leakage scheme
└── postprocess::               # Observables
    ├── Psi4Extractor           # GW extraction
    └── ApparentHorizonFinder   # AH finder

See also: Physics Formulations | Parameter Reference | Developer Guide

Clone this wiki locally