-
Notifications
You must be signed in to change notification settings - Fork 0
HPC Deployment
LiranOG edited this page May 9, 2026
·
8 revisions
GRANITE v0.6.8 | ← Gravitational Wave Extraction | Initial Data →
# Always run before any simulation
python3 scripts/health_check.pyVerifies: Release build flags (-O3 -march=native), OMP thread count, available RAM, HDF5 parallel mode.
| Configuration | AMR Levels | RAM (estimated) |
|---|---|---|
| 64³, 4 levels (desktop) | 4 | ~0.5 GB |
| 128³, 4 levels (desktop) | 4 | ~4-6 GB |
| 256³, 6 levels (workstation) | 6 | ~16 GB |
| 512³, 8 levels (cluster) | 8 | ~128 GB |
| B5_star, 12 levels | 12 | ~2 TB |
Formula:
RAM ≈ nvar_total × nx³ × AMR_factor × 8 bytes
nvar_total = 22 (CCZ4) + 9 (GRMHD) = 31 × 3 (RK3 buffers) = 93
AMR_factor ≈ 1.14 per level (geometric series: Σ (1/8)^ℓ)
# Recommended: use all physical cores (not hyperthreads)
export OMP_NUM_THREADS=$(nproc --all) # Linux
export OMP_PROC_BIND=close # Bind threads to nearby cores
export OMP_PLACES=cores # Core-level granularityFor NUMA systems (multi-socket nodes):
numactl --interleave=all python3 scripts/run_granite_hpc.py ...python3 scripts/run_granite_hpc.py \
build/bin/granite_main \
benchmarks/B2_eq/params.yaml \
--omp-threads 32 \
--mpi-ranks 128 \
--disable-numa-bind \
--amr-telemetry-file /scratch/$USER/amr_B2eq.jsonlSLURM auto-generation (produces jobs/submit_granite.sbatch):
python3 scripts/run_granite_hpc.py \
build/bin/granite_main \
benchmarks/B2_eq/params.yaml \
--slurm \
--mpi-ranks 128 \
--omp-threads 8#!/bin/bash
#SBATCH --job-name=granite_B2eq
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=4G
#SBATCH --time=24:00:00
#SBATCH --partition=compute
module load gcc/11 openmpi/4.1 hdf5/1.12-parallel
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export OMP_PROC_BIND=close
srun python3 scripts/run_granite_hpc.py \
build/bin/granite_main \
benchmarks/B2_eq/params.yaml \
--omp-threads $SLURM_CPUS_PER_TASK \
--mpi-ranks $SLURM_NTASKSio:
hdf5_stripe_count: 16 # Match to number of storage targets
hdf5_stripe_size: 4194304 # 4 MB (optimal for large field arrays)
collective_io: true # MPI-IO collective modeSet Lustre striping on the output directory:
lfs setstripe -c 16 -S 4M /scratch/$USER/granite_output/docker build -f containers/Dockerfile -t granite:v0.6.8 .
docker run --rm -it -v $(pwd)/output:/output granite:v0.6.8 \
build/bin/granite_main benchmarks/B2_eq/params.yamlsingularity build granite.sif containers/granite.def
singularity run --bind /scratch/$USER:/output granite.sif \
build/bin/granite_main benchmarks/B2_eq/params.yaml| Phase | Hardware | Configuration | Projected Throughput |
|---|---|---|---|
| v0.6.8 (current) | i5-8400, GTX 1050 Ti | 64³ CPU | 0.084 M/s |
| v0.7 GPU kernels | vast.ai H100 SXM | 256³ GPU | ~50 M/s (projected) |
| v0.8 production | Cluster H100 × 8 | 512³ GPU | ~400 M/s (projected) |
| v1.0 B5_star | Tier-0 cluster | 12 AMR levels | Exascale-class |
Note: The GTX 1050 Ti (development desktop) is NOT viable for FP64 GPU compute. GPU production runs target H100 SXM instances via vast.ai after GPU kernel porting is complete in v0.7.
See also: Benchmarks & Validation | Developer Guide
v0.6.8 · Repository · Issues
- 🩺 Simulation Health & Debugging
- 📊 Benchmarks & Validation
- 🗂️ AMR Design
- 🖥️ HPC Deployment
- 🌀 VORTEX Engine
"Simulate the unimaginable."