HPC Deployment

🖥️ HPC Deployment

GRANITE v0.6.8 | ← Gravitational Wave Extraction | Initial Data →

1. Pre-Flight Checklist

# Always run before any simulation
python3 scripts/health_check.py

Verifies: Release build flags (-O3 -march=native), OMP thread count, available RAM, HDF5 parallel mode.

2. Memory Requirements

Configuration	AMR Levels	RAM (estimated)
64³, 4 levels (desktop)	4	~0.5 GB
128³, 4 levels (desktop)	4	~4-6 GB
256³, 6 levels (workstation)	6	~16 GB
512³, 8 levels (cluster)	8	~128 GB
B5_star, 12 levels	12	~2 TB

Formula:

RAM ≈ nvar_total × nx³ × AMR_factor × 8 bytes
nvar_total = 22 (CCZ4) + 9 (GRMHD) = 31 × 3 (RK3 buffers) = 93
AMR_factor ≈ 1.14 per level (geometric series: Σ (1/8)^ℓ)

3. OpenMP Configuration

# Recommended: use all physical cores (not hyperthreads)
export OMP_NUM_THREADS=$(nproc --all)    # Linux
export OMP_PROC_BIND=close               # Bind threads to nearby cores
export OMP_PLACES=cores                  # Core-level granularity

For NUMA systems (multi-socket nodes):

numactl --interleave=all python3 scripts/run_granite_hpc.py ...

4. HPC Launch Command

python3 scripts/run_granite_hpc.py \
    build/bin/granite_main \
    benchmarks/B2_eq/params.yaml \
    --omp-threads 32 \
    --mpi-ranks 128 \
    --disable-numa-bind \
    --amr-telemetry-file /scratch/$USER/amr_B2eq.jsonl

SLURM auto-generation (produces jobs/submit_granite.sbatch):

python3 scripts/run_granite_hpc.py \
    build/bin/granite_main \
    benchmarks/B2_eq/params.yaml \
    --slurm \
    --mpi-ranks 128 \
    --omp-threads 8

5. SLURM Job Template

#!/bin/bash
#SBATCH --job-name=granite_B2eq
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=4G
#SBATCH --time=24:00:00
#SBATCH --partition=compute

module load gcc/11 openmpi/4.1 hdf5/1.12-parallel

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export OMP_PROC_BIND=close

srun python3 scripts/run_granite_hpc.py \
    build/bin/granite_main \
    benchmarks/B2_eq/params.yaml \
    --omp-threads $SLURM_CPUS_PER_TASK \
    --mpi-ranks $SLURM_NTASKS

6. Lustre / Parallel Filesystem I/O Tuning

io:
  hdf5_stripe_count:  16       # Match to number of storage targets
  hdf5_stripe_size:   4194304  # 4 MB (optimal for large field arrays)
  collective_io:      true     # MPI-IO collective mode

Set Lustre striping on the output directory:

lfs setstripe -c 16 -S 4M /scratch/$USER/granite_output/

7. Container Deployment

Docker

docker build -f containers/Dockerfile -t granite:v0.6.8 .
docker run --rm -it -v $(pwd)/output:/output granite:v0.6.8 \
    build/bin/granite_main benchmarks/B2_eq/params.yaml

Singularity/Apptainer (HPC clusters)

singularity build granite.sif containers/granite.def
singularity run --bind /scratch/$USER:/output granite.sif \
    build/bin/granite_main benchmarks/B2_eq/params.yaml

8. GPU Roadmap (Post-v0.7)

Phase	Hardware	Configuration	Projected Throughput
v0.6.8 (current)	i5-8400, GTX 1050 Ti	64³ CPU	0.084 M/s
v0.7 GPU kernels	vast.ai H100 SXM	256³ GPU	~50 M/s (projected)
v0.8 production	Cluster H100 × 8	512³ GPU	~400 M/s (projected)
v1.0 B5_star	Tier-0 cluster	12 AMR levels	Exascale-class

Note: The GTX 1050 Ti (development desktop) is NOT viable for FP64 GPU compute. GPU production runs target H100 SXM instances via vast.ai after GPU kernel porting is complete in v0.7.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPC Deployment

🖥️ HPC Deployment

1. Pre-Flight Checklist

2. Memory Requirements

3. OpenMP Configuration

4. HPC Launch Command

5. SLURM Job Template

6. Lustre / Parallel Filesystem I/O Tuning

7. Container Deployment

Docker

Singularity/Apptainer (HPC clusters)

8. GPU Roadmap (Post-v0.7)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🌌 GRANITE Wiki

🚀 Getting Started

🔬 Physics & Science

🖥️ Running Simulations

👨‍💻 Development

📚 Reference

Clone this wiki locally