> **Reference:**
> Celeritas project: [https://github.com/celeritas-project/celeritas](https://github.com/celeritas-project/celeritas)
> Geometry & macros: [https://github.com/celeritas-project/atlas-tilecal-integration](https://github.com/celeritas-project/atlas-tilecal-integration)
> Official releases: [https://github.com/celeritas-project/celeritas/releases](https://github.com/celeritas-project/celeritas/releases)
>
> **Hardware:** Linux/WSL2 + CUDA 11/12 GPU (compute 6.0+). CPU-only fallback included.

# GPU-Accelerated Geant4 Simulation of the ATLAS Tile Calorimeter

This self-contained notebook guides you through building and running a **CUDA-enabled Geant4 + DD4hep + Celeritas** stack, loading a realistic slice of the ATLAS Tile Calorimeter, and benchmarking CPU vs GPU performance. It is designed as a ~40-minute hands-on exercise for HEP newcomers and advanced users alike.

**Learning objectives**

* Install required software inside the notebook (Conda optional)
* Fetch detector geometry from DD4hep
* Compile a minimal C++ application off-loading electromagnetic transport to the GPU
* Time simulations on CPU and GPU and visualise energy-deposit spectra
* Complete two short exercises with provided solution templates


## 0  Environment bootstrap (optional)
If you do **not** already have a Python ≥3.11 Conda environment, run the commented commands below in a terminal to create one named `tilegpu` and start JupyterLab.

```bash
# conda create -n tilegpu -c conda-forge python=3.11 jupyterlab
# conda activate tilegpu
# jupyter lab
```


## Official Celeritas Installation (Spack)

To install Celeritas and its dependencies, use Spack as follows:

```bash
# Install Spack
git clone --depth=2 https://github.com/spack/spack.git
. spack/share/spack/setup-env.sh

# Set up CUDA (if you have a GPU)
spack external find cuda

# Set default configuration (replace cuda_arch=80 with your GPU architecture)
spack config add packages:all:variants:"cxxstd=17 +cuda cuda_arch=80"

# Install Celeritas
spack install celeritas
spack load celeritas
```

See the [Celeritas documentation](https://github.com/celeritas-project/celeritas) for more details and integration steps.


In [None]:
# Install lightweight Python deps (≈1 min on first run)
%pip install --quiet cupy-cuda12x uproot awkward matplotlib nbformat


### Why Celeritas + DD4hep?
* **Celeritas** accelerates Geant4 electromagnetic (e±/γ) transport on NVIDIA GPUs, yielding order-of-magnitude speed-ups for calorimeter studies.
* **DD4hep** supplies experiment-quality detector descriptions; the TileCal GDML used here is taken from the public ATLAS repository.


In [None]:
**Note:**

Celeritas and Geant4 should be installed and loaded using Spack as described above. Do not use pre-built bundles. After installation, you can use the libraries and executables provided by Spack.

See the [Celeritas documentation](https://github.com/celeritas-project/celeritas) for integration details.

## 1  Prepare ATLAS TileCal Geometry and Macro
The next cell downloads a small Tile Calorimeter GDML plus a steering macro that shoots **10 GeV electrons** along the beam axis.


In [None]:
import urllib.request, pathlib, os

files = {
    "TileTB_2B1EB_nobeamline.gdml": "https://raw.githubusercontent.com/celeritas-project/atlas-tilecal-integration/main/TileTB_2B1EB_nobeamline.gdml",
    "TBrun_elec.mac": "https://raw.githubusercontent.com/celeritas-project/atlas-tilecal-integration/main/TBrun_elec.mac"
}
for fname, url in files.items():
    if not pathlib.Path(fname).exists():
        print(f"Downloading {fname} …")
        urllib.request.urlretrieve(url, fname)
    else:
        print(f"{fname} already present")
print("Geometry & macro ready.")


## 2  Minimal Geant4 + Celeritas Off-load Application
Below we write **<60 lines** of C++ that
1. Build a multithreaded Geant4 run manager
2. Load the TileCal geometry with DD4hep
3. Insert *Celeritas* as the parallel GPU transport
4. Execute the macro to shoot particles

> **Tip:** Compiling inside Jupyter uses the system compiler via shell commands starting with `!`.


In [None]:
%%writefile tile_gpu.cc
#include <G4RunManagerFactory.hh>
#include <G4UImanager.hh>
#include <FTFP_BERT.hh>
#include <DDG4/Geant4DetectorConstruction.h>
#include <celeritas/TrackingManagerConstructor.hh>

int main()
{
    auto rm = G4RunManagerFactory::CreateRunManager(G4RunManagerType::MT);
    rm->SetUserInitialization(new dd4hep::sim::Geant4DetectorConstruction("TileTB_2B1EB_nobeamline.gdml"));

    auto* phys = new FTFP_BERT;
    phys->RegisterPhysics(new celeritas::TrackingManagerConstructor);
    rm->SetUserInitialization(phys);

    rm->Initialize();

    auto* ui = G4UImanager::GetUIpointer();
    ui->ApplyCommand("/tracking/verbose 0");
    ui->ApplyCommand("/control/execute TBrun_elec.mac");

    delete rm;
    return 0;
}


In [None]:
# Compile the application
!g++ tile_gpu.cc -std=c++20 -O3 $(geant4-config --cflags --libs) -I$CELER_DIR/include -L$CELER_DIR/lib -lCeleritas -o tile_gpu
print("Compilation finished → executable ./tile_gpu")


## 3  Benchmark CPU vs GPU
We run the application twice: first forcing Celeritas off (CPU-only) then on (GPU). The helper function toggles this via the environment variable `CELER_DISABLE_DEVICE`.


In [None]:
import subprocess, time, os, re

def run_tile(use_gpu: bool = True):
    env = os.environ.copy()
    env["CELER_DISABLE_DEVICE"] = "0" if use_gpu else "1"
    start = time.time()
    output = subprocess.check_output(["./tile_gpu"], text=True, env=env)
    dt = time.time() - start
    m = re.search(r"Run summary: *(\d+) events", output)
    nev = int(m.group(1)) if m else 1000
    return dt, nev

cpu_t, nevt = run_tile(False)
gpu_t, _ = run_tile(True)
print(f"Simulated {nevt} events
CPU time : {cpu_t:.2f} s
GPU time : {gpu_t:.2f} s
Speed-up  : ×{cpu_t/gpu_t:.1f}")


## 4  Visualise Energy Deposits
The Geant4 scorer writes ROOT output; we open it with `uproot` and build a log-scale histogram.


In [None]:
import matplotlib.pyplot as plt, awkward as ak, uproot, pathlib

root_file = next(pathlib.Path('.').glob('*.root'))
print('Reading', root_file)
file = uproot.open(str(root_file))
edep = file["TileCellHits/Edep"].array()

plt.figure(figsize=(6,4))
plt.hist(edep.to_numpy(), bins=120, log=True, histtype='step')
plt.xlabel('Energy deposit per hit [MeV]')
plt.ylabel('Counts / bin')
plt.title('10 GeV e⁻ energy deposits (GPU offload)')
plt.tight_layout()
plt.show()


## 5  Exercises (≈15 min)

### Exercise 1 – Hadron shower
Modify **`TBrun_elec.mac`** to shoot **30 GeV π⁺** instead of electrons and repeat the timing study. Observe that GPU advantage decreases because sizeable hadronic cascades still track on the CPU.

### Exercise 2 – Absorber thickness scan
1. Use the (mock) macro command `/dd/geometry/scaleAbsorber X cm` where *X*=1–5 cm.
2. Loop over thicknesses, recording (a) total visible energy and (b) GPU speed-up.
3. Plot both on dual y-axes.


In [None]:
# —— Solution scaffolding ——
import json, numpy as np

def run_with_macro(macro_text):
    macro_path = 'tmp.mac'
    with open(macro_path,'w') as f:
        f.write(macro_text)
    env = os.environ.copy()
    env["CELER_DISABLE_DEVICE"] = "1"
    subprocess.check_output(["./tile_gpu"], env=env, text=True)
    os.remove(macro_path)
    # add return values extraction if needed

# Students fill in details based on above helpers


## 6  Clean-up
Run the following to delete binaries and large archives when finished.


In [None]:
!rm -rf geant4_celeritas_cuda12.tgz geant4_cuda *.root tile_gpu tile_gpu.cc
