This part of the code corresponds to the RANODE section in the paper "Generator Based Inference (GBI)" by Chi Lung Cheng, Ranit Das, Runze Li, Radha Mastandrea, Vinicius Mikuni, Benjamin Nachman, David Shih and Gup Singh.
The environment requirement for RANODE based inference is available in environment.yml, it can be installed by running:
conda env create -f environment.yml --prefix /path/gbi_ranode_envTo setup the rest of environment variables, run
source setup.shDuring first executation, user will be prompted to enter the input and output directory. The input directory should contain the files listed in the Dataset section.
For M1/M2/M3 Macs, use the Mac-specific environment file:
# Navigate to the parent GBI directory
cd /path/to/GBI
# Create conda environment using environment_mac.yml
conda env create -f ranode/environment_mac.yml -n gbi_ranode
# Or create it locally in the project
conda env create -f ranode/environment_mac.yml --prefix ./.conda-envs/gbi_ranode
# Install paws-sbi package
conda activate gbi_ranode # or: conda activate ./.conda-envs/gbi_ranode
pip install git+https://github.com/hep-lbdl/paws-sbi.git
# Create data and output directories
mkdir -p data output
# Configure data/output paths
echo 'export OUTPUT_DIR="/path/to/GBI/output"' > ranode/.config
echo 'export DATA_DIR="/path/to/GBI/data"' >> ranode/.configConvenience Activation Script: Use the provided activate_gbi.sh script that sets up all paths automatically:
source activate_gbi.sh
cd ranodeThis script will:
- Activate the conda environment
- Set up PYTHONPATH to include the ranode directory
- Configure LAW_HOME and LAW_CONFIG_FILE
- Load data/output directory paths
The following datasets are required and should be placed in your DATA_DIR (default: data/ directory):
- Simulated QCD background from official LHCO dataset: https://zenodo.org/records/4536377/files/events_anomalydetection_v2.features.h5
- Extra simulated QCD background: https://zenodo.org/records/8370758/files/events_anomalydetection_qcd_extra_inneronly_features.h5
- Extended parametric W->X(qq)Y(qq) signal: https://zenodo.org/records/15384386/files/events_anomalydetection_extended_Z_XY_qq_parametric.h5
- Signal ensembles with trainvaltest splitting: lumi_matched_train_val_test_split_signal_features_W_qq.h5
You can download all datasets using curl:
cd data/
# Official LHCO QCD background (71 MB)
curl -L -O "https://zenodo.org/records/4536377/files/events_anomalydetection_v2.features.h5"
# Extra QCD background (37 MB)
curl -L -O "https://zenodo.org/records/8370758/files/events_anomalydetection_qcd_extra_inneronly_features.h5"
# Extended parametric signal (11 GB - this will take a while)
curl -L -O "https://zenodo.org/records/15384386/files/events_anomalydetection_extended_Z_XY_qq_parametric.h5"Note: The extended parametric signal dataset is 11 GB and may take significant time to download.
"Luigi Analysis Workflow (LAW)" is used to construct this project. First, one needs to setup the law task list by running:
For Linux:
conda activate /path/gbi_ranode_env
source setup.sh
law indexFor macOS:
source activate_gbi.sh
cd ranode
law indexAfter this different tasks can be run with law by commands like:
law run taskname --version output_postfix --flags XXXTo get likelihood scanning plot at one signal injection strength, one can run:
law run FittingScanResults --version test_0 --ensemble 1 --mx 100 --my 500 --s-ratio-index 11 --workers 3where:
--ensemblesets the dataset ensemble used in this scan--mxand--myspecify the mass of signal model--s-ratio-indexrepresents the index of true signal injection strength--workersspecifies the number of threads used
To get the likelihood scanning at different signal strengths, with 10 ensembles to smooth the performance, one can run:
law run ScanOverTrueMuEnsembleAvg --version test_0 --mx 100 --my 500 --num-ensemble 10 --workers 3To plot the jet mass learned and generated by the model, one can run:
law run SignalGenerationPlot --version test_0 --mx 100 --my 500 --num-ensemble 10 --num-generated-sigs 1000000 --workers 3Apple M-series chips (M1/M2/M3) support GPU acceleration via Metal Performance Shaders (MPS). This provides significant speedup over CPU-only execution.
To check if MPS is available:
python -c "import torch; print('MPS available:', torch.backends.mps.is_available())"Basic likelihood scanning with GPU:
law run FittingScanResults \
--version mps_test_0 \
--ensemble 1 \
--mx 100 \
--my 500 \
--s-ratio-index 11 \
--workers 1 \
--FittingScanResults-device mps \
--BkgTemplateTraining-device mps \
--BkgTemplateChecking-device mps \
--PerfectBkgTemplateTraining-device mps \
--RNodeTemplate-device mps \
--PredictBkgProb-device mps \
--ScanRANODE-device mps \
--SampleModelBinSR-device mps \
--PredictBkgProbGen-device mpsScanning over multiple signal strengths with GPU:
law run ScanOverTrueMuEnsembleAvg \
--version mps_test_0 \
--mx 100 \
--my 500 \
--num-ensemble 10 \
--workers 1 \
--ScanOverTrueMuEnsembleAvg-device mps \
--BkgTemplateTraining-device mps \
--BkgTemplateChecking-device mps \
--PerfectBkgTemplateTraining-device mps \
--RNodeTemplate-device mps \
--PredictBkgProb-device mps \
--ScanRANODE-device mps \
--SampleModelBinSR-device mps \
--PredictBkgProbGen-device mpsSignal generation plot with GPU:
law run SignalGenerationPlot \
--version mps_test_0 \
--mx 100 \
--my 500 \
--num-ensemble 10 \
--num-generated-sigs 1000000 \
--workers 1 \
--SignalGenerationPlot-device mps \
--BkgTemplateTraining-device mps \
--RNodeTemplate-device mps \
--PredictBkgProb-device mpsNote for macOS users:
- Use
--workers 1to avoid multiprocessing issues with MPS - Results will be saved to the
OUTPUT_DIRspecified inranode/.config - GPU acceleration significantly speeds up model training compared to CPU
For systems without GPU support or for testing, you can run on CPU by setting all device flags to cpu:
law run FittingScanResults \
--version cpu_test_0 \
--ensemble 1 \
--mx 100 \
--my 500 \
--s-ratio-index 11 \
--workers 1 \
--FittingScanResults-device cpu \
--BkgTemplateTraining-device cpu \
--BkgTemplateChecking-device cpu \
--PerfectBkgTemplateTraining-device cpu \
--RNodeTemplate-device cpu \
--PredictBkgProb-device cpu \
--ScanRANODE-device cpu \
--SampleModelBinSR-device cpu \
--PredictBkgProbGen-device cpuPerformance Note: CPU-only execution will be significantly slower than GPU execution, especially for model training tasks.