Skip to content

LPC-HH/HH4b

Repository files navigation

HH4b

Actions Status Codestyle pre-commit.ci status

Search for two boosted (high transverse momentum) Higgs bosons (H) decaying to four beauty quarks (b).

Setting up package

Creating a virtual environment

First, create a virtual environment (micromamba is recommended):

# Download the micromamba setup script (change if needed for your machine https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html)
# Install: (the micromamba directory can end up taking O(1-10GB) so make sure the directory you're using allows that quota)
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
# You may need to restart your shell
micromamba create -n hh4b python=3.10 -c conda-forge
micromamba activate hh4b

Installing package

Remember to install this in your mamba environment.

# Clone the repository
git clone https://github.com/LPC-HH/HH4b.git
cd HH4b
# Perform an editable installation
pip install -e .
# for committing to the repository
pip install pre-commit
pre-commit install
# install requirements
pip3 install -r requirements.txt

Troubleshooting

  • If your default python in your environment is not Python 3, make sure to use pip3 and python3 commands instead.

  • You may also need to upgrade pip to perform the editable installation:

python3 -m pip install -e .

Running coffea processors

Setup

For submitting to condor, all you need is python >= 3.7.

For running locally, follow the same virtual environment setup instructions above

micromamba activate hh4b

Running locally

To test locally first (recommended), can do e.g.:

mkdir outfiles
python -W ignore src/run.py --starti 0 --endi 1 --year 2022 --processor skimmer --samples QCD --subsamples "QCD_PT-470to600"
python -W ignore src/run.py --processor skimmer --year 2022EE --nano-version v12_private --samples HH --subsamples GluGlutoHHto4B_kl-1p00_kt-1p00_c2-0p00_TuneCP5_13p6TeV --starti 0 --endi 1
python -W ignore src/run.py  --year 2022 --processor trigger_boosted --samples Muon --subsamples Run2022C --nano_version v11_private --starti 0 --endi 1

Parquet and pickle files will be saved. Pickles are in the format {'nevents': int, 'cutflow': Dict[str, int]}.

Or on a specific file(s):

FILE=/eos/uscms/store/user/rkansal/Hbb/nano/Run3Winter23NanoAOD/QCD_PT-15to7000_TuneCP5_13p6TeV_pythia8/02c29a77-3e0e-40e0-90a1-0562f54144e9.root
python -W ignore src/run.py --processor skimmer --year 2023 --files $FILE --files-name QCD

Jobs

The script src/condor/submit.py manually splits up the files into condor jobs:

On a full dataset: e.g. TAG=23Jul13

python src/condor/submit.py --processor skimmer --tag $TAG --files-per-job 20 --submit

On a specific sample:

python src/condor/submit.py --processor skimmer --tag $TAG --nano-version v11_private --samples HH --subsamples GluGlutoHHto4B_kl-1p00_kt-1p00_c2-0p00_TuneCP5_13p6TeV_TSG

Over many samples, using a yaml file:

nohup python src/condor/submit_from_yaml.py --tag $TAG --processor skimmer --save-systematics --submit --yaml src/condor/submit_configs/${YAML}.yaml &> tmp/submitout.txt &

To Submit (if not using the --submit flag):

nohup bash -c 'for i in condor/'"${TAG}"'/*.jdl; do condor_submit $i; done' &> tmp/submitout.txt &

Dask

Log in with ssh tunneling:

ssh -L 8787:localhost:8787 cmslpc-sl7.fnal.gov

Run the ./shell script as setup above via lpcjobqueue:

./shell coffeateam/coffea-dask:0.7.21-fastjet-3.4.0.1-g6238ea8

Renew your grid certificate:

voms-proxy-init --rfc --voms cms -valid 192:00

Run the job submssion script:

python -u -W ignore src/run.py --year 2022EE --yaml src/condor/submit_configs/skimmer_23_10_02.yaml --processor skimmer --nano-version v11 --region signal --save-array --executor dask > dask.out 2>&1

Postprocessing

Condor Scripts

Check jobs

Check that all jobs completed by going through output files:

for year in 2022 2022EE 2023 2023BPix; do python src/condor/check_jobs.py --tag $TAG --processor trigger (--submit) --year $year; done

e.g.

python src/condor/check_jobs.py --year 2018 --tag Oct9 --processor matching --check-running --user cmantill --submit-missing

Combine pickles

Combine all output pickles into one:

for year in 2016APV 2016 2017 2018; do python src/condor/combine_pickles.py --tag $TAG --processor trigger --r --year $year; done

Combine

CMSSW + Combine Quickstart

cmsrel CMSSW_11_3_4
cd CMSSW_11_3_4/src
cmsenv
# float regex PR was merged so we should be able to switch to the main branch now:
git clone -b v9.2.0 https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
git clone -b v2.0.0 https://github.com/cms-analysis/CombineHarvester.git CombineHarvester
# Important: this scram has to be run from src dir
scramv1 b clean; scramv1 b -j 4

I also add the combine folder to my PATH in my .bashrc for convenience:

export PATH="$PATH:/uscms_data/d1/rkansal/hh4b/HH4b/src/HH4b/combine"

Create Datacards

After activating the CMSSW environment from above, need to install rhalphalib and this repo:

# rhalphalib
git clone https://github.com/nsmith-/rhalphalib
cd rhalphalib
pip3 install -e . --user  # editable installation
cd ..
# this repo
git clone https://github.com/LPC-HH/HH4b.git
cd HH4b
pip3 install -e . --user  # TODO: check editable installation

Then, the command is:

python3 postprocessing/CreateDatacard.py --templates-dir templates/$TAG --model-name $TAG

e.g.

python3 postprocessing/CreateDatacard.py --templates-dir postprocessing/templates/Apr18 --year 2022-2023  --model-name run3-bdt-apr18

Run fits and diagnostics locally

All via the below script, with a bunch of options (see script):

run_blinded_hh4b.sh --workspace --bfit --limits --passbin 0

It will automatically include the VBF category if the directory has a passvbf.txt card.

Postfit plots

e.g.

python3 postprocessing/PlotFits.py --fit-file cards/run3-bdt-apr18/FitShapes.root --plots-dir ../../plots/PostFit/run3-bdt-apr18 --signal-scale 10

F-tests locally

This will take 5-10 minutes for 100 toys will take forever for more than >>100!.

# automatically make workspaces and do the background-only fit for orders 0 - 3
run_ftest_hh4b.sh --cardstag run3-bdt-apr2 --templatestag Apr2 --year 2022-2023 # -dl for saving shapes and limits
# run f-test for desired order
run_ftest_hh4b.sh --cardstag run3-bdt-apr2 --goftoys --ffits --numtoys 100 --seed 444 --order 0

Moving datasets (WIP)

rucio list-dataset-replicas cms:
rucio add-rule cms:/DATASET 1 T1_US_FNAL_Disk --activity "User AutoApprove" --lifetime [# of seconds] --ask-approval --comment ''