(Note: remember to replace <USERNAME>
with your username if copying commands below)
First, log into an LPC node:
kinit <USERNAME>@FNAL.GOV
ssh <USERNAME>@cmslpc-el9.fnal.gov
The CMSSW version used runs on slc7. You'll need to setup the correct OS environment using singularity (more LPC documentation here). First you need to clone lpc-scripts
in your home area:
cd ~
git clone https://github.com/FNALLPC/lpc-scripts
Then you are going to edit your .bashrc
, adding the following line:
source ~/lpc-scripts/call_host.sh
Now you should log out and log back in for the changes to take effect. To start the container use the following command.
cmssw-el7 -p --bind `readlink $HOME` --bind `readlink -f ${HOME}/nobackup/` --bind /uscms_data --bind /cvmfs -- /bin/bash
Install CMSSW_11_3_4
in your nobackup
area:
cd ~/nobackup
source /cvmfs/cms.cern.ch/cmsset_default.sh
cmsrel CMSSW_11_3_4
cd CMSSW_11_3_4/src
cmsenv
First, log into KISTI:
ssh -p 4280 <USERNAME>@ui20.sdfarm.kr
The CMSSW version used runs on slc7. You'll need to setup the correct OS environment using singularity. On KISTI, this can be done with:
setup_el7
Install CMSSW_11_3_4 on KISTI:
source /cvmfs/cms.cern.ch/cmsset_default.sh
export SCRAM_ARCH=slc7_amd64_gcc900 # this is set automatically on LPC but not on KISTI
cmsrel CMSSW_11_3_4
cd CMSSW_11_3_4/src
cmsenv
First, log into lxplus:
ssh -Y <USERNAME>@lxplus.cern.ch
The CMSSW version used runs on slc7. You'll need to setup the correct OS environment using singularity.
On lxplus, this can be done by following the instructions at this link. Essentially you need to create a script, called start_el7.sh
, that looks like this:
#!/bin/bash
export APPTAINER_BINDPATH=/afs,/cvmfs,/cvmfs/grid.cern.ch/etc/grid-security:/etc/grid-security,/cvmfs/grid.cern.ch/etc/grid-security/vomses:/etc/vomses,/eos,/etc/pki/ca-trust,/etc/tnsnames.ora,/run/user,/tmp,/var/run/user,/etc/sysconfig,/etc:/orig/etc
schedd=`myschedd show -j | jq .currentschedd | tr -d '"'`
apptainer -s exec /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/cms-cat/cmssw-lxplus/cmssw-el7-lxplus:latest/ sh -c "source /app/setupCondor.sh && export _condor_SCHEDD_HOST=$schedd && export _condor_SCHEDD_NAME=$schedd && export _condor_CREDD_HOST=$schedd && /bin/bash "
You will make the script executable and you will run it:
./start_el7.sh
When doing that, you may get an error like this:
2024/10/21 23:12:38 [ERROR] - HTTP code: 404: Requested user (mcremone) is not known in pool share
If that is the case, simply bump to a better schedd by doing:
myschedd bump
Install CMSSW_11_3_4
in your home directory:
source /cvmfs/cms.cern.ch/cmsset_default.sh
cmsrel CMSSW_11_3_4
cd CMSSW_11_3_4/src
cmsenv
The rest of the setup should be the same regardless of what cluster you are working on.
Install combine
(see detailed instructions):
cd $CMSSW_BASE/src
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
cd HiggsAnalysis/CombinedLimit
git fetch origin
git checkout v9.1.0 # current recommeneded tag (Jan 2024)
scramv1 b clean; scramv1 b # always make a clean build
Also install CombineHarvester
:
cd $CMSSW_BASE/src
git clone https://github.com/cms-analysis/CombineHarvester.git CombineHarvester
cd CombineHarvester
git checkout v2.1.0
scram b
Fork this repo on github and clone it into your CMSSW_11_3_4/src
directory:
cd $CMSSW_BASE/src
git clone https://github.com/<USERNAME>/decaf.git -b UL
cd decaf
Then, setup the proper dependences:
source setup.sh
This script installs the necessary packages as user packages (Note: Pip gave errors when running setup.sh
for the first time, but it seemed to install everything just fine. No errors showed up when running setup.sh
a second time.). This is a one-time setup.
When you log in after doing all the one time installations, you only need to set up the environments. Consider using aliases, scripts, etc. to make your life easier.
Consider adding this line to your ~/.bashrc or ~/.bash_profile if you haven't done so already rather than sourcing every time:
source /cvmfs/cms.cern.ch/cmsset_default.sh
Singularity on LPC:
cmssw-el7 -p --bind `readlink $HOME` --bind `readlink -f ${HOME}/nobackup/` --bind /uscms_data --bind /cvmfs -- /bin/bash
Singularity on KISTI:
setup_el7
Singularity on LXPLUS:
./start_el7.sh
Then, go to where you installed CMSSW and do:
cd CMSSW_11_3_4/src
cmsenv
cd decaf
source env.sh
By running this script you will also initialize your grid certificate (Note: setup.sh
also runs env.sh
). This requires you to save your grid certificate password in $HOME/private/$USER.txt
. Alternatively, you can change this location, or you can comment this out and initialize it manually every time.
The list of input files for the analyzer can be generated as a JSON file using the macros/list.py
script. This script will run over the datasets listed in data/process.py
, find the list of files for each dataset, “pack” them into small groups for condor jobs, and output the list of groups as a JSON file in metadata/
.
The options for this script are:
-
-d
or--dataset
:- Select a specific dataset to pack. By default, it will run over all datasets in
process.py
. - Usage:
-d <dataset_name>
- Select a specific dataset to pack. By default, it will run over all datasets in
-
-y
or--year
:- Data year. Options are
2016pre
,2016post
,2017
, and2018
. - Usage:
-y <year>
- Data year. Options are
-
-m
or--metadata
:- Name of metadata output file. Output will be saved in
metadata/<NAME>.json
. - Usage:
-m <name>
- Name of metadata output file. Output will be saved in
-
-p
or--pack
:- Size of file groups. The smaller the number, the more condor jobs will run. The larger the number, the longer each condor job will take. We tend to pick 32, but the decision is mostly arbitrary.
- Usage:
-p <size>
-
-s
or--special
:- Size of file groups for special datasets. For a specific dataset, use a different size with respect to the one established with
--pack
. The syntax is-s <DATASET>:<NUMBER>
. - Usage:
-s <dataset>:<number>
- Size of file groups for special datasets. For a specific dataset, use a different size with respect to the one established with
-
-c
or--custom
:- Boolean to decide to use public central NanoAODs (if
False
) or private custom NanoAODs (ifTrue
). Default isFalse
. - Usage:
-c
(no argument needed)
- Boolean to decide to use public central NanoAODs (if
-
-t
or--transfer
:- When using public central NanoAODs it is advisable to transfer files to the cluster you are running from. For example, when running the code at lxplus, input
T2_CH_CERN
to transfer files. The proper xrootd redirect is automatically used. Default isT1_US_FNAL_Disk
. - Usage:
-t <cluster>
- When using public central NanoAODs it is advisable to transfer files to the cluster you are running from. For example, when running the code at lxplus, input
As an example, to generate the JSON file for all 2017 publis data/MC NanoAODs at KISTI:
python3 macros/list.py -y 2017 -m 2017 -p 32 -t T2_KR_KISTI
As a reminder, this script assumes that you are in the decaf/analysis
directory when running. The output above will be saved in metadata/2017.json
.
If generating a JSON for public NanoAODs, rucio
has to be installed in order to initiate and auto-approve data transfers:
source /cvmfs/cms.cern.ch/rucio/setup-py3.sh
export RUCIO_ACCOUNT=<YOUR_CERN_USERNAME>
when the JSONs are produced, remember to reset the standard decaf environment by sourcing env.sh
:
cd ..
source env.sh
cd analysis
If using the --custom
option, the script can take several hours to run, so it’s best to use a process manager such as nohup
or tmux
to avoid the program crashing in case of a lost connection. For example
nohup python3 macros/list.py -y 2017 -m 2017 -p 32 -c &
The &
option at the end of the command lets it run in the background, and the std output and error is saved in nohup.out
.
The nohup
command is useful and recommended for running most scripts, but you may also use tools like tmux
or screen
.
MC b-tagging efficiencies are needed by most of the analyses to compute the b-tag event weight, once such efficiencies are corrected with the POG-provided b-tag SFs. To compute them, we first need to run the common
module in util
:
python3 utils/common.py
This will generate a series of auxiliary functions and information, like the AK4 b-tagging working points, and it will save such information in a .coffea
file in the data
folder. AK4 b-tagging working points are essential to measure the MC efficiencies and they are used by the btag
processor in the processors
folder. To generate the processor file:
python3 processors/btageff.py -y 2018 -m 2018 -n 2018
The options for this script are:
-
-y
or--year
:- Data year. Options are
2016pre
,2016post
,2017
, and2018
. - Usage:
-y <year>
- Data year. Options are
-
-m
or--metadata
:- Metadata file to be used as input.
- Usage:
-m <metadata_file>
-
-n
or--name
:- Name of the output processor file. In this case, it will generate a file called
btageff2018.processor
stored in thedata
folder. - Usage:
-n <output_name>
- Name of the output processor file. In this case, it will generate a file called
To run the processor:
python3 run.py -p btageff2018 -m 2018 -d QCD
With this command you will run the btag2018
processor over QCD MC datasets as defined by the 2018
metadata file. You will see a printout like:
Processing: QCD_Pt_1400to1800_TuneCP5_13TeV_pythia8____4_ Preprocessing 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 32/32 [ 0:01:28 < 0:00:00 | ? file/s ] Merging (local) 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 31/31 [ 0:00:23 < 0:00:00 | ? merges/s ]
This means an output file with histograms as defined in the btag processor file has been generated. In this case a folder called btageff2018
inside the hists
folder has been created. Inside this folder you can see a file called QCD_Pt_1400to1800_TuneCP5_13TeV_pythia8____4_.futures
, that stores the histograms. To take advantage of the parallelism offered by the HTCondor job scheduler, the run_condor.py
script can be used:
python3 run_condor.py -p btag2018 -m 2018 -d QCD -c kisti -t -x
The options for this script are the same as for run.py, with the addition of:
-
-c
or--cluster
:- Specifies which cluster you are using. Currently supports
lpc
orkisti
orlxplus
. - Usage:
-c <cluster_name>
- Specifies which cluster you are using. Currently supports
-
-t
or--tar
:- Tars the local python environment and the local CMSSW folder.
- Usage:
-t
(no argument needed)
-
-x
or--copy
:- Copies these two tarballs to your EOS area. For example, to run the same setup but for a different year you won’t need to tar and copy again. You can simply do:
python3 run_condor.py -p btag2017 -m 2017 -d QCD -c kisti
- Usage:
-x
(no argument needed))
- Copies these two tarballs to your EOS area. For example, to run the same setup but for a different year you won’t need to tar and copy again. You can simply do:
You can check the status of your HTCondor jobs by doing:
condor_q <YOUR_USERNAME>
Note that in order to use the condor scripts, you need to change the name of the certificate with your own certificate's name in the scripts.
After obtaining all the histograms, a first step of data reduction is needed. This step is achieved by running the reduce.py
script:
python3 reduce.py -f hists/btag2018
The options of this script are:
-
-f
or--folder
:- Specifies the folder to be processed.
- Usage:
-f <folder_name>
-
-d
or--dataset
:- Specifies the dataset(s) to be processed. If not provided, defaults to
None
. - Usage:
-d <dataset_name>
- Specifies the dataset(s) to be processed. If not provided, defaults to
-
-e
or--exclude
:- Specifies the dataset(s) to be excluded from processing. Defaults to
None
. - Usage:
-e <dataset_name>
- Specifies the dataset(s) to be excluded from processing. Defaults to
-
-v
or--variable
:- Specifies the variable(s) to be processed. If not provided, defaults to
None
. - Usage:
-v <variable_name>
- Specifies the variable(s) to be processed. If not provided, defaults to
All the different datasets produced at the previous step will be reduced. A different file for each variable for each reduced dataset will be produced. For example, the command above will produce the following reduced files:
hists/btageff2018/deepcsv--QCD_Pt_1000to1400_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_120to170_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_1400to1800_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_15to30_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_170to300_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_1800to2400_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_2400to3200_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_300to470_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_30to50_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_3200toInf_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_470to600_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_50to80_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_600to800_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_800to1000_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepcsv--QCD_Pt_80to120_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_1000to1400_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_120to170_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_1400to1800_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_15to30_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_170to300_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_1800to2400_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_2400to3200_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_300to470_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_30to50_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_3200toInf_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_470to600_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_50to80_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_600to800_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_800to1000_TuneCP5_13TeV_pythia8.reduced
hists/btageff2018/deepflav--QCD_Pt_80to120_TuneCP5_13TeV_pythia8.reduced
This step can be run in HTCondor by using the reduce_condor.py
script. The reduce_condor.py
script has the same options of reduce.py
, with addition of the same --cluster
, --tar
, and --copy
options descibed above when discussing run_condor.py
.
A second step of data reduction is needed to merge all the .reduced
files corresponding to a single variable. This is achieved by using the merge.py
script:
python3 merge.py -f hists/btageff2018
The options of this script are:
-
-f
or--folder
:- Specifies the folder to be processed.
- Usage:
-f <folder_name>
-
-v
or--variable
:- Specifies the variable(s) to be processed. If not provided, defaults to
None
, meaning all variables are processed. - Usage:
-v <variable_name>
- Specifies the variable(s) to be processed. If not provided, defaults to
-
-e
or--exclude
:- Specifies the variable(s) to be excluded from processing. Defaults to
None
. - Usage:
-e <variable_name>
- Specifies the variable(s) to be excluded from processing. Defaults to
-
-p
or--postprocess
:- If specified, performs postprocessing on the merged files.
- Usage:
-p
(no argument needed)
This command will produce the following files:
hists/btageff2018/deepcsv.merged hists/btageff2018/deepflav.merged
The same script can be used to merge the the files corresponding to each single variable into a single file, using the -p
or —postprocess
option:
python3 merge.py -f hists/btageff2018 -p
Also this step can be run in HTCondor by using the merge_condor.py
script. The merge_condor.py
script has the same options of merge.py
, with addition of the same --cluster
, --tar
, and --copy
options descibed above when discussing run_condor.py
.
This README is a work in progress