Skip to content

SUEPPhysics/SUEPSkimmer

Repository files navigation

SUEPSkimmer

Getting started

Will skim SUEPNano files produced by kraken at the MIT cluster. The default HLT path for skimming is HLT_TripleMu_5_3_3*.

To get started:

  1. Initialize some CMSSW (Note: 13_0_4 worked great for me)
  2. Clone this repository in src
  3. Execute the following:
cd SUEPSkimmer
chmod +x compile.sh
./compile.sh
  1. The scripts in data can be used to create lists of files. See README.md in data for more information.

  2. The script condorSubmitter.sh can submit jobs to Condor. To just prepare but not submit:

    ./condorSubmitter.sh -d data/datasets.dat -p gfal

    To prepare and submit:

    ./condorSubmitter.sh -d data/datasets.dat -p gfal -s

Preparing the files for skimming

  1. The datasets for skimming are defined in data/datasets.dat. The format is:
/store/user/paus/nanosu/A01/QCD_Pt_1000to1400_TuneCP5_13TeV_pythia8+RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1+MINIAODSIM
/store/user/paus/nanosu/A01/QCD_Pt_120to170_TuneCP5_13TeV_pythia8+RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2+MINIAODSIM
/store/user/paus/nanosu/A01/QCD_Pt_1400to1800_TuneCP5_13TeV_pythia8+RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1+MINIAODSIM
... (and so on)
  1. Use data/dumpFilenames.sh to create a list of files for each dataset:
cd data
./dumpFilenames.sh -d datasets.dat -p gfal
  1. The output is in files named data/filenames/<dataset name>.txt. One file per dataset will be produced. All available input files will be listed in the file.

Submitting jobs to Condor

The script condorSubmitter.sh can be used to submit jobs to Condor. The script takes three arguments: the file that lists the datasets, the transfer protocol to be used (gfal or xrootd), and a boolean that will only prepare but not submit if true. For example:

./condorSubmitter.sh -d data/datasets.dat -p gfal -s

Resubmitting missing/new files

When new files appear or some files are missing, the script data/diff.sh can run a diff between the files in the MIT cluster and the output files in the LPC EOS. The resulting diff output can be used to resubmit jobs for the missing/new files. The script takes the same arguments as data/dumpFilenames.sh:

./diff.sh -d datasets.dat -p gfal

Merging the output

The merging will fuse the files into 1GB blocks (or smaller if total size is less than 1GB). The script merger.sh will merge the files in the output directory. The user should use the wrapper script runMerger.sh to run the merger. The script takes two arguments: the file that lists the datasets and the input path in the LPC EOS. For example:

./runMerger.sh -d data/datasets.dat -i /store/user/lpcsuep

About

A skimmer for the SUEPNano files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published