Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMS generator cards for 2015 files #97

Closed
katilp opened this issue May 19, 2021 · 3 comments
Closed

CMS generator cards for 2015 files #97

katilp opened this issue May 19, 2021 · 3 comments

Comments

@katilp
Copy link
Member

katilp commented May 19, 2021

(edited 11.10.2021)

The generator "gridpacks" are stored in /cvmfs/cms.cern.ch/phys_generator/gridpacks/
However, note that not all the LHE cards 2015 are stored yet there.

Take an example dataset from https://github.com/cernopendata/data-curation/blob/master/cms-YYYY-simulated-datasets/inputs/CMS-2015-mc-datasets.txt

Find the generator cards "by-hand" with:

Case no LHE:

Three options:

Through "fragments" stored in McM

Advantage: gets directly the relevant information

  1. search the dataset in McM (request -> output dataset) (example query MiniAODSIM)
  2. find the parent name and redo 1.
  3. if GEN-SIM, find the generator parameters in "Name of fragment"

As the metadata script reads the full dictionary we should have this information already

Config files

Advantage: already available as config for GEN-SIM step

Disadvantage: shows the full config file, not only the cards

The steps 1,2 as above, then

From edmProvDump

Advantage: get the information directly from the file

Disadvantage: to be done in a CMSSW release area, formatting not the best for the display

  1. Find a single root file name $file
  2. In a CMSSW release area (e.g. .../CMSSW_7_6_7/src) and after cmsenv:
    edmProvDump -f "generator SIM" root://eospublic.cern.ch/$file | grep -A9999 "generator SIM"

Case LHE:

Case no gridpack

  • MINIAODSIM has mcdb_id > 1 in McM
  • in use before gridpack was adopted
  1. Example McM query for a MINIAODSIM
  2. Take "Mcdb id", in the dictionary: "mcdb_id": 15839
  3. Find the lhe file in /eos/cms/store/lhe/$mcdb_id
  4. Extract file and read the header with
xz -d -c /eos/cms/store/lhe/$mcdb_id/* > lhe.lhe
  awk '/<header>/,/<\/header>/' lhe.lhe > lhe_header

Case gridpack

  • mcdb_1 = 0
  • Find the gridpack address, two options
From McM dictionary
From edmProvDump
  1. Find a single root file name $file
  2. In a CMSSW release area (e.g. .../CMSSW_7_6_7/src) and after cmsenv:
edmProvDump -f "externalLHEProducer LHE" root://eospublic.cern.ch/$file | grep gridpacks > line
gp=$(sed "s/'/ /g" line | awk '{print $6}')
Extract cards once $gp address is know
if [[ $file == *"madgraph"* ]]; then
      tar -xf $gp ./process/madevent/Cards/run_card.dat
      tar -xf $gp ./process/madevent/Cards/proc_card*.dat
      tar -xf $gp ./process/madevent/Cards/param_card.dat
      mv ./process/madevent/Cards/*.dat $dir/
 elif [[ $file == *"powheg"* ]]; then
      tar -xf $gp *.input
      mv *.input $dir
 elif [[ $file == *"amcatnlo"* ]]; then
      tar -xf $gp process/Cards/run_card.dat
      tar -xf $gp process/Cards/proc_card*.dat
      tar -xf $gp process/Cards/param_card.dat
      mv process/Cards/*.dat $dir/
fi
@katilp
Copy link
Member Author

katilp commented Oct 25, 2021

@OsamaMomani as we discussed, this could be done separated from the other script, and in this case, the pseudocode is

input: list of datasets

  • find the datasets which have the LHE step (the ones without it should already have generator parameters)
  • if mcdb_id > 0:
    • get the generator parameters from the "LHE header" and name it according to the recid with
      xz -d -c /eos/cms/store/lhe/$mcdb_id/* > lhe.lhe
      awk '/<header>/,/<\/header>/' lhe.lhe > $recid_lhe_header
      
    or with similar
    • write the header to a generator parameter store
  • else:
    • get the gridpack address (it is a string starting with /cvmfs/cms.cern.ch/phys_generator/gridpacks/) from the fragment or from the dictionary
    • write the whole gridpack in a gridpack store (add the recid to the name, or have them as folders)
    • get the generator parameters from the gridpack tar file with the codesnippet at the end of the previous comment ($file can be the dataset name)
    • save the generator parameter files to files starting with recid and record them the generator parameter store

@katilp
Copy link
Member Author

katilp commented Nov 21, 2021

Exceptions/additions:

mcdb_id, but no <header>

eg: /GluGluToContinToZZTo2e2nu_13TeV_MCFM701_pythia8/RunIIWinter15pLHE-MCRUN2_71_V1-v1/LHE -> mcdb_id = 14301

$ ls /eos/cms/store/lhe/15401/
gg_ZZ_2El2Nu_13TeV_NNPDF30_lo_as_0130_MCFM70.lhe
-bash-4.2$ head /eos/cms/store/lhe/15401/gg_ZZ_2El2Nu_13TeV_NNPDF30_lo_as_0130_MCFM70.lhe
<LesHouchesEvents version="1.0">
<!--
file generated with MCFM version 7.0
Input file input.DAT contained:
#  Cross-section is:             14.9316     +/-            0.136621E-01)

 #  Contribution from parton sub-processes:
#         GG     |        0.0000        0.00%
#         GQ     |        0.0000        0.00%
#         GQB    |        0.0000        0.00%
  • take what is in the comment, i.e. between <!-- .... -->
  • keyword for identification "generators": ["MCFM701"]

swp, swo files

/GluGluWWTo2L2Nu_MCFM_13TeV/RunIIWinter15pLHE-MCRUN2_71_V1-v1/LHE

$ ls /eos/cms/store/lhe/15275
ggWWbx_lord_NNPDF30_proc127_ll_500kevents.lhe  ggWWbx_l.swo  ggWWbx_l.swp

to be investigated

no files in ./process/madevent/Cards for madgraph

/BulkGravTohhTohtatahbb_narrow_M-1000_13TeV-madgraph/RunIIWinter15wmLHE-MCRUN2_71_V1-v1/LHE

/cvmfs/cms.cern.ch/phys_generator/gridpacks/slc6_amd64_gcc481/13TeV/madgraph/V5_2.2.2/exo_diboson/Spin-2/BulkGraviton_hh_htatahbb/narrow/v3/BulkGraviton_hh_htatahbb_narrow_M1000_tarball.tar.xz

./process/madevent/Cards/run_card.dat: Not found in archive

  • Remove ./ and use tar -xf $gp process/madevent/Cards/run_card.dat etc

no powheg.input for powheg

/GluGluHToBB_M125_13TeV_powheg_pythia8/RunIIWinter15wmLHE-MCRUN2_71_V1-v1/LHE

/cvmfs/cms.cern.ch/phys_generator/gridpacks/slc6_amd64_gcc481/13TeV/powheg/V2/gg_H_quark-mass-effects_NNPDF30_13TeV_M125/v2/gg_H_quark-mass-effects_NNPDF30_13TeV_M125_tarball.tar.gz

$ tar -tf $gp | grep powheg.input
./powheg.input
  • Take ./powheg.input instead of powheg.input

corrupt compress?/timeout?

/RSGravToWW_width0p2_M-3000_13TeV-madgraph/RunIIWinter15wmLHE-MCRUN2_71_V1-v1/LHE

/cvmfs/cms.cern.ch/phys_generator/gridpacks/slc6_amd64_gcc481/13TeV/madgraph/V5_2.3.3/exo_diboson/Spin-2/RSGraviton_WW_inclu/wide/v1/RSGraviton_WW_inclu_width0.2_M3000_tarball.tar.xz

Files are there and get extracted but tar fails in exit. If I kill the process before it fails:

$ tar -xf $gp ./process/madevent/Cards/run_card.dat
^C
$ ls process/madevent/Cards/run_card.dat
process/madevent/Cards/run_card.dat
$ head process/madevent/Cards/run_card.dat
#*********************************************************************
#                       MadGraph5_aMC@NLO                            *
#                                                                    *
#                     run_card.dat MadEvent                          *
#                                                                    *
#  This file is used to set the parameters of the run.               *
#                                                                    *
#  Some notation/conventions:                                        *
#                                                                    *
#   Lines starting with a '# ' are info or comments                  *

Multiple lhe files

e.g.

$ ls /eos/cms/store/lhe/15453/
Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_0.lhe   Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_2.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_6.lhe
Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_10.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_3.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_7.lhe
Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_11.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_4.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_8.lhe
Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_1.lhe   Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_5.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_9.lhe
  • take ..._0.lhe

@OsamaMomani
Copy link
Member

folder structure

lhe_generators
│
├── mcdb
│      ├── {mcdb_id}_header (files)
│        .....
├── gridpacks
        ├── {recid} (folders)
               ├── (one or more .dat|.input files)
         .....

~3478 mcdb files (tens of MB)
~3636 gridpacks folders (few hundreds of MB)

in opendata portal

  • check if record has lhe then check mcdb or gridpacks
    • if mcdb provide a link to /eos/..../lhe_generators/mcdb/{mcdb_id}_header
    • if gridpacks provide links to files inside /eos/..../lhe_generators/gridpacks/{recid}

@katilp katilp closed this as completed Dec 9, 2021
CMS-2015-Open-Data-Release automation moved this from In progress to Done Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants