Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMS: prepare data records for Run2 Hbb and QCD MC for ML studies #2525

Open
2 tasks done
katilp opened this issue Mar 21, 2019 · 5 comments
Open
2 tasks done

CMS: prepare data records for Run2 Hbb and QCD MC for ML studies #2525

katilp opened this issue Mar 21, 2019 · 5 comments

Comments

@katilp
Copy link
Member

katilp commented Mar 21, 2019

prepare data records for the Run2 samples used in ML file production

from #2448
signal MC:

/BulkGravTohhTohbbhbb_narrow_M-600_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM
/BulkGravTohhTohbbhbb_narrow_M-1000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM
/BulkGravTohhTohbbhbb_narrow_M-1200_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/BulkGravTohhTohbbhbb_narrow_M-1400_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/BulkGravTohhTohbbhbb_narrow_M-1600_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/BulkGravTohhTohbbhbb_narrow_M-1800_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/BulkGravTohhTohbbhbb_narrow_M-2000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/BulkGravTohhTohbbhbb_narrow_M-2000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM
/BulkGravTohhTohbbhbb_narrow_M-2500_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/BulkGravTohhTohbbhbb_narrow_M-3000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM
/BulkGravTohhTohbbhbb_narrow_M-4000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/BulkGravTohhTohbbhbb_narrow_M-4500_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM

qcd bkg
/QCD_Pt_300to470_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/QCD_Pt_470to600_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/QCD_Pt_600to800_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/QCD_Pt_800to1000_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/QCD_Pt_1000to1400_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/QCD_Pt_1400to1800_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/QCD_Pt_1800to2400_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/QCD_Pt_2400to3200_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/QCD_Pt_3200toInf_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v3/MINIAODSIM

and from #2447
/QCD_Pt-15to7000_TuneCUETP8M1_Flat_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_magnetOn_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM

Release: CMSSW_8_0_21 Global Tag: 80X_mcRun2_asymptotic_2016_TrancheIV_v6

The files are in https://eospublichttp.cern.ch/eos/opendata/cms/MonteCarlo2016

These are standard samples (i.e. they are part of a normal production with no modifications in the standard workflow)

These are first MiniAODSIM format sample on the portal. There is a notion of them in the updated http://opendata-dev.web.cern.ch/docs/about-cms
aboutminiaod

We may want to link to for further information of this format in usage
(https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2016)

To do:

  • prepare records
  • extract metadata (CMSDAS), NB one further production step -> MiniAODSIM
@katilp
Copy link
Member Author

katilp commented Mar 26, 2019

Create cms-simulated-datasets-Run2-datascience.json (similar to other cms-simulated-dataset...)

mantasavas added a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 27, 2019
mantasavas added a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 27, 2019
mantasavas added a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 27, 2019
mantasavas added a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 27, 2019
mantasavas added a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 28, 2019
mantasavas added a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 28, 2019
mantasavas added a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 28, 2019
mantasavas added a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 28, 2019
mantasavas added a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 28, 2019
mantasavas added a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 28, 2019
tiborsimko pushed a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 28, 2019
tiborsimko pushed a commit to mantasavas/opendata.cern.ch that referenced this issue Mar 28, 2019
@ghost ghost removed the Status: ready for work label Apr 1, 2019
@katilp
Copy link
Member Author

katilp commented Apr 3, 2019

Add to the abstract-description in the last paragraph:

The contents of MINIAODSIM in these datasets may differ from the final legacy format used in CMS Run2 simulated datasets.

@katilp katilp reopened this Apr 3, 2019
@katilp
Copy link
Member Author

katilp commented Apr 13, 2019

To do:

  • Check why LHE step is missing for Hbb samples e.g. http://opendata-dev.web.cern.ch/record/12007
    • update: the extraction script gets the steps correctly and reads them to cache, but the it was not propagated in record building yet
  • Change the order of steps in provenance
  • Monte Carlo Production Overview -> Monte Carlo production overview

@tiborsimko
Copy link
Member

Nice test case:

  • record ID 12009
  • script finds Summer15 parent:
$ cat cache/run2-datascience/mcm-store/dict/@BulkGravTohhTohbbhbb_narrow_M-3000_13TeV-madgraph@RunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1@AODSIM.json | jq '.results.input_dataset'
"/BulkGravTohhTohbbhbb_narrow_M-3000_13TeV-madgraph/RunIISummer15wmLHEGS-MCRUN2_71_V1_ext1-v1/GEN-SIM"
  • but does not go there:
$ ls -l cache/run2-datascience/mcm-store/dict/ | grep -c Summer15      
0

To be checked while rerunning...

@tiborsimko tiborsimko assigned tiborsimko and unassigned mantasavas Jun 6, 2019
@tiborsimko tiborsimko moved this from To do to In progress in CMS-Q4-Updates Jun 6, 2019
@tiborsimko tiborsimko moved this from In progress to Reviewer approved in CMS-Q4-Updates Jun 6, 2019
@tiborsimko tiborsimko moved this from Reviewer approved to Review in progress in CMS-Q4-Updates Jun 21, 2019
@katilp
Copy link
Member Author

katilp commented Jul 15, 2019

BulkGravTohhTohbbhbb* datasets:

  • Case A: 12000,12001,12007,12009 : all LHE step missing, sim config and production script in SIM step missing
  • Case B: 12002-12006, 12008, 12010, 12011 : LHE step present (but no configs), production script in SIM step missing

QCD* datasets:

  • Case C: 12012-12019, 12021: production script in SIM step missing
  • Case D: 12020: : production script in SIM step missing, step HLT RECO with no name and no links

General:

  • generators appear in HLT RECO step, would make more sense in LHE SIM steps
  • generator parametr fragments not shown

Case A, e.g 12000:

Case B, e.g. 12002

Case C, e.g. 12012

Case D, 12020

@tiborsimko tiborsimko removed this from the CMS-Q4-Updates milestone Jul 17, 2019
@tiborsimko tiborsimko removed this from Review in progress in CMS-Q4-Updates Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants