Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make a production-like workflow for HI 2018 #24587

Closed
slava77 opened this issue Sep 18, 2018 · 24 comments
Closed

make a production-like workflow for HI 2018 #24587

slava77 opened this issue Sep 18, 2018 · 24 comments

Comments

@slava77
Copy link
Contributor

slava77 commented Sep 18, 2018

HI data taking for 2018 is supposed to be written in AOD and to be running without miniAOD outputs.
For some time I was using -s RAW2DIGI,L1Reco,RECO,EI,DQM:@standardDQM with output directed to AOD[SIM]. This stopped working in 10_3_0_pre3 due to some updates from #24380 .

I think that we need a relval setup to test the desired production configuration.

@mandrenguyen please comment [or redirect] on the needed processing steps in production.
Should we be running miniAOD, even if it's not needed?

@cmsbuild
Copy link
Contributor

A new Issue was created by @slava77 Slava Krutelyov.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@mandrenguyen
Copy link
Contributor

@slava77 No we shouldn't run miniAOD for the PbPb run. Indeed we just noticed today that things are not working running without miniAOD. Do you understand why this is happening? @stepobr this is going to require a bugfix. @cfmcginn this is issue you discovered today.

@slava77
Copy link
Contributor Author

slava77 commented Sep 18, 2018

assign pdmv

@cmsbuild
Copy link
Contributor

New categories assigned: pdmv

@prebello,@pgunnell,@zhenhu you have been requested to review this Pull request/Issue and eventually sign? Thanks

@stepobr
Copy link
Contributor

stepobr commented Sep 18, 2018

@slava77 , @mandrenguyen this is due to the absence of pfNoPileUpJME, exactly as @slava77 pointed out. Should I update the existing pull request or create a new one?

@mandrenguyen
Copy link
Contributor

@stepobr The original PR is merged already, so you'll need a new one. At some point we discussed setting a threshold on ak8 jets such that it doesn't use up any CPU (since we don't use it anyway). Perhaps that can be done at the same time. But the priority is get something that runs.

@cfmcginn
Copy link
Contributor

Hi - we will discuss and finalize how we address this in tomorrow's meeting, but just for record keeping here: my inclination is to not re-add this collection as it is only used for jet collections useless in HI that we keep only to not potentially break things in reco down the line.

Unless this collection is already dummy, we should just feed a filtered particleFlow with pt cut of 9999. to cut timing or feed particleFlow directly and kick up the the inputEtMin to 9999. on the FastJetProducer itself. The latter would reduce size completely and cuts timing for each algo by ~3. If it is dummy it would operate like former.

This is only from what I understand - if I am overlooking something please let me know.

@slava77
Copy link
Contributor Author

slava77 commented Sep 18, 2018

it looks like this missing pfNoPileUpJME is the only thing broken in the "production" setup using -s RAW2DIGI,L1Reco,RECO,EI,DQM:@standardDQM

@fabiocos
Copy link
Contributor

@slava77 @mandrenguyen is there a follow-up of this issue?

@slava77
Copy link
Contributor Author

slava77 commented Oct 23, 2018

In order to avoid future possible problems as we had in T0 replay recently, it would be still nice to resolve this issue and make a matrix workflow.

It looks like a bit more complete set in the DQM step would be to have DQM:@common+@standardDQM+@ExtraHLT
this is what was used in #24958

@fabiocos
Copy link
Contributor

@prebello @zhenhu could you please provide such update to PyReleaseValidation asap?

@prebello
Copy link
Contributor

Hi @fabiocos do you mean add DQM:@common+@standardDQM+@ExtraHLT sequence in HI relvals?

@zhenhu
Copy link
Contributor

zhenhu commented Oct 24, 2018

Hi @fabiocos , we can add a workflow with RAW data file as input, and run through
-s RAW2DIGI,L1Reco,RECO,EI,DQM:@common+@standardDQM+@ExtraHLT,
using 2018_pp_on_AA as era. Does it look good to you?
Do you have any suggested run number we could use?

@fabiocos
Copy link
Contributor

@zhenhu @prebello I have discussed with @icali, it would be good to have a realistic chain that combines all the ingredients:

  • the use of the remapper introduced in Raw data remapper #24819 on input data in HI-like format, as proposed in make a T0-like relval for 2018 HI #24619 (Ivan will provide the run numbers of tests collected in these last days)

  • the use of the Run2_2018_pp_on_AA era

  • the use of a realistic sequence like the one suggested by Slava

@icali
Copy link
Contributor

icali commented Oct 24, 2018

We collected the run 325174 this morning that has the (quasi) final L1+HLT menu. It includes both the datasets with full and reduced format.
Running the standard HI sequence (so --repack) should also run the mapper automatically.

@zhenhu
Copy link
Contributor

zhenhu commented Oct 24, 2018

I tried to add two new workflows with HIMinimumBias0 or HIMinimumBiasReducedFormat0 as input,
and run the standard HI sequence using the Run2_2018_pp_on_AA era.
My code is committed here: e0e3a60

The workflows run without crashes, but give me some warning messages, such as:

%MSG-w L1TStage2uGTTiming:  L1TStage2uGTTiming:l1tStage2uGTTiming@beginRun  24-Oct-2018 23:33:49 CEST Run: 324401
Algo "L1_SingleEG15" not found in the trigger menu L1Menu_Collisions2018_v2_0_0. Could not retrieve algo bit number.
%MSG-e L1T:  L1TRawToDigi:gtStage2Digis 24-Oct-2018 23:34:02 CEST  Run: 324401 Event: 14508
Cannot unpack: no FEDRawDataCollection found
%MSG-e L1TGlobalProduce:  L1TGlobalProducer:valGtStage2Digis  24-Oct-2018 23:34:02 CEST Run: 324401 Event: 15664
Could not find valid algo block. Setting prescale column to 1
%MSG-w JetMonitor:  JetMonitor:AK8PFJet200_Prommonitoring  24-Oct-2018 23:34:02 CEST Run: 324401 Event: 28182
Jet handle not valid 
%MSG-w DiJetMonitor:  DiJetMonitor:DiPFjetAve320_Prommonitoring  24-Oct-2018 23:34:02 CEST Run: 324401 Event: 28911
DiJet handle not valid 
%MSG-w SiStripRawToDigi:  SiStripRawToDigiModule:siStripDigis  24-Oct-2018 23:34:10 CEST Run: 324401 Event: 28182
NULL pointer to FEDRawData for FED: id 434
%MSG-w Invalid Data:  HcalRawToDigi:hcalDigis 24-Oct-2018 23:34:12 CEST  Run: 324401 Event: 28911
The default QIE10 Collection has 3 samples per digi, while the current data has 6!  This data cannot be included with the default collection.
%MSG-e EcalLaserDbService:  EcalRecHitProducer:ecalRecHit  24-Oct-2018 23:34:14 CEST Run: 324401 Event: 28182
Interpolated Laser correction <0 for detid 872420483
%MSG
----- Begin Fatal Exception 24-Oct-2018 23:34:16 CEST-----------------------
An exception of category 'Conditions mismatch' occurred while
   [0] Processing  Event run: 324401 lumi: 2 event: 14508 stream: 3
   [1] Running path 'eventinterpretaion_step'
   [2] Prefetching for module PFCandidateFwdPtrCollectionStringFilter/'pfAllMuonsEI'
   [3] Prefetching for module TPPFCandidatesOnPFCandidates/'pfNoPileUpEI'
   [4] Prefetching for module PFPileUp/'pfPileUpEI'
   [5] Prefetching for module PFCandidateFwdPtrProducer/'particleFlowPtrs'
   [6] Prefetching for module PFLinker/'particleFlow'
   [7] Prefetching for module PFProducer/'particleFlowTmp'
   [8] Prefetching for module PFBlockProducer/'particleFlowBlock'
   [9] Prefetching for module PFElecTkProducer/'pfTrackElec'
   [10] Prefetching for module GsfTrackProducer/'electronGsfTracks'
   [11] Prefetching for module CkfTrackCandidateMaker/'electronCkfTrackCandidates'
   [12] Prefetching for module ElectronSeedProducer/'ecalDrivenElectronSeeds'
   [13] Prefetching for module CaloTowersCreator/'towerMaker'
   [14] Prefetching for module HBHEIsolatedNoiseReflagger/'hbhereco'
   [15] Calling method for module HBHEPhase1Reconstructor/'hbheprereco'
Exception Message:
Requested conditions of type HcalGains for cell (0x45484001) (HE 16,1,4) got conditions for cell (0x0) 
----- End Fatal Exception -------------------------------------------------

Above errors can by reproduced by running the two new workflows:

runTheMatrix.py --what standard -l 140.56,140.57 -m 7500

The main cmsDriver used is:

step2 --conditions auto:run2_data -s RAW2DIGI,L1Reco,RECO,EI,DQM --process reRECO -n 30 --data --era Run2_2018_pp_on_AA --eventcontent AOD,DQM --runUnscheduled --scenario HeavyIons --datatier AOD,DQMIO --repacked --io RECOHID18.io --python RECOHID18.py --no_exec --filein filelist:step1_dasquery.log --lumiToProcess step1_lumiRanges.log --fileout file:step2.root --nThreads 4

Do you have any suggestion how to proceed?

@icali
Copy link
Contributor

icali commented Oct 24, 2018

I saw the same error using the standard GT provided by the cmsDriver command. However, changing the GT to 103X_dataRun2_Prompt_v2, it solved the issue. However, I'm not sure what would it be the best GT to be used.

@zhenhu
Copy link
Contributor

zhenhu commented Oct 24, 2018

Hi @icali , the new GT does not work for me. To reproduce my error, you can first add the two new wfs as in e0e3a60
Then run the following command to produce the RECOHID18.py:

runTheMatrix.py --what standard -l 140.56 -t 4 -m 7500 --command=' --conditions=103X_dataRun2_Prompt_v2' -b 'HIN_data' --noCaf --wm init

@zhenhu
Copy link
Contributor

zhenhu commented Oct 25, 2018

Hi all,

By comparing @icali 's config with mine, we finally found the source of the errors.

In Ivan's config, he used:

process.reconstruction_step = cms.Path(process.reconstruction)

, while I was using:

process.reconstruction_step = cms.Path(process.reconstructionHeavyIons)

The reason of the above difference is because I have '--scenario':'HeavyIons' in my cmsDriver, but Ivan used the default scenario which is pp.

So, in order to make my reco step work, I committed a new version which is exactly same as Ivan's config:
23e154f
Now the reco steps works fine.

But the EI step still has errors. Any suggestion on this?

----- Begin Fatal Exception 25-Oct-2018 01:57:55 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 325174 lumi: 1 event: 25598 stream: 3
   [1] Running path 'AODoutput_step'
   [2] Prefetching for module PoolOutputModule/'AODoutput'
   [3] Prefetching for module JetTagProducer/'pfCombinedInclusiveSecondaryVertexV2BJetTagsEI'
   [4] Prefetching for module CandIPProducer/'pfImpactParameterTagInfosEI'
   [5] Prefetching for module FastjetJetProducer/'pfJetsEI'
   [6] Prefetching for module TPPFCandidatesOnPFCandidates/'pfNoElectronJME'
   [7] Prefetching for module TPPFCandidatesOnPFCandidates/'pfNoMuonJME'
   [8] Prefetching for module TPPFCandidatesOnPFCandidates/'pfNoPileUpJMEEI'
   [9] Calling method for module PFPileUp/'pfPileUpJMEEI'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: std::vector<reco::Vertex>
Looking for module label: goodOfflinePrimaryVertices
Looking for productInstanceName: 

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "SkipEvent = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

@slava77
Copy link
Contributor Author

slava77 commented Oct 25, 2018 via email

@fabiocos
Copy link
Contributor

#24958 has been merged (as well as its 10_3_X backport), so in next IB you should not see any more the issue

@zhenhu
Copy link
Contributor

zhenhu commented Oct 25, 2018

Two pull requests #25005 and #25006 (backport) created with new workflows 140.56 and 140.57.
Local tests successful.

@fabiocos
Copy link
Contributor

@slava77 I think that #25005 is addressing this issue and to some extent #24619

@slava77
Copy link
Contributor Author

slava77 commented Nov 9, 2018

@slava77 I think that #25005 is addressing this issue and to some extent #24619

Looking at 104X IB 2018-11-07:

  • 140.56
    • input from: /HIMinimumBias0/Tier0_REPLAY_vocms015-v214/RAW with run [325174]
    • step2 highlights: -s RAW2DIGI,L1Reco,RECO,ALCA:SiStripCalZeroBias+SiPixelCalZeroBias,SKIM:PbPbEMu+PbPbZEE+PbPbZMM,EI,DQM:@common+@standardDQM+@ExtraHLT --era Run2_2018_pp_on_AA --eventcontent AOD,DQM
  • 140.57
    • input from: /HIMinimumBiasReducedFormat0/Tier0_REPLAY_vocms015-v214/RAW with run [325174]
    • step2 highlights: -s RAW2DIGI,L1Reco,RECO,ALCA:SiStripCalZeroBias+SiPixelCalZeroBias,SKIM:PbPbEMu+PbPbZEE+PbPbZMM,EI,DQM:@common+@standardDQM+@ExtraHLT --era Run2_2018_pp_on_AA --eventcontent AOD,DQM

I agree. This satisfies the original request. I'm closing this issue.

@slava77 slava77 closed this as completed Nov 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants