Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase2 workflow fixes #14165

Closed
wants to merge 7 commits into from
Closed

Conversation

kpedro88
Copy link
Contributor

This is the promised "workflow fix" PR for Phase2. This is just meant to get some things nominally working before 8_1_0_pre3. Further reorganizations will come later.

Major changes:

  1. local reco workflows now use full local reco instead of just tracker local reco
  2. 2023 workflows now use the Phase2 era
  3. rename 2023dev -> 2023tilted

Various bug fixes/customizations/patches:

  1. turn off useLumiInfoRunHeader in ecalMultiFitUncalibRecHit to fix ECAL bunchSpacingProducer exception
  2. turn off all "recover" options in ecalRecHit to avoid use of EcalRecHitWorkerRecover, which has non-optional use of EcalEndcapGeometryRecord that causes an exception in Phase2 geometries (a better fix for this should come later)
  3. additional ZDC customization to handle lack of DIGI2RAW/RAW2DIGI for HCAL
  4. temporary fix from @jshlee to prevent ME0 crash

Known warnings:

  1. Expected since HEback isn't set up in the hexagonal geometry yet:
%MSG-e HGCDigitizer:  MixingModule:mix 20-Apr-2016 02:57:06 CDT Run: 1 Event: 1
 @ accumulate : can't find HGCHitsHEback collection of g4SimHits
  1. CSC people are looking into this one, the actual file address is /cvmfs/cms-ib.cern.ch/2016-17/slc6_amd64_gcc530/cms/CSCTrackFinderEmulation/1.2/data/L1Trigger/CSCTrackFinder/data/core_2014_05_15/comp_dphi_8.dat with a slash between data and L1Trigger:
Cannot read file: /cvmfs/cms-ib.cern.ch/2016-17/slc6_amd64_gcc530/cms/CSCTrackFinderEmulation/1.2/dataL1Trigger/CSCTrackFinder/data/core_2014_05_15/comp_dphi_8.dat, addr: 0

I tested this using the normal and tilted FourMuPt1_200 upgrade workflows:

runTheMatrix.py -w upgrade -l 11000.0 > & matrix_11000.log &
runTheMatrix.py -w upgrade -l 10600.0 > & matrix_10600.log &

10 events run to completion for steps 1 and 2 with no crashes or exceptions.

attn: @boudoul, @lgray, @bsunanda, @ianna, @calabria

@davidlange6 - should we add upgrade workflows into the PR/IB standard tests in this PR, or wait?

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kpedro88 (Kevin Pedro) for CMSSW_8_1_X.

It involves the following packages:

Configuration/Geometry
Configuration/PyReleaseValidation
Configuration/StandardSequences
Geometry/CMSCommonData
RecoLocalCalo/Configuration
RecoLocalCalo/EcalRecProducers
RecoLocalMuon/GEMSegment
SLHCUpgradeSimulations/Configuration
SLHCUpgradeSimulations/Geometry

@civanch, @Dr15Jones, @cvuosalo, @ianna, @mdhildreth, @fabozzi, @cmsbuild, @srimanob, @franzoni, @slava77, @hengne, @davidlange6 can you please review it and eventually sign? Thanks.
@ghellwig, @makortel, @GiacomoSguazzoni, @rovere, @VinInn, @Martin-Grunewald, @bellan, @jhgoh, @cerati, @argiro, @dgulhan this is something you requested to watch as well.
@slava77, @Degano, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@kpedro88
Copy link
Contributor Author

@cmsbuild please test

@cmsbuild
Copy link
Contributor

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/12517/console

@@ -63,3 +63,6 @@
chi2ThreshEE_ = cms.double(50.0),
)
)

from Configuration.StandardSequences.Eras import eras
eras.phase2_common.toModify( ecalMultiFitUncalibRecHit.algoPSet, useLumiInfoRunHeader = cms.bool(False) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying problem is that bunchSpacingProducer is separate from localreco:
https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X/Configuration/StandardSequences/python/Reconstruction_cff.py#L101

Without it, there is an exception when running ecalMultiFitUncalibRecHit. The above customization seemed like the easiest way to prevent the exception. I'm open to other solutions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add it to the sequence you are running (at the start).
It's already in the localreco in HI and cosmics.
There is no harm from listing the same module several times in different sequences

@cmsbuild
Copy link
Contributor

-1

Tested at: e26834b

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-14165/12517/summary.html

I found follow errors while testing this PR

Failed tests: RelVals AddOn

  • RelVals:

When I ran the RelVals I found an error in the following worklfows:
140.53 step2

runTheMatrix-results/140.53_RunHI2011+RunHI2011+RECOHID11+HARVESTDHI/step2_RunHI2011+RunHI2011+RECOHID11+HARVESTDHI.log
1000.0 step2
runTheMatrix-results/1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/step2_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT.log
1001.0 step2
runTheMatrix-results/1001.0_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4/step2_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4.log
1003.0 step2
runTheMatrix-results/1003.0_RunMinBias2012A+RunMinBias2012A+RECODDQM+HARVESTDDQM/step2_RunMinBias2012A+RunMinBias2012A+RECODDQM+HARVESTDDQM.log
  • AddOn:

I found errors in the following addon tests:

cmsDriver.py RelVal -s HLT:PRef,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_PRef --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016 --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_PRef_DATA.root --fileout file:RelVal_Raw_PRef_DATA_HLT_RECO.root : FAILED - time: date Wed Apr 20 14:15:21 2016-date Wed Apr 20 14:08:39 2016 s - exit: 16640
cmsDriver.py RelVal -s HLT:HIon,RAW2DIGI,L1Reco,RECO --data --scenario=HeavyIons -n 10 --conditions auto:run2_data_HIon --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016,Run2_HI --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_HIon_DATA.root --fileout file:RelVal_Raw_HIon_DATA_HLT_RECO.root : FAILED - time: date Wed Apr 20 14:17:15 2016-date Wed Apr 20 14:08:40 2016 s - exit: 16640
cmsDriver.py RelVal -s HLT:PIon,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_PIon --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016 --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_PIon_DATA.root --fileout file:RelVal_Raw_PIon_DATA_HLT_RECO.root : FAILED - time: date Wed Apr 20 14:15:46 2016-date Wed Apr 20 14:09:53 2016 s - exit: 16640
cmsDriver.py RelVal -s HLT:Fake,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run1_data_Fake --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --processName=HLTRECO --filein file:RelVal_Raw_Fake_DATA.root --fileout file:RelVal_Raw_Fake_DATA_HLT_RECO.root : FAILED - time: date Wed Apr 20 14:20:59 2016-date Wed Apr 20 14:18:06 2016 s - exit: 16640
cmsDriver.py RelVal -s HLT:GRun,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_GRun --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_25ns --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_GRun_DATA.root --fileout file:RelVal_Raw_GRun_DATA_HLT_RECO.root : FAILED - time: date Wed Apr 20 14:25:52 2016-date Wed Apr 20 14:19:30 2016 s - exit: 16640
cmsDriver.py RelVal -s HLT:Fake1,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_Fake1 --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_25ns --processName=HLTRECO --filein file:RelVal_Raw_Fake1_DATA.root --fileout file:RelVal_Raw_Fake1_DATA_HLT_RECO.root : FAILED - time: date Wed Apr 20 14:24:40 2016-date Wed Apr 20 14:21:04 2016 s - exit: 16640

@boudoul
Copy link
Contributor

boudoul commented Apr 20, 2016

looks like the errors are not related to this PR (other PRs are suffering from the same errors ..)

@kpedro88
Copy link
Contributor Author

@boudoul - yes, I've seen them in other PRs also. It appears to be a problem with jet corrections:

An exception of category 'NoRecord' occurred while
   [0] Processing run: 165121 lumi: 62 event: 23609118
   [1] Running path 'reconstruction_step'
   [2] Calling event method for module L1FastjetCorrectorProducer/'ak4PFL1FastjetCorrector'
Exception Message:
No "JetCorrectionsRecord" record found in the EventSetup for synchronization value
Run: 165121 LuminosityBlock: 62 Event: 0 Time: 5607255048184066530
 Please add an ESSource or ESProducer that delivers such a record.

@hengne
Copy link
Contributor

hengne commented Apr 20, 2016

@boudoul @kpedro88 it is xrootd access to the input files.. my PR is also suffering this since yesterday evening: #14098

@cmsbuild
Copy link
Contributor

Pull request #14165 was updated. @civanch, @Dr15Jones, @cvuosalo, @ianna, @mdhildreth, @fabozzi, @cmsbuild, @srimanob, @franzoni, @slava77, @hengne, @davidlange6 can you please check and sign again.

@kpedro88
Copy link
Contributor Author

@slava77 latest commit addresses your recommendation about bunchSpacingProducer

@@ -194,4 +196,5 @@
#
reconstruction_standard_candle = cms.Sequence(localreco*globalreco*vertexreco*recoJetAssociations*btagging*electronSequence*photonSequence)


from Configuration.StandardSequences.Eras import eras
eras.phase2_common.toReplaceWith( localreco, _phase2_localreco )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe easier to add globally, no eras

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that allowed? I don't want to screw up other workflows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be OK, we'll see in the tests anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submitted as a separate commit so it can be reverted back to the Era-dependent version if necessary

@kpedro88
Copy link
Contributor Author

Update on known warning 2: it's unrelated to Phase2, caused by a change in the external. I submitted an issue against the external: cms-externals/CSCTrackFinderEmulation#3

@cmsbuild
Copy link
Contributor

Pull request #14165 was updated. @civanch, @Dr15Jones, @cvuosalo, @ianna, @mdhildreth, @fabozzi, @cmsbuild, @srimanob, @franzoni, @slava77, @hengne, @davidlange6 can you please check and sign again.

@smuzaffar
Copy link
Contributor

@kpedro88 , cms-externals/CSCTrackFinderEmulation tool file is updated to have the correct path. Next IB (23h00) should have the fix.

@kpedro88
Copy link
Contributor Author

@smuzaffar great, thanks

@kpedro88
Copy link
Contributor Author

@cmsbuild please test

@cmsbuild
Copy link
Contributor

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/12530/console

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

@kpedro88
Copy link
Contributor Author

The comparison looks good to me.

@davidlange6 @slava77 @civanch @ianna @hengne - can you sign ASAP so we can make 810pre3?

@davidlange6
Copy link
Contributor

all - will merge this tonight as to build pre3 unless there are issues raised in the meantime

davidlange6 added a commit that referenced this pull request Apr 21, 2016
Add phase ii scenarios to runTheMatrix on top of #14098 #14165
@kpedro88
Copy link
Contributor Author

@davidlange6 did you mean to merge this?

@kpedro88
Copy link
Contributor Author

Ah I see, it was included in #14181

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants