Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modules prohibiting concurrent lumis in RelVal workflows #25090

Closed
gartung opened this issue Nov 1, 2018 · 143 comments · Fixed by #29784
Closed

Modules prohibiting concurrent lumis in RelVal workflows #25090

gartung opened this issue Nov 1, 2018 · 143 comments · Fixed by #29784

Comments

@gartung
Copy link
Member

gartung commented Nov 1, 2018

I took the expanded python output from all of the gen-sim step of runTheMatrix jobs. From that I got a list of module names. Attached in the list of modules and their base class. There are two legacy modules, one EDAnalyzer and one EDFilter.
EDAnalyzer MEtoMEComparitor
EDFilter CosmicGenFilterHelix

gen-sim-modules.txt

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2018

A new Issue was created by @gartung Patrick Gartung.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@gartung gartung changed the title Legacy modulew in GEN-SIM relval workflows Legacy modules in relval workflows Nov 1, 2018
@Dr15Jones
Copy link
Contributor

See #25093

@Dr15Jones
Copy link
Contributor

assign core

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2018

New categories assigned: core

@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@gartung
Copy link
Member Author

gartung commented Nov 5, 2018

Legacy modules in the ALCA workflows

AlCaDiJetsProducer
AlCaECALRecHitReducer
AlCaEcalHcalReadoutsProducer
AlCaElectronTracksReducer
AlCaGammaJetProducer
AlCaHOCalibProducer
AlignmentMuonSelectorModule
BSCTrigger
CSCTFCandidateProducer
CSCTFTrackProducer
CentralityProducer
DQMHistNormalizer
DTCalibMuonSelection
DetStatus
EcalBarrelSimHitsValidation
EcalEndcapSimHitsValidation
EcalPreshowerSimHitsValidation
EcalSimHitsValidation
ElectronMatchedCandidateProducer
EventContentAnalyzer
EventWithHistoryEDFilter
FilterOutScraping
HLTHighLevel
HLTTauMCProducer
HLTTauRefCombiner
HcalCalibTypeFilter
L1Comparator
L1GctEmulator
L1RCTProducer
L1ScalersClient
L1TCaloRCTToUpgradeConverter
L1TStage1Layer2Producer
LHECOMWeightProducer
LSNumberFilter
LaserAlignmentT0Producer
MEtoMEComparitor
MuonAlignment
ParticleTowerProducer
PrescalerFHN
QualityTester
RPCTechnicalTrigger
RPCTrigger
ShallowEventDataProducer
ShallowGainCalibration
ShallowTracksProducer
SiStripBFieldFilter
SiStripDCSFilter
TauDQMFileLoader

@gartung
Copy link
Member Author

gartung commented Nov 5, 2018

Legacy modules in DIGI2RAW workflows (can include DIGI, L1, HLT)

BSCTrigger
CSCTFCandidateProducer
CSCTFTrackProducer
CastorDigiToRaw
CentralityProducer
DQMHistNormalizer
EcalBarrelSimHitsValidation
EcalEndcapSimHitsValidation
EcalPreshowerSimHitsValidation
EcalSimHitsValidation
EventContentAnalyzer
GctDigiToRaw
GenHIEventProducer
HLTDisplacedmumumuVtxProducer
HLTTauMCProducer
HLTTauRefCombiner
L1GTDigiToRaw
L1GTEvmDigiToRaw
L1GctEmulator
L1RCTProducer
LHECOMWeightProducer
MEtoMEComparitor
ParticleTowerProducer
QualityTester
RPCTechnicalTrigger
RPCTrigger
TauDQMFileLoader

@gartung
Copy link
Member Author

gartung commented Nov 5, 2018

Legacy modules in RECO workflows (including DQM in many cases)

BSCTrigger
CSCTFCandidateProducer
CSCTFTrackProducer
CentralityProducer
DQMHistNormalizer
EcalBarrelSimHitsValidation
EcalEndcapSimHitsValidation
EcalPreshowerSimHitsValidation
EcalSimHitsValidation
EventContentAnalyzer
FilterOutScraping
HLTTauMCProducer
HLTTauRefCombiner
L1Comparator
L1GctEmulator
L1RCTProducer
L1ScalersClient
L1TCaloRCTToUpgradeConverter
L1TStage1Layer2Producer
LHECOMWeightProducer
MEtoMEComparitor
ParticleTowerProducer
QualityTester
RPCTechnicalTrigger
RPCTrigger
TauDQMFileLoader

@gartung
Copy link
Member Author

gartung commented Nov 5, 2018

Legacy modules in HARVESTING workflows

AlcaBeamMonitorClient
BSCTrigger
CSCTFCandidateProducer
CSCTFTrackProducer
ConversionPostprocessing
DQMDaqInfo
DQMDcsInfoClient
DQMFEDIntegrityClient
DQMHistNormalizer
DQMMessageLoggerClient
DQMOfflineHLTEventInfoClient
DQMReferenceHistogramRootFileEventSetupAnalyzer
ESDaqInfoTask
ESDataCertificationTask
ESDcsInfoTask
EgHLTOfflineSummaryClient
HLTInclusiveVBFClient
HLTTauRelvalQTester
L1Comparator
L1GctEmulator
L1RCTProducer
L1ScalersClient
L1TCaloRCTToUpgradeConverter
L1TStage1Layer2Producer
ParticleTowerProducer
PhotonPostprocessing
QualityTester
RPCTechnicalTrigger
RPCTrigger
SiStripCertificationInfo
SiStripDaqInfo
SiStripDcsInfo
SiStripOfflineDQM
TauDQMFileLoader

@gartung
Copy link
Member Author

gartung commented Nov 5, 2018

@Dr15Jones these are correct now. 'grep -f legacy.modules.list' treated the module names as regex not exact match.

@prebello
Copy link
Contributor

prebello commented Nov 6, 2018

@gartung may I kindly ask if you intend to change/update (or have changed) something in relvals outputs based on this issue? What is the purpose of it?
Thank you.

@Dr15Jones
Copy link
Contributor

assign simulation, reconstruction, dqm

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 6, 2018

New categories assigned: dqm,reconstruction,simulation

@mdhildreth,@jfernan2,@slava77,@andrius-k,@schneiml,@perrotta,@kmaeshima,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Dr15Jones
Copy link
Contributor

@prebello I asked Patrick to gather the list of legacy modules. These modules are causing CPU efficiency problems, in particular they prohibit the framework from being able to processing multiple LuminosityBlocks concurrently.

It would be best if the different L2 areas could 'poke' the responsible persons to get the various modules changed to be one, stream or global modules.

With that in mind, the core group will be spending effort in at least changing ones in the RECO workflow.

@Dr15Jones
Copy link
Contributor

I made a small modification to the framework so that if concurrent LuminosityBlocks are configured for a job, the framework will print out a list of modules which require synchronization on LuminosityBlock boundaries. With that in place I used workflow 1.0 to find modules:

GEN+SIM:

  Pythia8GeneratorFilter generator
  GenXSecAnalyzer genXSecAnalyzer
  MEtoEDMConverter MEtoEDMConverter
  PoolOutputModule RAWSIMoutput

DIGI:

  L1RCTProducer simRctDigis
  L1GctEmulator simGctDigis
  CSCTFTrackProducer simCsctfTrackDigis
  CSCTFCandidateProducer simCsctfDigis
  RPCTrigger simRpcTriggerDigis
  BSCTrigger simBscDigis
  RPCTechnicalTrigger simRpcTechTrigDigis
  GctDigiToRaw gctDigiToRaw
  L1GTDigiToRaw l1GtPack
  L1GTEvmDigiToRaw l1GtEvmPack
  CastorDigiToRaw castorRawData
  HLTrigReport hltTrigReport
  MEtoEDMConverter MEtoEDMConverter
  PoolOutputModule RAWSIMoutput

RECO:

  MEtoEDMConverter MEtoEDMConverter
  ConditionDumperInEdm conditionsInEdm
  PoolOutputModule AODSIMoutput
  PoolOutputModule MINIAODSIMoutput

@Dr15Jones
Copy link
Contributor

If I use workflow 137.8 and run step3 which does

RAW2DIGI,L1Reco,RECO,SKIM:ZElectron+SinglePhotonJetPlusHOFilter+EXOMONOPOLE,EI,PAT,ALCA:SiStripCalZeroBias+SiStripCalMinBias+SiStripCalSmallBiasScan+TkAlMinBias+EcalESAlign,DQM:@standardDQM+@ExtraHLT+@miniAODDQM+@L1TEgamma 

Then the following module types cause LuminosityBlock synchronization

AlcaBeamMonitor
B2GDQM
BPHMonitor
BPhysicsOniaDQM
BTVHLTOfflineSource
BTagPerformanceAnalyzerOnData
CSCMonitorModule
CSCOfflineMonitor
CTPPSCommonDQMSource
CTPPSDiamondDQMSource
CTPPSPixelDQMSource
CaloTowersAnalyzer
CastorMonitorModule
DQMDcsInfo
DQMEventInfo
DQMMessageLogger
DQMPFCandidateAnalyzer
DQMRootOutputModule
DTChamberEfficiency
DTDCSByLumiTask
DTDataIntegrityTask
DTResolutionAnalysisTask
DTRunConditionVar
DTSegmentAnalysisTask
DTSegmentsTask
DTTriggerEfficiencyTask
DetStatus
DiDispStaMuonMonitor
DiJetMonitor
DiMuonHistograms
DigiPhase1Task
DigiTask
ESFEDIntegrityTask
ESIntegrityTask
ESOccupancyTask
ESRawDataTask
ESRecoSummary
ESTrendTask
EcalDQMonitorTask
EcalFEDMonitor
EcalPileUpDepMonitor
EcalZmassTask
EfficiencyAnalyzer
EgHLTOfflineSource
ElectronAnalyzer
ElectronGeneralAnalyzer
ElectronTagProbeAnalyzer
EventWithHistoryEDFilter
ExoticaDQM
FSQDQM
GeneralHLTOffline
HLTElePhoTagAndProbeOfflineSource
HLTEleTagAndProbeOfflineSource
HLTHighLevel
HLTInclusiveVBFSource
HLTMuEleTagAndProbeOfflineSource
HLTMuPhoTagAndProbeOfflineSource
HLTMuonOfflineAnalyzer
HLTObjectsMonitor
HLTTauDQMOfflineSource
HTMonitor
HcalNoiseRates
HcalRecHitsAnalyzer
HigPhotonJetHLTOfflineSource
HiggsDQM
JetAnalyzer
JetMETHLTOfflineSource
JetMonitor
L1TEGammaOffline
L1TMP7ZeroSupp
L1TObjectsTiming
L1TStage2BMTF
L1TStage2CaloLayer1
L1TStage2CaloLayer2
L1TStage2EMTF
L1TStage2MuonComp
L1TStage2OMTF
L1TStage2RegionalMuonCandComp
L1TStage2uGMT
L1TStage2uGMTMuon
L1TStage2uGTCaloLayer2Comp
L1TStage2uGTTiming
L1TdeStage2CaloLayer2
L1TdeStage2EMTF
L1TdeStage2uGT
LepHTMonitor
LogMessageMonitor
METAnalyzer
METMonitor
MonitorTrackResiduals
MuonEnergyDepositAnalyzer
MuonIdDQM
MuonIsolationDQM
MuonKinVsEtaAnalyzer
MuonMiniAOD
MuonMonitor
MuonPFAnalyzer
MuonRecoAnalyzer
MuonRecoOneHLT
MuonSeedsAnalyzer
NoBPTXMonitor
ObjMonitor
PFCandidateDQMAnalyzer
PFJetDQMAnalyzer
PFMETDQMAnalyzer
PFMuonDQMAnalyzer
PackedCandidateTrackValidator
PhotonAnalyzer
PhotonMonitor
PiZeroAnalyzer
PoolOutputModule
PrimaryVertexMonitor
PrimaryVertexResolution
QcdPhotonsDQM
RPCDcsInfo
RPCEfficiency
RPCFEDIntegrity
RPCMonitorDigi
RPCRecHitProbability
RawTask
RazorMonitor
RecHitTask
RecoSusyDQM
SUSYDQMAnalyzer
SegmentTrackAnalyzer
SiPixelPhase1Clusters
SiPixelPhase1DeadFEDChannels
SiPixelPhase1Digis
SiPixelPhase1RawData
SiPixelPhase1RecHits
SiPixelPhase1TrackClusters
SiPixelPhase1TrackEfficiency
SiPixelPhase1TrackResiduals
SiStripDCSFilter
SiStripFEDCheckPlugin
SiStripFEDMonitorPlugin
SiStripMonitorCluster
SiStripMonitorDigi
SiStripMonitorTrack
SinglePhotonJetPlusHOFilter
SingleTopTChannelLeptonDQM
SingleTopTChannelLeptonDQM_miniAOD
TPTask
TagAndProbeBtagTriggerMonitor
Tau3MuMonitor
TauTagValidation
TkAlCaRecoMonitor
TopDiLeptonHLTOfflineDQM
TopDiLeptonOfflineDQM
TopMonitor
TopSingleLeptonDQM
TopSingleLeptonDQM_miniAOD
TopSingleLeptonHLTOfflineDQM
TotemDAQTriggerDQMSource
TotemRPDQMHarvester
TotemRPDQMSource
TotemTimingDQMSource
TrackEfficiencyMonitor
TrackSplittingMonitor
TrackingMonitor
TrackingRecoMaterialAnalyser
TriggerMatchMonitor
V0Monitor
ZToMuMuGammaAnalyzer
dEdxAnalyzer
dEdxHitAnalyzer

(Note, I dropped showing the module labels because of the huge number of different instances of the same type. If I kept the module label list the list would have been 789 entries long.)

@Dr15Jones
Copy link
Contributor

I used the Tracer service to determine what parts of workflow 1.0 need MEtoEDMConverter and ConditionDumperInEdm. Turns out no module consumes the output of MEtoEDMConverter while the PoolOutputModule does get the data from ConditionDumperInEdm.

Even though no module wants data from MEtoEDMConverter, the framework is required to still call that module on Run and LuminosityBlock boundaries (since modules can have side-effects from those transitions). Given it is not doing anything useful, MEtoEDMConverter should be removed from RECO workflows.

@schneiml
Copy link
Contributor

Slightly off topic, but sort of related: we should figure out which workflows use MEtoEDM/EDMtoME and for which reasons. The main consumer (to my knowledge) is Alca, but I am always surprised to hear how many WF load the module.

Your observation that it gets loaded but not really used seems very plausible to me.

@Dr15Jones
Copy link
Contributor

The PoolOutputModule in the list of problematic modules was a false positive. I have now corrected the algorithm used to identify such modules.

@Dr15Jones
Copy link
Contributor

@schneiml using

edmConfigTree 'MEtoEDMConverter' step3_RAW2DIGI_L1Reco_RECO_RECOSIM_EI_PAT.py

I traced the inclusion of MEtoEDMConverter in the RECO sequence to

     Configuration.StandardSequences.EndOfProcess_cff
   |  |     from DQMServices.Components.MEtoEDMConverter_cfi import *
   |  |     endOfProcess=cms.Sequence(MEtoEDMConverter)
   |  |     endOfProcess_withComparison = cms.Sequence(MEtoEDMConverter+MEtoMEComparitor)
   |    DQMServices.Components.MEtoEDMConverter_cfi
   |  |       MEtoEDMConverter = cms.EDProducer("MEtoEDMConverter",
   |  |           Name = cms.untracked.string('MEtoEDMConverter'),
   |    DQMServices.Components.MEtoMEComparitor_cfi
   |                                            MEtoEDMLabel = cms.string('MEtoEDMConverter'),
   |                                            lumiInstance = cms.string('MEtoEDMConverterLumi'),
   |                                            runInstance = cms.string('MEtoEDMConverterRun'),

where the cmsDriver created configuration file contains the line

process.load('Configuration.StandardSequences.EndOfProcess_cff')

@Dr15Jones
Copy link
Contributor

@schneiml there is already an open issue on this problem #23403

@schneiml
Copy link
Contributor

Oh, that sounds like a fairly simple (and bad) situation. Thanks for digging out that old issue!

And edmConfigTree sounds like another very useful tool I have never heard about. Could be handy now and then.

@slava77
Copy link
Contributor

slava77 commented May 24, 2019

@gartung @Dr15Jones
is this issue still active?

Perhaps a new list can be made, since the old list is 6 months old now.

@jfernan2
Copy link
Contributor

+1

@missirol
Copy link
Contributor

+hlt

As far I can see, there is no outstanding item for HLT here (if not, let me know, and I'll undo the signature).

Thanks to @Dr15Jones, @Martin-Grunewald and @makortel for making the updates in #27961 #28175 #28445 #30987 (and maybe other PRs I missed).

@makortel
Copy link
Contributor

@gartung Could you close this issue? Thanks!

@francescobrivio
Copy link
Contributor

@makortel following the discussion in #37071 concerning the DQM/BeamMonitor migration, there are a few cases where DQMOneLumiEDAnalyzer<> is still used (see for example https://github.com/cms-sw/cmssw/blob/master/DQM/BeamMonitor/plugins/BeamSpotProblemMonitor.h#L26), and this blocks concurrent lumis AFAIU.
Since I'm about to migrate the other plugins in that directory, I'd like to know which is the preferred way to handle DQM cases where lumi (o run) transitions are needed:

Maybe you could make a list of the "allowed" (preferred) solutions and start phasing out the ones that should be deprecated? (Sorry if this list already exists and I missed it!)

Thanks for the help!

@makortel
Copy link
Contributor

makortel commented Mar 7, 2022

  • is DQMOneLumiEDAnalyzer<> allowed?

According to documentation

DQMOneLumiEDAnalyzer based on edm::one::EDProducer. Used when begin/end lumi transitions are needed. Blocks concurrent lumisections.

that would be "no".

  • or should I use DQMOneEDAnalyzer with LuminosityBlockCache?

I believe that would be the preferred way. At least all uses of LuminosityBlockCache in DQM are in conjunction with DQMOneEDAnalyzer. I took a look of several of those, and found that many of them are not actually safe wrt. concurrent lumis; the logic is either clearly wrong, or potentially wrong (e.g. looks wrong but code is complicated so it is not easy to figure it out). E.g. the two such modules in DQM/BeamMonitor, AlcaBeamMonitor and OnlineBeamMonitor, have clear logic errors towards concurrent lumis and IOVs. (I'll follow up these in a separate issue)

Maybe you could make a list of the "allowed" (preferred) solutions and start phasing out the ones that should be deprecated?

If your suggestion is specifically about DQM modules, it would be up to @cms-sw/dqm-l2. Based on the documentation I'd avoid DQMOneLumiEDAnalyzer and edm::EDAnalyzer base classes.

If your question was more general, only edm::one module type is affected, and the base class types to be avoided are those with BeginLuminosityBlockProducer, EndLuminosityBlockProducer, or WatchLuminosityBlock extension without LuminosityBlockCache (same for ...Run...). This information is technically in the documentation, but buried in the details, which is something we could improve.

@cecilecaillol
Copy link
Contributor

+l1

@civanch
Copy link
Contributor

civanch commented Jun 9, 2022

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 9, 2022

This issue is fully signed and ready to be closed.

@qliphy qliphy closed this as completed Jun 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment