Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in TkAlCaRecoMonitor:ALCARECOTkAlMuonIsolatedTkAlDQM #23439

Closed
fabiocos opened this issue Jun 4, 2018 · 24 comments
Closed

Problem in TkAlCaRecoMonitor:ALCARECOTkAlMuonIsolatedTkAlDQM #23439

fabiocos opened this issue Jun 4, 2018 · 24 comments

Comments

@fabiocos
Copy link
Contributor

fabiocos commented Jun 4, 2018

An error message has been reported by @prebello while testing the cmsDriver sequence for the reprocessing of the 2018A data affected by the T0 deletion problem, see https://hypernews.cern.ch/HyperNews/CMS/get/prep-ops/5378/1/1.html

The error message is

%MSG-e Alignment: TkAlCaRecoMonitor:ALCARECOTkAlMuonIsolatedTkAlDQM 02-Jun-2018 18:39:25 CEST Run: 315252 Event: 377936
invalid trackcollection encountered!
%MSG

triggered apparently in https://cmssdt.cern.ch/lxr/source/DQMOffline/Alignment/src/TkAlCaRecoMonitor.cc#0190

I confirm the problem in 10_1_5, 10_1_6 and in the present 10_2_X head. This issue is not seen in the recent 2018 relval tests as the corresponding module seems not run, the ALCA sequence used in 136.855 being different than the tested one for the reprocessing.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2018

A new Issue was created by @fabiocos Fabio Cossutti.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@fabiocos
Copy link
Contributor Author

fabiocos commented Jun 4, 2018

assign dqm,alca,pdmv

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2018

New categories assigned: dqm,pdmv,alca

@jfernan2,@lpernie,@prebello,@vazzolini,@franzoni,@fabozzi,@vanbesien,@GurpreetSinghChahal,@arunhep,@kmaeshima,@dmitrijus,@cerminar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@fabiocos
Copy link
Contributor Author

fabiocos commented Jun 6, 2018

the issue looks present in T0 files, for instance looking randomly at run 317382, and it appears related to the cout message

ERROR: No beam sport found!

from

https://cmssdt.cern.ch/lxr/source/DQMOffline/Muon/src/MuonRecoAnalyzer.cc#0461

that has been upgraded to LogInfo in 10_2_X

@fabozzi
Copy link
Contributor

fabozzi commented Jun 6, 2018

Apart from the typo in the message, but is this message really relevant? If not, the authors of this module should find a way to assure people looking the logs that reconstruction is not broken... or simply suppress this message

@fabiocos
Copy link
Contributor Author

fabiocos commented Jun 6, 2018

The problem "ERROR: No beam sport found!" appears as a buggy check in 10_1_X that has been fixed in 10_2_X, a backport is for sure useful

@fabiocos
Copy link
Contributor Author

fabiocos commented Jun 6, 2018

concerning the original problem, the DQM module triggering the problem is in the sequence

process.pathALCARECOTkAlMuonIsolated = cms.Path(process.ALCARECOTkAlMuonIsolatedHLT+process.ALCARECOTkAlMuonIsolatedDCSFilter+process.ALCARECOTkAlMuonIso
latedGoodMuons+process.ALCARECOTkAlMuonIsolatedRelCombIsoMuons+process.ALCARECOTkAlMuonIsolated, cms.Task(process.ALCARECOTkAlMuonIsolatedTkAlDQM, process.ALCARECOTkAlMuonIsolatedTrackingDQM))

defined in https://cmssdt.cern.ch/lxr/source/DQMOffline/Alignment/python/ALCARECOTkAlDQM_cff.py
where the input producer for ALCARECOTkAlMuonIsolatedTkAlDQM is ALCARECOTkAlMuonIsolated

For the events with crash that module of the path is simply not executed, as can be seen in the Tracer output (not running in multithreaded mode for simplicity):

++++++++ finished: prefetching before processing event for module: stream = 0 label = 'AlcaHBHEMuonFilter' id = 80
++++++++ starting: processing event for module: stream = 0 label = 'AlcaHBHEMuonFilter' id = 80
++++++++ finished: processing event for module: stream = 0 label = 'AlcaHBHEMuonFilter' id = 80
++++++++ starting: processing event for module: stream = 0 label = 'pathALCARECOHcalCalHBHEMuonFilter' id = 6
++++++++ finished: processing event for module: stream = 0 label = 'pathALCARECOHcalCalHBHEMuonFilter' id = 6
++++++ finished: processing path 'pathALCARECOHcalCalHBHEMuonFilter' : stream = 0
++++++++ finished: prefetching before processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedGoodMuons' id = 101
++++++++ starting: processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedGoodMuons' id = 101
++++++++ finished: processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedGoodMuons' id = 101
++++++++ starting: prefetching before processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedRelCombIsoMuons' id = 102
++++++++ finished: prefetching before processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedRelCombIsoMuons' id = 102
++++++++ starting: processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedRelCombIsoMuons' id = 102
++++++++ finished: processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedRelCombIsoMuons' id = 102
++++++++ starting: processing event for module: stream = 0 label = 'pathALCARECOTkAlMuonIsolated' id = 14
++++++++ finished: processing event for module: stream = 0 label = 'pathALCARECOTkAlMuonIsolated' id = 14
++++++ finished: processing path 'pathALCARECOTkAlMuonIsolated' : stream = 0
++++++++ finished: prefetching before processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedTkAlDQM' id = 170
++++++++ starting: processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedTkAlDQM' id = 170
%MSG-e Alignment: TkAlCaRecoMonitor:ALCARECOTkAlMuonIsolatedTkAlDQM 06-Jun-2018 17:27:53 CEST Run: 317382 Event: 61924130
invalid trackcollection encountered!
%MSG
++++++++ finished: processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedTkAlDQM' id = 170
++++++++ finished: prefetching before processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedTrackingDQM' id = 171
++++++++ starting: processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedTrackingDQM' id = 171
++++++++ finished: processing event for module: stream = 0 label = 'ALCARECOTkAlMuonIsolatedTrackingDQM' id = 171

So if this is expected, the LogError should be downgraded to a LogInfo, or the Path should be refactored in such a way to ensure that the DQM module runs only when the input is really needed.

@fabiocos
Copy link
Contributor Author

fabiocos commented Jun 6, 2018

@fabozzi

the 10_1_X code

if(doMVA) { pvIndex = getPv(iTrack.index(), &(*vertex)); //HFDumpUtitilies if (pvIndex > -1) { refPoint = vertex->at(pvIndex).position(); } else { if (&(*beamSpot)==NULL) { refPoint = beamSpot->position(); } else { cout << "ERROR: No beam sport found!" << endl; } } }

becomes in 10_2_X

if(doMVA) { pvIndex = getPv(iTrack.index(), &(*vertex)); //HFDumpUtitilies if (pvIndex > -1) { refPoint = vertex->at(pvIndex).position(); } else { if(beamSpot.isValid()) { refPoint = beamSpot->position(); } else { edm::LogInfo("MuonRecoAnalyzer") << "ERROR: No beam sport found!" << endl; } } }

and the problem disappears.

@fabiocos
Copy link
Contributor Author

fabiocos commented Jun 6, 2018

@arunhep @lpernie @jfernan2 @dmitrijus could you please comment?

@dmitrijus
Copy link
Contributor

This is fine, I guess (I am not the author of the module).

From what it seems, backporting #23015 will do the trick.

@dmitrijus
Copy link
Contributor

@parbol

@lpernie
Copy link
Contributor

lpernie commented Jun 6, 2018

I agree with @dmitrijus , seems that a backport would solve the problem.

@fabiocos
Copy link
Contributor Author

fabiocos commented Jun 7, 2018

@dmitrijus @lpernie that backprot will solve one problem, the beam spot not found message. What about the other one, that is flooding with error messages the LogError output? Who is going to look into it? Is anybody looking at the histograms produced by this DQM (I guess no, because the module is not executed I would say)?

@parbol
Copy link
Contributor

parbol commented Jun 7, 2018

Hi,

I am the author of MuonRecoAnalyzer.cc. Yes, there was a problem with that cout, that was fixed as Fabio commented. I think the backport should work.

For the other module:

https://cmssdt.cern.ch/lxr/source/DQMOffline/Alignment/src/TkAlCaRecoMonitor.cc#0190

that was triggering the initial problem I don't know anything. I'm not author/developer of that module.

cheers,

Pablo

@fabiocos
Copy link
Contributor Author

fabiocos commented Jun 7, 2018

@parbol the backport is there #23519

@dmitrijus @lpernie the other issue is flooding of LogError the output (and the skim), who is taking care of that?

@lpernie
Copy link
Contributor

lpernie commented Jun 7, 2018

Hi @fabiocos , for the other module I think it is DQM core or offline DQM-Tk developers responsibility to propose a change.

@jfernan2
Copy link
Contributor

jfernan2 commented Jun 7, 2018

I have pinged DQM-Tk since the decission depends on them, not on DQM. We (DQM) could silence blindly the message, but we are not sure about future consequences.
@boudoul any hint who should we contact directly instead of the full DQM-Tk crew as I did?
Thanks in advance

@prebello
Copy link
Contributor

prebello commented Jul 27, 2018

Dear all

the same issue appears in 10_1_9 with the sequence

RECO -s RAW2DIGI,L1Reco,RECO,ALCA:TkAlMuonIsolated --runUnscheduled --nThreads 8 --data --era Run2_2018 --scenario pp --conditions 101X_dataRun2_Prompt_PixelCond_forTracker_v1 --eventcontent ALCARECO --datatier ALCARECO --customise Configuration/DataProcessing/RecoTLR.customisePostEra_Run2_2018 --filein /store/data/Run2018C/SingleMuon/RAW/v1/000/319/348/00000/D4C88FCF-0E83-E811-A6B0-FA163EDE417A.root -n 100 --python_filename=recoskim_Run2018C_SingleMuon.py --no_exec

Do you have news about this issue? it should be fixed in 101X as pointed above.

FYI @fabiocos @fabozzi

@fabiocos
Copy link
Contributor Author

@prebello as 10_1_X is now used only by HLT, and Tier0 and online DQM have both moved to 10_2_X, I would say that this problem becomes less relevant

@mmusich
Copy link
Contributor

mmusich commented Sep 11, 2018

@fabiocos I think that this issue is actually release-independent and due to the fact that @prebello runs the reconstruction sequence in unscheduled mode. EDFilters in the AlCa sequences behave badly if they are not put into tasks. It looks pretty much related to issue #24461

@prebello
Copy link
Contributor

indeed @fabiocos and @mmusich in latest request of reprocessing by AlCa, @tocheng requested to remove —runUnscheduled .

@davidlange6
Copy link
Contributor

davidlange6 commented Sep 11, 2018 via email

@fioriNTU
Copy link
Contributor

fioriNTU commented Mar 1, 2019

+1

@prebello
Copy link
Contributor

prebello commented Mar 4, 2019

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests