Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing issue of PS cluster matched to >1 EE cluster in DBG_X IBs #36655

Merged
merged 3 commits into from Jan 22, 2022

Conversation

swagata87
Copy link
Contributor

@swagata87 swagata87 commented Jan 9, 2022

PR description:

Currently several workflows of debug IBs fail with the error message Found a PS cluster matched to more than one EE cluster!. This was reported in #34097 and #35524 . This PR is aimed at fixing this.

It looks like the problem was here:

auto found_pscluster = std::find_if(new_sc.preshowerClustersBegin(),
new_sc.preshowerClustersEnd(),
[&i_ps](const auto& i) { return i.key() == i_ps->first; });

In particular, i.key() seems to give a garbage value. To bypass the issue, std::find_if is replaced by std::find in a way that we do not need to access .key() of CaloCluster anymore. This is similar to what was done here:

auto found_pscluster =
std::find(new_sc.preshowerClustersBegin(), new_sc.preshowerClustersEnd(), reco::CaloClusterPtr(psclus));

This small change seems to solve the problem.

PR validation:

Tested from CMSSW_12_3_DBG_X_2022-01-06-2300 using workflows 117.0 and 11630.0.

This PR is not a backport.
Backport not needed.

PS: After the first round of tests, another crash was found in DBG build (only in WF 25202.0):

%MSG-e BasicSingleTrajectoryState:  GsfTrackProducer:lowPtGsfEleGsfTracks  09-Jan-2022 17:26:01 UTC Run: 1 Event: 1
asking for componenets to a SingleTrajectoryState
%MSG
cmsRun: /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/4cd3ceaede4e18eb9ab43f14fe009004/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_DBG_X_2022-01-06-2300/src/TrackingTools/TrajectoryState/src/BasicTrajectoryState.cc:248: virtual const Components& BasicSingleTrajectoryState::components() const: Assertion `false' failed.

I took the opportunity to fix that as well.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2022

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36655/27680

  • This PR adds an extra 16KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2022

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36655/27681

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2022

A new Pull Request was created by @swagata87 (Swagata Mukherjee) for master.

It involves the following packages:

  • RecoEcal/EgammaClusterAlgos (reconstruction)

@jpata, @cmsbuild, @clacaputo, @slava77 can you please review it and eventually sign? Thanks.
@Sam-Harper, @rchatter, @jainshilpi, @argiro, @sobhatta, @thomreis, @afiqaize, @simonepigazzini, @wrtabb, @lgray, @varuns23, @ram1123 this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@smuzaffar
Copy link
Contributor

please test for CMSSW_12_3_DBG_X

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-867ebb/21579/summary.html
COMMIT: 989e183
CMSSW: CMSSW_12_3_X_2022-01-08-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/36655/21579/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 3 differences found in the comparisons
  • DQMHistoTests: Total files compared: 43
  • DQMHistoTests: Total histograms compared: 3461659
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3461631
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 42 files compared)
  • Checked 181 log files, 42 edm output root files, 43 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2022

-1

Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-867ebb/21578/summary.html
COMMIT: 989e183
CMSSW: CMSSW_12_3_DBG_X_2022-01-06-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/36655/21578/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

The relvals timed out after 4 hours.

  • 25202.025202.0_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25+NANOUP15_PU25/step3_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25+NANOUP15_PU25.log

@jpata
Copy link
Contributor

jpata commented Jan 10, 2022

https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-867ebb/21578/runTheMatrix-results/25202.0_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25+NANOUP15_PU25/step3_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25+NANOUP15_PU25.log

%MSG-e BasicSingleTrajectoryState:  GsfTrackProducer:lowPtGsfEleGsfTracks  09-Jan-2022 17:26:01 UTC Run: 1 Event: 1
asking for componenets to a SingleTrajectoryState
%MSG
cmsRun: /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/4cd3ceaede4e18eb9ab43f14fe009004/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_DBG_X_2022-01-06-2300/src/TrackingTools/TrajectoryState/src/BasicTrajectoryState.cc:248: virtual const Components& BasicSingleTrajectoryState::components() const: Assertion `false' failed.


A fatal system signal has occurred: abort signal
The following is the call stack containing the origin of the signal.

...
#9  0x00002b07b4be2818 in BasicSingleTrajectoryState::components (this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/4cd3ceaede4e18eb9ab43f14fe009004/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_DBG_X_2022-01-06-2300/src/TrackingTools/TrajectoryState/src/BasicTrajectoryState.cc:248
#10 0x00002b081c908ac2 in TrajectoryStateOnSurface::components (this=<synthetic pointer>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/4cd3ceaede4e18eb9ab43f14fe009004/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_DBG_X_2022-01-06-2300/src/TrackingTools/TrajectoryState/interface/TrajectoryStateOnSurface.h:85
#11 operator() (tsos=<synthetic pointer>..., __closure=<synthetic pointer>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/4cd3ceaede4e18eb9ab43f14fe009004/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_DBG_X_2022-01-06-2300/src/RecoTracker/TrackProducer/src/TrackProducerAlgorithm.cc:288
#12 TrackProducerAlgorithm<reco::GsfTrack>::buildTrack (this=this@entry=0x2b086fd43098, theFitter=<optimized out>, thePropagator=thePropagator@entry=0x2b086f695d80, algoResults=..., hits=..., theTSOS=..., seed=..., ndof=ndof@entry=0, bs=..., seedRef=..., qualityMask=<optimized out>, nLoops=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/4cd3ceaede4e18eb9ab43f14fe009004/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_DBG_X_2022-01-06-2300/src/RecoTracker/TrackProducer/src/TrackProducerAlgorithm.cc:302
#13 0x00002b081c795ce8 in TrackProducerAlgorithm<reco::GsfTrack>::runWithCandidate (this=this@entry=0x2b086fd43098, theG=0x2b089e83b200, theMF=0x2b08969d5f30, theTCCollection=..., theFitter=theFitter@entry=0x2b08bd4e6590, thePropagator=thePropagator@entry=0x2b086f695d80, builder=0x2b08f00a6dc0, bs=..., algoResults=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/bits/unique_ptr.h:173
#14 0x00002b081c78d638 in GsfTrackProducer::produce (this=0x2b086fd42c00, theEvent=..., setup=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/4cd3ceaede4e18eb9ab43f14fe009004/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_DBG_X_2022-01-06-2300/src/FWCore/Framework/interface/ESHandle.h:65
#15 0x00002b07a6b56468 in edm::stream::EDProducerAdaptorBase::doEvent (this=0x2b08733ef2c0, info=..., act=0x2b07acfe5010, mcc=mcc@entry=0x2b086fd55968) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/4cd3ceaede4e18eb9ab43f14fe009004/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_DBG_X_2022-01-06-2300/src/FWCore/Framework/src/stream/EDProducerAdaptorBase.cc:61
...
Module: GsfTrackProducer:lowPtGsfEleGsfTracks (crashed)

As far as I can tell the crash is related to this PR. @swagata87 please check.

@jpata
Copy link
Contributor

jpata commented Jan 10, 2022

To clarify, the debug build segfaulted in 25202.0. Is this supposed to happen?

@swagata87
Copy link
Contributor Author

To clarify, the debug build segfaulted in 25202.0. Is this supposed to happen?

Hi Joosep,

from a quick look, it seems to me that this crash is due to a different problem. Will look into it more..

@smuzaffar
Copy link
Contributor

please test for CMSSW_12_3_DBG_X
run explicitly for DBg IBs, I have updated the jenkins job to increase the timeout for DBG IBs

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-867ebb/21721/summary.html
COMMIT: 69d6a01
CMSSW: CMSSW_12_3_X_2022-01-13-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/36655/21721/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 43
  • DQMHistoTests: Total histograms compared: 3461659
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3461631
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 42 files compared)
  • Checked 181 log files, 42 edm output root files, 43 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

by the way , Step1 for few workflows [a] are taking too long for DBG IBs [a]. Although I have increased the timeout for DBG PR tests to be 9 hours butwith this rate they might fail again. Can someone check what takes stoo long for DBG relvals?

> grep 'Run: 1 Event:' 23234.0_*/step1*.log
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 14:20:37 CET  Run: 1 Event: 1
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:09:49 CET  Run: 1 Event: 2
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:30:29 CET  Run: 1 Event: 3
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 16:14:13 CET  Run: 1 Event: 4
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 17:14:34 CET  Run: 1 Event: 5
> grep 'Run: 1 Event:' 28234.0_*/step1*.log
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 14:33:27 CET  Run: 1 Event: 1
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:26:33 CET  Run: 1 Event: 2
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:47:18 CET  Run: 1 Event: 3
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 16:36:06 CET  Run: 1 Event: 4
> grep 'Run: 1 Event:' 35034.0_*/step1*.log
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 14:33:46 CET  Run: 1 Event: 1
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:24:48 CET  Run: 1 Event: 2
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:46:34 CET  Run: 1 Event: 3
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 16:35:06 CET  Run: 1 Event: 4

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-867ebb/21722/summary.html
COMMIT: 69d6a01
CMSSW: CMSSW_12_3_DBG_X_2022-01-13-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/36655/21722/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

The relvals timed out after 4 hours.

@clacaputo
Copy link
Contributor

by the way , Step1 for few workflows [a] are taking too long for DBG IBs [a]. Although I have increased the timeout for DBG PR tests to be 9 hours butwith this rate they might fail again. Can someone check what takes stoo long for DBG relvals?

> grep 'Run: 1 Event:' 23234.0_*/step1*.log
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 14:20:37 CET  Run: 1 Event: 1
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:09:49 CET  Run: 1 Event: 2
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:30:29 CET  Run: 1 Event: 3
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 16:14:13 CET  Run: 1 Event: 4
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 17:14:34 CET  Run: 1 Event: 5
> grep 'Run: 1 Event:' 28234.0_*/step1*.log
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 14:33:27 CET  Run: 1 Event: 1
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:26:33 CET  Run: 1 Event: 2
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:47:18 CET  Run: 1 Event: 3
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 16:36:06 CET  Run: 1 Event: 4
> grep 'Run: 1 Event:' 35034.0_*/step1*.log
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 14:33:46 CET  Run: 1 Event: 1
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:24:48 CET  Run: 1 Event: 2
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 15:46:34 CET  Run: 1 Event: 3
%MSG-e ExcessiveTime:  OscarMTProducer:g4SimHits 14-Jan-2022 16:35:06 CET  Run: 1 Event: 4

@smuzaffar I've checked running locally

runTheMatrix.py -j 16 -s -l 101.0,10224.0,1306.0,250202.181,25202.0,9.0 --command ' --customise Validation/Performance/TimeMemoryJobReport.customiseWithTimeMemoryJobReport' --job-reports

and inspecting the step1*.log. All the ExcessiveTime warnings are related to the OscarMTProducer

@smuzaffar
Copy link
Contributor

@jpata, @clacaputo, @slava77 can you please review and sign it? It should fix DBG relvals

@clacaputo
Copy link
Contributor

Due to timeout issue, I've run the test locally in CMSSW_12_3_DBG_X_2022-01-06-2300 using the command

runTheMatrix.py -j 16 -s -l 101.0,10224.0,1306.0,250202.181,25202.0,9.0 --command ' --customise Validation/Performance/TimeMemoryJobReport.customiseWithTimeMemoryJobReport' --job-reports

both without and with the PR applied.

With the PR applied, there aren't any more WF failing with the message Found a PS cluster matched to more than one EE cluster!. So it seems to me that the PR fixes that issue.

About virtual const Components& BasicSingleTrajectoryState::components() const: Assertion 'false' failed[1], I can't reproduce it.

[1]

%MSG-e BasicSingleTrajectoryState:  GsfTrackProducer:lowPtGsfEleGsfTracks  09-Jan-2022 17:26:01 UTC Run: 1 Event: 1
asking for componenets to a SingleTrajectoryState
%MSG
cmsRun: /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/4cd3ceaede4e18eb9ab43f14fe009004/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_DBG_X_2022-01-06-2300/src/TrackingTools/TrajectoryState/src/BasicTrajectoryState.cc:248: virtual const Components& BasicSingleTrajectoryState::components() const: Assertion `false' failed.

@clacaputo
Copy link
Contributor

test parameters:

  • enable_test = none

@clacaputo
Copy link
Contributor

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-867ebb/21758/summary.html
COMMIT: 69d6a01
CMSSW: CMSSW_12_3_X_2022-01-16-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/36655/21758/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 43
  • DQMHistoTests: Total histograms compared: 3464734
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3464712
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 42 files compared)
  • Checked 181 log files, 42 edm output root files, 43 DQM output files
  • TriggerResults: no differences found

@clacaputo
Copy link
Contributor

Hi @smuzaffar , despite the test successfully passing #36655 (comment) I can still see, in the summary of the checks, one that failed. The one that failed it's related to the test performed on DBG IB, for some reason, it's still in the summary.

Is there a way to reset it?

@smuzaffar
Copy link
Contributor

The failed check is part of an optional test ( cms/36655/DBG/slc7_amd64_gcc10/optional ), so please ignore the result of that.

@clacaputo
Copy link
Contributor

+reconstruction

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants