New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HE- energy reconstruction - phase1 #15499
Conversation
@cmsbuild please test |
The tests are being triggered in jenkins. |
A new Pull Request was created by @mariadalfonso for CMSSW_8_1_X. It involves the following packages: CalibCalorimetry/HcalAlgos @ghellwig, @cvuosalo, @cerminar, @cmsbuild, @franzoni, @slava77, @mmusich, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are list here #13028 |
-1 Tested at: a111b58 You can see the results of the tests here: I found follow errors while testing this PR Failed tests: RelVals AddOn
When I ran the RelVals I found an error in the following worklfows: runTheMatrix-results/5.1_TTbar+TTbarFS+HARVESTFS/step1_TTbar+TTbarFS+HARVESTFS.log101.0 step1 runTheMatrix-results/101.0_SingleElectronE120EHCAL+SingleElectronE120EHCAL/step1_SingleElectronE120EHCAL+SingleElectronE120EHCAL.log1306.0 step3 runTheMatrix-results/1306.0_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15/step3_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15.log135.4 step1 runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/step1_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS.log1000.0 step2 runTheMatrix-results/1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/step2_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT.log1001.0 step2 runTheMatrix-results/1001.0_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4+ALCAHARVD5/step2_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4+ALCAHARVD5.log1330.0 step3 runTheMatrix-results/1330.0_ZMM_13+ZMM_13+DIGIUP15+RECOUP15+HARVESTUP15/step3_ZMM_13+ZMM_13+DIGIUP15+RECOUP15+HARVESTUP15.log9.0 step3 runTheMatrix-results/9.0_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST/step3_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST.log1003.0 step2 runTheMatrix-results/1003.0_RunMinBias2012A+RunMinBias2012A+RECODDQM+HARVESTDDQM/step2_RunMinBias2012A+RunMinBias2012A+RECODDQM+HARVESTDDQM.log136.731 step3 runTheMatrix-results/136.731_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT+HARVESTDR2/step3_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT+HARVESTDR2.log25.0 step3 runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT/step3_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT.log10021.0 step3 runTheMatrix-results/10021.0_TenMuE_0_200+TenMuE_0_200_pythia8_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017/step3_TenMuE_0_200+TenMuE_0_200_pythia8_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017.log50202.0 step3 runTheMatrix-results/50202.0_TTbar_13+TTbar_13+DIGIUP15_PU50+RECOUP15_PU50+HARVESTUP15_PU50/step3_TTbar_13+TTbar_13+DIGIUP15_PU50+RECOUP15_PU50+HARVESTUP15_PU50.log10424.0 step3 runTheMatrix-results/10424.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2023D1_GenSimFull+DigiFull_2023D1+RecoFullGlobal_2023D1+HARVESTFullGlobal_2023D1/step3_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2023D1_GenSimFull+DigiFull_2023D1+RecoFullGlobal_2023D1+HARVESTFullGlobal_2023D1.log10024.0 step3 runTheMatrix-results/10024.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017/step3_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017.log25202.0 step3 runTheMatrix-results/25202.0_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25/step3_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25.log
I found errors in the following addon tests: cmsDriver.py TTbar_8TeV_TuneCUETP8M1_cfi --conditions auto:run1_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@Fake,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot Realistic8TeVCollision : FAILED - time: date Wed Aug 17 18:38:33 2016-date Wed Aug 17 18:36:50 2016 s - exit: 18688 |
in, e.g. wflow 25.0 (run1 ttbar)
Errors in jenkins tests are related to this PR. |
@cmsbuild please test |
The tests are being triggered in jenkins. |
+1 Phase 1 HE energy reco, including changes to common code used for Run 2. The code changes are satisfactory. Jenkins tests against baseline CMSSW_8_1_X_2016-08-29-1100 show numerous small differences, most amounting to no more than jitter. Extended tests that examined the differences in more detail, along with timing tests, are described above, and they did not find any issues of concern. |
This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @smuzaffar |
+1 |
I believe this pull request is causing the following crashes in the IBs These crashes happen in both the threaded and non-threaded IBs. |
On 8/31/16 6:22 AM, Chris Jones wrote:
OK, I see a crash in the regular 81X as well
|
CMSSW_8_1_NONTHREADED_X is identical to CMSSW_8_1_X except the IB RelVals are just run with 1 threads. This is intended to help identify when a crash is from thread-unsafe code or just general unsafe code. |
@mariadalfonso @igv4321 |
I tried running 100 events from workflow 43.0 in CMSSW_8_1_X_2016-08-31-1100 and did not reproduce a crash. It must be some very rare memory issue... |
Actually, https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/slc6_amd64_gcc530/CMSSW_8_1_X_2016-08-31-1100/pyRelValMatrixLogs/run/43.0_ZpMM_2250_8TeV+ZpMM_2250_8TeVINPUT+DIGI+RECO+HARVEST/step3_ZpMM_2250_8TeV+ZpMM_2250_8TeVINPUT+DIGI+RECO+HARVEST.log doesn't show a crash either. I guess I can try in the earlier IB... |
If it's random memory/uninitialized condition or related, you may want to try with
|
Thanks for the tip, but I still couldn't reproduce the crash with that. I'm just going to run valgrind and see if points to anything useful. |
@Dr15Jones thanks, I was able to replicate the crash in 140.53 (data is more reliable than MC for this sort of thing). Debug symbols point the finger at: @mariadalfonso, it looks like the problem is that |
I wanted to make a note that PulseShapeFitOOTPileupCorrection is not const thread safe because it has a member data
and from the |
@slava77 @kpedro88 just want to let you know that I will fix asap the https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X_2016-08-31-1100/RecoLocalCalo/HcalRecAlgos/src/PulseShapeFitOOTPileupCorrection.cc#L345 |
This PR implement the phase1 M2/M0 method and add some general cleanup in some code common to the run2 version.
Things to note:
Slides w/ descriptions/tests include up to commit 6bc3caa https://indico.cern.ch/event/563410/contributions/2277004/attachments/1324249/1988640/PR15499.pdf
To be considered for the 80X backport (pure sw changes):
a3fb0c6: from Slava this improve the CPU timing of 20% on ttbar events
4359dac: from Carl this protect for potential division by zero
3551190: from Slava/Igor this improve the memory