Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HE- energy reconstruction - phase1 #15499

Merged
merged 25 commits into from Aug 30, 2016
Merged

Conversation

mariadalfonso
Copy link
Contributor

@mariadalfonso mariadalfonso commented Aug 17, 2016

This PR implement the phase1 M2/M0 method and add some general cleanup in some code common to the run2 version.

Things to note:

  1. removed unused inputTags/enum
  2. re-arranged Pset to handle the M2 paramters for the siPM
  3. from Salvat's fixed M0 energy ( see commit d67ef34 )
  4. memory heap std::vector & Output --> float*, (see commit 3551190) Slava/Igor's request
  5. from Slava's various double -> float to speed time ( see commit a3fb0c6)

Slides w/ descriptions/tests include up to commit 6bc3caa https://indico.cern.ch/event/563410/contributions/2277004/attachments/1324249/1988640/PR15499.pdf

To be considered for the 80X backport (pure sw changes):
a3fb0c6: from Slava this improve the CPU timing of 20% on ttbar events
4359dac: from Carl this protect for potential division by zero
3551190: from Slava/Igor this improve the memory

@slava77
Copy link
Contributor

slava77 commented Aug 17, 2016

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 17, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/14572/console

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mariadalfonso for CMSSW_8_1_X.

It involves the following packages:

CalibCalorimetry/HcalAlgos
RecoLocalCalo/HcalRecAlgos
RecoLocalCalo/HcalRecProducers

@ghellwig, @cvuosalo, @cerminar, @cmsbuild, @franzoni, @slava77, @mmusich, @davidlange6 can you please review it and eventually sign? Thanks.
@ghellwig, @tocheng, @argiro this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@cmsbuild
Copy link
Contributor

-1

Tested at: a111b58

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15499/14572/summary.html

I found follow errors while testing this PR

Failed tests: RelVals AddOn

  • RelVals:

When I ran the RelVals I found an error in the following worklfows:
5.1 step1

runTheMatrix-results/5.1_TTbar+TTbarFS+HARVESTFS/step1_TTbar+TTbarFS+HARVESTFS.log
101.0 step1
runTheMatrix-results/101.0_SingleElectronE120EHCAL+SingleElectronE120EHCAL/step1_SingleElectronE120EHCAL+SingleElectronE120EHCAL.log
1306.0 step3
runTheMatrix-results/1306.0_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15/step3_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15.log
135.4 step1
runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/step1_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS.log
1000.0 step2
runTheMatrix-results/1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/step2_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT.log
1001.0 step2
runTheMatrix-results/1001.0_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4+ALCAHARVD5/step2_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4+ALCAHARVD5.log
1330.0 step3
runTheMatrix-results/1330.0_ZMM_13+ZMM_13+DIGIUP15+RECOUP15+HARVESTUP15/step3_ZMM_13+ZMM_13+DIGIUP15+RECOUP15+HARVESTUP15.log
9.0 step3
runTheMatrix-results/9.0_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST/step3_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST.log
1003.0 step2
runTheMatrix-results/1003.0_RunMinBias2012A+RunMinBias2012A+RECODDQM+HARVESTDDQM/step2_RunMinBias2012A+RunMinBias2012A+RECODDQM+HARVESTDDQM.log
136.731 step3
runTheMatrix-results/136.731_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT+HARVESTDR2/step3_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT+HARVESTDR2.log
25.0 step3
runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT/step3_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT.log
10021.0 step3
runTheMatrix-results/10021.0_TenMuE_0_200+TenMuE_0_200_pythia8_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017/step3_TenMuE_0_200+TenMuE_0_200_pythia8_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017.log
50202.0 step3
runTheMatrix-results/50202.0_TTbar_13+TTbar_13+DIGIUP15_PU50+RECOUP15_PU50+HARVESTUP15_PU50/step3_TTbar_13+TTbar_13+DIGIUP15_PU50+RECOUP15_PU50+HARVESTUP15_PU50.log
10424.0 step3
runTheMatrix-results/10424.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2023D1_GenSimFull+DigiFull_2023D1+RecoFullGlobal_2023D1+HARVESTFullGlobal_2023D1/step3_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2023D1_GenSimFull+DigiFull_2023D1+RecoFullGlobal_2023D1+HARVESTFullGlobal_2023D1.log
10024.0 step3
runTheMatrix-results/10024.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017/step3_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017.log
25202.0 step3
runTheMatrix-results/25202.0_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25/step3_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25.log
  • AddOn:

I found errors in the following addon tests:

cmsDriver.py TTbar_8TeV_TuneCUETP8M1_cfi --conditions auto:run1_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@Fake,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot Realistic8TeVCollision : FAILED - time: date Wed Aug 17 18:38:33 2016-date Wed Aug 17 18:36:50 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:PRef,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_PRef --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016 --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_PRef_DATA.root --fileout file:RelVal_Raw_PRef_DATA_HLT_RECO.root : FAILED - time: date Wed Aug 17 18:56:10 2016-date Wed Aug 17 18:36:56 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:HIon,RAW2DIGI,L1Reco,RECO --data --scenario=HeavyIons -n 10 --conditions auto:run2_data_HIon --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016,Run2_HI --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_HIon_DATA.root --fileout file:RelVal_Raw_HIon_DATA_HLT_RECO.root : FAILED - time: date Wed Aug 17 18:49:00 2016-date Wed Aug 17 18:37:00 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:GRun,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run2_mc_GRun --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016 --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_GRun_MC.root --fileout file:RelVal_Raw_GRun_MC_HLT_RECO.root : FAILED - time: date Wed Aug 17 18:54:30 2016-date Wed Aug 17 18:37:03 2016 s - exit: 18688
cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:run2_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@relval25ns,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot NominalCollision2015 --era Run2_25ns --magField 38T_PostLS1 : FAILED - time: date Wed Aug 17 18:38:47 2016-date Wed Aug 17 18:37:04 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:PRef,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run2_mc_PRef --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016 --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_PRef_MC.root --fileout file:RelVal_Raw_PRef_MC_HLT_RECO.root : FAILED - time: date Wed Aug 17 18:52:56 2016-date Wed Aug 17 18:37:08 2016 s - exit: 18688
cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:run2_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@relval2016,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot NominalCollision2015 --era Run2_2016 --magField 38T_PostLS1 : FAILED - time: date Wed Aug 17 18:52:21 2016-date Wed Aug 17 18:37:11 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:PIon,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_PIon --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016 --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_PIon_DATA.root --fileout file:RelVal_Raw_PIon_DATA_HLT_RECO.root : FAILED - time: date Wed Aug 17 18:48:06 2016-date Wed Aug 17 18:38:03 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:Fake,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run1_mc_Fake --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --processName=HLTRECO --filein file:RelVal_Raw_Fake_MC.root --fileout file:RelVal_Raw_Fake_MC_HLT_RECO.root : FAILED - time: date Wed Aug 17 18:51:04 2016-date Wed Aug 17 18:38:35 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:HIon,RAW2DIGI,L1Reco,RECO --mc --scenario=HeavyIons -n 10 --conditions auto:run2_mc_HIon --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016,Run2_HI --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_HIon_MC.root --fileout file:RelVal_Raw_HIon_MC_HLT_RECO.root : FAILED - time: date Wed Aug 17 18:55:06 2016-date Wed Aug 17 18:38:57 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:Fake1,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run2_mc_Fake1 --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_25ns --processName=HLTRECO --filein file:RelVal_Raw_Fake1_MC.root --fileout file:RelVal_Raw_Fake1_MC_HLT_RECO.root : FAILED - time: date Wed Aug 17 18:57:10 2016-date Wed Aug 17 18:48:09 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:Fake,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run1_data_Fake --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --processName=HLTRECO --filein file:RelVal_Raw_Fake_DATA.root --fileout file:RelVal_Raw_Fake_DATA_HLT_RECO.root : FAILED - time: date Wed Aug 17 18:53:17 2016-date Wed Aug 17 18:49:24 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:GRun,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_GRun --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_25ns --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_GRun_DATA.root --fileout file:RelVal_Raw_GRun_DATA_HLT_RECO.root : FAILED - time: date Wed Aug 17 18:57:42 2016-date Wed Aug 17 18:51:07 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:Fake1,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_Fake1 --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_25ns --processName=HLTRECO --filein file:RelVal_Raw_Fake1_DATA.root --fileout file:RelVal_Raw_Fake1_DATA_HLT_RECO.root : FAILED - time: date Wed Aug 17 18:55:25 2016-date Wed Aug 17 18:52:28 2016 s - exit: 18688
cmsDriver.py RelVal -s HLT:PIon,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run2_mc_PIon --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016 --magField 38T_PostLS1 --processName=HLTRECO --filein file:RelVal_Raw_PIon_MC.root --fileout file:RelVal_Raw_PIon_MC_HLT_RECO.root : FAILED - time: date Wed Aug 17 19:01:37 2016-date Wed Aug 17 18:53:00 2016 s - exit: 18688

@slava77
Copy link
Contributor

slava77 commented Aug 17, 2016

in, e.g. wflow 25.0 (run1 ttbar)

----- Begin Fatal Exception 17-Aug-2016 18:08:48 CEST-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing module: class=HcalHitReconstructor label='hbheprereco'
Exception Message:
MissingParameter: Parameter 'chargeMax' not found.
----- End Fatal Exception -------------------------------------------------

Errors in jenkins tests are related to this PR.
Please test your PR before submission. A broken PR will be on a rather low priority for review.
http://cms-sw.github.io/PRWorkflow requires
runTheMatrix.py -l limited -i all
to pass (or at least not introduce any new failures if some are present in the baseline).

@cmsbuild
Copy link
Contributor

Pull request #15499 was updated. @ghellwig, @cvuosalo, @cerminar, @cmsbuild, @franzoni, @slava77, @mmusich, @davidlange6 can you please check and sign again.

@cvuosalo
Copy link
Contributor

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 17, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/14577/console

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

@slava77
Copy link
Contributor

slava77 commented Aug 17, 2016

IN 136.731 (singlePhoton 2016B workflow)
all_oldvsnew_runsingleph2016bwf136p731c_hbherechitssorted_hbhereco__rereco_obj_obj_time
is this expected?

@kpedro88
Copy link
Contributor

@cvuosalo @slava77 please sign...

@cvuosalo
Copy link
Contributor

+1

For #15499 b0bfac1

Phase 1 HE energy reco, including changes to common code used for Run 2.

The code changes are satisfactory. Jenkins tests against baseline CMSSW_8_1_X_2016-08-29-1100 show numerous small differences, most amounting to no more than jitter. Extended tests that examined the differences in more detail, along with timing tests, are described above, and they did not find any issues of concern.

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @smuzaffar

@davidlange6
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 4e9418b into cms-sw:CMSSW_8_1_X Aug 30, 2016
@slava77
Copy link
Contributor

slava77 commented Aug 31, 2016

On 8/31/16 6:22 AM, Chris Jones wrote:

I believe this pull request is causing the following crashes in the IBs
https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/slc6_amd64_gcc530/CMSSW_8_1_NONTHREADED_X_2016-08-30-2300/pyRelValMatrixLogs/run/134.709_RunSinglePh2015B+RunSinglePh2015B+HLTDR2_50ns+RECODR2_50nsreHLT+HARVESTDR2/step3_RunSinglePh2015B+RunSinglePh2015B+HLTDR2_50ns+RECODR2_50nsreHLT+HARVESTDR2.log
https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/slc6_amd64_gcc530/CMSSW_8_1_NONTHREADED_X_2016-08-30-2300/pyRelValMatrixLogs/run/140.53_RunHI2011+RunHI2011+RECOHID11+HARVESTDHI/step2_RunHI2011+RunHI2011+RECOHID11+HARVESTDHI.log

These crashes happen in both the threaded and non-threaded IBs.

What's in CMSSW_8_1_NONTHREADED_X
and how is it different from CMSSW_8_1_X
I think I see this branch for the first time.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#15499 (comment), or
mute the thread
https://github.com/notifications/unsubscribe-auth/AEdcbmKrF3rrHxWyg3FpLSarkh2TpmL9ks5qlYAxgaJpZM4Jmhpf.

@Dr15Jones
Copy link
Contributor

CMSSW_8_1_NONTHREADED_X is identical to CMSSW_8_1_X except the IB RelVals are just run with 1 threads. This is intended to help identify when a crash is from thread-unsafe code or just general unsafe code.

@slava77
Copy link
Contributor

slava77 commented Aug 31, 2016

@mariadalfonso @igv4321
please let me know if you started looking at the issue already

@kpedro88
Copy link
Contributor

I tried running 100 events from workflow 43.0 in CMSSW_8_1_X_2016-08-31-1100 and did not reproduce a crash. It must be some very rare memory issue...

@slava77
Copy link
Contributor

slava77 commented Aug 31, 2016

If it's random memory/uninitialized condition or related, you may want to try with

export MALLOC_CHECK_=3
export MALLOC_PERTURB_=$(($RANDOM % 255 + 1))
cmsRunGlibC a.py

@kpedro88
Copy link
Contributor

Thanks for the tip, but I still couldn't reproduce the crash with that. I'm just going to run valgrind and see if points to anything useful.

@kpedro88
Copy link
Contributor

@Dr15Jones thanks, I was able to replicate the crash in 140.53 (data is more reliable than MC for this sort of thing).

Debug symbols point the finger at:
https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X_2016-08-31-1100/RecoLocalCalo/HcalRecAlgos/src/PulseShapeFitOOTPileupCorrection.cc#L345

@mariadalfonso, it looks like the problem is that std::vector<float> fitParsVec; may not be filled in some cases: there's an if, an else if, but no else. Can you address this?

@Dr15Jones
Copy link
Contributor

I wanted to make a note that PulseShapeFitOOTPileupCorrection is not const thread safe because it has a member data

 std::auto_ptr<FitterFuncs::PulseShapeFunctor> psfPtr_

and from the const member functions of PulseShapeFitOOTPileupCorrection it calls non-const functions of FitterFuncs::PulseShapeFunctor.
Because this is being used in a stream module and each module has its own copy of PulseShapeFitOOTPileupCorrection there won't be a threading problem in this case. However, if PulseShapeFitOOTPileupCorrection were used in a global module or put into a the EventSetup or the Event then we would have a problem.

Martin-Grunewald added a commit to cms-tsg-storm/cmssw that referenced this pull request Sep 1, 2016
@mariadalfonso
Copy link
Contributor Author

Martin-Grunewald added a commit to cms-tsg-storm/cmssw that referenced this pull request Sep 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet