Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HEEP V7.0 in 90X #16913

Merged
merged 18 commits into from Jan 6, 2017
Merged

HEEP V7.0 in 90X #16913

merged 18 commits into from Jan 6, 2017

Conversation

Sam-Harper
Copy link
Contributor

Dear All,

It was decided by E/gamma to submit the HEEP V7.0 separately. So here is the pull request.

It adds the needed VID classes to calculate the cut, adds a function to EcalClusterTools too calculate the number of saturated crystals in the 5x5 region of the detector. It also adds a new class to make isolation from pat::PackedCandidates and a producer to make the isolation and nrSat crystal value maps to be picked up by the VID cuts.

This is not yet run in any default sequence (and nor is it intended to be just yet) and the goal of this is for it to get in 80X to ensure that analysers can use it easily. So this is a strict addition of things and does not change anything already running.

It has a weak dependence on #16777 in that it will crash when running on AOD without it. But its not run in standard sequences, only by analysers who will know to ensure that they have that PR. It will work fine for the miniAOD.

Best,
Sam

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 8, 2016

A new Pull Request was created by @Sam-Harper for CMSSW_9_0_X.

It involves the following packages:

PhysicsTools/SelectorUtils
RecoEcal/EgammaCoreTools
RecoEgamma/EgammaIsolationAlgos
RecoEgamma/ElectronIdentification

@cmsbuild, @cvuosalo, @slava77, @monttj, @davidlange6 can you please review it and eventually sign? Thanks.
@argiro, @rafaellopesdesa, @lgray this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are listed here #13028

@cvuosalo
Copy link
Contributor

cvuosalo commented Dec 8, 2016

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 8, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/16865/console Started: 2016/12/08 18:52

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 8, 2016

-1

Tested at: 933ff5b

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-16913/16865/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests

  • Unit Tests:

I found errors in the following unit tests:

---> test runtestRecoEgammaElectronIdentification had ERRORS

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 8, 2016

Comparison job queued.

@Sam-Harper
Copy link
Contributor Author

Ah damn, sorry about this. The issue is that when re-making the tracker isolation for HEEP V70, it needs the IdealMagnetic field to make the PackedCandidates (specifically for the vertex info), ie this sequence needs to be in the config file:
process.load('Configuration.StandardSequences.MagneticField_cff')

Now we can fix this in two ways, if in any CMS workflow process.load('Configuration.StandardSequences.MagneticField_cff') will be already available we can simply fix the unit test by adding this into the unit tests's config.

The other way, we dont actually need this module to run for HEEP V6.0 (the value map producer is keyed off the word "HEEP" in the ID to run) that is currently in the release. Mainly as what is currently in the release is broken and should be ignored. So I could just tell it not to run for HEEPV6.0. I prefer this solution and will implement it as the default option if I dont hear anything.

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 8, 2016

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-16913/16865/summary.html

The workflows 1003.0, 1001.0, 1000.0, 140.53, 136.731, 4.22 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

@Dr15Jones
Copy link
Contributor

@Dr15Jones if there is a cyclic product dependency shouldn't the framework now stop the job or not even start running it?
The framework is meant to catch the problem and not run the job. Unfortunately, there is a known weakness in the algorithm I'm using. I have ideas on how to make it complete, but haven't had the time. Is there an example configuration someone can point me to which I could use for future testing?

@cvuosalo
Copy link
Contributor

@Dr15Jones: The version of this PR corresponding to the Dec. 8 commit runs indefinitely in the RECO step of a standard workflow like 25202.0 with the change described here: #16913 (comment)

@cvuosalo
Copy link
Contributor

+1

For #16913 f11ac4a

Adding capability to run new HEEP V7.0. By default, the old version is run as before. There should be no change in monitored quantities.

The code changes are satisfactory, and Jenkins tests against baseline CMSSW_9_0_X_2016-12-12-1100 show no significant differences, as expected. A test of workflow 25202.0_TTbar_13 for 200 events with the new HEEP V7.0 shows no significant differences and no change in the size of Reco event content. CPU time increases by the slightest amount, and memory usage increases marginally.

  delta/mean delta/orJob     original                   new       module name
  ---------- ------------     --------                  ----       ------------
       added      +0.00%         0.00 ms/ev ->         0.68 ms/ev heepIDVarValueMaps
  ---------- ------------     --------                  ----       ------------

Total time per event with DQM and Validation is about 20 s.

Memory:

Baseline
Max VSIZ 5029.42 on evt 193 ; max RSS 3479.64 on evt 137

PR
Max VSIZ 5090.72 on evt 199 ; max RSS 4003.46 on evt 174

@davidlange6
Copy link
Contributor

Hi @cvuosalo this 500MB increase in RSS is real? and if so, for what application?

@slava77
Copy link
Contributor

slava77 commented Dec 22, 2016

uhm, a marginal increase of 500MB.
With a few marginal increases like this our computing costs can double.

@cvuosalo
Copy link
Contributor

@davidlange6: This memory measurement was for step 3 (RAW2DIGI_L1Reco_RECO_EI_PAT_VALIDATION_DQM) of workflow 25202.0 with HEEP V7.0 manually enabled. This result was from a single test, so there might be fluctuations with repeated tests.

Please note that this PR does not enable HEEP V7.0 in standard workflows. It only allows analysts to run it separately. The memory usage issues with HEEP V7.0 could be wrung out now before it is allowed in to CMSSW at all, or they could be addressed later at the time HEEP V7.0 would be proposed to be enabled in standard workflows.

@Sam-Harper
Copy link
Contributor Author

HI all. I'm completely baffed by it taking 500mb of extra memory, it just calculates a few floats. How do I reproduce these studies for my own testing.

I had a look and I dont see a memory leak (and I've been running this offline a lot so I would have noticed). I also dont see me copying large collections. The only think I can think of is when the pat::PackedCandidate makes the pseudo track.

@Dr15Jones
Copy link
Contributor

@cvuosalo how many threads were you running for the memory test? If it was more than one then that is another source of 'randomness' in the results.

@cvuosalo
Copy link
Contributor

@Dr15Jones: I used the default number of threads, which should be one. I ran it in unscheduled mode, which is the default for runTheMatrix.py

@cvuosalo
Copy link
Contributor

@Sam-Harper: Instructions for time and memory measurements are here:

https://twiki.cern.ch/twiki/bin/viewauth/CMS/RecoIntegration#pp_MC_Benchmark_setup_for_timing

More instructions are here:
https://twiki.cern.ch/twiki/bin/viewauth/CMS/RecoIntegration#Compare_timing

The memory usage measurement is explained in the "Using TimeEvent output" section.

@Dr15Jones
Copy link
Contributor

#17102 now catches the circular data dependency and throws an exception which explains the problem.

@Sam-Harper
Copy link
Contributor Author

Dear All,

I followed all the instructions in https://twiki.cern.ch/twiki/bin/viewauth/CMS/RecoIntegration#pp_MC_Benchmark_setup_for_timing and re-ran it for 200 events on file /store/relval/CMSSW_9_0_0_pre2/RelValTTbar_13/GEN-SIM-DIGI-RAW-HLTDEBUG/PU25ns_90X_mcRun2_asymptotic_v0-v1/10000/043953B3-A7C2-E611-A0DA-0CC47A4D75EC.root of /RelValTTbar_13/CMSSW_9_0_0_pre2-PU25ns_90X_mcRun2_asymptotic_v0-v1/GEN-SIM-DIGI-RAW-HLTDEBUG

I ran it in release CMSSW_9_0_X_2016-12-29-1100. I ran it 3 ways, the base release only, the base release + this PR and the base release + this PR + HEEP V7.0 enabled

I can see no different in the memory used, its within the precision of the method.

base: MAX VSIZE: 5167
base + PR : MAX VSIZE: 5169
base + PR + HEEP V7.0 enabled: MAX VSIZE: 5177

I measured the base release memory 6 times, VSIZE = 5177, 5171,5180,5175,5167,5171 so I consider the HEEP V7.0 marginal and within the bounds of uncertainty.

The log files are here for your own scrutiny:
www.cern.ch/sharper/cms/memtestForHEEPV70/baseRelease.log
www.cern.ch/sharper/cms/memtestForHEEPV70/heepV70PresentButNotEnabled.log
www.cern.ch/sharper/cms/memtestForHEEPV70/heepV70Enabled.log

@davidlange6
Copy link
Contributor

+1

@davidlange6
Copy link
Contributor

thanks @Sam-Harper

@Sam-Harper
Copy link
Contributor Author

and thank you @davidlange6 and everybody else for scrutinising this. The absolute last thing I would have wanted to do is cause problems for the reco.

Now ultimately I would like this in 80X to make things easier for users. I recall the procedure is to wait to get this into a pre release and then back port it to 80X? Thanks.

@davidlange6 davidlange6 merged commit c803208 into cms-sw:CMSSW_9_0_X Jan 6, 2017
@davidlange6
Copy link
Contributor

Right - we'll organize an 80x release discussion once people are back...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants