New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HEEP V7.0 in 90X #16913
HEEP V7.0 in 90X #16913
Conversation
A new Pull Request was created by @Sam-Harper for CMSSW_9_0_X. It involves the following packages: PhysicsTools/SelectorUtils @cmsbuild, @cvuosalo, @slava77, @monttj, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are listed here #13028 |
@cmsbuild please test |
The tests are being triggered in jenkins. |
-1 Tested at: 933ff5b You can see the results of the tests here: I found follow errors while testing this PR Failed tests: UnitTests
I found errors in the following unit tests: ---> test runtestRecoEgammaElectronIdentification had ERRORS |
Comparison job queued. |
Ah damn, sorry about this. The issue is that when re-making the tracker isolation for HEEP V70, it needs the IdealMagnetic field to make the PackedCandidates (specifically for the vertex info), ie this sequence needs to be in the config file: Now we can fix this in two ways, if in any CMS workflow process.load('Configuration.StandardSequences.MagneticField_cff') will be already available we can simply fix the unit test by adding this into the unit tests's config. The other way, we dont actually need this module to run for HEEP V6.0 (the value map producer is keyed off the word "HEEP" in the ID to run) that is currently in the release. Mainly as what is currently in the release is broken and should be ignored. So I could just tell it not to run for HEEPV6.0. I prefer this solution and will implement it as the default option if I dont hear anything. |
Comparison is ready The workflows 1003.0, 1001.0, 1000.0, 140.53, 136.731, 4.22 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons |
Comparison job queued. |
|
@Dr15Jones: The version of this PR corresponding to the Dec. 8 commit runs indefinitely in the RECO step of a standard workflow like 25202.0 with the change described here: #16913 (comment) |
+1 Adding capability to run new HEEP V7.0. By default, the old version is run as before. There should be no change in monitored quantities. The code changes are satisfactory, and Jenkins tests against baseline CMSSW_9_0_X_2016-12-12-1100 show no significant differences, as expected. A test of workflow 25202.0_TTbar_13 for 200 events with the new HEEP V7.0 shows no significant differences and no change in the size of Reco event content. CPU time increases by the slightest amount, and memory usage increases marginally.
Total time per event with DQM and Validation is about 20 s. Memory:
|
Hi @cvuosalo this 500MB increase in RSS is real? and if so, for what application? |
uhm, a marginal increase of 500MB. |
@davidlange6: This memory measurement was for step 3 (RAW2DIGI_L1Reco_RECO_EI_PAT_VALIDATION_DQM) of workflow 25202.0 with HEEP V7.0 manually enabled. This result was from a single test, so there might be fluctuations with repeated tests. Please note that this PR does not enable HEEP V7.0 in standard workflows. It only allows analysts to run it separately. The memory usage issues with HEEP V7.0 could be wrung out now before it is allowed in to CMSSW at all, or they could be addressed later at the time HEEP V7.0 would be proposed to be enabled in standard workflows. |
HI all. I'm completely baffed by it taking 500mb of extra memory, it just calculates a few floats. How do I reproduce these studies for my own testing. I had a look and I dont see a memory leak (and I've been running this offline a lot so I would have noticed). I also dont see me copying large collections. The only think I can think of is when the pat::PackedCandidate makes the pseudo track. |
@cvuosalo how many threads were you running for the memory test? If it was more than one then that is another source of 'randomness' in the results. |
@Dr15Jones: I used the default number of threads, which should be one. I ran it in unscheduled mode, which is the default for |
@Sam-Harper: Instructions for time and memory measurements are here: https://twiki.cern.ch/twiki/bin/viewauth/CMS/RecoIntegration#pp_MC_Benchmark_setup_for_timing More instructions are here: The memory usage measurement is explained in the "Using TimeEvent output" section. |
#17102 now catches the circular data dependency and throws an exception which explains the problem. |
Dear All, I followed all the instructions in https://twiki.cern.ch/twiki/bin/viewauth/CMS/RecoIntegration#pp_MC_Benchmark_setup_for_timing and re-ran it for 200 events on file /store/relval/CMSSW_9_0_0_pre2/RelValTTbar_13/GEN-SIM-DIGI-RAW-HLTDEBUG/PU25ns_90X_mcRun2_asymptotic_v0-v1/10000/043953B3-A7C2-E611-A0DA-0CC47A4D75EC.root of /RelValTTbar_13/CMSSW_9_0_0_pre2-PU25ns_90X_mcRun2_asymptotic_v0-v1/GEN-SIM-DIGI-RAW-HLTDEBUG I ran it in release CMSSW_9_0_X_2016-12-29-1100. I ran it 3 ways, the base release only, the base release + this PR and the base release + this PR + HEEP V7.0 enabled I can see no different in the memory used, its within the precision of the method. base: MAX VSIZE: 5167 I measured the base release memory 6 times, VSIZE = 5177, 5171,5180,5175,5167,5171 so I consider the HEEP V7.0 marginal and within the bounds of uncertainty. The log files are here for your own scrutiny: |
+1 |
thanks @Sam-Harper |
and thank you @davidlange6 and everybody else for scrutinising this. The absolute last thing I would have wanted to do is cause problems for the reco. Now ultimately I would like this in 80X to make things easier for users. I recall the procedure is to wait to get this into a pre release and then back port it to 80X? Thanks. |
Right - we'll organize an 80x release discussion once people are back... |
Dear All,
It was decided by E/gamma to submit the HEEP V7.0 separately. So here is the pull request.
It adds the needed VID classes to calculate the cut, adds a function to EcalClusterTools too calculate the number of saturated crystals in the 5x5 region of the detector. It also adds a new class to make isolation from pat::PackedCandidates and a producer to make the isolation and nrSat crystal value maps to be picked up by the VID cuts.
This is not yet run in any default sequence (and nor is it intended to be just yet) and the goal of this is for it to get in 80X to ensure that analysers can use it easily. So this is a strict addition of things and does not change anything already running.
It has a weak dependence on #16777 in that it will crash when running on AOD without it. But its not run in standard sequences, only by analysers who will know to ensure that they have that PR. It will work fine for the miniAOD.
Best,
Sam