GBRForest implementation added in RecoJets/JetProducers PileupJetIdAl… #15907

romeof · 2016-09-19T15:02:46Z

GBRForest implementation has been added in the PileupJetIdAlgo class of the package RecoJets/JetProducers. This method is used to book the reader for the mva pileup jet id discriminator.

…go class

slava77 · 2016-09-19T15:13:20Z

@cmsbuild please test

cmsbuild · 2016-09-19T15:15:18Z

The tests are being triggered in jenkins.

cmsbuild · 2016-09-19T15:17:09Z

A new Pull Request was created by @romeof for CMSSW_8_1_X.

It involves the following packages:

RecoJets/JetProducers

@cvuosalo, @slava77, @davidlange6 can you please review it and eventually sign? Thanks.
@TaiSakuma, @jdolen, @ahinzmann, @rappoccio, @yslai, @nhanvtran, @gkasieczka, @schoef, @mariadalfonso this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

ahinzmann · 2016-09-19T15:21:24Z

adding @lgray since he knows on the egamma GBR forst that we started from

slava77 · 2016-09-19T21:21:44Z

@cmsbuild please test

cmsbuild · 2016-09-19T21:22:13Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/15274/console

cmsbuild · 2016-09-19T22:41:50Z

+1
Tested at: f82365c
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15907/15274/summary.html

lgray · 2016-09-19T23:07:40Z

RecoJets/JetProducers/src/PileupJetIdAlgo.cc

+                }
+                tmpTMVAReader.AddSpectator( *it, variables_[ tmvaNames_[*it] ].first );
+            }
+            reco::details::loadTMVAWeights(&tmpTMVAReader,  tmvaMethod_.c_str(), tmvaEtaWeights_.at(v).c_str() );


You need to wrap this call in a std::unique_ptrTMVA::IMethod, otherwise it will leak memory.

like in this line:

cmssw/CommonTools/Utils/src/TMVAEvaluator.cc

Line 39 in 0397259

mIMethod = std::unique_ptr<TMVA::IMethod>( reco::details::loadTMVAWeights(mReader.get(), mMethod.c_str(), weightFile.c_str()) );

Hi Lindsey,
I have implemented std::unique_ptrTMVA::IMethod syntax. I can see it from
https://github.com/romeof/cmssw/blob/PileupJetID_GBRForest/RecoJets/JetProducers/src/PileupJetIdAlgo.cc

Is this ok?

lgray · 2016-09-19T23:08:56Z

RecoJets/JetProducers/src/PileupJetIdAlgo.cc

+                tmpTMVAReader.AddSpectator( *it, variables_[ tmvaNames_[*it] ].first );
+            }
+            reco::details::loadTMVAWeights(&tmpTMVAReader,  tmvaMethod_.c_str(), tmvaEtaWeights_.at(v).c_str() );
+            etaReader_.push_back(std::unique_ptr<const GBRForest> ( new GBRForest( dynamic_cast<TMVA::MethodBDT*>( tmpTMVAReader.FindMVA(tmvaMethod_.c_str()) ) ) ) );


Likewise, here and below it would be a good idea (if possible) to move all of this into a cache that is common across threads like I do for the EGM code. This would reduce the number of times you have to read in the configuration file and thus improve CMSSW startup time in multithreaded mode.

@lgray could you point us to the relevant line in the EGM code that make this cache?

here is an example of a cache: https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X/RecoParticleFlow/PFTracking/interface/ConvBremHeavyObjectCache.h
here is an example of using it: https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X/RecoParticleFlow/PFTracking/interface/PFElecTkProducer.h#L48-L61

Here's the relevant TWiki: https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface#edm_GlobalCacheT

Hi Lindsey,
do you have another example that may be more clear (to me)?
So far, I have used as a reference
https://github.com/cms-sw/cmssw/blob/CMSSW_7_6_X/RecoEgamma/ElectronIdentification/plugins/ElectronMVAEstimatorRun2Spring15NonTrig.cc#L151
but the new code seems more complicated (to me).
I would be glad if there is an implementation similar to what you mention in a file related to
https://github.com/cms-sw/cmssw/blob/CMSSW_7_6_X/RecoEgamma/ElectronIdentification/plugins/ElectronMVAEstimatorRun2Spring15NonTrig.cc#L151
If so, I can try to implement it.

Hey, so what I am thinking about is more along the lines of this:
https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X/RecoEgamma/EgammaTools/interface/MVAValueMapProducer.h

Which uses:
https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X/RecoEgamma/EgammaTools/interface/MVAObjectCache.h

As a global cache and then finally that global cache stores pointers to the MVAs which derive from the base class here:
https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X/RecoEgamma/EgammaTools/interface/AnyMVAEstimatorRun2Base.h

@romeof: What is being done about using a global cache for this PR?

I have to study this syntax before. It is completely new to me and I am not a guru of programming :-( This may take sometime, not sure how much yet.

@lgray: For me to understand, could you comment further on what you mean by "This would reduce the number of times you have to read in the configuration file"? It assumes that there are different MVA configurations or it is for the same MVA configuration for different events? By "thread" do you mean a call to an MVA configuration? Thanks and sorry for naive questions.

Hi, since the producer that holds this algorithm is edm::stream it gets replicated for each thread (by thread I mean a thread on a processor) for CMSSW. Therefore if you run with four threads you have to build this algorithm four times. As it is implemented presently that means reading the MVA file and created four instances of the MVA which can blow up memory and start up time.

Using the global cache solves this problem permanently by only ever needing to build the MVAs once and sharing the MVAs among the threads.

cmsbuild · 2016-09-19T23:47:26Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15907/15274/summary.html

cvuosalo · 2016-09-22T19:18:15Z

RecoJets/JetProducers/interface/PileupJetIdAlgo.h

@@ -37,6 +37,8 @@ class PileupJetIdAlgo {
 					       float jec, const reco::Vertex *, const reco::VertexCollection &, double rho);

 	void set(const PileupJetIdentifier &);
+        std::unique_ptr<const GBRForest> getMVA(std::vector<std::string>, const std::string &);


The first parameter should actually be a const reference (I forgot to say so in my original coment) to avoid copying overhead:

std::unique_ptr<const GBRForest> getMVA(const std::vector<std::string> &, const std::string &);

Same for getMVAvars.

Hi cvuosalo,
thanks for your last suggestions, which I implemented in the code.
If it is ok now, I will try to get a look at the global cache implementation.

cmsbuild · 2016-09-27T08:18:16Z

Pull request #15907 was updated. @cmsbuild, @cvuosalo, @slava77, @davidlange6 can you please check and sign again.

cvuosalo · 2016-09-27T17:18:13Z

@cmsbuild please test

cmsbuild · 2016-09-27T17:18:29Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/15406/console

cmsbuild · 2016-09-27T18:36:45Z

+1
Tested at: c280496
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15907/15406/summary.html

cmsbuild · 2016-09-27T19:59:29Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15907/15406/summary.html

cvuosalo · 2016-09-27T20:59:23Z

+1

For #15907 c280496

Changing from TMVA to GBRForest in PileupJetIdAlgo. There should be no change in monitored quantities.

The code changes are satisfactory, and Jenkins tests against baseline CMSSW_8_1_X_2016-09-27-1100 show no significant differences, as expected. A test of workflow 136.731_RunSinglePh2016B with 200 events against baseline CMSSW_8_1_X_2016-09-18-0000 also shows no significant differences. Memory usage shows no significant change, for either one or four threads:

One thread:
Baseline
Max VSIZ 3078.3 on evt 187 ; max RSS 2676.09 on evt 187
PR 
Max VSIZ 3065.55 on evt 187 ; max RSS 2610.54 on evt 109

Four threads:
Baseline
Max VSIZ 5257.57 on evt 194 ; max RSS 4390.66 on evt 143
PR
Max VSIZ 5335.07 on evt 191 ; max RSS 4264.74 on evt 143

cmsbuild · 2016-09-27T20:59:39Z

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @smuzaffar

ahinzmann · 2016-09-27T21:06:10Z

@cvuosalo Did you happen to also look at the processing time with multiple threads? It would be interesting to know if this change can indeed speed up MiniAOD production.

lgray · 2016-09-27T21:19:45Z

You'll need to look at the time spent before the first event. That's the
only place where reading in multiple MVAs will cause slow down.

On Tue, Sep 27, 2016 at 4:06 PM, ahinzmann notifications@github.com wrote:

@cvuosalo https://github.com/cvuosalo Did you happen to also look at
the processing time with multiple threads? It would be interesting to know
if this change can indeed speed up MiniAOD production.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#15907 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABBMOemGIopNzqdh4-GD4z54ld6T9OMtks5quYVFgaJpZM4KAl_3
.

slava77 · 2016-09-27T21:35:21Z

On 9/27/16 1:59 PM, Carl Vuosalo wrote:

+1

For #15907 #15907 c280496
c280496

Changing from TMVA to GBRForest in PileupJetIdAlgo. There should be no
change in monitored quantities.

The code changes are satisfactory, and Jenkins tests against baseline
CMSSW_8_1_X_2016-09-27-1100 show no significant differences, as
expected. A test of workflow 136.731_RunSinglePh2016B with 200 events
against baseline CMSSW_8_1_X_2016-09-18-0000 also shows no significant
differences.

Memory usage shows no significant change, for either one or
four threads:

IIRC, we should see a decrease by using GBR.
This is one of the reasons to switch to it.

both 1-thread and 4-thread tests show decrease in RSS: 1-thread down by 65MB
and 4-thread is down by 126MB.
igprof -mp may give a more convincing evidence.

|One thread: Baseline Max VSIZ 3078.3 on evt 187 ; max RSS 2676.09 on evt
187 PR Max VSIZ 3065.55 on evt 187 ; max RSS 2610.54 on evt 109 Four
threads: Baseline Max VSIZ 5257.57 on evt 194 ; max RSS 4390.66 on evt
143 PR Max VSIZ 5335.07 on evt 191 ; max RSS 4264.74 on evt 143 |

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#15907 (comment), or
mute the thread
https://github.com/notifications/unsubscribe-auth/AEdcbtYfdyiMOehk6d8g0TzR3rvYEG0Uks5quYOsgaJpZM4KAl_3.

cvuosalo · 2016-09-27T22:39:50Z

@ahinzmann: My measurements don't show any performance improvement, but I'm not sure of the correct technique to specifically measure the improvement provided by using GBRForest.

davidlange6 · 2016-09-28T07:02:44Z

+1

cvuosalo · 2016-10-04T15:53:32Z

igprof memory profiling shows some improvement. Checking MEM_LIVE between baseline CMSSW_8_1_X_2016-09-18-0000 and this PR shows the MVA-reading memory usage for PileUpJetIdAlgo goes from rank 426 and 1.41% of total memory down to rank 1395 and 0.21%.

GBRForest implementation added in RecoJets/JetProducers PileupJetIdAl…

f82365c

…go class

cmsbuild added this to the Next CMSSW_8_1_X milestone Sep 19, 2016

cmsbuild added comparison-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Sep 19, 2016

cmsbuild added tests-started and removed tests-pending labels Sep 19, 2016

cmsbuild added tests-pending tests-started and removed tests-started tests-pending labels Sep 19, 2016

cmsbuild added tests-approved and removed tests-started labels Sep 19, 2016

lgray suggested changes Sep 19, 2016

View reviewed changes

cmsbuild added comparison-available and removed comparison-pending labels Sep 19, 2016

std::unique_ptrTMVA::IMethod implemented

45ba52e

cmsbuild added comparison-pending and removed comparison-available tests-approved labels Sep 21, 2016

cvuosalo reviewed Sep 22, 2016

View reviewed changes

Improved MVA calcultion fuction

c280496

cmsbuild added tests-started and removed tests-pending labels Sep 27, 2016

cmsbuild added tests-approved and removed tests-started labels Sep 27, 2016

cmsbuild added comparison-available and removed comparison-pending labels Sep 27, 2016

cmsbuild added fully-signed reconstruction-approved and removed pending-signatures reconstruction-pending labels Sep 27, 2016

This was referenced Sep 27, 2016

Reduce threshold for tracks in vertexers used for timing studies #15989

Merged

Spurious differences appearing for 20024.0 in Jenkins DQM results, starting around CMSSW_8_1_X_2016-09-26-1500 #16004

Closed

cmsbuild added orp-approved and removed orp-pending labels Sep 28, 2016

cmsbuild merged commit 96be59a into cms-sw:CMSSW_8_1_X Sep 28, 2016

GBRForest implementation added in RecoJets/JetProducers PileupJetIdAl… #15907

GBRForest implementation added in RecoJets/JetProducers PileupJetIdAl… #15907

Conversation

romeof commented Sep 19, 2016

slava77 commented Sep 19, 2016

cmsbuild commented Sep 19, 2016

cmsbuild commented Sep 19, 2016

ahinzmann commented Sep 19, 2016

slava77 commented Sep 19, 2016

cmsbuild commented Sep 19, 2016 • edited

cmsbuild commented Sep 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgray Sep 21, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgray Sep 22, 2016 • edited

Choose a reason for hiding this comment

cmsbuild commented Sep 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmsbuild commented Sep 27, 2016

cvuosalo commented Sep 27, 2016

cmsbuild commented Sep 27, 2016 • edited

cmsbuild commented Sep 27, 2016

cmsbuild commented Sep 27, 2016

cvuosalo commented Sep 27, 2016

cmsbuild commented Sep 27, 2016

ahinzmann commented Sep 27, 2016

lgray commented Sep 27, 2016

slava77 commented Sep 27, 2016

cvuosalo commented Sep 27, 2016

davidlange6 commented Sep 28, 2016

cvuosalo commented Oct 4, 2016

cmsbuild commented Sep 19, 2016 •

edited

lgray Sep 21, 2016 •

edited

lgray Sep 22, 2016 •

edited

cmsbuild commented Sep 27, 2016 •

edited