Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure VID Python/FWLite code and avoid generating dictionaries for Reco*/*Identification packages #35953

Merged
merged 4 commits into from Jan 25, 2022
Merged

Restructure VID Python/FWLite code and avoid generating dictionaries for Reco*/*Identification packages #35953

merged 4 commits into from Jan 25, 2022

Conversation

guitargeek
Copy link
Contributor

@guitargeek guitargeek commented Nov 2, 2021

PR description:

I took issue reported in #35754 as an opportunity to simplify the VID code a bit on the Python side.

The key class for the Python/FWLite support of VID is the VersionedSelector<> in PhysicsTools/SeletorUtils. The dictionaries for the different template instances of this class were split over the Electron/Muon/Photon identification packages. Now, they are all in the PhysicsTools/SeletorUtils library.

Then, there were dictionaries in the 3 identification packages for almost trivial classes used in VIDSelectorBase.py. However, these classes are tiny template classes that are only used in said python scipt, so I don't think we need dictionaries for them. The classes are small enough such that including them via gInterpreter.Declare is very fast.

Finally, there was a dictionary produced for the ElectronMVAEstimatorRun2 class, which I suggest to also not do anymore. There is only one use case for this: the class is used in the ElectronIdentification/python/FWLite.py utility file. But again, declaring the ElectronMVAEstimatorRun2 is very quick (after avoiding some includes), so it's not necessary to generate dictionaries in CMSSW just for this one usecase (which is outdated anyway because FWLite usage is going down since NanoAOD I guess).

More information can also be found in the commit messages.

PR validation:

  • CMSSW compiles
  • test_eleid_fwlite.py runs
  • it is still possible to instantiate the Python classes VIDElectronSelector, VIDPhotonSelector, VIDMuonSelector

if this PR is a backport please specify the original PR and why you need to backport that PR:

No backport intended.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35953/26361

  • This PR adds an extra 24KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2021

A new Pull Request was created by @guitargeek (Jonas Rembser) for master.

It involves the following packages:

  • RecoEgamma/EgammaTools (reconstruction)
  • RecoEgamma/ElectronIdentification (reconstruction)
  • RecoEgamma/PhotonIdentification (reconstruction)

@jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@Sam-Harper, @jainshilpi, @lgray, @sobhatta, @afiqaize, @wrtabb, @varuns23, @ram1123 this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35953/26362

  • This PR adds an extra 20KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2021

Pull request #35953 was updated. @jpata, @cmsbuild, @slava77 can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35953/26367

  • This PR adds an extra 24KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2021

Pull request #35953 was updated. @smuzaffar, @Dr15Jones, @makortel, @cmsbuild, @slava77, @jpata can you please check and sign again.

@guitargeek guitargeek changed the title Support interactive dictionary generation for ElectronMVAEstimatorRun2 Restructure VID Python/FWLite code and avoid generation dictionaries for Reco*/*Identification packages Nov 2, 2021
@guitargeek
Copy link
Contributor Author

Sorry, I should have made the PR a draft before. I'm done with the force pushes now

Comment on lines 23 to 35
namespace PhysicsTools_SelectorUtils {
struct dictionary {
pat::strbitset strbitset;
edm::Wrapper<pat::strbitset> wstrbitset;
std::vector<pat::strbitset> vstrbitset;
edm::Wrapper<std::vector<pat::strbitset> > wvstrbitset;

VersionedSelector<edm::Ptr<reco::Photon> > vPhotonSelector;
VersionedSelector<edm::Ptr<pat::Photon> > vPatPhotonSelector;
VersionedSelector<edm::Ptr<reco::Muon> > vMuonSelector;
VersionedSelector<edm::Ptr<reco::GsfElectron> > vGsfElectronSelector;
VersionedSelector<edm::Ptr<pat::Electron> > vPatElectronSelector;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct is not needed anymore (for a long time).

Suggested change
namespace PhysicsTools_SelectorUtils {
struct dictionary {
pat::strbitset strbitset;
edm::Wrapper<pat::strbitset> wstrbitset;
std::vector<pat::strbitset> vstrbitset;
edm::Wrapper<std::vector<pat::strbitset> > wvstrbitset;
VersionedSelector<edm::Ptr<reco::Photon> > vPhotonSelector;
VersionedSelector<edm::Ptr<pat::Photon> > vPatPhotonSelector;
VersionedSelector<edm::Ptr<reco::Muon> > vMuonSelector;
VersionedSelector<edm::Ptr<reco::GsfElectron> > vGsfElectronSelector;
VersionedSelector<edm::Ptr<pat::Electron> > vPatElectronSelector;
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the suggestion!

@@ -9,4 +9,5 @@
<class name="MuonVPlusJetsIDSelectionFunctor"/>
<class name="PVSelector"/>
<class name="RunLumiSelector"/>
<class pattern="VersionedSelector<*>"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classes_def.xml should list all types explicitly (we had not so long ago a campaign to remove all patterns).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! I fixed this

namespace edmplugin {
template <class T>
class PluginFactory;
}
typedef edmplugin::PluginFactory<AnyMVAEstimatorRun2Base*(const edm::ParameterSet&)> AnyMVAEstimatorRun2Factory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this sounds more of a reason to move the plugin factory definition out from this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't see why that's better, but it's also fine for me :) I have now put it in a separate file called AnyMVAEstimatorRun2Factory.h. Are you fine with that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't see why that's better, but it's also fine for me :)

The users of AnyMVAEstimatorRun2Base do not need to know about the factory (only the code creating the object needs the factory). This separation is actually a common pattern in CMSSW (I'd even say vast majority of plugin factories are defined that way, but I didn't do a thorough analysis). The split also avoids the need to explicitly forward-declare types from the framework (code rule 4.7).

I have now put it in a separate file called AnyMVAEstimatorRun2Factory.h. Are you fine with that?

That's good, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got it!

@@ -47,13 +71,13 @@ class ElectronMVAEstimatorRun2 : public AnyMVAEstimatorRun2Base {
std::vector<int> nVariables_;

// Data members
std::vector<std::unique_ptr<const GBRForest>> gbrForests_;
std::vector<const GBRForest*> gbrForests_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the smart pointer removed? (was not clear to me in the commit message)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you can't use a unique_ptr<T> if T is only forward declared. But that was an over-optimization anyway, I included now GBRForest.h again, and generating the dictionaries for the ElectronMVAEstimatorRun2 is still very fast.

@guitargeek
Copy link
Contributor Author

Hi @makortel, thanks for the review! I have addressed your comments, let me know if I should follow up on anything.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35953/26396

  • This PR adds an extra 24KB to repository

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49ad7a/21869/summary.html
COMMIT: d4a53bb
CMSSW: CMSSW_12_3_X_2022-01-20-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35953/21869/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-INPUT

  • 4.764.76_ZMuSkim2012D+ZMuSkim2012D+HLTDSKIM2+RECODR1reHLT2+HARVESTDR1reHLT/step2_ZMuSkim2012D+ZMuSkim2012D+HLTDSKIM2+RECODR1reHLT2+HARVESTDR1reHLT.log

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 43
  • DQMHistoTests: Total histograms compared: 3464860
  • DQMHistoTests: Total failures: 5
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3464833
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 42 files compared)
  • Checked 181 log files, 42 edm output root files, 43 DQM output files
  • TriggerResults: no differences found

@jpata
Copy link
Contributor

jpata commented Jan 21, 2022

All the unit tests pass, and there are no relevant reco differences. 🚀

@makortel
Copy link
Contributor

+core

Thanks @guitargeek!

@jpata
Copy link
Contributor

jpata commented Jan 25, 2022

+reconstruction

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

@perrotta
Copy link
Contributor

merge

@cmsbuild cmsbuild merged commit f88f508 into cms-sw:master Jan 25, 2022
@guitargeek guitargeek deleted the ElectronMVAEstimatorRun2_fwlite branch January 25, 2022 09:44
@qliphy
Copy link
Contributor

qliphy commented Jan 26, 2022

@guitargeek There appear in IB (and also PR tests) some warnings about duplicated dictionaries or dictionaries defined in the wrong place. Would you please have a look?:

https://cmssdt.cern.ch/SDT/cgi-bin/showDupDict.py/slc7_amd64_gcc10/www/tue/12.3-tue-11/CMSSW_12_3_X_2022-01-25-1100/testLogs/dupDict-lostDefs.log
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49ad7a/21869/dupDict/lostDefs.txt

Searching for 'classes_def.xml' in '/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_3_X_2022-01-20-1100/src'.

./PhysicsTools/SelectorUtils/src/classes_def.xml
EgammaCandidates : VersionedSelector< edm::Ptr< reco::Photon > >
MuonReco : VersionedSelector< edm::Ptr< reco::Muon > >
EgammaCandidates : VersionedSelector< edm::Ptr< reco::GsfElectron > >

@slava77
Copy link
Contributor

slava77 commented Jan 26, 2022

@guitargeek There appear in IB (and also PR tests) some warnings about duplicated dictionaries or dictionaries defined in the wrong place. Would you please have a look?:

CMSSW_12_3_X_2022-01-25-1100 appears to be a patch release. IIUC, we need a full build for the duplicate dictionaries to go away.

@makortel
Copy link
Contributor

CMSSW_12_3_X_2022-01-25-1700 is a full build, and the complaint of dictionaries in wrong package is still there
https://cmssdt.cern.ch/SDT/cgi-bin/showDupDict.py/slc7_amd64_gcc10/www/tue/12.3-tue-17/CMSSW_12_3_X_2022-01-25-1700/testLogs/dupDict-lostDefs.log

I think in this case we should update the checker script instead

{'EgammaCandidates' : ['reco::GsfElectron.*','reco::Photon.*']},

{'MuonReco' : ['reco::Muon(Ref|)(Vector|)']},

In general all dictionaries of DataFormat classes should be declared in the package that defines the class, but in this case PhysicsTools/SelectorUtils (that defines VersionedSelector<T> already depends on DataFormats/EgammaCandidates and DataFormats/MuonReco, and therefore it is the correct place to define these three dicrionaries.

@smuzaffar

@smuzaffar
Copy link
Contributor

@makortel , I will update the dict rule defined here https://github.com/cms-sw/cmssw/blob/master/Utilities/ReleaseScripts/scripts/duplicateReflexLibrarySearch.py#L32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants