Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FillDescriptions for all 3 GsfElectronProducers #25574

Merged
merged 3 commits into from Jan 11, 2019
Merged

FillDescriptions for all 3 GsfElectronProducers #25574

merged 3 commits into from Jan 11, 2019

Conversation

guitargeek
Copy link
Contributor

@guitargeek guitargeek commented Dec 31, 2018

  • Adding fillDescriptions() to GsfElectronBaseProducer with defaults
  • This way, the Python config files of the 3 inheriting producers don't need to repeat same parameter values
  • This is part of an effort to unify the GsfElectronProducers to one single module

Original, old blah blah:

Hi all,

between Christmas and new years I was thinking a bit about how some of our producers/algorithms are configured (in particular Egamma and PF producers). I noticed that configuration is often very verbose, repeating basically the same manual looping over configuration variables at several places, for example:

  • Python config files
  • fillDescriptions functions
  • "configuration struct" declarations in algorithm classes
  • reading these configuration structs from parameter sets

Additionally, there are repetitions from copy-pasted producers and their configs. Today I was trying to get rid of this redundancy at the example of the GsfElectronBaseProducer/GsfElectronAlgo combo, but didn't really conclude (ending up with some ugly preprocessor macros...). In any case, the first step would be to carry all the configuration over to C++ via the fillDescriptions functionality, which I propose in this PR, reducing some of the redundancy from the copy-pasted config files on the way.

Note that having all configuration defaults in the C++ code makes it easier to validate and therefor refactor things in the future, which I think eventually has to be done to reunite the upcoming low-pt electron reconstruction with the current reconstruction.

Local matrix tests pass and I hope I didn't do any mistakes shuffling config values around which cause differences!

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @guitargeek (Jonas Rembser) for master.

It involves the following packages:

RecoEgamma/EgammaElectronProducers

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@jainshilpi, @Sam-Harper, @varuns23, @rovere, @lgray this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Dec 31, 2018

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 31, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/32387/console Started: 2018/12/31 04:36

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25574/32387/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 33
  • DQMHistoTests: Total histograms compared: 3153717
  • DQMHistoTests: Total failures: 13
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3153500
  • DQMHistoTests: Total skipped: 204
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
  • Checked 137 log files, 14 edm output root files, 33 DQM output files

desc.add<edm::InputTag>("gsfElectronCoresTag",edm::InputTag("gsfElectronCores")) ;
desc.add<edm::InputTag>("ecalDrivenGsfElectronsTag",edm::InputTag("ecalDrivenGsfElectrons")) ;
desc.add<edm::InputTag>("pfMvaTag",edm::InputTag("pfElectronTranslator:pf")) ;
desc.add<edm::InputTag>("gsfElectronCoresTag", edm::InputTag("gedGsfElectronCores"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was "gsfElectronCores"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not compare this to the old commented-out fillDescriptions, but to the Python_cfi file:
https://github.com/cms-sw/cmssw/pull/25574/files#diff-6b9d40a5dde30255880f879f68bf9348L14

desc.add<bool>("checkHcalStatus", true);

// output collections
desc.add<std::string>("outputEGMPFValueMap", "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only in gedGsfElectronTmp, but probably even useless, as it is never accessed in the code (if I am not wrong)

Copy link
Contributor

@perrotta perrotta Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify is "outputEGMPFValueMap" is really needed, and remove if not

desc.add<std::vector<int> >("recHitSeverityToBeExcludedBarrel") ;
desc.add<std::vector<int> >("recHitSeverityToBeExcludedEndcaps") ;
//desc.add<int>("severityLevelCut",4) ;
desc.add<std::vector<std::string>>("recHitFlagsToBeExcludedBarrel", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these "RecHits...ToBeExcluded" were previously left undefined in this GsfElectronBaseProducer::fillDescription() and taken from centralized configurations in the actual cfi

    recHitFlagsToBeExcludedBarrel = cleanedHybridSuperClusters.RecHitFlagToBeExcluded,
    recHitFlagsToBeExcludedEndcaps = multi5x5BasicClustersCleaned.RecHitFlagToBeExcluded,
    recHitSeverityToBeExcludedBarrel = cleanedHybridSuperClusters.RecHitSeverityToBeExcluded,
    recHitSeverityToBeExcludedEndcaps = cleanedHybridSuperClusters.RecHitSeverityToBeExcluded,

I think that the same should be also done here (i.e. define them as above inside the actual cff files), in order to retain the possibility to modify them only once and centrally: do you agree?

{
edm::ParameterSetDescription psd0;
{
edm::ParameterSetDescription psd1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These isolation algo configs repeat themselves identically here inside every trkIsol03Cfg \ trkIsol04Cfg: it is easier to just fill the endcaps and barrel cuts with the same psd1.

Moreover, in the previous config they were taken from the central RecoEgamma/​EgammaIsolationAlgos/​python/​electronTrackIsolations_cfi.py: as before, I think the same should be retained in the actual cff config here, so that it will be still possibile to modify all them centrally in one single place.

desc.add<std::string>("crackCorrectionFunction", "EcalClusterCrackCorrection");

desc.add<bool>("ecalWeightsFromDB", true);
desc.add<std::vector<std::string>>("ecalRefinedRegressionWeightFiles", {}); // if not from DB. Otherwise, keep empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

desc.add<bool>("ecalWeightsFromDB", true);
desc.add<std::vector<std::string>>("ecalRefinedRegressionWeightFiles", {}); // if not from DB. Otherwise, keep empty
desc.add<bool>("combinationWeightsFromDB", true);
desc.add<std::vector<std::string>>("combinationRegressionWeightFile", {}); // if not from DB. Otherwise, keep empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -211,8 +297,8 @@ GsfElectronBaseProducer::GsfElectronBaseProducer( const edm::ParameterSet& cfg,
strategyCfg_.addPflowElectrons = cfg.getParameter<bool>("addPflowElectrons") ;
strategyCfg_.ctfTracksCheck = cfg.getParameter<bool>("ctfTracksCheck");
strategyCfg_.gedElectronMode = cfg.getParameter<bool>("gedElectronMode");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also get defined in the base fillDescription...
(etc...)

@perrotta
Copy link
Contributor

perrotta commented Jan 7, 2019

In general: you removed from GsfElectronBaseProducer::fillDescription() the definition of all parameters which are still used in the derived classes but with a different face value in them. This is not correct. In the future one could decide a different tune, and modify differently in the different actual gsf producer configs some parameter which is now defined in the base fillDescription. What would you do in that case? Will yuo modify the c++ code once again and and move the definition of that parameter from the base class to the single derived ones?

I think that only parameters which are specific of any derived class should be defined in the specialized fillDescription's, while anything is in common to all them, even if configured at run time with diferent face values in the different specializations, should be defined in the Base fillDescription, and only later modified in the actual python configs

@guitargeek
Copy link
Contributor Author

guitargeek commented Jan 7, 2019

Hi Andrea, thanks for taking the time to review and discuss! I agree with what you said and that it should be in principal be done as you describe in the second paragraph of your general comment.

However, I suspect that the most sustainable solution would involve rethinking this inheritance pattern of the GsfElectronProducers. What we have right now is 4 different producer classes (if you include the base class) which differ only slightly (order of 10 lines) in their produce method and getEvent or something like that, plus some different helper functions which should be free functions anyway. The underlying GsfElectronAlgo is always the same. I feel the combination of the Frameworks configuration system with configuration via inheritance from a base class is a bit awkward and should probably be replaced by only one plugin "GsfElectronProducer" that is configured via the framework all the way. Then, the fillDescriptions method provides the default values and we have config files for the 3 different configurations.

How do you feel about that?

@perrotta
Copy link
Contributor

perrotta commented Jan 7, 2019

@guitargeek , if I understand correctly your proposal you want to have eventually one single producer and the specializations will be configured via python config. That would be great, and the core of my comment goes exactly in that direction: you should centralize since now all possible common parameters in the base fillDescription, and specialize all residual differences in the dedicated python configs.
As it is done now in this pull request is rather confusing, as the decision about what to put in the base fillDescription seems to be taken only based on the "fortuitous" circumstance that such a parameter has currently the same numerical value in the three actual producers. I think that centralizing those definitions as much as possible will also eventually ease the further transition you have in mind, while providing a cleaner structure till then.

@guitargeek
Copy link
Contributor Author

Very well! Then let's go in this PR to the point where there is one single fillDescriptions for the defaults and 3 python configurations. The class unification will then follow eventually in a different PR I suppose.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-25574/7923

  • This PR adds an extra 28KB to repository

@cmsbuild
Copy link
Contributor

Pull request #25574 was updated. @perrotta, @cmsbuild, @slava77 can you please check and sign again.

@perrotta
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 10, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/32517/console Started: 2019/01/10 15:56

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25574/32517/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 33
  • DQMHistoTests: Total histograms compared: 3153717
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3153512
  • DQMHistoTests: Total skipped: 204
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
  • Checked 137 log files, 14 edm output root files, 33 DQM output files

@perrotta
Copy link
Contributor

+1

  • Configuration parameters are defined for all gsf electron producers in the fillDescriptions() method of the base GsfElectronBaseProducer: specific values of those parameters are then set for all actual producers in the dedicated python configs
  • Configurations obtained as such identically reproduce the same previous configurations
  • Jenkins tests pass and show no differences, as expected

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants