Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing Generator Filter Information in NanoAOD #34169

Merged
merged 5 commits into from Dec 11, 2021

Conversation

mzarucki
Copy link
Contributor

PR description:

  • new nanoAOD module SimpleFlatTableProducerBaseLumi (edm::one::EDProducer) that stores the generator filter information in a flat table
  • the information is taken from the GenFilterInfo header in the LuminosityBlocks tree in miniAOD
  • the module is configured to store the following generator filter information: numEventsTotal, numEventsPassed, filterEfficiency,filterEfficiencyError
  • the genFilterTable is added to the globalTablesMC sequence in globals_cff.py
  • the nanoaod::FlatTable is written into the LuminosityBlocks tree as new branches via the added LumiOutputBranches type
  • the motivation (xsec/luminosity normalisation) and implementation was presented at the X-POG meeting (16.06.2021): Storing Generator Filter Information in NanoAOD

PR validation:

The code has been tested in CMSSW_12_0_0_pre1, running a standard nanoAOD workflow on a miniAOD signal MC sample with a generator filter (SMS-T2tt_dM-10to80_genHT-160_genMET-80_mWMin-0p1), over several thousand events (several LS). The output root file correctly has the requested generator filter information stored.

Following rebasing on the latest master branch, the code has passes the nanoAOD unit tests that run over data (runTheMatrix.py -l 136.8523) in the latest IB CMSSW_12_0_X_2021-06-17-1100. The MC equivalent (runTheMatrix.py -l 1325.81) and scram b runtests fail due to a DAS error not being able to access the file. I believe it would still important to inspect the output of the nanoAOD MC unit tests.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34169/23376

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34169/23377

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@mzarucki
Copy link
Contributor Author

As discussed in yesterday's X-POG meeting, there might still be some open questions/comments:

  • Using unsigned ints (uint8) in the configuration for the numEventsPassed and numEventsTotal (as in the GenFilterInfo header) gave nonsensical answers (passed > total), so a normal int was used
    • Is a precision of 6 for the ints sufficient?
  • In order to avoid conflict with b0c0009, std::vector<LumiOutputBranches> m_lumiTables2 had to be added that uses nanoaod::FlatTables
  • Is there a simple way to make the code more efficient in terms of re-using existing modules? Some blocks of code were duplicated and modified to handle LuminosityBlocks and write to the Lumi tree

I leave the above considerations to the experts.

Cheers,
Mateusz

@@ -271,3 +272,134 @@ class FirstObjectSimpleFlatTableProducer : public SimpleFlatTableProducerBase<T,
return out;
}
};

template <typename T, typename TProd>
class SimpleFlatTableProducerBaseLumi : public edm::one::EDProducer<edm::EndLuminosityBlockProducer> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this module (template) need to be edm::one? Could it be edm::stream?

If it needs to stay edm::one for some reason, please use also edm::LuminosityBlockCache extension to tell the framework the the module can process events from multiple LuminosityBlocks concurrently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @makortel,

At the time of development I tried to use the edm::stream (as in SimpleFlatTableProducerBase) guided by the FWMultithreadedFrameworkStreamModuleInterface Twiki however, this resulted in a series of compilation errors that I was not able to resolve. Using edm::one solved the issues.

If you have any suggestions on how to make edm::stream work, I would be grateful.

Best,
Mateusz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of compilation errors did you get?

Copy link
Contributor Author

@mzarucki mzarucki Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first one is: /afs/cern.ch/work/m/mzarucki/nanoAOD/CMSSW_12_0_0_pre3/src/PhysicsTools/NanoAOD/interface/SimpleFlatTableProducer.h:315:8: error: 'void SimpleFlatTableProducerBaseLumi<T, TProd>::endLuminosityBlockProduce(edm::LuminosityBlock&, const edm::EventSetup&) [with T = GenFilterInfo; TProd = GenFilterInfo]' marked 'final', but is not virtual

which can be solved by just removing the final specifier for the endLuminosityBlockProduce method.

A following compilation error is:

/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_0_pre3/src/FWCore/Framework/interface/stream/callAbilities.h:455:43: error: 'globalEndLuminosityBlockProduce' is not a member of 'LumiSingletonSimpleFlatTableProducer<GenFilterInfo>' 455 | T::globalEndLuminosityBlockProduce(Lumi, iES, iRC);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, after further thought stream module type actually wouldn't help. I was thinking we could use the "streams" to work around the thread-unsafety of StringObjectFunction and StringCutObjectSelector, but the LuminosityBlock transitions of stream modules are global (i.e. can occur concurrently).

So the best way to support "processing Events from multiple LuminosityBlocks concurrently" would be to use the edm::LuminosityBlockCache extension. Since you don't really need the cache, you could declare it as e.g. edm::LuminosityBlockCache<int> and in globalBeginLuminosityBlock() return nullptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @makortel,

Therefore, is edm:one fine to use for this module?

As suggested, I have added the edm:LuminosityBlockCache extension as follows:

class SimpleFlatTableProducerBaseLumi : public edm::one::EDProducer<edm::EndLuminosityBlockProducer, edm::LuminosityBlockCache<int>> {

I have also added the globalBeginLuminosityBlock method which returns nullptr.

Unfortunately, I get the following compilation errors:

 from /afs/cern.ch/work/m/mzarucki/nanoAOD/CMSSW_12_0_0_pre3/src/PhysicsTools/NanoAOD/plugins/SimpleFlatTableProducerPlugins.cc:1:
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/gcc/9.3.0/include/c++/9.3.0/bits/unique_ptr.h: In instantiation of 'typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = LumiSingletonSimpleFlatTableProducer<GenFilterInfo>; _Args = {const edm::ParameterSet&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<LumiSingletonSimpleFlatTableProducer<GenFilterInfo>, std::default_delete<LumiSingletonSimpleFlatTableProducer<GenFilterInfo> > >]':
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_0_pre3/src/FWCore/Framework/src/MakeModuleHelper.h:39:40:   required from 'static std::unique_ptr<_Tp> edm::MakeModuleHelper<Base>::makeModule(const edm::ParameterSet&) [with T = LumiSingletonSimpleFlatTableProducer<GenFilterInfo>; Base = edm::one::EDProducerBase]'
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_0_pre3/src/FWCore/Framework/src/WorkerMaker.h:83:107:   required from 'std::shared_ptr<edm::maker::ModuleHolder> edm::WorkerMaker<T>::makeModule(const edm::ParameterSet&) const [with T = LumiSingletonSimpleFlatTableProducer<GenFilterInfo>]'
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_0_pre3/src/FWCore/Framework/src/WorkerMaker.h:77:40:   required from here
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/gcc/9.3.0/include/c++/9.3.0/bits/unique_ptr.h:857:30: error: invalid new-expression of abstract class type 'LumiSingletonSimpleFlatTableProducer<GenFilterInfo>'
  857 |     { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Is there something obvious that I might be missing here?

Best regards,
Mateusz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of LuminosityBlockCache requires overriding also void SimpleFlatTableProducerBaseLumi::globalEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) override {}.
(sorry for not mentioning it earlier)
https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkOneModuleInterface#edm_LuminosityBlockCacheT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification!

I have made the discussed updates to support processing events from multiple LuminosityBlocks concurrently in a4a7f44

I would be grateful if you could have a look whether this is what you were expecting.

Cheers,
Mateusz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be grateful if you could have a look whether this is what you were expecting.

Yes, it looks good now. Thanks!

@mariadalfonso
Copy link
Contributor

in addition to makortel comments please run the following

You can also run scram build code-format to apply code format directly

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34169/23395

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mzarucki (Mateusz Zarucki) for master.

It involves the following packages:

PhysicsTools/NanoAOD

@cmsbuild, @mariadalfonso, @gouskos, @fgolf can you please review it and eventually sign? Thanks.
@gpetruc, @swertz this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@mariadalfonso
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-84e32b/16108/summary.html
COMMIT: 68d7a15
CMSSW: CMSSW_12_0_X_2021-06-18-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/34169/16108/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 38
  • DQMHistoTests: Total histograms compared: 2785631
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2785596
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 37 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 160 log files, 37 edm output root files, 38 DQM output files
  • TriggerResults: found differences in 7 / 37 workflows

@mzarucki
Copy link
Contributor Author

mzarucki commented Dec 9, 2021

Dear @kdlong, @mariadalfonso,

I have added the genFilterTable into the NanoGen sequence in this commit: 27fcb91

I ran the workflows: runTheMatrix.py -l 546,547,548 --ibeos with no failed tests.

Best regards,
Mateusz

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 9, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34169/27267

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 9, 2021

Pull request #34169 was updated. @cmsbuild, @mariadalfonso, @gouskos, @fgolf can you please check and sign again.

@mariadalfonso
Copy link
Contributor

please test workflow 546,547,548

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-84e32b/21129/summary.html
COMMIT: 27fcb91
CMSSW: CMSSW_12_3_X_2021-12-09-1100/slc7_amd64_gcc900
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/34169/21129/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-84e32b/546.0_DYToLL_M-50_13TeV_pythia8+DYToLL_M-50_13TeV_pythia8+NANOGENFromGen
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-84e32b/547.0_DYToll01234Jets_5f_LO_MLM_Madgraph_LHE_13TeV+DYToll01234Jets_5f_LO_MLM_Madgraph_LHE_13TeV+Hadronizer_TuneCP5_13TeV_MLM_5f_max4j_LHE_pythia8+NANOGENFromGen
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-84e32b/548.0_TTbar_Pow_LHE_13TeV+TTbar_Pow_LHE_13TeV+Hadronizer_TuneCP5_13TeV_powhegEmissionVeto2p_pythia8+NANOGENFromGen

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 3250704
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3250682
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 41 files compared)
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@mariadalfonso
Copy link
Contributor

+xpog

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

  • After a long digestion, the code got finally blessed and approved by xpog.
  • There are however several unanswered relevant remarks, that should be addressed. Normally "it was as such in the original code from which I derived my development" cannot be accepted as an excuse to avoid code and performance improvements and fixes. However, given the longish story of this PR, we can accept to merge it as such and allow it stably in the release.
  • A github issue with the remaining points to be addressed was opened in Remaining points to be fixed in SimpleFlatTableProducer (PhysicsTools/NanoAOD) #36461. Either @mzarucki or any other developer designed by @cms-sw/xpog-l2 is expected to take care and implement them at the earliest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants