Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering genVisTaus breaks gen-level matching #118

Closed
ktht opened this issue Mar 13, 2018 · 14 comments
Closed

Filtering genVisTaus breaks gen-level matching #118

ktht opened this issue Mar 13, 2018 · 14 comments

Comments

@ktht
Copy link

ktht commented Mar 13, 2018

GenVisTaus are filtered before they're written to the Ntuple, but the matching is done w.r.t unfiltered genVisTau collection:

genVisTauTable = cms.EDProducer("SimpleCandidateFlatTableProducer",
src = cms.InputTag("genVisTaus"),
cut = cms.string("pt > 10."),

tausMCMatchHadTauForTable = cms.EDProducer("MCMatcher", # cut on deltaR, deltaPt/Pt; pick best by deltaR
src = tauTable.src, # final reco collection
matched = cms.InputTag("genVisTaus"), # generator level hadronic tau decays

tauMCTable = cms.EDProducer("CandMCMatchTableProducer",
src = tauTable.src,
mcMap = cms.InputTag("tausMCMatchLepTauForTable"),
mcMapVisTau = cms.InputTag("tausMCMatchHadTauForTable"),

Example: event 1:5883:9975004 in /store/mc/RunIIFall17MiniAOD/ttHJetToNonbb_M125_TuneCP5_13TeV_amcatnloFXFX_madspin_pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/D4FE989C-2619-E811-9F24-002590D9D822.root

Unfiltered collection:

#  0 GenVisTau    pt =   +5.969   eta =     -0.402   phi = -3.141   mass =   +1.047
#  1 GenVisTau    pt =   +3.867   eta =     -0.013   phi = -3.086   mass =   +0.140
#  2 GenVisTau    pt =  +55.625   eta =     +0.736   phi = +0.005   mass =   +0.725
#  3 GenVisTau    pt =  +30.875   eta =     +1.945   phi = -1.828   mass =   +1.441

Filtered collection (that is written to the Ntuple):

#  0 GenVisTau    pt =  +55.625   eta =     +0.736   phi = +0.005   mass =   +0.725
#  1 GenVisTau    pt =  +30.875   eta =     +1.945   phi = -1.828   mass =   +1.441

Reco taus:

#  0       Tau    pt =  +82.110   eta =     -0.358   phi = +1.373   mass =   +0.140   genPartIdx = 26   genPartFlav = 1 (prompt electron)   jetIdx = 2
#  1       Tau    pt =  +65.288   eta =     +0.736   phi = +0.002   mass =   +0.797   genPartIdx = 2   genPartFlav = 5 (hadronic tau decay)   jetIdx = 3
#  2       Tau    pt =  +22.015   eta =     +1.954   phi = -1.804   mass =   +0.328   genPartIdx = 3   genPartFlav = 5 (hadronic tau decay)   jetIdx = 6

In practice, if genPartFlav equals to 5 (hadronic tau decay case), then the generator-level match should be found in GenVisTau collection instead of GenPart collection; but in the above example genPartIdx goes out of bounds in the GenVisTau collection for two reco taus.

The easiest fix is to just remove the pT cut on the genVisTaus when they're written to the Ntuple:

cut = cms.string("pt > 10."),

@ktht
Copy link
Author

ktht commented Mar 13, 2018

Apparently, the same issue applies to gen-level jets as well (e.g. 9th reco jet in event 1:5883:9974904 of the same file):

cut = cms.string("pt > 10"),

@arizzi
Copy link

arizzi commented Mar 15, 2018

@ktht this is by design.
Because the objects are pt sorted the indices are not messed up (i.e. still pointing to the right object). People should explicitelly check that the matchingIdx is < nGenJets or nVisTau.
It still makes sense to store the index even if the object is dropped because it contains information about the reco object being a fake or not.

You can think about it the same as when you check isAvailable for an edm::Ref, the fact that the Ref isNonnull doesn't guarantee that the pointed object is really stored.

@ktht
Copy link
Author

ktht commented Mar 16, 2018

I agree, but clearly the GenVisTau collection is not pT sorted before their indices are recorded, as you can see from my original post where the pT cut is removed (they are printed in the same order as they are stored in the Ntuple). In the above example, Tau_genPartIdx stores indices 2 and 3, but the GenVisTau collection contains 2 entries.

If you still insist on the pT cut in GenVisTau and GenJet, why not use the same collection that is written to the Ntuple to also perform the gen-matching? Something along the lines of:

finalGenVisTaus = cms.EDFilter("GenParticleSelector",
  src = cms.InputTag("genVisTaus"),
  cut = cms.string("pt > 10"),
)
# replace genVisTaus with finalGenVisTaus in subsequent code + update tauMC sequence

@arizzi
Copy link

arizzi commented Mar 16, 2018

the bug is the GenVisTau should pt sorted then (before they are taken for matching).

For GenJet it is the case.

@ktht
Copy link
Author

ktht commented Mar 17, 2018

Ah ok, I agree with this conclusion. The following solution worked for me, but I dunno where to submit the PR:

diff --git a/PhysicsTools/HepMCCandAlgos/plugins/GenVisTauProducer.cc b/PhysicsTools/HepMCCandAlgos/plugins/GenVisTauProducer.cc
index 2da65fa1344..469344710d5 100644
--- a/PhysicsTools/HepMCCandAlgos/plugins/GenVisTauProducer.cc
+++ b/PhysicsTools/HepMCCandAlgos/plugins/GenVisTauProducer.cc
@@ -11,6 +11,7 @@
 #include "PhysicsTools/JetMCUtils/interface/JetMCTag.h"
 #include "DataFormats/TauReco/interface/PFTau.h"
 #include "DataFormats/Math/interface/deltaR.h"
+#include "CommonTools/Utils/interface/PtComparator.h"
 
 #include <vector>
 #include <iostream>
@@ -21,6 +22,7 @@ class GenVisTauProducer : public edm::global::EDProducer<>
   GenVisTauProducer(const edm::ParameterSet& params) 
     : src_(consumes<reco::GenJetCollection>(params.getParameter<edm::InputTag>("src")))
     , srcGenParticles_(consumes<reco::GenParticleCollection>(params.getParameter<edm::InputTag>("srcGenParticles")))
+    , pTComparator_()
   {
     produces<reco::GenParticleCollection>();
   }
@@ -77,6 +79,7 @@ class GenVisTauProducer : public edm::global::EDProducer<>
       genVisTaus->push_back(genVisTau);
     }
 
+    std::sort(genVisTaus->begin(), genVisTaus->end(), pTComparator_);
     evt.put(std::move(genVisTaus));
   }
 
@@ -91,6 +94,7 @@ class GenVisTauProducer : public edm::global::EDProducer<>
  private:
   const edm::EDGetTokenT<reco::GenJetCollection> src_;
   const edm::EDGetTokenT<reco::GenParticleCollection> srcGenParticles_;
+  const GreaterByPt<reco::GenParticle> pTComparator_;
 };
 
 #include "FWCore/Framework/interface/MakerMacros.h"

@arizzi
Copy link

arizzi commented Mar 17, 2018

please submit here and on master for cmssw

@arizzi
Copy link

arizzi commented Mar 17, 2018

btw can't we do with some sorter in between, i.e. without modifying the original producer?

@ktht
Copy link
Author

ktht commented Mar 17, 2018

The closest thing I could find is this:

typedef ObjectSelector<
SortCollectionSelector<
reco::CandidateCollection,
GreaterByPt<reco::Candidate>
>
> LargestPtCandSelector;

But it's for reco::Candidate/Collection; we could write a similar plugin to e.g. PhysicsTools/NanoAOD/plugins/LargestPtGenParticleSelector.cc:

typedef ObjectSelector<
          SortCollectionSelector<
            reco::GenParticleCollection,
            GreaterByPt<reco::GenParticle>
          >
        > LargestPtGenParticleSelector;

The plugin not only sorts by pT, but also filters out first maxNumber particles in the pt-sorted collection. We could avoid this by setting it to some large value:

genVisTausPtSorted = cms.EDFilter("LargestPtGenParticleSelector",
  src = cms.InputTag("genVisTaus"),
  maxNumber = cms.uint32(100),
)

It's kind of a hack, imo.

@gpetruc
Copy link

gpetruc commented Mar 19, 2018

I prefer the version that modifies the GenVisTauProducer to do the sorting: a lot of other EDProducers do sort their outputs by pT already.

I would suggest:

  • PR to cms-sw/master from whatever recent 10_1_X IB.
  • PR to cms-nanoAOD/master from 94X

@fabiocos @slava77 @perrotta : do you want the 94X PR also to cms-sw/CMSSW_9_4_AN_X since the change is not in a NanoAOD package? (even if it's a minor change and the module is used only in nanoAOD production)

@slava77
Copy link

slava77 commented Mar 19, 2018 via email

@fabiocos
Copy link

fabiocos commented Mar 20, 2018 via email

@arizzi
Copy link

arizzi commented Mar 20, 2018

@ktht can you please make the PRs then to master and 94X main branch

(@slava77 concerning code quality is three lines here: #118 (comment))

@slava77
Copy link

slava77 commented Mar 20, 2018 via email

@gpetruc
Copy link

gpetruc commented Mar 21, 2018

done in #135

@gpetruc gpetruc closed this as completed Mar 21, 2018
arizzi pushed a commit that referenced this issue Mar 23, 2018
peruzzim pushed a commit that referenced this issue Apr 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants