Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix jets with crazy momenta in miniAOD #27374

Closed
perrotta opened this issue Jun 27, 2019 · 22 comments
Closed

Fix jets with crazy momenta in miniAOD #27374

perrotta opened this issue Jun 27, 2019 · 22 comments

Comments

@perrotta
Copy link
Contributor

BoostedDoubleSVProd is experiencing some crash in nanoAOD productions from miniAOD, see https://its.cern.ch/jira/browse/CMSCOMPPR-6445

The origin of the crash was identified as due to subjets components that have nan's in their momenta values. A debug of one such case revealed that those subjets with nan's were originating from jets having crazy momenta (|p|=82 TeV in the example pinpointed in #27238).

The fix proposed in #27238 is able to rescue BoostedDoubleSVProd from this issue, but it doesn't solve the original bug, This will have to be identified and fixed when the jets are produced.

A recipe to reproduce could be to run on the following event

process.source.eventsToProcess = cms.untracked.VEventRange('303832:918:1081148408-303832:918:1081148408')
process.source.fileNames = filesRelValTTbarPileUpMINIAODSIM
process.source.fileNames = cms.untracked.vstring( '/store/data/Run2017E/DoubleEG/MINIAOD/31Mar2018-v1/00000/5A8BC921-F437-E811-81ED-1866DAEB3370.root')

a nanoAOD or any other job which accesses subjets from miniAOD.

Another possible job that hits it is obtained by customizing and run RecoBTag/TensorFlow/test/test_deep_doublex_cfg.py, as explained in #27238 (comment)

This is put to the attention of the jet reco contacts: please @rappoccio @knash @gkasieczka have a look and report here about your findings

@perrotta
Copy link
Contributor Author

assign reconstruction

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction

@slava77,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @perrotta .

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Jun 28, 2019

in which jet collection are these jets and does the problem come from miniAOD or from AOD?

@andrzejnovak
Copy link
Contributor

in which jet collection are these jets and does the problem come from miniAOD or from AOD?

https://github.com/cms-sw/cmssw/blob/master/RecoBTag/TensorFlow/test/test_deep_doublex_cfg.py#L48

@perrotta
Copy link
Contributor Author

Curiously enough, in the same miniAOD input file there are also a few CaloJets with about 80/90 TeV energy:
CaloJets

@slava77
Copy link
Contributor

slava77 commented Jul 4, 2019

in which jet collection are these jets and does the problem come from miniAOD or from AOD?

https://github.com/cms-sw/cmssw/blob/master/RecoBTag/TensorFlow/test/test_deep_doublex_cfg.py#L48

So, it's slimmedJetsAK8. (IIRC, this is based on PUPPI).
The question is if this comes from AOD or does this appear in PUPPI in miniAOD ?

@rappoccio
Copy link
Contributor

I'm sorry for the delay here, I've been traveling. This is coming from AOD (and is present in CaloJets). If I look at root://cmsxrootd.fnal.gov//store/data/Run2017E/DoubleEG/AOD/17Nov2017-v1/40000/74C22A1F-95D3-E711-AB20-02163E019CB1.root I see 30 TeV CaloJets:

root [2] Events->Scan("recoCaloJets_ak4CaloJets__RECO.obj.m_state.p4Polar_.fCoordinates.fPt", "recoCaloJets_ak4CaloJets__RECO.obj.m_state.p4Polar_.fCoordinates.fPt > 1000");
***********************************
*    Row   * Instance * recoCaloJ *
***********************************
*       49 *        0 * 1117.0339 *
*       49 *        1 * 1061.7541 *
*      184 *        0 * 1076.6167 *
*      184 *        1 * 1022.2794 *
*      525 *        0 * 1086.7362 *
*      525 *        1 * 1024.4000 *
*     2948 *        0 * 1057.2645 *
*     3277 *        0 * 1518.5771 *
*     3277 *        1 * 1231.1153 *
*     4734 *        0 * 1017.6969 *
*     5070 *        0 * 1028.7641 *
*     5360 *        0 * 30736.918 *
*     6188 *        0 * 1008.8494 *
*     7427 *        0 * 1072.2818 *
*     7692 *        0 * 1325.9370 *
*     7692 *        1 * 1220.7102 *
*     7871 *        0 * 1308.4901 *
*     7871 *        1 * 1132.8198 *
*     9282 *        0 * 1005.0418 *
*    12626 *        0 * 1077.2320 *
*    12626 *        1 * 1035.2044 *
***********************************

This gets propagated to 80 TeV jets in PF:

root [3] Events->Scan("recoPFJets_pfJetsEI__RECO.obj.m_state.p4Polar_.fCoordinates.fPt", "recoPFJets_pfJetsEI__RECO.obj.m_state.p4Polar_.fCoordinates.fPt > 1000");
***********************************
*    Row   * Instance * recoPFJet *
***********************************
*       49 *        0 * 1108.9173 *
*       49 *        1 * 1106.2629 *
*      184 *        0 * 1075.6630 *
*      184 *        1 * 1039.2093 *
*      525 *        0 * 1107.8487 *
*      525 *        1 * 1092.2025 *
*     2948 *        0 * 1050.2926 *
*     3277 *        0 * 1506.9429 *
*     3277 *        1 * 1334.4985 *
*     4734 *        0 * 1146.8552 *
*     5066 *        0 * 1060.8508 *
*     5070 *        0 * 1041.5122 *
*     5360 *        0 * 81992.640 *
*     6188 *        0 * 1046.7060 *
*     6188 *        1 * 1006.3564 *
*     7427 *        0 * 1121.1865 *
*     7692 *        0 * 1312.9611 *
*     7692 *        1 * 1235.4965 *
*     7871 *        0 * 1311.7025 *
*     7871 *        1 * 1177.8438 *
*     9282 *        0 * 1022.7382 *
*    10833 *        0 * 1062.5575 *
*    12626 *        0 * 1068.4943 *
*    12626 *        1 * 1045.9259 *
***********************************

@rappoccio
Copy link
Contributor

@perrotta @slava77 we're probably going to need HCAL experts to take a look at this.

@slava77
Copy link
Contributor

slava77 commented Jul 29, 2019

@rappoccio
do you happen to know how this 80TeV becomes a NaN downstream?

@rappoccio
Copy link
Contributor

rappoccio commented Jul 29, 2019

In MiniAOD the PFJets (both CHS and Puppi) correctly handle this as "80 TeV", and so do the soft dropped subjets.

It seems like the packing of the PFCandidates is the culprit, since the PF candidates at AOD level are fine but then in MINIAOD I get:

root [9] Events->Scan("patPackedCandidates_packedPFCandidates__PAT.obj.pt()", "patPackedCandidates_packedPFCandidates__PAT.obj.pt() > 100");
***********************************
*    Row   * Instance * patPacked *
***********************************
*        0 *     1614 *       inf *
***********************************
==> 1 selected entry

The "raw" packing is okay:

 Events->Scan("patPackedCandidates_packedPFCandidates__PAT.obj.packedPt_");
*        0 *     1614 *     31744 *

But the "pt()" function is throwing infinity.

@rappoccio
Copy link
Contributor

@gpetruc Any ideas on how to fix this? I think this is the culprit line:

https://github.com/cms-sw/cmssw/blob/master/DataFormats/PatCandidates/src/PackedCandidate.cc#L14

packedPt_ = MiniFloatConverter::float32to16(p4_.load()->Pt());

@slava77
Copy link
Contributor

slava77 commented Jul 29, 2019

we could probably truncate this to some large number.
MiniFloatConverter::max() returns 65504.f. A truncation seems practical.

@perrotta
Copy link
Contributor Author

@rappoccio : is there any progress with solving this issue?

@rappoccio
Copy link
Contributor

I wasn't sure who was responsible, sorry. I can do it if there are no others ;)

@rappoccio
Copy link
Contributor

Actually the truncation is already written, right?

https://cmssdt.cern.ch/lxr/source/DataFormats/Math/interface/libminifloat.h#0022

So we can change float32to16 to float32to16crop?

@perrotta
Copy link
Contributor Author

Thank you Sal!

Actually the truncation is already written, right?

https://cmssdt.cern.ch/lxr/source/DataFormats/Math/interface/libminifloat.h#0022

So we can change float32to16 to float32to16crop?

One should truncate only if limits are hit.

What about something like:

  float unpackedPt = std::min(p4_.load()->Pt(),MiniFloatConverter::max());
  packedPt_ = MiniFloatConverter::float32to16(unpackedPt);

@rappoccio
Copy link
Contributor

Sure, that also works.

rappoccio added a commit to rappoccio/cmssw that referenced this issue Sep 13, 2019
In response to cms-sw#27374, truncating pt of PackedCandidate in case the 32 bit representation is beyond the 16-bit max (represented by the "minifloat" as inf, as per that standard requirement [here](ftp://ftp.fox-toolkit.org/pub/fasthalffloatconversion.pdf)).  This was giving problems downstream when the 16-bit "infinity" was widened to 32 bits and became inaccurate.
@rappoccio
Copy link
Contributor

PR #27988

@slava77
Copy link
Contributor

slava77 commented Dec 3, 2019

@perrotta
it looks like it's time to close this

@perrotta
Copy link
Contributor Author

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants