Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParticleNetAK4 jet tagger #31570

Merged
merged 2 commits into from Sep 29, 2020
Merged

ParticleNetAK4 jet tagger #31570

merged 2 commits into from Sep 29, 2020

Conversation

hqucms
Copy link
Contributor

@hqucms hqucms commented Sep 24, 2020

PR description:

This PR adds the ParticleNet tagger for AK4 jets. ParticleNetAK4 is a multi-class tagger for

  • jet flavour tagging (b, c vs light jets)
  • quark/gluon tagging
  • pileup jet identification

The current version is trained on standard AK4 CHS jets using UL18 MC. The tagger is based on the ParticleNet graph neural network architecture, which is also used in CMSSW for boosted AK8 jet tagging.

The new ParticleNetAK4 tagger shows significant performance improvements:

  • large improvements in b-/c-tagging compared to DeepJet, particularly at high pT
  • large improvements in q/g-tagging compared to quark-gluon likelihood, similar/better than DeepJet q/g discriminant
  • similar/better performance in PU rejection than pileup jet ID

More details and comparisons can be found in the presentations in the BTV [1, 2] and the JME [1, 2) groups.

Requires:

cms-data/RecoBTag-Combined#35

PR validation:

Implementation of this PR has been verified with the training framework and shows consistent results.

[Timing]
Evaluated by running 1k ttbar events using RecoBTag/ONNXRuntime/test/test_particle_net_ak4_cfg.py.

TimeReport   0.000065     0.000065     0.000065  pfParticleNetAK4DiscriminatorsJetTags
TimeReport   0.036854     0.036854     0.036854  pfParticleNetAK4JetTags
TimeReport   0.002803     0.002803     0.002803  pfParticleNetAK4TagInfos

For comparison, below is for DeepJet:

TimeReport   0.009825     0.009825     0.009825  pfDeepFlavourJetTags
TimeReport   0.000764     0.000764     0.000764  pfDeepFlavourTagInfos

[Timing for 1325.518]: #31570 (comment)
Timing for UL reMINIAOD workflow 1325.518 increases by ~6% (0.453879 s/ev -> 0.471795 s/ev).

[Memory]

Model init (1.8MB) + execution (3.5MB):

FYI @camclean @alefisico

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31570/18600

  • This PR adds an extra 52KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @hqucms (Huilin Qu) for master.

It involves the following packages:

PhysicsTools/PatAlgos
RecoBTag/Configuration
RecoBTag/FeatureTools
RecoBTag/ONNXRuntime

@perrotta, @jpata, @cmsbuild, @santocch, @slava77 can you please review it and eventually sign? Thanks.
@jdamgov, @rappoccio, @gouskos, @jdolen, @ahinzmann, @smoortga, @riga, @schoef, @emilbols, @mariadalfonso, @JyothsnaKomaragiri, @nhanvtran, @gkasieczka, @clelange, @hatakeyamak, @ferencek, @gpetruc, @andrzejnovak, @peruzzim, @seemasharmafnal, @mmarionncern this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@santocch
Copy link

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 24, 2020

The tests are being triggered in jenkins.

@jpata
Copy link
Contributor

jpata commented Sep 24, 2020

@cmsbuild please abort

(we need to include the data PR, we were just discussing this PR in the reco chat)

@cmsbuild
Copy link
Contributor

Jenkins tests are aborted.

@jpata
Copy link
Contributor

jpata commented Sep 24, 2020

test parameters:

@jpata
Copy link
Contributor

jpata commented Sep 24, 2020

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 24, 2020

The tests are being triggered in jenkins.
Tested with other pull request(s) cms-data/RecoBTag-Combined#35

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d5e9ed/9551/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 158 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2539438
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2539409
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 34 files compared)
  • Checked 149 log files, 22 edm output root files, 35 DQM output files

@jpata
Copy link
Contributor

jpata commented Sep 28, 2020

Reco outputs are only in jet id variables pairDiscriVector 19-24, as expected.

Timing on reMINIAOD phase2 ttbar workflow 1325.518 (1000 local events) goes up by ~4% 0.453879 s/ev -> 0.471795 s/ev. This is fine, but it is not negligible, so should be noted.

TimeReports are attached here:
report_orig.txt
report_new.txt

@slava77
Copy link
Contributor

slava77 commented Sep 28, 2020

Reco outputs are only in jet id variables pairDiscriVector 19-24, as expected.

Timing on reMINIAOD phase2 ttbar workflow 1325.518 (1000 local events) goes up by ~4% 0.453879 s/ev -> 0.471795 s/ev. This is fine, but it is not negligible, so should be noted.

TimeReports are attached here:
report_orig.txt
report_new.txt

Looking at the report, the particleNetAK4 adds up to about 28 ms, which would be about 6%. (it would be nice to see it in the summary quoted above).
@hqucms is there some expectation that this can be improved in some not so distant future?

@hqucms
Copy link
Contributor Author

hqucms commented Sep 28, 2020

@hqucms is there some expectation that this can be improved in some not so distant future?

@slava77 It will be difficult to improve (unless we enable AVX for ONNXRuntime).

@slava77
Copy link
Contributor

slava77 commented Sep 28, 2020

@hqucms is there some expectation that this can be improved in some not so distant future?

@slava77 It will be difficult to improve (unless we enable AVX for ONNXRuntime).

what about using smaller precision, float16?

@jpata
Copy link
Contributor

jpata commented Sep 28, 2020

I don't see it as a blocker for this PR today, but it would be good to see effort for all DNNs entering production to improve the network by trimming the network size, as well doing inference with reduced precision (as Slava suggests).

@hqucms
Copy link
Contributor Author

hqucms commented Sep 29, 2020

@slava77 @jpata
One thing I tried is to quantize this model (i.e., using int8 for Conv/Gemm operations) using ONNXRuntime's quantization tools, but I did not get any improvement in speed (at least up to now).

For now, the easiest way to speed up the inference is to enable the dynamic architecture feature of ONNXRuntime, then we can get a ~1.5-2x speed-up for free on all models using ONNXRuntime whenever AVX/AVX2/AVX512 is available (and still be able to run on older machine w/ only SSE). The price to pay is numerical precision level difference in the results due to the use of different instructions.

For future developments one can try e.g., applying some preselection on the jet constituent particles, or doing a systematic network architecture search to reduce the inference time, but all these take substantial amount of time/resources and go beyond the scope of this PR.

@jpata
Copy link
Contributor

jpata commented Sep 29, 2020

+reconstruction

  • adds ParticleNet tagger for AK4 jets
  • reco tests show differences in jet ID variables, as expected
  • runtime cost of this DNN is not negligible and should be addressed

@jpata
Copy link
Contributor

jpata commented Sep 29, 2020

@hqucms can you clarify if a backport of this is planned?

@hqucms
Copy link
Contributor Author

hqucms commented Sep 29, 2020

@jpata Yes, we plan to backport this to 106X for UL.

@andrzejnovak
Copy link
Contributor

@hqucms what is the degree of numerical differences from using AVX? Would it be below what was discussed here in #28469 ?

@silviodonato
Copy link
Contributor

urgent
to be merged in 11_2_0_pre7

@hqucms
Copy link
Contributor Author

hqucms commented Sep 29, 2020

@silviodonato
Just a small note -- This PR needs to be merged after cms-data/RecoBTag-Combined#35.

@hqucms
Copy link
Contributor Author

hqucms commented Sep 29, 2020

@hqucms what is the degree of numerical differences from using AVX? Would it be below what was discussed here in #28469 ?

@andrzejnovak I think rounding to 1e-4 should cover the difference in most cases, but there can still be some exceptions.

@silviodonato
Copy link
Contributor

merge

@cmsbuild cmsbuild merged commit b8ac8cf into cms-sw:master Sep 29, 2020
@santocch
Copy link

santocch commented Oct 4, 2020

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 4, 2020

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants