HLT XGBoost Photon MVA and Diphoton combination filter #44473

smorovic · 2024-03-19T23:01:54Z

PR description:

This PR uses XGBoost C API to implement calculation of the score of Photons for HLT.
Combination filter is provided with the aim of implementing diphoton paths.
Total of 9 HLT variables are used as input.

For binary model files, NTree Limit parameter must be provided (it is not saved in the model file).

Note: at this time model training files are not provided, until they are finalized.
At that point we will request: cms-data repository for model files, define default parameters (cuts) and add unit test which ensures XGBoost (v1.7.5 currently used in CMSSW) calculates scores correctly with given model files and parameters.

Initially this is opened as Draft PR. Backport to 14_0_X is planned.

PR validation:

Initially passes HLT integration tests (caveat: all events rejected at the combination filter due to low statistics, but no errors)

Sample HLT snippet used for this:

process.hltPreDiphotonMVA = cms.EDFilter( "HLTPrescaler",
     offset = cms.uint32( 0 ),
     L1GtReadoutRecordTag = cms.InputTag( "hltGtStage2Digis" )
)

process.hltDiEG14p25EtEta2p55UnseededFilter = cms.EDFilter( "HLT1Photon",
     saveTags = cms.bool( True ),
     inputTag = cms.InputTag( "hltEgammaCandidatesUnseeded" ),
     triggerType = cms.int32( 92 ),
     MinE = cms.double( -1.0 ),
     MinPt = cms.double( 14.25 ),
     MinMass = cms.double( -1.0 ),
     MaxMass = cms.double( -1.0 ),
     MinEta = cms.double( -1.0 ),
     MaxEta = cms.double( 2.55 ),
     MinN = cms.int32( 2 )
)

process.PhotonXGBoostProducer = cms.EDProducer("PhotonXGBoostProducer",
     candTag = cms.InputTag( "hltEgammaCandidatesUnseeded" ),
     inputTagR9 = cms.InputTag("hltEgammaR9IDUnseeded", "r95x5"),
     inputTagHoE = cms.InputTag("hltEgammaHoverEUnseeded"),
     inputTagSigmaiEtaiEta = cms.InputTag("hltEgammaClusterShapeUnseeded", "sigmaIEtaIEta5x5NoiseCleaned"),
     inputTagE2x2 = cms.InputTag("hltEgammaClusterShapeUnseeded", "e2x2"),
     inputTagIso = cms.InputTag("hltEgammaEcalPFClusterIsoUnseeded"),
     mvaFileXgbB = cms.FileInPath("RecoEgamma/PhotonIdentification/data/barrel.bin"), #copy by hand
     mvaFileXgbE = cms.FileInPath("RecoEgamma/PhotonIdentification/data/barrel.bin"), #copy by hand
     mvaNTreeLimitB = cms.uint32(168),
     mvaNTreeLimitE = cms.uint32(158),
     mvaThresholdEt = cms.double(14.25)
)

process.HLTEgammaDoubleXGBoostCombFilter = cms.EDFilter("HLTEgammaDoubleXGBoostCombFilter",
     saveTags = cms.bool( True ),
     candTag = cms.InputTag( "hltEgammaCandidatesUnseeded" ),
     mvaPhotonTag = cms.InputTag( "PhotonXGBoostProducer" ),
     highMassCut = cms.double(95),
     leadCutHighMass1 = cms.vdouble(0.98,0.95),
     subCutHighMass1 = cms.vdouble(0.00,0.04),
     leadCutHighMass2 = cms.vdouble(0.85,0.85),
     subCutHighMass2 = cms.vdouble(0.04,0.08),
     leadCutHighMass3 = cms.vdouble(0.30,0.50),
     subCutHighMass3 = cms.vdouble(0.15,0.20),
     lowMassCut = cms.double(60),
     leadCutLowMass1 = cms.vdouble(0.98,0.90),
     subCutLowMass1 = cms.vdouble(0.04,0.05),
     leadCutLowMass2 = cms.vdouble(0.90,0.80),
     subCutLowMass2 = cms.vdouble(0.10,0.10),
     leadCutLowMass3 = cms.vdouble(0.60,0.60),
     subCutLowMass3 = cms.vdouble(0.30,0.30),

process.HLTDiphotonMvaTestSequence = cms.Sequence( process.HLTDoFullUnpackingEgammaEcalSequence + process.HLTPFClusteringForEgamma + process.hltEgammaCandidates + process.hltEGL1SingleAndDoubleEGOrFilter + process.hltEG30L1SingleAndDoubleEGOrEtFilter + process.HLTPFClusteringForEgammaUnseeded + process.hltEgammaCandidatesUnseeded + process.hltDiEG14p25EtEta2p55UnseededFilter + process.hltEgammaR9IDUnseeded + process.HLTDoLocalHcalSequence + process.HLTFastJetForEgamma + process.hltEgammaHoverEUnseeded + process.hltEgammaClusterShapeUnseeded + process.hltEgammaEcalPFClusterIsoUnseeded )

process.HLT_Diphoton_MVA = cms.Path( process.HLTBeginSequence + process.hltL1sSingleAndDoubleEGor + process.HLTDiphotonMvaTestSequence  + process.PhotonXGBoostProducer + process.HLTEgammaDoubleXGBoostCombFilter + process.HLTEndSequence )

process.schedule.insert( process.schedule.index( process.HLT_Diphoton30_22_R9Id_OR_IsoCaloId_AND_HE_R9Id_Mass95_v20 ) + 1, process.HLT_Diphoton_MVA )

getattr(process.datasets, 'EGamma0').append('HLT_Diphoton_MVA')
getattr(process.datasets, 'EGamma1').append('HLT_Diphoton_MVA')
getattr(process.datasets, 'OnlineMonitor').append('HLT_Diphoton_MVA')

process.hltDatasetEGamma.triggerConditions.append('HLT_Diphoton_MVA')
process.hltDatasetOnlineMonitor.triggerConditions.append('HLT_Diphoton_MVA')

cmsbuild · 2024-03-19T23:02:22Z

cms-bot internal usage

cmsbuild · 2024-03-19T23:10:11Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44473/39558

This PR adds an extra 24KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44473/39558/code-format.patch
e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44473/39558/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

cmsbuild · 2024-03-19T23:21:24Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44473/39559

This PR adds an extra 24KB to repository

cmsbuild · 2024-03-19T23:21:44Z

A new Pull Request was created by @smorovic for master.

It involves the following packages:

HLTrigger/Egamma (hlt)
RecoEgamma/EgammaHLTProducers (hlt)
RecoEgamma/PhotonIdentification (reconstruction)

@mmusich, @mandrenguyen, @cmsbuild, @Martin-Grunewald, @jfernan2 can you please review it and eventually sign? Thanks.
@silviodonato, @HuguesBrun, @afiqaize, @a-kapoor, @missirol, @cericeci, @ram1123, @Sam-Harper, @valsdav, @Fedespring, @jainshilpi, @varuns23, @sobhatta, @Prasant1993, @sameasy, @Martin-Grunewald, @lgray this is something you requested to watch as well.
@rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

HLTrigger/Egamma/plugins/HLTEgammaDoubleXGBoostCombFilter.cc

RecoEgamma/PhotonIdentification/plugins/PhotonXGBoostProducer.cc

HLTrigger/Egamma/plugins/HLTEgammaDoubleXGBoostCombFilter.cc

RecoEgamma/PhotonIdentification/plugins/PhotonXGBoostProducer.cc

mmusich · 2024-03-20T08:05:57Z

type hlt-integration

cmsbuild · 2024-03-20T13:29:49Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44473/39583

This PR adds an extra 24KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File RecoEgamma/EgammaHLTProducers/plugins/EgammaHLTClusterShapeProducer.cc modified in PR(s): Update of the Phase-2 GMT Emulator #44319

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44473/39583/code-format.patch
e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44473/39583/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

cmsbuild · 2024-03-20T13:44:23Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44473/39589

This PR adds an extra 64KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File RecoEgamma/EgammaHLTProducers/plugins/EgammaHLTClusterShapeProducer.cc modified in PR(s): Update of the Phase-2 GMT Emulator #44319

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44473/39589/code-format.patch
e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44473/39589/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

mmusich · 2024-03-21T11:48:57Z

+hlt

PR according to description and follow-up review.
newly introduced unit test RecoEgammaPhotonIdentificationTest passes when tested with XGBoost Photon HLT model files cms-data/RecoEgamma-PhotonIdentification#14
proposed code is not yet used in a HLT menu, but will be used once it is available (proponent tested at HLT XGBoost Photon MVA and Diphoton combination filter #44473 (comment)).
this PR must be merged together with XGBoost Photon HLT model files cms-data/RecoEgamma-PhotonIdentification#14

mmusich · 2024-03-21T11:49:23Z

@smorovic is the plan to backport to 14.0.X as well?

smorovic · 2024-03-21T11:55:24Z

@smorovic is the plan to backport to 14.0.X as well?

yes, it will be needed for (potentially) being added to HLT menu in the upcoming months.

jfernan2 · 2024-03-21T17:32:55Z

+1

cmsbuild · 2024-03-21T17:33:20Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @rappoccio, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)
Notice This PR was tested with additional Pull Request(s), please also merge them if necessary: cms-data/RecoEgamma-PhotonIdentification#14

antoniovilela · 2024-03-22T09:45:03Z

+1

perrotta · 2024-03-23T08:02:08Z

@smorovic did you notice that the newly added unit test keeps failing in the IBs with the aarch64 and pc64le architectures, both in master and also in the 14_0_X IBs after the backport?
The error looks the same as the one get when this PR is run without the corresponding cmsdist update, see e.g. #44497 (comment)
@cms-sw/core-l2 maybe have some clue about it...

mmusich · 2024-03-23T08:52:16Z

The error looks the same as the one get when this PR is run without the corresponding cmsdist update,

Probably that IB build didn't pick up (yet) the cmsdist update.

mmusich · 2024-03-23T08:57:27Z

keeps failing in the IBs with the aarch64 and pc64le architectures,

The test in aarch log looks like the inference is returning values different from the expected ones.

The error looks the same as the one get when this PR is run without the corresponding cmsdist update

It does not look like that at all.

smorovic · 2024-03-23T11:59:10Z

So, it looks like values on other architectures are not reproducible (in 4 out of 10 cases).
Since this is target for AMD64 only (HLT), we could just add #if defined(__x86_64__) around the unit test check.
Would that be fine?

mmusich · 2024-03-23T15:04:10Z

Would that be fine?

Well, not in the long term. I imagine you are planning to use this for HLT online, but the moment your trigger will enter the offline menu in cmssw, it will be tested by TSG unit tests and relvas that are run on all architectures, and we cannot let trigger irreproducibilities cripple in. So a longer term solution is needed to support this on other archs.

smorovic · 2024-03-23T18:51:40Z

Would that be fine?

Well, not in the long term. I imagine you are planning to use this for HLT online, but the moment your trigger will enter the offline menu in cmssw, it will be tested by TSG unit tests and relvas that are run on all architectures, and we cannot let trigger irreproducibilities cripple in. So a longer term solution is needed to support this on other archs.

OK, I see.
We could then completely remove this unit test for now, so that IBs / release tests won't ne broken.
And this problem will have to be fixed. I will try debugging the library on lxplus9-arm, probably starting with compiler options and optimisations for the library.

I also had a look at:
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_ppc64le_gcc12/CMSSW_14_1_X_2024-03-22-2300/unitTestLogs/RecoEgamma/PhotonIdentification#/
And it is interesting that on ARM and PPC discrepancies are identical...

so maybe figuring it out between ARM and AMD64 will be sufficient (and at the moment I don't know of a PPC machine that I could use).

mmusich · 2024-03-25T07:47:28Z

We could then completely remove this unit test for now, so that IBs / release tests won't ne broken.

@smorovic, on second thought, I think it might be OK if we go ahead with your proposal:

add #if defined(__x86_64__) around the unit test check.

provided we're reminded of the observation that ARM and PPC give different results than AMD64 in a github issue. If the trigger results are different, but at least reproducible (i.e. each time the inference is run we get the same results within the same architecture) it might be OK given as you said we don't run the HLT online on those archs.

smorovic · 2024-03-25T10:25:24Z

@mmusich
I just tested it on ARM machine and this test always fails in the same way, with same values produced.
I will push new PRs with __x86_64__ ifdef, but independently I will try to understand where this numerical differences come from.

makortel · 2024-03-25T15:41:37Z

I'd suggest to open a GitHub issue in any case (to have a better record that this kind of difference between x86 and ARM was observed, if nothing else).

mmusich · 2024-03-25T15:43:07Z

I'd suggest to open a GitHub issue in any case

Indeed, this is what I trying to suggest with

provided we're reminded of the observation that ARM and PPC give different results than AMD64 in a github issue

at #44473 (comment)

smorovic · 2024-03-25T22:59:32Z

As mentioned in #44542,
it seems discrepancy comes from abs being inconsistent between architectures in case of CMSSW.
I'm changing to std::abs in master and 14_1_X in #44531 and corresponding backport.

cmsbuild added this to the CMSSW_14_1_X milestone Mar 19, 2024

cmsbuild added reconstruction-pending hlt-pending pending-signatures tests-pending orp-pending code-checks-pending labels Mar 19, 2024

cmsbuild added code-checks-rejected code-checks-pending and removed code-checks-pending code-checks-rejected labels Mar 19, 2024

cmsbuild added code-checks-approved and removed code-checks-pending labels Mar 19, 2024

Martin-Grunewald reviewed Mar 20, 2024

View reviewed changes

HLTrigger/Egamma/plugins/HLTEgammaDoubleXGBoostCombFilter.cc Outdated Show resolved Hide resolved

Martin-Grunewald reviewed Mar 20, 2024

View reviewed changes

RecoEgamma/PhotonIdentification/plugins/PhotonXGBoostProducer.cc Show resolved Hide resolved

mmusich reviewed Mar 20, 2024

View reviewed changes

cmsbuild added hlt-integration code-checks-pending and removed code-checks-approved labels Mar 20, 2024

cmsbuild added code-checks-rejected code-checks-pending and removed code-checks-pending code-checks-rejected labels Mar 20, 2024

smorovic marked this pull request as ready for review March 21, 2024 11:10

cmsbuild added hlt-approved and removed hlt-pending labels Mar 21, 2024

smorovic mentioned this pull request Mar 21, 2024

HLT XGBoost Photon MVA and Diphoton combination filter (backport for 14_0_X) #44500

Merged

cmsbuild added reconstruction-approved fully-signed and removed reconstruction-pending pending-signatures labels Mar 21, 2024

cmsbuild added orp-approved and removed orp-pending labels Mar 22, 2024

cmsbuild merged commit 6569682 into cms-sw:master Mar 22, 2024
12 checks passed

cmsbuild mentioned this pull request Mar 22, 2024

Update 2024 geometry and GEM alignment conditions in 2024 MC GTs #44497

Merged

smorovic mentioned this pull request Mar 25, 2024

XGBoost different inference results on AMD64, ARM and PPC #44542

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HLT XGBoost Photon MVA and Diphoton combination filter #44473

HLT XGBoost Photon MVA and Diphoton combination filter #44473

smorovic commented Mar 19, 2024

cmsbuild commented Mar 19, 2024 •

edited

cmsbuild commented Mar 19, 2024

cmsbuild commented Mar 19, 2024

cmsbuild commented Mar 19, 2024

mmusich commented Mar 20, 2024

cmsbuild commented Mar 20, 2024

cmsbuild commented Mar 20, 2024

mmusich commented Mar 21, 2024

mmusich commented Mar 21, 2024

smorovic commented Mar 21, 2024

jfernan2 commented Mar 21, 2024

cmsbuild commented Mar 21, 2024

antoniovilela commented Mar 22, 2024

perrotta commented Mar 23, 2024

mmusich commented Mar 23, 2024

mmusich commented Mar 23, 2024 •

edited

smorovic commented Mar 23, 2024

mmusich commented Mar 23, 2024 •

edited

smorovic commented Mar 23, 2024

mmusich commented Mar 25, 2024 •

edited

smorovic commented Mar 25, 2024

makortel commented Mar 25, 2024

mmusich commented Mar 25, 2024

smorovic commented Mar 25, 2024 •

edited

HLT XGBoost Photon MVA and Diphoton combination filter #44473

HLT XGBoost Photon MVA and Diphoton combination filter #44473

Conversation

smorovic commented Mar 19, 2024

PR description:

PR validation:

cmsbuild commented Mar 19, 2024 • edited

cmsbuild commented Mar 19, 2024

cmsbuild commented Mar 19, 2024

cmsbuild commented Mar 19, 2024

mmusich commented Mar 20, 2024

cmsbuild commented Mar 20, 2024

cmsbuild commented Mar 20, 2024

mmusich commented Mar 21, 2024

mmusich commented Mar 21, 2024

smorovic commented Mar 21, 2024

jfernan2 commented Mar 21, 2024

cmsbuild commented Mar 21, 2024

antoniovilela commented Mar 22, 2024

perrotta commented Mar 23, 2024

mmusich commented Mar 23, 2024

mmusich commented Mar 23, 2024 • edited

smorovic commented Mar 23, 2024

mmusich commented Mar 23, 2024 • edited

smorovic commented Mar 23, 2024

mmusich commented Mar 25, 2024 • edited

smorovic commented Mar 25, 2024

makortel commented Mar 25, 2024

mmusich commented Mar 25, 2024

smorovic commented Mar 25, 2024 • edited

cmsbuild commented Mar 19, 2024 •

edited

mmusich commented Mar 23, 2024 •

edited

mmusich commented Mar 23, 2024 •

edited

mmusich commented Mar 25, 2024 •

edited

smorovic commented Mar 25, 2024 •

edited