Fix features order #29799

rovere · 2020-05-11T15:40:14Z

PR description:

In recent meetings at the Upgrade HLT TDR, strange distribution of PFCandidates produced from TICL have been reported: JetMET BTV.
After some debugging, this has been traced back to the way in which we pass features to ML model for PID and energy regression. This PR reshuffle them in the correct order. Local tests show no more dependency on phi.

PR validation:

Besides checking that the strange phi behaviour is gone:

runTheMatrix.py -l limited

cmsbuild · 2020-05-11T15:40:40Z

The code-checks are being triggered in jenkins.

cmsbuild · 2020-05-11T15:47:10Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-29799/15284

This PR adds an extra 28KB to repository

cmsbuild · 2020-05-11T15:47:34Z

A new Pull Request was created by @rovere (Marco Rovere) for master.

It involves the following packages:

RecoHGCal/TICL

@perrotta, @cmsbuild, @kpedro88, @slava77 can you please review it and eventually sign? Thanks.
@felicepantaleo, @riga, @apsallid, @sobhatta, @lecriste, @hatakeyamak, @clelange this is something you requested to watch as well.
@silviodonato, @dpiparo you are the release manager for this.

cms-bot commands are listed here

rovere · 2020-05-11T15:50:12Z

please test

cmsbuild · 2020-05-11T15:50:34Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/6227/console Started: 2020/05/11 17:53

cmsbuild · 2020-05-11T17:07:28Z

+1
Tested at: eda2df4
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bd1abe/6227/summary.html
CMSSW: CMSSW_11_1_X_2020-05-11-1100
SCRAM_ARCH: slc7_amd64_gcc820

cmsbuild · 2020-05-11T17:07:31Z

Comparison job queued.

cmsbuild · 2020-05-11T18:39:53Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bd1abe/6227/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 318 differences found in the comparisons
DQMHistoTests: Total files compared: 34
DQMHistoTests: Total histograms compared: 2697527
DQMHistoTests: Total failures: 1116
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 2696092
DQMHistoTests: Total skipped: 319
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
Checked 147 log files, 16 edm output root files, 34 DQM output files

kpedro88 · 2020-05-11T18:56:30Z

Comparison differences are limited to Phase 2 workflows, as expected.

@rovere I notice a few potentially surprising changes. Please confirm if these are expected:

overall slightly fewer objects reconstructed:
many fewer EM objects:
corresponding increase in HAD objects:
narrower distribution of MIP regressed energy:

rovere · 2020-05-12T09:14:45Z

Ciao @kpedro88 thanks for checking. The problem is that the comparison, in this case, does not make too much sense, since the ML inference was wrong before and, as a consequence, all the results produced in terms of PID and energy regression were unreliable.
Rephrased in another way: if POGs report mismatches and poor performances related to PID and energy regression, we will follow them up and investigate. We are also in the process of retraining the model using the most up-to-date TICL/trackster reconstruction that was not available at the time of the original training included in CMSSW.

fwyzard · 2020-05-12T12:57:25Z

RecoHGCal/TICL/plugins/PatternRecognitionbyCA.cc

@@ -333,9 +333,9 @@ void PatternRecognitionbyCA::energyRegressionAndID(const std::vector<reco::CaloC
        float *features = &input.tensor<float, 4>()(i, j, seenClusters[j], 0);

        // fill features


This "interface" is extremely error prone (as this bug just demonstrated).
Whoever approved this code for CMSSW clearly was not doing their job.

Looks like the original sin is #27917.

What are the plans to fix the use of anonymous, unordered list for passing parameters to tensorflow or other ML engines ?

@fwyzard this is a good point. Maybe the framework should add a data structure like edm::featureMap that enforces a schema and has interfaces to output in various ML formats.

(Though, this would only prevent one of the two problems addressed by this PR. No framework feature can specify whether or not energy should be divided by vertex multiplicity...)

(Though, this would only prevent one of the two problems addressed by this PR. No framework feature can specify whether or not energy should be divided by vertex multiplicity...)

:-)

Opened #29818 to follow this

kpedro88 · 2020-05-12T14:12:55Z

@rovere I take the point that the "old" results are just wrong. However, can you remark if the "new" results seem correct?

rovere · 2020-05-12T16:17:47Z

@kpedro88 I can confirm that using single-pion and single-electron guns into HGCAL the network is able to assign the proper ID to the reconstructed Tracksters. How this will scale on more complex events with PU is something we still have to verify.
The regressed energy is a little off, right now, but that's expected and due to the changes that went into the reconstruction since the last time the model has been trained. Users have been warned not to use it for the time being. A retraining of the model is needed and will come in the next weeks.

kpedro88 · 2020-05-12T22:17:59Z

+upgrade

perrotta · 2020-05-13T09:20:53Z

+1

Correct order of TensorFlow input features is restored
Jenkins tests pass and show differences where expected
A retraining of the model will be needed, but this is independent from this pull request

cmsbuild · 2020-05-13T09:21:39Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo (and backports should be raised in the release meeting by the corresponding L2)

silviodonato · 2020-05-13T20:04:04Z

+1

rovere added 3 commits May 11, 2020 13:29

Fix order of features in input to inference

80bda12

Fix energy scale with fractions at inference

da02f90

Run code-format

eda2df4

cmsbuild added this to the CMSSW_11_1_X milestone May 11, 2020

cmsbuild added code-checks-pending comparison-pending orp-pending pending-signatures reconstruction-pending tests-pending upgrade-pending labels May 11, 2020

cmsbuild added code-checks-approved and removed code-checks-pending labels May 11, 2020

cmsbuild added tests-started and removed tests-pending labels May 11, 2020

cmsbuild added tests-approved and removed tests-started labels May 11, 2020

cmsbuild added comparison-available and removed comparison-pending labels May 11, 2020

fwyzard reviewed May 12, 2020

View reviewed changes

kpedro88 mentioned this pull request May 12, 2020

Add framework class to enforce schemas for ML feature lists #29818

Open

cmsbuild added upgrade-approved and removed upgrade-pending labels May 12, 2020

cmsbuild added fully-signed reconstruction-approved and removed pending-signatures reconstruction-pending labels May 13, 2020

cmsbuild added orp-approved and removed orp-pending labels May 13, 2020

cmsbuild merged commit e1b8483 into cms-sw:master May 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix features order #29799

Fix features order #29799

rovere commented May 11, 2020

cmsbuild commented May 11, 2020

cmsbuild commented May 11, 2020

cmsbuild commented May 11, 2020

rovere commented May 11, 2020

cmsbuild commented May 11, 2020 •

edited

cmsbuild commented May 11, 2020

cmsbuild commented May 11, 2020

cmsbuild commented May 11, 2020

kpedro88 commented May 11, 2020

rovere commented May 12, 2020

fwyzard May 12, 2020

fwyzard May 12, 2020

kpedro88 May 12, 2020

fwyzard May 12, 2020

kpedro88 May 12, 2020

kpedro88 commented May 12, 2020

rovere commented May 12, 2020

kpedro88 commented May 12, 2020

perrotta commented May 13, 2020

cmsbuild commented May 13, 2020

silviodonato commented May 13, 2020

		@@ -333,9 +333,9 @@ void PatternRecognitionbyCA::energyRegressionAndID(const std::vector<reco::CaloC
		float *features = &input.tensor<float, 4>()(i, j, seenClusters[j], 0);

		// fill features

Fix features order #29799

Fix features order #29799

Conversation

rovere commented May 11, 2020

PR description:

PR validation:

cmsbuild commented May 11, 2020

cmsbuild commented May 11, 2020

cmsbuild commented May 11, 2020

rovere commented May 11, 2020

cmsbuild commented May 11, 2020 • edited

cmsbuild commented May 11, 2020

cmsbuild commented May 11, 2020

cmsbuild commented May 11, 2020

kpedro88 commented May 11, 2020

rovere commented May 12, 2020

fwyzard May 12, 2020

Choose a reason for hiding this comment

fwyzard May 12, 2020

Choose a reason for hiding this comment

kpedro88 May 12, 2020

Choose a reason for hiding this comment

fwyzard May 12, 2020

Choose a reason for hiding this comment

kpedro88 May 12, 2020

Choose a reason for hiding this comment

kpedro88 commented May 12, 2020

rovere commented May 12, 2020

kpedro88 commented May 12, 2020

perrotta commented May 13, 2020

cmsbuild commented May 13, 2020

silviodonato commented May 13, 2020

cmsbuild commented May 11, 2020 •

edited