Sort pixel tracks in the SoA converter #38065

silviodonato · 2022-05-24T16:25:56Z

PR description:

This PR helps to reduce the CPU-GPU differences. Testing on 10k of events of 900 GeV collisions, the hltAK4PFJets with difference in pt greater than 0.35 GeV passed from 5 to 2.

Moreover, having the pixel tracks sorted by pT helps the CPU-GPU comparison.

PR validation:

I checked on 10k of events that this PR does not change the CPU reconstruction, while it changes 3 jets reconstructed using GPU, reducing the differences from CPU.

root [2] Events->Scan("EventAuxiliary.event():EventAuxiliary.run():recoPFJets_hltAK4PFJets__CPU3.obj.pt():recoPFJets_hltAK4PFJets__GPU3.obj.pt():recoPFJets_hltAK4PFJets__CPU4.obj.pt():recoPFJets_hltAK4PFJets__GPU4.obj.pt()","abs(recoPFJets_hltAK4PFJets__CPU4.obj.pt()-recoPFJets_hltAK4PFJets__GPU4.obj.pt())>0.35 || abs(recoPFJets_hltAK4PFJets__CPU3.obj.pt()-recoPFJets_hltAK4PFJets__GPU3.obj.pt())>0.35")
***********************************************************************************************
*    Row   * Instance * EventAuxi * EventAuxi * recoPFJet * recoPFJet * recoPFJet * recoPFJet *
***********************************************************************************************
*     2787 *        1 * 249032616 *    346512 * 1.9907082 * 1.6183525 * 1.9907082 * 1.6183525 *
*     4010 *        2 * 251712216 *    346512 * 2.0095655 * 2.0095655 * 2.0095655 * 2.6015811 *
*     4010 *        3 * 251712216 *    346512 * 1.6040698 * 1.6040675 * 1.6040698 * 2.0095655 *
*     7289 *        2 * 260018739 *    346512 * 1.7360442 * 2.2783870 * 1.7360442 * 2.2783870 *
*     7962 *        1 * 261754240 *    346512 * 0.8782627 * 0.8782627 * 0.8782627 * 1.2651753 *
***********************************************************************************************

CPU3 and GPU3 are after the PR and CPU4 and GPU4 before the PR

Backport:

I will open soon the backport PRs to use this PR in the HLT at P5:
#38067 #38066

cc @cms-sw/hlt-l2 @fwyzard @VinInn

cmsbuild · 2022-05-24T16:34:47Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38065/30150

This PR adds an extra 12KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38065/30150/code-format.patch
e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38065/30150/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

mmusich · 2022-05-24T19:06:15Z

@silviodonato, Silvio can you please apply code checks so that the integration tests could be started?

cmsbuild · 2022-05-24T19:25:16Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38065/30154

This PR adds an extra 12KB to repository

cmsbuild · 2022-05-24T19:25:34Z

A new Pull Request was created by @silviodonato (Silvio Donato) for master.

It involves the following packages:

RecoPixelVertexing/PixelTrackFitting (reconstruction)

@jpata, @cmsbuild, @clacaputo, @slava77 can you please review it and eventually sign? Thanks.
@felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @mmusich, @mtosi, @dgulhan this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

mmusich · 2022-05-24T19:41:48Z

enable gpu

mmusich · 2022-05-24T19:42:00Z

test parameters:

enable_tests = gpu
workflows_gpu = 11634.503, 11634.506, 11634.583, 11634.587
workflow = 11634.501
relvals_opt= -w standard,highstats,pileup,generator,extendedgen,production,upgrade,cleanedupgrade,ged

mmusich · 2022-05-24T19:42:10Z

please test

mmusich · 2022-05-24T19:49:34Z

type tracking

cmsbuild · 2022-05-25T00:13:34Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f3216b/24962/summary.html
COMMIT: a130c46
CMSSW: CMSSW_12_5_X_2022-05-24-1100/el8_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38065/24962/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f3216b/24962/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f3216b/24962/git-merge-result

GPU Comparison Summary

Summary:

You potentially added 20878 lines to the logs
Reco comparison results: 44 differences found in the comparisons
DQMHistoTests: Total files compared: 7
DQMHistoTests: Total histograms compared: 76363
DQMHistoTests: Total failures: 260
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 76103
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
Checked 24 log files, 18 edm output root files, 7 DQM output files
TriggerResults: found differences in 6 / 6 workflows

Comparison Summary

Summary:

You potentially added 365465 lines to the logs
Reco comparison results: 4 differences found in the comparisons
DQMHistoTests: Total files compared: 51
DQMHistoTests: Total histograms compared: 3666987
DQMHistoTests: Total failures: 1216
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3665749
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
Checked 212 log files, 48 edm output root files, 51 DQM output files
TriggerResults: found differences in 3 / 50 workflows

mmusich · 2022-05-25T06:32:29Z

@silviodonato

GPU: You potentially added 20878 lines to the logs
CPU: You potentially added 365465 lines to the logs

as you can see from the logs, this PR adds a flood of:

%MSG-w PixelVertexProducer:  PixelVertexProducerFromSoA:hltPixelVertices  24-May-2022 22:40:09 CEST Run: 1 Event: 1
oops track 179 does not exists on CPU 65535
%MSG
%MSG-w PixelVertexProducer:  PixelVertexProducerFromSoA:hltPixelVertices  24-May-2022 22:40:09 CEST Run: 1 Event: 1
oops track 180 does not exists on CPU 65535
%MSG
%MSG-w PixelVertexProducer:  PixelVertexProducerFromSoA:hltPixelVertices  24-May-2022 22:40:09 CEST Run: 1 Event: 1
oops track 181 does not exists on CPU 65535
%MSG
%MSG-w PixelVertexProducer:  PixelVertexProducerFromSoA:hltPixelVertices  24-May-2022 22:40:09 CEST Run: 1 Event: 1
oops track 182 does not exists on CPU 65535
%MSG
%MSG-w PixelVertexProducer:  PixelVertexProducerFromSoA:hltPixelVertices  24-May-2022 22:40:09 CEST Run: 1 Event: 1
oops track 183 does not exists on CPU 65535
%MSG
%MSG-w PixelVertexProducer:  PixelVertexProducerFromSoA:hltPixelVertices  24-May-2022 22:40:09 CEST Run: 1 Event: 1
oops track 184 does not exists on CPU 65535
%MSG

both on CPU and GPU workflows:

cmsbuild · 2022-05-31T17:30:23Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f3216b/25107/summary.html
COMMIT: 62af9d4
CMSSW: CMSSW_12_5_X_2022-05-31-1100/el8_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/38065/25107/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 7
DQMHistoTests: Total histograms compared: 76363
DQMHistoTests: Total failures: 58
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 76305
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
Checked 24 log files, 18 edm output root files, 7 DQM output files
TriggerResults: found differences in 6 / 6 workflows

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 2 differences found in the comparisons
DQMHistoTests: Total files compared: 51
DQMHistoTests: Total histograms compared: 3664317
DQMHistoTests: Total failures: 1110
DQMHistoTests: Total nulls: 1
DQMHistoTests: Total successes: 3663184
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.004 KiB( 50 files compared)
DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
Checked 212 log files, 48 edm output root files, 51 DQM output files
TriggerResults: found differences in 3 / 50 workflows

missirol · 2022-06-02T14:40:12Z

I'm trying to understand the differences in the HLT outputs from the PR tests (e.g. wfs 11634.*).
https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_12_5_X_2022-05-31-1100+f3216b/50618/dqm-histo-comparison-summary.html

There are several (generally small) differences in both CPU and GPU wfs, and I understand this is to be expected. The differences are mainly coming from HLT (full) tracks and HLT PFlow-related quantities.
The wf which customises the HLT step to use MkFit at HLT (i.e. 11634.7) shows less differences compared to its standard counterpart (11634.0). Correct? Expected?
The DD4hep wf (11634.911) shows more differences wrt 11634.0, while the DDD wf (11634.914) shows no differences at all. I don't know what could explain this. Am I missing something?

Feedback from tracking experts is likely needed (and appreciated).

mmusich · 2022-06-02T15:11:29Z

@missirol

There are several (generally small) differences in both CPU and GPU wfs, and I understand this is to be expected. The differences are mainly coming from HLT (full) tracks and HLT PFlow-related quantities.

I agree this is expected.

The wf which customises the HLT step to use MkFit at HLT (i.e. 11634.7) shows less differences compared to its standard counterpart (11634.0). Correct? Expected?

the mkFit customization uses the same seeds as the default HLT workflow customizeHLTIter0ToMkFit.py#L44, so the patatrack pixel tracks, but mkFit also performs an internal sorting and cleaning of the seeds, so probably part of what is implemented here is overridden anyway donwstream (@mmasciov)

The DD4hep wf (11634.911) shows more differences wrt 11634.0, while the DDD wf (11634.914) shows no differences at all. I don't know what could explain this. Am I missing something?

As far as I understand the standard wf 11634.0 IS using DD4hep (but reading the simulation / reconstruction geometry from DataBase), while 11634.911 is theoretically using the same geometry, but fetching if from XML. Looking in the past few month of cmssw integration history the 11634.911 workflow is not very reproducible (see e.g.: #35109), so I am not entirely surprised about that. About why the DDD doesn't show any change, I have no clue (tagging also @cms-sw/geometry-l2 )

missirol · 2022-06-02T15:18:49Z

Thanks for the info, @mmusich .

cvuosalo · 2022-06-02T19:54:49Z

@missirol @mmusich Wf 11634.911 is DD4hep using geometry from XML files. 11634.0 is DD4hep with DB geometry. 11634.914 is DDD with DB geometry. The other 11634 wfs are DD4hep with DB geometry.
I cannot explain the pattern of differences between these workflows due to the change in sort order of the pixel tracks. There are tiny numerical differences (~1e-10) among the Tracker reco geometries from DD4hep XML, DD4hep DB, and DDD DB. I am not sure how they would interact with the sorting of pixel tracks.

jpata · 2022-06-08T09:20:46Z

Folks, how do you want to take this forward, and with what timescale?

As far as I understand, minor differences in outputs can be expected given a different sorting, but the exact pattern of differences with respect to different geometry implementations is not understood. From the reco side, this is not a blocker.

missirol · 2022-06-08T14:38:11Z

+hlt

Sort pixel tracks in the SoA converter #38065 (comment) was not addressed, but it refers to something that was not changed by this PR.
The differences in the outputs are not fully understood by experts, and I guess that simply requires more time to be figured out.
From where I stand, I would simply accept that this PR reduces the CPU-GPU differences at HLT.
The goal is to have the backport (Sort pixel tracks in the SoA converter (12_4_X) #38066) integrated in 12_4_0.

jpata · 2022-06-09T10:39:15Z

+reconstruction

no relevant differences in reco (some in HLT)
addresses CPU-GPU differences at HLT

qliphy · 2022-06-10T00:05:16Z

@cms-sw/heterogeneous-l2 Do you have any further comment? Or can you sign this PR?

fwyzard · 2022-06-10T05:05:26Z

+1

cmsbuild · 2022-06-10T05:05:50Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

qliphy · 2022-06-10T05:37:00Z

+1

cmsbuild added this to the CMSSW_12_5_X milestone May 24, 2022

cmsbuild added code-checks-pending orp-pending pending-signatures reconstruction-pending tests-pending labels May 24, 2022

This was referenced May 24, 2022

Sort pixel tracks in the SoA converter (12_4_X) #38066

Merged

Sort pixel tracks in the SoA converter (12_3_X) #38067

Merged

cmsbuild added code-checks-rejected and removed code-checks-pending labels May 24, 2022

cmsbuild added code-checks-pending and removed code-checks-rejected labels May 24, 2022

cmsbuild added code-checks-approved and removed code-checks-pending labels May 24, 2022

cmsbuild added tests-started and removed tests-pending labels May 24, 2022

cmsbuild added the tracking label May 24, 2022

cmsbuild added tests-approved and removed tests-started labels May 25, 2022

cmsbuild removed the tests-approved label May 25, 2022

cmsbuild added tests-started and removed tests-pending labels May 31, 2022

cmsbuild added tests-approved and removed tests-started labels May 31, 2022

cmsbuild added hlt-approved and removed hlt-pending labels Jun 8, 2022

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Jun 9, 2022

cmsbuild added fully-signed heterogeneous-approved and removed pending-signatures heterogeneous-pending labels Jun 10, 2022

cmsbuild added orp-approved and removed orp-pending labels Jun 10, 2022

cmsbuild merged commit 6700f76 into cms-sw:master Jun 10, 2022

This was referenced Jun 10, 2022

Adding SONIC ParticleNet Producer to CMSSW #37964

Merged

Updated root to tip of branch v6-26-00-patches cms-sw/cmsdist#7921

Merged

silviodonato mentioned this pull request Jul 31, 2023

sort by pt only good-quality tracks #42428

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort pixel tracks in the SoA converter #38065

Sort pixel tracks in the SoA converter #38065

silviodonato commented May 24, 2022 •

edited

cmsbuild commented May 24, 2022

mmusich commented May 24, 2022

cmsbuild commented May 24, 2022

cmsbuild commented May 24, 2022

mmusich commented May 24, 2022

mmusich commented May 24, 2022

mmusich commented May 24, 2022

mmusich commented May 24, 2022

cmsbuild commented May 25, 2022

mmusich commented May 25, 2022

cmsbuild commented May 31, 2022

missirol commented Jun 2, 2022

mmusich commented Jun 2, 2022

missirol commented Jun 2, 2022

cvuosalo commented Jun 2, 2022

jpata commented Jun 8, 2022

missirol commented Jun 8, 2022

jpata commented Jun 9, 2022

qliphy commented Jun 10, 2022

fwyzard commented Jun 10, 2022

cmsbuild commented Jun 10, 2022

qliphy commented Jun 10, 2022

Sort pixel tracks in the SoA converter #38065

Sort pixel tracks in the SoA converter #38065

Conversation

silviodonato commented May 24, 2022 • edited

PR description:

PR validation:

Backport:

cmsbuild commented May 24, 2022

mmusich commented May 24, 2022

cmsbuild commented May 24, 2022

cmsbuild commented May 24, 2022

mmusich commented May 24, 2022

mmusich commented May 24, 2022

mmusich commented May 24, 2022

mmusich commented May 24, 2022

cmsbuild commented May 25, 2022

GPU Comparison Summary

Comparison Summary

mmusich commented May 25, 2022

cmsbuild commented May 31, 2022

GPU Comparison Summary

Comparison Summary

missirol commented Jun 2, 2022

mmusich commented Jun 2, 2022

missirol commented Jun 2, 2022

cvuosalo commented Jun 2, 2022

jpata commented Jun 8, 2022

missirol commented Jun 8, 2022

jpata commented Jun 9, 2022

qliphy commented Jun 10, 2022

fwyzard commented Jun 10, 2022

cmsbuild commented Jun 10, 2022

qliphy commented Jun 10, 2022

silviodonato commented May 24, 2022 •

edited