Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize Fishbone #37398

Merged
merged 9 commits into from Apr 3, 2022
Merged

Stabilize Fishbone #37398

merged 9 commits into from Apr 3, 2022

Conversation

VinInn
Copy link
Contributor

@VinInn VinInn commented Mar 29, 2022

The fishbone has been made deterministic and order independent.
The main change is (of course) removing the break in the combinatorial double loop.

results:

Timing:

negligible effect.

break
T4:
   846.7 ±   0.7 ev/s (4900 events)
   831.0 ±   1.5 ev/s (4900 events)
   816.4 ±   1.3 ev/s (4900 events)
   817.1 ±   1.9 ev/s (4900 events)
 --------------------
   827.8 ±  14.3 ev/s
A10:
  1126.9 ±   7.9 ev/s (4900 events)
  1100.6 ±   7.7 ev/s (4900 events)
  1126.8 ±   7.6 ev/s (4900 events)
  1103.4 ±   6.2 ev/s (4900 events)
 --------------------
  1114.4 ±  14.4 ev/s


no break
T4:
   822.0 ±   2.0 ev/s (4900 events)
   827.6 ±   1.7 ev/s (4900 events)
   822.3 ±   1.6 ev/s (4900 events)
   827.1 ±   2.1 ev/s (4900 events)
 --------------------
   824.7 ±   3.0 ev/s
A10:
  1187.3 ±   2.7 ev/s (4900 events)
  1121.4 ±   7.8 ev/s (4900 events)
  1132.4 ±   2.2 ev/s (4900 events)
  1096.5 ±   7.1 ev/s (4900 events)
 --------------------
  1134.4 ±  38.3 ev/s

MTV

slight increase of duplicate (as more fishbone cells are created) for Loose tracks. no effect on HP
http://innocent.home.cern.ch/innocent/RelVal/gpuMTVstableFB/

detailed comparison of reproducibility

counters on 1000 TTBAR events

break
||Counters | nEvents | nHits | nCells | nTuples | nFitTacks  |  nLooseTracks  |  nGoodTracks | nUsedHits | nDupHits | nFishCells | nKilledCells | nUsedCells | nZeroTrackCells ||
Counters Raw 1000 11389676 88651895 3908050 1368682 857640 2662036 4878812 3195116 434118 793740 6774466 13927606
Counters Norm 1000 ||  11389.7|  88651.9|  3908.1|  2662.0|  1368.7|  857.6|  4878.8|  3195.1|  0.005|  0.009|  0.076|  0.157||
--
Counters Raw 1000 11389676 88651895 3908121 1368668 857630 2662045 4878809 3195114 434125 793727 6774466 13927643
Counters Norm 1000 ||  11389.7|  88651.9|  3908.1|  2662.0|  1368.7|  857.6|  4878.8|  3195.1|  0.005|  0.009|  0.076|  0.157||


no break
|Counters | nEvents | nHits | nCells | nTuples | nFitTacks  |  nLooseTracks  |  nGoodTracks | nUsedHits | nDupHits | nFishCells | nKilledCells | nUsedCells | nZeroTrackCells ||
Counters Raw 1000 11389676 88651895 3906666 1368589 857445 2661798 4878546 3194513 436707 794913 6774466 13928968
Counters Norm 1000 ||  11389.7|  88651.9|  3906.7|  2661.8|  1368.6|  857.4|  4878.5|  3194.5|  0.005|  0.009|  0.076|  0.157|
--
Counters Raw 1000 11389676 88651895 3906666 1368607 857464 2661798 4878546 3194513 436707 794913 6774466 13928968
Counters Norm 1000 ||  11389.7|  88651.9|  3906.7|  2661.8|  1368.6|  857.5|  4878.5|  3194.5|  0.005|  0.009|  0.076|  0.157||
--
Counters Raw 1000 11389676 88651895 3906666 1368608 857449 2661798 4878546 3194513 436707 794913 6774466 13928968
Counters Norm 1000 ||  11389.7|  88651.9|  3906.7|  2661.8|  1368.6|  857.4|  4878.5|  3194.5|  0.005|  0.009|  0.076|  0.157||

dump of 10 events

cat doDumpTK
setenv CUDA_VISIBLE_DEVICES 1
cmsRun gpuDebug.py > & bta.log
cmsRun gpuDebug.py > & btb.log
cmsRun gpuDebug.py > & btc.log

grep TK bta.log | cut -d ' ' -f 3-100 | sort -g -r > bta.txt
grep TK btb.log | cut -d ' ' -f 3-100 | sort -g -r > btb.txt
grep TK btc.log | cut -d ' ' -f 3-100 | sort -g -r > btc.txt
tail -n 4 bt*.log
wc bt*.txt
diff bta.txt btb.txt | egrep "<|>" | wc
diff bta.txt btc.txt | egrep "<|>" | wc
diff btc.txt btb.txt | egrep "<|>" | wc
no break
[innocent@patatrack02 ttbar2021]$ source doDumpTK
==> ta.log <==
dropped waiting message count 0
||Counters | nEvents | nHits | nCells | nTuples | nFitTacks  |  nLooseTracks  |  nGoodTracks | nUsedHits | nDupHits | nFishCells | nKilledCells | nUsedCells | nZeroTrackCells ||
Counters Raw 10 121715 1020413 46187 32977 16805 9177 54165 36414 4848 8821 80227 141874
Counters Norm 10 ||  12171.5|  102041.3|  4618.7|  3297.7|  1680.5|  917.7|  5416.5|  3641.4|  0.005|  0.009|  0.079|  0.139||

==> tb.log <==
dropped waiting message count 0
||Counters | nEvents | nHits | nCells | nTuples | nFitTacks  |  nLooseTracks  |  nGoodTracks | nUsedHits | nDupHits | nFishCells | nKilledCells | nUsedCells | nZeroTrackCells ||
Counters Raw 10 121715 1020413 46187 32977 16806 9178 54165 36414 4848 8821 80227 141874
Counters Norm 10 ||  12171.5|  102041.3|  4618.7|  3297.7|  1680.6|  917.8|  5416.5|  3641.4|  0.005|  0.009|  0.079|  0.139||

==> tc.log <==
dropped waiting message count 0
||Counters | nEvents | nHits | nCells | nTuples | nFitTacks  |  nLooseTracks  |  nGoodTracks | nUsedHits | nDupHits | nFishCells | nKilledCells | nUsedCells | nZeroTrackCells ||
Counters Raw 10 121715 1020413 46187 32977 16805 9177 54165 36414 4848 8821 80227 141874
Counters Norm 10 ||  12171.5|  102041.3|  4618.7|  3297.7|  1680.5|  917.7|  5416.5|  3641.4|  0.005|  0.009|  0.079|  0.139||
  16807  285717 1975148 ta.txt
  16808  285734 1975272 tb.txt
  16807  285717 1975169 tc.txt
  50422  857168 5925589 total
     71     700    4875
     46     454    3179
     54     513    3563

break
[innocent@patatrack02 ttbar2021]$ source doDumpTK
==> bta.log <==
dropped waiting message count 0
||Counters | nEvents | nHits | nCells | nTuples | nFitTacks  |  nLooseTracks  |  nGoodTracks | nUsedHits | nDupHits | nFishCells | nKilledCells | nUsedCells | nZeroTrackCells ||
Counters Raw 10 121715 1020413 46199 32978 16813 9178 54169 36415 4813 8806 80227 141851
Counters Norm 10 ||  12171.5|  102041.3|  4619.9|  3297.8|  1681.3|  917.8|  5416.9|  3641.5|  0.005|  0.009|  0.079|  0.139||

==> btb.log <==
dropped waiting message count 0
||Counters | nEvents | nHits | nCells | nTuples | nFitTacks  |  nLooseTracks  |  nGoodTracks | nUsedHits | nDupHits | nFishCells | nKilledCells | nUsedCells | nZeroTrackCells ||
Counters Raw 10 121715 1020413 46205 32979 16808 9180 54169 36421 4810 8805 80227 141848
Counters Norm 10 ||  12171.5|  102041.3|  4620.5|  3297.9|  1680.8|  918.0|  5416.9|  3642.1|  0.005|  0.009|  0.079|  0.139||

==> btc.log <==
dropped waiting message count 0
||Counters | nEvents | nHits | nCells | nTuples | nFitTacks  |  nLooseTracks  |  nGoodTracks | nUsedHits | nDupHits | nFishCells | nKilledCells | nUsedCells | nZeroTrackCells ||
Counters Raw 10 121715 1020413 46202 32978 16807 9181 54170 36420 4807 8806 80227 141849
Counters Norm 10 ||  12171.5|  102041.3|  4620.2|  3297.8|  1680.7|  918.1|  5417.0|  3642.0|  0.005|  0.009|  0.079|  0.139||
  16815  285853 1976068 bta.txt
  16810  285768 1975451 btb.txt
  16809  285751 1975331 btc.txt
  50434  857372 5926850 total
    204    1955   13446
    251    2393   16459
    248    2373   16396

In particular with this PR (no break) comparing 10 events at track level
NO difference in HP quadruplets, only one HP triplet differs.
the rest are loose triplets (and a couple of loose quadruplets)
REMINDER:
loose tracks are in the collection ONLY to be used for seeding or for algorithms that perform some sort of pre-cleaning
(often a simple association to trimmed-vertices is enough).

Making the various ambiguity solvers deterministic and order independent will be much harder and costly

@VinInn
Copy link
Contributor Author

VinInn commented Mar 29, 2022

@cmsbuild , please test

@VinInn
Copy link
Contributor Author

VinInn commented Mar 29, 2022

enable gpu

@VinInn
Copy link
Contributor Author

VinInn commented Mar 29, 2022

@fwyzard @silviodonato

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37398/29058

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @VinInn (Vincenzo Innocente) for master.

It involves the following packages:

  • RecoPixelVertexing/PixelTriplets (reconstruction)

@jpata, @clacaputo, @slava77 can you please review it and eventually sign? Thanks.
@felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @mmusich, @mtosi, @dgulhan this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor

fwyzard commented Mar 29, 2022

+1

@fwyzard
Copy link
Contributor

fwyzard commented Mar 29, 2022

thanks @VinInn

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e71f31/23495/summary.html
COMMIT: d65010f
CMSSW: CMSSW_12_4_X_2022-03-29-1100/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37398/23495/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19874
  • DQMHistoTests: Total failures: 3479
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 16395
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3585896
  • DQMHistoTests: Total failures: 3840
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3582033
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 200 log files, 45 edm output root files, 48 DQM output files
  • TriggerResults: found differences in 1 / 47 workflows

@silviodonato
Copy link
Contributor

thanks @VinInn

@VinInn
Copy link
Contributor Author

VinInn commented Apr 2, 2022

not sure who is waiting what. I understood this was considered urgent by HLT.

@clacaputo
Copy link
Contributor

not sure who is waiting what. I understood this was considered urgent by HLT.

Hello @VinInn , just following the incoming PRs in chronological order. Please, if a PR is urgent tag it as urgent, so its priority can be adjusted accordingly

@clacaputo
Copy link
Contributor

+reconstruction

  • Fishbone has been made deterministic and order-independent
  • minor reco differences in 11634.506, in line with the PR content

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 2, 2022

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented Apr 3, 2022

+1

@cmsbuild cmsbuild merged commit 6f69477 into cms-sw:master Apr 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants