Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecoTracker/MkFitCore: factorize loops to allow gcc -msse3 to vectorize loops #37868

Merged
merged 5 commits into from
May 30, 2022

Conversation

gartung
Copy link
Member

@gartung gartung commented May 9, 2022

See discussion in trackreco#93

Split large loops in PropagationMPlex into smaller loops, converting loop temporary floats into function temporary arrays of floats. This allows gcc -msse3 to convert loops with floating point operations into simd instructions.

Under EL8 use system provided mvec which implements vectorized trig functions. Under sl7, Intel's svml library must be linked to get the same performance.

Below is quoted the data from running the stand alone mkFit executable. This shows the improvements from this PR and the improvements from using AVX and AVX2 instructions sets as well as the improvements from using libmvec on el8.

No changes on sl7 gcc -msse3

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 29.63152
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 29.52981
Total event loop time 46.68341 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0

No changes on sl7 gcc -mavx

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 31.67810
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 31.56959
Total event loop time 48.90405 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0

No changes on sl7 gcc -mavx2 -mfma

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 29.00773
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 28.90690
Total event loop time 45.81278 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0
================================================================

With changes on sl7 gcc -msse3 -ffastmath -mveclibabi=svml -lsvml

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 21.65558
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 21.58022
Total event loop time 37.01423 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0
================================================================

With changes on sl7 gcc -mavx -ffastmath -mveclibabi=svml -lsvml

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 18.64836
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 18.58693
Total event loop time 33.58694 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0
================================================================

With changes on sl7 gcc -mavx2 -mfma

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 24.14937
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 24.06449
Total event loop time 40.26953 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0

With changes on sl7 gcc -mavx2 -mfma -ffastmath

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 21.65153
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 21.58067
Total event loop time 36.93315 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0

With changes on sl7 gcc -mavx2 -mfma -ffastmath -mveclibabi=svml -lsvml

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 17.75612
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 17.69385
Total event loop time 32.37217 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0
================================================================

With changes on el8 gcc -mavx2 -mfma -ffastmath

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 17.16427
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 17.10306
Total event loop time 31.63744 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0

With changes on el8 gcc -mavx -ffastmath

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 17.10759
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 17.04684
Total event loop time 31.47563 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0
================================================================

With changes on el8 gcc -msse3 -ffastmath

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 21.45296
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 21.37855
Total event loop time 36.65969 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0
================================================================

With changes on el8 gcc -msse3 no fast-math no attribute(math-errno)

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 23.91457
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 23.83136
Total event loop time 39.58513 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0

No changes on slc7 icc -mAVX2 which adds -lsvml

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 14.38005
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 14.33141
Total event loop time 28.65396 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0

With changes on slc7 icc -mAVX2 which adds -lsvml

================================================================
=== TOTAL for 100 events
================================================================
Total Matriplex fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 15.08185
Total event > 1 fit = 0.00000  --- Build  BHMX = 0.00000  STDMX = 0.00000  CEMX = 0.00000  MIMI = 15.03019
Total event loop time 29.54984 simtracks 787323 seedtracks 2933807 builtcands 0 maxhits 4651 on lay 0

@cmsbuild
Copy link
Contributor

cmsbuild commented May 9, 2022

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37868/29816

ERROR: Build errors found during clang-tidy run.

@gartung gartung force-pushed the gartung-mkFit-sse3-vectorize branch 7 times, most recently from a0fab6a to 4fe14c0 Compare May 9, 2022 16:55
@cmsbuild
Copy link
Contributor

cmsbuild commented May 9, 2022

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37868/29831

ERROR: Build errors found during clang-tidy run.

RecoTracker/MkFitCore/src/PropagationMPlex.icc:125:14: error: variable-sized object may not be initialized [clang-diagnostic-error]
  float dDdx[nmax-nmin]={0.};
             ^
RecoTracker/MkFitCore/src/PropagationMPlex.icc:126:14: error: variable-sized object may not be initialized [clang-diagnostic-error]
  float dDdy[nmax-nmin]={0.};
             ^
RecoTracker/MkFitCore/src/PropagationMPlex.icc:127:16: error: variable-sized object may not be initialized [clang-diagnostic-error]
  float dDdipt[nmax-nmin]={0.};
               ^
RecoTracker/MkFitCore/src/PropagationMPlex.icc:128:16: error: variable-sized object may not be initialized [clang-diagnostic-error]
  float dDdphi[nmax-nmin]={0.};
               ^
Suppressed 377 warnings (376 in non-user code, 1 with check filters).
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@cmsbuild
Copy link
Contributor

cmsbuild commented May 9, 2022

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37868/29833

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@gartung
Copy link
Member Author

gartung commented May 25, 2022

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5bd708/24964/summary.html
COMMIT: 3f9b1e1
CMSSW: CMSSW_12_5_X_2022-05-24-2300/el8_amd64_gcc10
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37868/24964/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 57 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3650985
  • DQMHistoTests: Total failures: 1108
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3649855
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

+core

@clacaputo
Copy link
Contributor

+reconstruction

  • re-sign

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

  • Technical PR
  • The differences, very small indeed, had been evaluated as being at the numerical level by the reviewer

@cmsbuild cmsbuild merged commit cfaa87f into cms-sw:master May 30, 2022
@mmusich
Copy link
Contributor

mmusich commented May 30, 2022

+1
Technical PR
The differences, very small indeed, had been evaluated as being at the numerical level by the reviewer.

Does this qualify for a backport on the production release for high intensity pp collisions?

@perrotta
Copy link
Contributor

+1
Technical PR
The differences, very small indeed, had been evaluated as being at the numerical level by the reviewer.

Does this qualify for a backport on the production release for high intensity pp collisions?

At the first order I would say yes.
Then. if there is a slighlty larger stat multitrack validation with it, that could help removing all doubts

@gartung
Copy link
Member Author

gartung commented Jun 6, 2022

please backport

@VinInn
Copy link
Contributor

VinInn commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet