Skip to content
This repository was archived by the owner on Dec 9, 2024. It is now read-only.

Conversation

@VourMa
Copy link
Contributor

@VourMa VourMa commented Jun 30, 2023

Equivalent to PR #298 but to the master branch. This PR has been tested on the V100 of lnx7188.

Timing
This PR (a5de4d4):

Total Timing Summary
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Event      Short           Rate
   avg      4.5      3.1      1.0      3.1      3.4      1.2      1.8      1.4      3.4      23.0      17.2+/-  3.0      26.9   explicit_cache[s=1]
   avg      7.1      5.0      1.5      5.5      4.8      1.5      3.6      2.4      6.0      37.4      28.8+/-  5.9      22.4   explicit_cache[s=2]
   avg     14.1      8.2      3.2     11.2      9.2      2.1      7.9      4.7     12.1      72.6      56.4+/-  9.9      20.1   explicit_cache[s=4]
   avg     22.8     10.2      4.5     15.0     12.5      2.5     11.8      7.3     16.9     103.4      78.1+/- 13.4      19.2   explicit_cache[s=6]
   avg     32.8     12.5      5.5     19.1     17.4      3.1     16.0      9.8     22.2     138.5     102.5+/- 20.1      19.2   explicit_cache[s=8]

Master (b498de8):

Total Timing Summary
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Event      Short           Rate
   avg      4.4      3.0      3.5      3.1      3.4      1.2      1.7      1.3      3.4      25.1      19.5+/-  3.2      27.3   explicit_cache[s=1]
   avg      7.2      5.2      4.3      5.9      5.3      1.5      3.7      2.3      6.7      42.1      33.4+/-  6.2      25.0   explicit_cache[s=2]
   avg     15.3      8.7      6.4     11.9     10.0      2.2      8.1      4.8     13.0      80.4      62.9+/- 10.4      22.0   explicit_cache[s=4]
   avg     24.6     11.9      8.1     17.1     15.0      2.9     13.4      7.5     18.0     118.6      91.1+/- 15.7      21.5   explicit_cache[s=6]
   avg     36.0     15.0     10.8     24.2     21.3      3.5     18.2     10.5     24.3     163.8     124.3+/- 20.0      22.1   explicit_cache[s=8]

Timing (also confirmed by the profiler timing) is cut down by almost 2/3. Part of the large timing reduction should be coming from the fact that the registers are reduced, so that we can increase the theoretical occupancy by 20%, and the achieved occupancy also goes up by 12.75%.

Profiler reports
This PR (a5de4d4) - in blue- with master (b498de8) - in green - comparison in parenthesis:
image

image

image

Validation plots

@VourMa VourMa requested a review from slava77 June 30, 2023 16:53
@slava77
Copy link
Contributor

slava77 commented Jun 30, 2023

Validation plots

is there a "before" link? I find it hard to evaluate the validation plots

@VourMa VourMa changed the title Ls optimization early stop simplify module gap size and is tighter tilted modules LS optimization: early stop, const arrays in moduleGapSize, simpler logic in isTighterTiltedModules (master version) Jun 30, 2023
@VourMa
Copy link
Contributor Author

VourMa commented Jun 30, 2023

is there a "before" link? I find it hard to evaluate the validation plots

No as of now, since the PR is purely technical and my only worry was that we didn't get anything. I can produce it though.

@slava77
Copy link
Contributor

slava77 commented Jun 30, 2023

my only worry was that we didn't get anything.

OK, fair enough.

@slava77 slava77 merged commit 0b3ff54 into master Jun 30, 2023
@ariostas ariostas deleted the LSOptimization_earlyStop_simplifyModuleGapSizeAndIsTighterTiltedModules branch May 8, 2024 21:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants