Skip to content
This repository was archived by the owner on Dec 9, 2024. It is now read-only.

Add T5 DNN#291

Merged
VourMa merged 8 commits intoSegmentLinking:masterfrom
jkguiang:t5-dnn
May 26, 2023
Merged

Add T5 DNN#291
VourMa merged 8 commits intoSegmentLinking:masterfrom
jkguiang:t5-dnn

Conversation

@jkguiang
Copy link
Copy Markdown
Contributor

@jkguiang jkguiang commented May 23, 2023

Summary

Added hard-coded T5 DNN to T5-building algorithm in place of the chi-squared cuts (see these slides for more info). The DNN gives a significant reduction in the T5 fake rate (particularly in the barrel) with only a slight change to the efficiency. This can be seen by comparing the efficiency plots in slide 8 (original master) and slide 9 (this PR) in the aforementioned slides. Finally, the current implementation of the DNN does significantly affect the overall runtime of the T5 building step (see the "Timing" section below).

Timing

This PR

Total Timing Summary
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Event      Short           Rate
   avg      1.6      2.0      1.4      3.3      6.1      1.4      1.3      1.0      2.1      20.3      17.3+/-  3.9      21.9   explicit_cache[s=1]
   avg      4.7      3.7      2.0      5.4      8.1      1.5      3.0      1.9      3.6      33.8      27.7+/-  6.4      19.5   explicit_cache[s=2]
   avg      5.5      5.4      3.3     10.5     12.8      1.7      7.0      3.3      6.3      55.8      48.5+/- 10.2      15.1   explicit_cache[s=4]
   avg     11.6      7.3      4.7     14.9     17.8      2.2     11.0      5.4      9.6      84.5      70.7+/- 15.8      15.1   explicit_cache[s=6]
   avg     17.1      8.7      5.9     20.0     24.0      2.8     15.1      7.5     12.9     114.1      94.2+/- 18.8      15.1   explicit_cache[s=8]

Original master

Total Timing Summary
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Event      Short           Rate
   avg      1.6      2.0      1.5      3.4      3.4      1.4      1.4      1.0      2.2      17.9      14.9+/-  3.5      19.1   explicit_cache[s=1]
   avg      3.4      3.1      1.8      4.9      4.6      1.5      2.7      1.7      3.5      27.2      22.3+/-  5.9      16.2   explicit_cache[s=2]
   avg      5.8      5.1      3.0      8.9      8.3      1.8      5.8      3.0      6.3      47.9      40.4+/-  7.8      12.9   explicit_cache[s=4]
   avg     10.8      6.3      3.9     12.2     11.6      2.1      8.7      4.7      9.1      69.5      56.5+/- 12.2      12.6   explicit_cache[s=6]
   avg     15.2      8.3      5.3     16.6     16.6      2.3     12.7      6.1     12.1      95.1      77.6+/- 11.1      12.3   explicit_cache[s=8]

@VourMa
Copy link
Copy Markdown
Contributor

VourMa commented May 23, 2023

Quick comment by just scrolling through the changes: could we add the hardcoded arrays in SDL/Constants.cu? I feel like it is much more appropriate to include them there rather than injecting them in the kernel.

Comment thread SDL/Quintuplet.cu Outdated

// (0): Linear(in_features=38, out_features=32, bias=True) => x = x*W_T + b
float bias_0[32] = {
-0.453155100345611572265625000000,-3.414395570755004882812500000000, 0.461477220058441162109375000000,-0.154097318649291992187500000000,-0.470577239990234375000000000000, 2.228851079940795898437500000000,-0.163815096020698547363281250000,-2.296737670898437500000000000000,-0.098085217177867889404296875000, 2.589857339859008789062500000000, 2.105398178100585937500000000000,-0.525056004524230957031250000000, 1.790417790412902832031250000000,-2.162935256958007812500000000000, 0.012643844820559024810791015625,-1.595005750656127929687500000000, 0.222240477800369262695312500000,-3.097779273986816406250000000000,-0.748212277889251708984375000000, 1.027869462966918945312500000000,-0.504925668239593505859375000000, 0.079979300498962402343750000000,-0.680445253849029541015625000000, 1.331984400749206542968750000000, 2.027832508087158203125000000000,-0.063643321394920349121093750000,-0.304876327514648437500000000000,-0.020376743748784065246582031250, 2.847964286804199218750000000000,-1.884062528610229492187500000000, 2.168398141860961914062500000000,-2.673130989074707031250000000000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for longer term, we should see how to load these from a file.
For this PR, I'd suggest to format the values in float precision to have a more compact code.

For the hackathon, I'm curious how much of this can be sped up by using half-precision and/or tensor operations

@GNiendorf
Copy link
Copy Markdown
Member

Quick comment by just scrolling through the changes: could we add the hardcoded arrays in SDL/Constants.cu? I feel like it is much more appropriate to include them there rather than injecting them in the kernel.

Could we keep them in their own separate (new) file maybe? Would make my life a lot easier when rebasing, and would probably be cleaner <3

@jkguiang
Copy link
Copy Markdown
Contributor Author

Just saw Gavin's comment, I will add another .cu file to contain the hard-coded weights and biases.

Comment thread SDL/NeuralNetworkWeights.cu Outdated
Comment thread SDL/Quintuplet.cu Outdated
Comment thread SDL/Quintuplet.cu Outdated
Comment thread SDL/Quintuplet.cu Outdated
mdsInGPU.anchorZ[fifthMDIndex], // outer (hit 5) t3_4_z
sqrtf(x5*x5 + y5*y5), // outer (hit 5) t3_4_r
float(modulesInGPU.layers[lowerModuleIndex5] + 6*is_endcap5), // outer (hit 5) t3_4_layer
log10((innerRadius+outerRadius)*3.8f*1.602f/(2*100*5.39f)), // t5_pt
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are those numbers? If they are constants, could they be added in the proper file and included here? If they are some temporary, magic numbers, could they be added in this file with a descriptive name (and possibly a descriptive comment), so that one can follow the calculations?

Comment thread SDL/Quintuplet.cu
Comment thread code/core/write_sdl_ntuple.cc Outdated
Comment thread SDL/Quintuplet.cu
Copy link
Copy Markdown
Contributor

@VourMa VourMa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! PR looks good to me, so I am approving.

I saw that the validation is included in the slides but am I ask to add a summary comment about the performance in PR description, so that one doesn't have to have through the full set of slides to understand the changes?

Same goes for the timing, which could also be improved in the slides by removing the "explicit" only results (which we don't usually look at), mentioning which timing corresponds to this PR and which to the master and explaining which numbers are used the ΔΤ5 column. Thanks!

@VourMa VourMa merged commit 3c0ba9a into SegmentLinking:master May 26, 2023
@GNiendorf
Copy link
Copy Markdown
Member

GNiendorf commented May 26, 2023

Is the SDL::passChiSquaredConstraint function no longer used now that this PR is merged in? If so, should we remove it from the code? Along those lines, is there any other code that can be removed from this (maybe some variables that no longer need to be calculated that were used for the previous code)?

@slava77
Copy link
Copy Markdown
Contributor

slava77 commented May 26, 2023

Is the SDL::passChiSquaredConstraint function no longer used now that this PR is merged in? If so, should we remove it from the code? Along those lines, is there any other code that can be removed from this (maybe some variables that no longer need to be calculated that were used for the previous code)?

there was also a point to get rid of all regression computation, since it's not really used (or not supposed to).

@VourMa
Copy link
Copy Markdown
Contributor

VourMa commented May 29, 2023

Timing on lnx7188 (V100) - DNN timing is not really worse here...

Before DNN:

Total Timing Summary
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Event      Short           Rate
   avg      4.4      3.2      4.1      3.2      3.3      1.2      1.9      1.4      3.6      26.1      20.5+/-  3.3      29.5   explicit_cache[s=1]
   avg      7.5      5.3      4.7      6.0      5.1      1.5      3.7      2.4      7.1      43.3      34.3+/-  6.4      25.8   explicit_cache[s=2]
   avg     16.3      8.6      5.9     12.8     10.0      2.1      8.3      4.9     13.5      82.4      63.9+/-  9.9      22.0   explicit_cache[s=4]
   avg     24.7     11.5      7.9     17.8     14.5      2.7     12.4      7.2     18.2     116.9      89.6+/- 17.7      22.0   explicit_cache[s=6]
   avg     37.0     15.2     10.2     22.0     20.3      3.5     18.1     10.3     24.8     161.3     120.8+/- 21.8      22.2   explicit_cache[s=8]

After DNN:

Total Timing Summary
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Event      Short           Rate
   avg      4.4      3.1      3.4      3.0      3.5      1.2      1.8      1.4      3.5      25.4      19.9+/-  3.5      29.2   explicit_cache[s=1]
   avg      7.0      4.8      4.0      5.6      4.9      1.4      3.4      2.1      6.1      39.3      30.9+/-  6.8      26.1   explicit_cache[s=2]
   avg     15.8      9.1      7.8     14.0     11.8      2.1      8.5      4.9     13.0      87.0      69.0+/- 11.0      23.5   explicit_cache[s=4]
   avg     24.7     11.8      8.0     17.5     14.1      2.7     13.5      7.6     17.5     117.5      90.1+/- 19.4      22.4   explicit_cache[s=6]
   avg     37.0     14.9      9.7     23.2     20.5      3.7     18.0     10.6     23.9     161.6     120.8+/- 22.0      21.9   explicit_cache[s=8]

@VourMa
Copy link
Copy Markdown
Contributor

VourMa commented May 29, 2023

DNN performance within CMSSW:
LST w/o DNN -> Blue
LST w/ DNN -> Red

PU200 sample: https://uaf-10.t2.ucsd.edu/~evourlio/SDL/DNNv1InCMSSW_LSTvsLSTDNNOnly/
Cube5 sample: https://uaf-10.t2.ucsd.edu/~evourlio/SDL/DNNv1InCMSSW_cube5_LSTvsLSTDNNOnly/

@slava77
Copy link
Copy Markdown
Contributor

slava77 commented May 30, 2023

Timing on lnx7188 (V100) - DNN timing is not really worse here...

could it be that the code updates during the review are also responsible?

@jkguiang
Copy link
Copy Markdown
Contributor Author

I re-ran sdl_timing PU200 today on cgpu-1 with a fresh clone of the current master branch:

Total Timing Summary
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Event      Short           Rate
   avg      1.7      2.0      1.5      3.4      3.7      1.4      1.4      1.0      2.1      18.1      15.1+/-  3.5      19.7   explicit_cache[s=1]
   avg      2.6      3.5      1.8      4.9      5.0      1.5      2.8      1.6      3.5      27.2      23.1+/-  4.9      14.3   explicit_cache[s=2]
   avg      4.9      5.2      3.1      9.3      8.9      1.7      6.4      3.2      6.1      48.8      42.1+/-  7.6      12.7   explicit_cache[s=4]
   avg      9.7      6.7      4.3     12.8     12.8      2.2      9.6      4.8      9.2      72.1      60.1+/- 12.0      12.7   explicit_cache[s=6]
   avg     16.9      9.1      6.3     18.5     18.2      2.6     13.3      6.7     12.1     103.6      84.1+/- 16.6      13.7   explicit_cache[s=8]

These seem more consistent with what Manos sees on the Cornell machines. I'm not sure what is different from when I ran the timing last time, but I'm not complaining 😄

@VourMa
Copy link
Copy Markdown
Contributor

VourMa commented May 30, 2023

I don't see any smoking gun in the updates for the decrease in the timing. @jkguiang could it be that you measured the timing for the master and the DNN-version at very different times?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants