Add T5 DNN#291
Conversation
|
Quick comment by just scrolling through the changes: could we add the hardcoded arrays in |
|
|
||
| // (0): Linear(in_features=38, out_features=32, bias=True) => x = x*W_T + b | ||
| float bias_0[32] = { | ||
| -0.453155100345611572265625000000,-3.414395570755004882812500000000, 0.461477220058441162109375000000,-0.154097318649291992187500000000,-0.470577239990234375000000000000, 2.228851079940795898437500000000,-0.163815096020698547363281250000,-2.296737670898437500000000000000,-0.098085217177867889404296875000, 2.589857339859008789062500000000, 2.105398178100585937500000000000,-0.525056004524230957031250000000, 1.790417790412902832031250000000,-2.162935256958007812500000000000, 0.012643844820559024810791015625,-1.595005750656127929687500000000, 0.222240477800369262695312500000,-3.097779273986816406250000000000,-0.748212277889251708984375000000, 1.027869462966918945312500000000,-0.504925668239593505859375000000, 0.079979300498962402343750000000,-0.680445253849029541015625000000, 1.331984400749206542968750000000, 2.027832508087158203125000000000,-0.063643321394920349121093750000,-0.304876327514648437500000000000,-0.020376743748784065246582031250, 2.847964286804199218750000000000,-1.884062528610229492187500000000, 2.168398141860961914062500000000,-2.673130989074707031250000000000 |
There was a problem hiding this comment.
for longer term, we should see how to load these from a file.
For this PR, I'd suggest to format the values in float precision to have a more compact code.
For the hackathon, I'm curious how much of this can be sped up by using half-precision and/or tensor operations
Could we keep them in their own separate (new) file maybe? Would make my life a lot easier when rebasing, and would probably be cleaner <3 |
|
Just saw Gavin's comment, I will add another .cu file to contain the hard-coded weights and biases. |
| mdsInGPU.anchorZ[fifthMDIndex], // outer (hit 5) t3_4_z | ||
| sqrtf(x5*x5 + y5*y5), // outer (hit 5) t3_4_r | ||
| float(modulesInGPU.layers[lowerModuleIndex5] + 6*is_endcap5), // outer (hit 5) t3_4_layer | ||
| log10((innerRadius+outerRadius)*3.8f*1.602f/(2*100*5.39f)), // t5_pt |
There was a problem hiding this comment.
What are those numbers? If they are constants, could they be added in the proper file and included here? If they are some temporary, magic numbers, could they be added in this file with a descriptive name (and possibly a descriptive comment), so that one can follow the calculations?
VourMa
left a comment
There was a problem hiding this comment.
Thanks for the updates! PR looks good to me, so I am approving.
I saw that the validation is included in the slides but am I ask to add a summary comment about the performance in PR description, so that one doesn't have to have through the full set of slides to understand the changes?
Same goes for the timing, which could also be improved in the slides by removing the "explicit" only results (which we don't usually look at), mentioning which timing corresponds to this PR and which to the master and explaining which numbers are used the ΔΤ5 column. Thanks!
|
Is the SDL::passChiSquaredConstraint function no longer used now that this PR is merged in? If so, should we remove it from the code? Along those lines, is there any other code that can be removed from this (maybe some variables that no longer need to be calculated that were used for the previous code)? |
there was also a point to get rid of all regression computation, since it's not really used (or not supposed to). |
|
Timing on lnx7188 (V100) - DNN timing is not really worse here... Before DNN: After DNN: |
|
DNN performance within CMSSW: PU200 sample: https://uaf-10.t2.ucsd.edu/~evourlio/SDL/DNNv1InCMSSW_LSTvsLSTDNNOnly/ |
could it be that the code updates during the review are also responsible? |
|
I re-ran These seem more consistent with what Manos sees on the Cornell machines. I'm not sure what is different from when I ran the timing last time, but I'm not complaining 😄 |
|
I don't see any smoking gun in the updates for the decrease in the timing. @jkguiang could it be that you measured the timing for the master and the DNN-version at very different times? |
Summary
Added hard-coded T5 DNN to T5-building algorithm in place of the chi-squared cuts (see these slides for more info). The DNN gives a significant reduction in the T5 fake rate (particularly in the barrel) with only a slight change to the efficiency. This can be seen by comparing the efficiency plots in slide 8 (original master) and slide 9 (this PR) in the aforementioned slides. Finally, the current implementation of the DNN does significantly affect the overall runtime of the T5 building step (see the "Timing" section below).
Timing
This PR
Original master