Updated/refactored T5 DNN#304
Conversation
|
I notice a major slowdown in TCs. Is this understood/expected? |
|
Could you add a toggle from the command line for the DNN like we do with the caching allocator? For example "sdl_make_tracklooper -mcd" where -d is a toggle for the DNN or something like that. |
|
Somehow, by changing the SDL Makefile, the TC timing increase is fixed: This PR (measured July 14th, 2023 around 7:30am)Pre-DNN (measured July 14th, 2023 around 7:40am)I have also merged NeuralNetwork.cu with NeuralNetwork.cuh as requested in the last LST meeting. As for adding a toggle for the DNN in the SDL CLI, I have not worked on that yet. Do we need it? I do not know how trivial it is to add. |
|
For the toggle, it should only take a couple more lines of code. See how it's done for the other toggles we have: https://github.com/SegmentLinking/TrackLooper/blob/master/bin/sdl_make_tracklooper |
…h include back to Quintuplet.cuh
|
@VourMa or @slava77 Can I get your take on whether you think a toggle would be beneficial here? My concern is that after this gets pushed, the only way that someone would know how to turn the DNN off properly would be to go to this PR and find the two relevant flags that need to be turned on when you toggle off the DNN. I think more likely people will just make the mistake of turning off the DNN flag without turning on the -DUSE_RPHICHI2 flag, although I'm not sure how big of a difference this flag makes. Am I missing something here @jkguiang? A toggle would do this automatically and avoid potential confusion in the future, as well as making it less complicated for new users. And it should take less than 15 lines of code to accomplish this since we already make use of other toggles, right? |
VourMa
left a comment
There was a problem hiding this comment.
The PR looks good and the comments are generally minor. There are two open issues I see, beyond the current code changes:
- There has been a request for a command line toggle for the usage of the DNN within the code. Even though it is not mandatory, it could be of use, since dealing with more than one compilation flags for one operation gets quickly complicated and forgotten (hence my comment about the cleanup of the makefile that needs a separate PR to be brought up to date).
Adding a toggle should not be that hard: First, one creates a separate make target, as here, with the proper extra compilation flag enabled. Then, this make target is turned on using the command line argument like this. - In my opinion, the increase in timing is still there and it wasn't at all clear to me what happened in the meanwhile and 3ms went down to ~0.5ms. Could you please comment on the investigations that you did to understand this? I feel it is ok to take a sub-ms hit to the timing but we should know it and we should know why.
is the point to be sure that both are not used? (a "toggle") |
I think the correct behavior is that if the DNN is being used, -DUSE_RZCHI2 and -DUSE_T5_DNN should be turned on, and if the DNN is not being used then -DUSE_RZCHI2 and -DUSE_RPHICHI2 should be turned on. At least to keep it consistent with what cuts were being done before the DNN. A toggle would do this automatically by just including a -d (for example) or not when running sdl_make_tracklooper. So like sdl_make_tracklooper -mcd and there's no need to change the makefile yourself. |
If we add a make target as above, I believe we will just need to be careful about cases where we want more than one make target to be used, as I am guessing we will want to be able to use the DNN-toggling make targets alongside any of the make targets already defined. In any case, I agree adding a flag to |
You 're right. Then this is probably a better example? |
For this, I made only a brief mention of what I had done:
Expanding on this, I had originally put the flags in the Makefile as follows: With the Makefile configured as above, I got the strangely long runtimes. Then, I moved the flags: This somehow fixed the issue I was having. I have no idea why this changed the runtime. |
In the "slow" configuration: I am not even sure you included the flags in the compilation. You were changing the Line 58 in c58c36e If what I am saying makes sense, I am not even sure what you were running: no DNN, no RZCHI2 cut for the T5, I don't know what that does and how the validations came out fine. In the "fast" configuration: you are now correctly including the Line 58 in c58c36e Could you please fix the above issue and double check that we are loading the flags in all cases we want them loaded? |
|
I believe I have addressed the comments made so far.
Finally, per the last comment:
This concerned me as well, however I had verified several times that the flags were indeed being used. I know for a fact now from (3) that the flags are being toggled, and the plots do not change. We also know empirically that the flags were being used before because the fake rate changed significantly with respect to the baseline. Nevertheless, I do not understand how this was possible. |
|
I think that during today's meeting we agreed that my explanation above makes sense from a technical point of view as to why there was a timing increase, and your solution with the toggle covers the issue I had mentioned. I noticed two comments not addressed:
Let me know if you plan to address them. Thanks! |
|
I have moved the constants per (1). For (2), I have not used the profiler (as I have been focused on ML development), so I do not know exactly how to make the comparison. If it is a huge worry, I can just remove the extra variable declarations. Otherwise, I will not have time to look into how to use the profiler until later this week at the earliest. I have run the profiler (per the command on the wiki) and put the files here if someone else has time to look: http://uaf-10.t2.ucsd.edu/~jguiang/dump/PR304.nsys-rep |
I'd like to see ncu outputs to see code line details. |
I ran the following command on and put the output here: |
|
I don't see any register usage from the is_endcap variables. @slava77 |
👍 |
VourMa
left a comment
There was a problem hiding this comment.
Thank you all for following up on this. I am merging the PR.
| mdsInGPU.anchorEta[mdIndex3], // outer T3 anchor hit 4 eta (t3_0_eta) | ||
| mdsInGPU.anchorPhi[mdIndex3], // outer T3 anchor hit 4 phi (t3_0_phi) | ||
| mdsInGPU.anchorZ[mdIndex3], // outer T3 anchor hit 3 eta (t3_0_z) | ||
| sqrtf(x3*x3 + y3*y3), // outer T3 anchor hit 3 r (t3_0_r) | ||
| float(modulesInGPU.layers[lowerModuleIndex3] + 6*is_endcap3), // outer T3 anchor hit 3 layer (t3_0_layer) |
There was a problem hiding this comment.
@jkguiang
is this intentionally identical to L85-90?
we could've saved a bit on the number of weights, matrix operations, and having the network to learn identities.
Please check and perhaps keep a todo somewhere to possibly clean this up
There was a problem hiding this comment.
It was not a design decision, but it is something we could indeed clean up.

Summary
Made the following updates to the existing T5 DNN (#291):
ifdeftoggles for the chi-squared and DNN cuts:-DUSE_RZCHI2toggles the r-z chi-squared cut-DUSE_T5_DNNtoggles the T5 DNN cuts-DUSE_RPHICHI2toggles the r-phi chi-squared cuts-DUSE_RZCHI2and-DUSE_T5_DNNare toggled (see SDL/Makefile)T5DNN::runInferenceTiming
This PR (measured July 11th, 2023 around 9:30am PDT)
Pre-DNN (measured July 11th, 2023 around 9:40am PDT)