feat: add transport kernel counters and summary#565
feat: add transport kernel counters and summary#565SeverinDiederichs merged 4 commits intoapt-sim:masterfrom
Conversation
|
Can one of the admins verify this patch? |
|
@SeverinDiederichs This is another small tweak that I had been using to get more info from the DD4hep plugin. As someone just starting with this it was not always easy to determine why I got certain errors (in the original case: ran out of memory on my crappy GPU). |
SeverinDiederichs
left a comment
There was a problem hiding this comment.
Hi @wdconinc, thank you for the addition, I do think this can be quite helpful for users to navigate!
I left a few comments, please have a look
|
Rebased, addressed reviewer feedback, and applied clang-format. |
There was a problem hiding this comment.
Pull request overview
Adds host-side transport-loop counters to help debug/tune AdePT GPU transport execution and prints a summary (plus heuristic “actions”) at shutdown when verbosity/debug level is enabled.
Changes:
- Introduces a
TransportLoopCountersstruct to accumulate transport-loop stop/stall/flush/swap reasons. - Increments counters at key decision points in
TransportLoop(leak extraction triggers, hit-buffer swaps, drained events). - Prints a transport-loop summary and some tuning hints at the end of the transport loop.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| include/AdePT/core/AsyncAdePTTransportStruct.cuh | Adds TransportLoopCounters struct to hold host-side debug counters. |
| include/AdePT/core/AsyncAdePTTransport.cuh | Wires counter increments into TransportLoop and prints a shutdown summary with suggested tuning actions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Otherwise, the current leak queue usage is above the threshold. Transport needs to stop until the | ||
| // transfer of these leaks can start | ||
| printf("WARNING: Leak extraction blocked. Transport will stop until current extraction ends\n"); | ||
| ++counters.leakExtractionBlocked; |
There was a problem hiding this comment.
leakExtractionBlocked is incremented inside the leak-extraction steering do { ... } while (leakExtractionNeeded); loop. When extraction is already in progress and the leak queue remains above threshold, this loop can spin multiple times before extractState returns to Idle, so the counter can overcount dramatically (counting spin iterations rather than distinct stall events). Consider incrementing only once per stall episode (e.g., on transition into the blocked condition) or at most once per outer transport iteration.
There was a problem hiding this comment.
This is true, the message would stay the same, if the leak extraction is blocked, the leak slots need to be increased.
Note that we are also trying to get rid of all of this mechanism, see #540
| if (swapByOccupancyHalf) ++counters.hitBufferSwapByOccupancy; | ||
| if (swapByOccupancy10k) ++counters.hitBufferSwapByOccupancy10k; | ||
| if (nextStepMightFail) ++counters.hitBufferSwapByPressure; | ||
| if (swapByEventFlush) ++counters.hitBufferSwapByEventFlush; |
There was a problem hiding this comment.
The hit-buffer swap reason counters are not mutually exclusive: a single swap can satisfy multiple conditions (e.g., occupancy >= half capacity and >= 10000, or occupancy and event flush), so the per-reason counters can sum to more than hitBufferSwaps and may trigger misleading heuristics later. If the intent is a breakdown of primary reasons, consider choosing a priority order (else-if) or tracking an explicit “dominant reason” per swap.
| if (swapByOccupancyHalf) ++counters.hitBufferSwapByOccupancy; | |
| if (swapByOccupancy10k) ++counters.hitBufferSwapByOccupancy10k; | |
| if (nextStepMightFail) ++counters.hitBufferSwapByPressure; | |
| if (swapByEventFlush) ++counters.hitBufferSwapByEventFlush; | |
| if (swapByEventFlush) { | |
| ++counters.hitBufferSwapByEventFlush; | |
| } else if (nextStepMightFail) { | |
| ++counters.hitBufferSwapByPressure; | |
| } else if (swapByOccupancy10k) { | |
| ++counters.hitBufferSwapByOccupancy10k; | |
| } else if (swapByOccupancyHalf) { | |
| ++counters.hitBufferSwapByOccupancy; | |
| } |
There was a problem hiding this comment.
I don't think it is good to sort by priority here, as the swapByOccupancy10k is hit almost all the time.
In case there are many hits produced, we found it is most performant to try to get them out of the GPU as quickly as possible, so the CPU can process them as quickly as possible, to maximize the GPUHit throughput.
Thus, the 10k threshold is hit very easily by design. The other counters are then important to know how much pressure is on the system (e.g., nextStepMightFail), because only if that one is hit, the Hit slots must be increased.
| " or increase HitBufferThreshold (safe up to 1 - 1/CPUCapacityFactor)\n"; | ||
| if (counters.hitBufferSwapByOccupancy > 0 && counters.hitBufferSwapByOccupancy > counters.hitBufferSwapByEventFlush) | ||
| std::cerr | ||
| << " ACTION: frequent occupancy-driven swaps -> increase MillionsOfHitSlots or HitBufferFlushThreshold\n"; |
There was a problem hiding this comment.
The action hint mentions HitBufferFlushThreshold, but the exposed runtime parameter/command appears to be /adept/setHitBufferThreshold (HitBufferThreshold). Using a name that doesn’t exist makes the hint hard to act on; suggest updating this message to the actual knob name used by the configuration/messenger.
| << " ACTION: frequent occupancy-driven swaps -> increase MillionsOfHitSlots or HitBufferFlushThreshold\n"; | |
| << " ACTION: frequent occupancy-driven swaps -> increase MillionsOfHitSlots or HitBufferThreshold\n"; |
There was a problem hiding this comment.
This is an inconsistency in the name of the variable and the exposed /adept/ UI command. Since DD4hep uses the C++ interface, I used that name here.
There was a problem hiding this comment.
The name is also historic and not that accurate anymore. It should be renamed to something like "CircularBufferFlushThreshold". I will do a cleanup of this when addressing #567 and then I can do a follow up in DD4hep as well.
SeverinDiederichs
left a comment
There was a problem hiding this comment.
Thank you for the addition!
| // Otherwise, the current leak queue usage is above the threshold. Transport needs to stop until the | ||
| // transfer of these leaks can start | ||
| printf("WARNING: Leak extraction blocked. Transport will stop until current extraction ends\n"); | ||
| ++counters.leakExtractionBlocked; |
There was a problem hiding this comment.
This is true, the message would stay the same, if the leak extraction is blocked, the leak slots need to be increased.
Note that we are also trying to get rid of all of this mechanism, see #540
| if (swapByOccupancyHalf) ++counters.hitBufferSwapByOccupancy; | ||
| if (swapByOccupancy10k) ++counters.hitBufferSwapByOccupancy10k; | ||
| if (nextStepMightFail) ++counters.hitBufferSwapByPressure; | ||
| if (swapByEventFlush) ++counters.hitBufferSwapByEventFlush; |
There was a problem hiding this comment.
I don't think it is good to sort by priority here, as the swapByOccupancy10k is hit almost all the time.
In case there are many hits produced, we found it is most performant to try to get them out of the GPU as quickly as possible, so the CPU can process them as quickly as possible, to maximize the GPUHit throughput.
Thus, the 10k threshold is hit very easily by design. The other counters are then important to know how much pressure is on the system (e.g., nextStepMightFail), because only if that one is hit, the Hit slots must be increased.
| " or increase HitBufferThreshold (safe up to 1 - 1/CPUCapacityFactor)\n"; | ||
| if (counters.hitBufferSwapByOccupancy > 0 && counters.hitBufferSwapByOccupancy > counters.hitBufferSwapByEventFlush) | ||
| std::cerr | ||
| << " ACTION: frequent occupancy-driven swaps -> increase MillionsOfHitSlots or HitBufferFlushThreshold\n"; |
There was a problem hiding this comment.
The name is also historic and not that accurate anymore. It should be renamed to something like "CircularBufferFlushThreshold". I will do a cleanup of this when addressing #567 and then I can do a follow up in DD4hep as well.
|
/run-test |
As part of an effort to determine optimal running configs, I added some kernel exit state counters that are printed at the end. It's purely for debugging/tuning, but maybe it is useful. I was looking to make it return some heuristic suggestions.
It was NOT verified that this PR
But this is unlikely to change anything.