Skip to content

feat: add transport kernel counters and summary#565

Merged
SeverinDiederichs merged 4 commits intoapt-sim:masterfrom
wdconinc:counters
Apr 17, 2026
Merged

feat: add transport kernel counters and summary#565
SeverinDiederichs merged 4 commits intoapt-sim:masterfrom
wdconinc:counters

Conversation

@wdconinc
Copy link
Copy Markdown
Contributor

As part of an effort to determine optimal running configs, I added some kernel exit state counters that are printed at the end. It's purely for debugging/tuning, but maybe it is useful. I was looking to make it return some heuristic suggestions.

It was NOT verified that this PR

  • [] Changes physics results
  • [] Does not change physics results

But this is unlikely to change anything.

@phsft-bot
Copy link
Copy Markdown

Can one of the admins verify this patch?

@wdconinc
Copy link
Copy Markdown
Contributor Author

@SeverinDiederichs This is another small tweak that I had been using to get more info from the DD4hep plugin. As someone just starting with this it was not always easy to determine why I got certain errors (in the original case: ran out of memory on my crappy GPU).

Copy link
Copy Markdown
Collaborator

@SeverinDiederichs SeverinDiederichs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wdconinc, thank you for the addition, I do think this can be quite helpful for users to navigate!
I left a few comments, please have a look

Comment thread include/AdePT/core/AsyncAdePTTransport.cuh Outdated
Comment thread include/AdePT/core/AsyncAdePTTransport.cuh Outdated
@wdconinc
Copy link
Copy Markdown
Contributor Author

Rebased, addressed reviewer feedback, and applied clang-format.

@wdconinc wdconinc marked this pull request as ready for review April 16, 2026 17:33
Copilot AI review requested due to automatic review settings April 16, 2026 17:33
@wdconinc wdconinc changed the title feat: add transport kernel counters [WIP] feat: add transport kernel counters and summary Apr 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds host-side transport-loop counters to help debug/tune AdePT GPU transport execution and prints a summary (plus heuristic “actions”) at shutdown when verbosity/debug level is enabled.

Changes:

  • Introduces a TransportLoopCounters struct to accumulate transport-loop stop/stall/flush/swap reasons.
  • Increments counters at key decision points in TransportLoop (leak extraction triggers, hit-buffer swaps, drained events).
  • Prints a transport-loop summary and some tuning hints at the end of the transport loop.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
include/AdePT/core/AsyncAdePTTransportStruct.cuh Adds TransportLoopCounters struct to hold host-side debug counters.
include/AdePT/core/AsyncAdePTTransport.cuh Wires counter increments into TransportLoop and prints a shutdown summary with suggested tuning actions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1371 to +1374
// Otherwise, the current leak queue usage is above the threshold. Transport needs to stop until the
// transfer of these leaks can start
printf("WARNING: Leak extraction blocked. Transport will stop until current extraction ends\n");
++counters.leakExtractionBlocked;
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leakExtractionBlocked is incremented inside the leak-extraction steering do { ... } while (leakExtractionNeeded); loop. When extraction is already in progress and the leak queue remains above threshold, this loop can spin multiple times before extractState returns to Idle, so the counter can overcount dramatically (counting spin iterations rather than distinct stall events). Consider incrementing only once per stall episode (e.g., on transition into the blocked condition) or at most once per outer transport iteration.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, the message would stay the same, if the leak extraction is blocked, the leak slots need to be increased.

Note that we are also trying to get rid of all of this mechanism, see #540

Comment on lines +1685 to +1688
if (swapByOccupancyHalf) ++counters.hitBufferSwapByOccupancy;
if (swapByOccupancy10k) ++counters.hitBufferSwapByOccupancy10k;
if (nextStepMightFail) ++counters.hitBufferSwapByPressure;
if (swapByEventFlush) ++counters.hitBufferSwapByEventFlush;
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hit-buffer swap reason counters are not mutually exclusive: a single swap can satisfy multiple conditions (e.g., occupancy >= half capacity and >= 10000, or occupancy and event flush), so the per-reason counters can sum to more than hitBufferSwaps and may trigger misleading heuristics later. If the intent is a breakdown of primary reasons, consider choosing a priority order (else-if) or tracking an explicit “dominant reason” per swap.

Suggested change
if (swapByOccupancyHalf) ++counters.hitBufferSwapByOccupancy;
if (swapByOccupancy10k) ++counters.hitBufferSwapByOccupancy10k;
if (nextStepMightFail) ++counters.hitBufferSwapByPressure;
if (swapByEventFlush) ++counters.hitBufferSwapByEventFlush;
if (swapByEventFlush) {
++counters.hitBufferSwapByEventFlush;
} else if (nextStepMightFail) {
++counters.hitBufferSwapByPressure;
} else if (swapByOccupancy10k) {
++counters.hitBufferSwapByOccupancy10k;
} else if (swapByOccupancyHalf) {
++counters.hitBufferSwapByOccupancy;
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is good to sort by priority here, as the swapByOccupancy10k is hit almost all the time.

In case there are many hits produced, we found it is most performant to try to get them out of the GPU as quickly as possible, so the CPU can process them as quickly as possible, to maximize the GPUHit throughput.

Thus, the 10k threshold is hit very easily by design. The other counters are then important to know how much pressure is on the system (e.g., nextStepMightFail), because only if that one is hit, the Hit slots must be increased.

" or increase HitBufferThreshold (safe up to 1 - 1/CPUCapacityFactor)\n";
if (counters.hitBufferSwapByOccupancy > 0 && counters.hitBufferSwapByOccupancy > counters.hitBufferSwapByEventFlush)
std::cerr
<< " ACTION: frequent occupancy-driven swaps -> increase MillionsOfHitSlots or HitBufferFlushThreshold\n";
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action hint mentions HitBufferFlushThreshold, but the exposed runtime parameter/command appears to be /adept/setHitBufferThreshold (HitBufferThreshold). Using a name that doesn’t exist makes the hint hard to act on; suggest updating this message to the actual knob name used by the configuration/messenger.

Suggested change
<< " ACTION: frequent occupancy-driven swaps -> increase MillionsOfHitSlots or HitBufferFlushThreshold\n";
<< " ACTION: frequent occupancy-driven swaps -> increase MillionsOfHitSlots or HitBufferThreshold\n";

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an inconsistency in the name of the variable and the exposed /adept/ UI command. Since DD4hep uses the C++ interface, I used that name here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is also historic and not that accurate anymore. It should be renamed to something like "CircularBufferFlushThreshold". I will do a cleanup of this when addressing #567 and then I can do a follow up in DD4hep as well.

Copy link
Copy Markdown
Collaborator

@SeverinDiederichs SeverinDiederichs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the addition!

Comment on lines 1371 to +1374
// Otherwise, the current leak queue usage is above the threshold. Transport needs to stop until the
// transfer of these leaks can start
printf("WARNING: Leak extraction blocked. Transport will stop until current extraction ends\n");
++counters.leakExtractionBlocked;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, the message would stay the same, if the leak extraction is blocked, the leak slots need to be increased.

Note that we are also trying to get rid of all of this mechanism, see #540

Comment on lines +1685 to +1688
if (swapByOccupancyHalf) ++counters.hitBufferSwapByOccupancy;
if (swapByOccupancy10k) ++counters.hitBufferSwapByOccupancy10k;
if (nextStepMightFail) ++counters.hitBufferSwapByPressure;
if (swapByEventFlush) ++counters.hitBufferSwapByEventFlush;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is good to sort by priority here, as the swapByOccupancy10k is hit almost all the time.

In case there are many hits produced, we found it is most performant to try to get them out of the GPU as quickly as possible, so the CPU can process them as quickly as possible, to maximize the GPUHit throughput.

Thus, the 10k threshold is hit very easily by design. The other counters are then important to know how much pressure is on the system (e.g., nextStepMightFail), because only if that one is hit, the Hit slots must be increased.

" or increase HitBufferThreshold (safe up to 1 - 1/CPUCapacityFactor)\n";
if (counters.hitBufferSwapByOccupancy > 0 && counters.hitBufferSwapByOccupancy > counters.hitBufferSwapByEventFlush)
std::cerr
<< " ACTION: frequent occupancy-driven swaps -> increase MillionsOfHitSlots or HitBufferFlushThreshold\n";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is also historic and not that accurate anymore. It should be renamed to something like "CircularBufferFlushThreshold". I will do a cleanup of this when addressing #567 and then I can do a follow up in DD4hep as well.

@SeverinDiederichs
Copy link
Copy Markdown
Collaborator

/run-test

@SeverinDiederichs SeverinDiederichs merged commit 45175e1 into apt-sim:master Apr 17, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants