Skip to content

Conversation

@abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Oct 18, 2025

Clarify the compactTask/availableSlot/count metric description.

The formula to compute available compact task slots in a run is: availableCompactionTaskSlots = Math.max(0, compactionTaskCapacity - busyCompactionTaskSlots). Often times, it's confusing why compactTask/availableSlot/count is lower than expected. This happens because the value is capped by maxNumConcurrentSubTasks, and the current metric description can be slightly misleading.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.

This adds the compactTask/busySlot/count metric directly, rather than indirectly deducing it through the
formula for available task slots and maximum task slots and taking their minimum.

The formula to compute available compact task slots in a run is:
availableCompactionTaskSlots = Math.max(0, compactionTaskCapacity - busyCompactionTaskSlots);

Often times, it's confusing why compactTask/availableSlot/count is lower than expected.
It turns out that the compact duty using the native engine just caps it using maxNumConcurrentSubTasks,
regardless of the phase a current compact supervisor is in. This is likely the safest and conservative thing to do.

This metric should help operators better plan for compaction task slots in a MM-based setup.
@kfaraz
Copy link
Contributor

kfaraz commented Oct 18, 2025

It turns out that the compact duty using the native engine just caps it using maxNumConcurrentSubTasks, regardless of the phase a current compact supervisor is in.

@abhishekrb19 , I am not sure how the busyCompactionTaskSlots being emitted in this PR will avoid this problem,
since the value we are emitting is effectively just maxSlots - availableSlots. Could you please clarify this?

To determine the actual count of tasks, we could count the sub-tasks for each currently running compact tasks,
but that can run into other issues. I feel it is better to launch fewer compact tasks (due to a smaller availableSlot count) than over-assign compact tasks causing them to potentially hog up slots reserved for ingestion.
So I feel the current approach of capping at maxNumConcurrentSubTasks while computing availableSlot count is reasonable.

What do you think?

Edit: Although, it might be useful to emit the "actual" busy slot count which would be computed using the sub-task counts as mentioned above. Is that what you mean to do here?

@abhishekrb19
Copy link
Contributor Author

@kfaraz thanks for taking a look!

It turns out that the compact duty using the native engine just caps it using maxNumConcurrentSubTasks, regardless of the phase a current compact supervisor is in.
@abhishekrb19 , I am not sure how the busyCompactionTaskSlots being emitted in this PR will avoid this problem,
since the value we are emitting is effectively just maxSlots - availableSlots.

Yeah, I was just noting my observation on how this metric is calculated. The docs for the compactTask/availableSlot/count metric tripped me up a bit: “This is the max number of task slots minus any currently running compaction tasks,” rather than mentioning that it’s an estimated number of currently running compaction tasks.

While at it, I was thinking that the busy slot estimate could just be emitted directly given the condition - a consistent higher utilization of the busy slots metric (and a corresponding drop in available slots) would indicate the need to tune the compaction task slots, which is what we ended up doing for some clusters.

Although, it might be useful to emit the "actual" busy slot count which would be computed using the sub-task counts as mentioned above. Is that what you mean to do here?

I wasn't necessarily thinking of emitting the "actual" busy slot count because it doesn't influence the auto-compaction algorithm currently; also I think this can be determined using task/*/count?

Let me know if that makes sense.

@kfaraz
Copy link
Contributor

kfaraz commented Oct 27, 2025

The docs for the compactTask/availableSlot/count metric tripped me up a bit: “This is the max number of task slots minus any currently running compaction tasks,” rather than mentioning that it’s an estimated number of currently running compaction tasks.

Fair point, @abhishekrb19 , it makes sense to update the docs.

also I think this can be determined using task/*/count

Yes, that's true, we can get the running count for different task types/datasources using the task/running/count metric.

a consistent higher utilization of the busy slots metric (and a corresponding drop in available slots) would indicate the need to tune the compaction task slots

I am not against emitting the busy slot count per se.
Just didn't seem to add any new info since we are already emitting the max slot count and the available slot count.
Please let me know if I am missing some use case.

@abhishekrb19 abhishekrb19 changed the title Add compactTask/busySlot/count metric to compact duty. Clarify compactTask/availableSlot/count metric description Oct 31, 2025
@abhishekrb19 abhishekrb19 force-pushed the busy_compact_metric branch 2 times, most recently from bc6c64b to e43c20b Compare October 31, 2025 23:34
@abhishekrb19
Copy link
Contributor Author

@kfaraz, I went ahead and just made the doc change to clarify and reverted the busySlot metric.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc fix, @abhishekrb19 !

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
@abhishekrb19 abhishekrb19 merged commit a3a2a5e into master Nov 4, 2025
6 checks passed
@abhishekrb19 abhishekrb19 deleted the busy_compact_metric branch November 4, 2025 01:47
@kgyrtkirk kgyrtkirk added this to the 36.0.0 milestone Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants