Clarify `compactTask/availableSlot/count` metric description #18663

abhishekrb19 · 2025-10-18T02:14:45Z

Clarify the compactTask/availableSlot/count metric description.

The formula to compute available compact task slots in a run is: availableCompactionTaskSlots = Math.max(0, compactionTaskCapacity - busyCompactionTaskSlots). Often times, it's confusing why compactTask/availableSlot/count is lower than expected. This happens because the value is capped by maxNumConcurrentSubTasks, and the current metric description can be slightly misleading.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.

This adds the compactTask/busySlot/count metric directly, rather than indirectly deducing it through the formula for available task slots and maximum task slots and taking their minimum. The formula to compute available compact task slots in a run is: availableCompactionTaskSlots = Math.max(0, compactionTaskCapacity - busyCompactionTaskSlots); Often times, it's confusing why compactTask/availableSlot/count is lower than expected. It turns out that the compact duty using the native engine just caps it using maxNumConcurrentSubTasks, regardless of the phase a current compact supervisor is in. This is likely the safest and conservative thing to do. This metric should help operators better plan for compaction task slots in a MM-based setup.

kfaraz · 2025-10-18T06:42:54Z

It turns out that the compact duty using the native engine just caps it using maxNumConcurrentSubTasks, regardless of the phase a current compact supervisor is in.

@abhishekrb19 , I am not sure how the busyCompactionTaskSlots being emitted in this PR will avoid this problem,
since the value we are emitting is effectively just maxSlots - availableSlots. Could you please clarify this?

To determine the actual count of tasks, we could count the sub-tasks for each currently running compact tasks,
but that can run into other issues. I feel it is better to launch fewer compact tasks (due to a smaller availableSlot count) than over-assign compact tasks causing them to potentially hog up slots reserved for ingestion.
So I feel the current approach of capping at maxNumConcurrentSubTasks while computing availableSlot count is reasonable.

What do you think?

Edit: Although, it might be useful to emit the "actual" busy slot count which would be computed using the sub-task counts as mentioned above. Is that what you mean to do here?

abhishekrb19 · 2025-10-22T02:58:20Z

@kfaraz thanks for taking a look!

It turns out that the compact duty using the native engine just caps it using maxNumConcurrentSubTasks, regardless of the phase a current compact supervisor is in.
@abhishekrb19 , I am not sure how the busyCompactionTaskSlots being emitted in this PR will avoid this problem,
since the value we are emitting is effectively just maxSlots - availableSlots.

Yeah, I was just noting my observation on how this metric is calculated. The docs for the compactTask/availableSlot/count metric tripped me up a bit: “This is the max number of task slots minus any currently running compaction tasks,” rather than mentioning that it’s an estimated number of currently running compaction tasks.

While at it, I was thinking that the busy slot estimate could just be emitted directly given the condition - a consistent higher utilization of the busy slots metric (and a corresponding drop in available slots) would indicate the need to tune the compaction task slots, which is what we ended up doing for some clusters.

Although, it might be useful to emit the "actual" busy slot count which would be computed using the sub-task counts as mentioned above. Is that what you mean to do here?

I wasn't necessarily thinking of emitting the "actual" busy slot count because it doesn't influence the auto-compaction algorithm currently; also I think this can be determined using task/*/count?

Let me know if that makes sense.

kfaraz · 2025-10-27T03:17:59Z

The docs for the compactTask/availableSlot/count metric tripped me up a bit: “This is the max number of task slots minus any currently running compaction tasks,” rather than mentioning that it’s an estimated number of currently running compaction tasks.

Fair point, @abhishekrb19 , it makes sense to update the docs.

also I think this can be determined using task/*/count

Yes, that's true, we can get the running count for different task types/datasources using the task/running/count metric.

a consistent higher utilization of the busy slots metric (and a corresponding drop in available slots) would indicate the need to tune the compaction task slots

I am not against emitting the busy slot count per se.
Just didn't seem to add any new info since we are already emitting the max slot count and the available slot count.
Please let me know if I am missing some use case.

abhishekrb19 · 2025-10-31T23:37:12Z

@kfaraz, I went ahead and just made the doc change to clarify and reverted the busySlot metric.

kfaraz

Thanks for the doc fix, @abhishekrb19 !

docs/operations/metrics.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

docs/operations/metrics.md

github-actions bot added the Area - Documentation label Oct 18, 2025

abhishekrb19 added 2 commits October 21, 2025 19:33

Merge branch 'master' into busy_compact_metric

9ecfbc1

Clarify doc

dcbaff9

abhishekrb19 added 2 commits October 31, 2025 16:22

Revert busySlot metric changes.

db48b4c

Merge branch 'master' into busy_compact_metric

0b10e4c

abhishekrb19 changed the title ~~Add compactTask/busySlot/count metric to compact duty.~~ Clarify compactTask/availableSlot/count metric description Oct 31, 2025

abhishekrb19 force-pushed the busy_compact_metric branch 2 times, most recently from bc6c64b to e43c20b Compare October 31, 2025 23:34

Condense doc

4321642

abhishekrb19 force-pushed the busy_compact_metric branch from e43c20b to 4321642 Compare October 31, 2025 23:35

kfaraz approved these changes Nov 2, 2025

View reviewed changes

docs/operations/metrics.md Outdated Show resolved Hide resolved

Update docs/operations/metrics.md

b8204f5

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

abhishekrb19 commented Nov 4, 2025

View reviewed changes

docs/operations/metrics.md Outdated Show resolved Hide resolved

/s/auto-compaction/auto compaction for consistency

8837052

abhishekrb19 merged commit a3a2a5e into master Nov 4, 2025
6 checks passed

abhishekrb19 deleted the busy_compact_metric branch November 4, 2025 01:47

kgyrtkirk added this to the 36.0.0 milestone Jan 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify `compactTask/availableSlot/count` metric description #18663

Clarify `compactTask/availableSlot/count` metric description #18663

Uh oh!

abhishekrb19 commented Oct 18, 2025 •

edited

Loading

Uh oh!

kfaraz commented Oct 18, 2025 •

edited

Loading

Uh oh!

abhishekrb19 commented Oct 22, 2025

Uh oh!

kfaraz commented Oct 27, 2025

Uh oh!

abhishekrb19 commented Oct 31, 2025

Uh oh!

kfaraz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Clarify compactTask/availableSlot/count metric description #18663

Clarify compactTask/availableSlot/count metric description #18663

Uh oh!

Conversation

abhishekrb19 commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfaraz commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhishekrb19 commented Oct 22, 2025

Uh oh!

kfaraz commented Oct 27, 2025

Uh oh!

abhishekrb19 commented Oct 31, 2025

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Clarify `compactTask/availableSlot/count` metric description #18663

Clarify `compactTask/availableSlot/count` metric description #18663

abhishekrb19 commented Oct 18, 2025 •

edited

Loading

kfaraz commented Oct 18, 2025 •

edited

Loading