Skip to content

Harmonize metrics for lag and cost based autoscalers.#19097

Merged
gianm merged 4 commits intoapache:masterfrom
gianm:ss-scale-metrics
Mar 6, 2026
Merged

Harmonize metrics for lag and cost based autoscalers.#19097
gianm merged 4 commits intoapache:masterfrom
gianm:ss-scale-metrics

Conversation

@gianm
Copy link
Contributor

@gianm gianm commented Mar 5, 2026

This patch adjusts the autoscalers to emit metrics with supervisorId, dataSource, and stream. Previously, the lag-based autoscaler emitted only dataSource (which was actually the supervisor id) and stream, and the cost-based autoscaler emitted only supervisorId and stream.

This patch adjusts the autoscalers to emit metrics with supervisorId,
dataSource, and stream. Previously, the lag-based autoscaler emitted
only dataSource (which was actually the supervisor id) and stream, and
the cost-based autoscaler emitted only supervisorId and stream.
@jtuglu1 jtuglu1 self-requested a review March 6, 2026 07:47
Copy link
Contributor

@jtuglu1 jtuglu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM – few nits. Let's also make sure to update the markdown docs with the dimensions.

);
log.info(
"LagBasedAutoScaler will collect lag every [%d] millis and will keep up to [%d] data points for the last [%d] millis for dataSource [%s]",
"LagBasedAutoScaler will collect lag every [%d] millis and will keep up to [%d] data points for the last [%d] millis for supervisor [%s]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: supervisor[%d]


log.debug("Calculated beyondProportion is [%s] and withinProportion is [%s] for dataSource [%s].", beyondProportion,
withinProportion, dataSource
log.debug("Calculated beyondProportion is [%s] and withinProportion is [%s] for supervisor [%s].", beyondProportion,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

log.debug("CurrentActiveTaskCount reached task count Max limit, skipping scale out action for dataSource [%s].",
dataSource
log.debug(
"CurrentActiveTaskCount reached task count Max limit, skipping scale out action for supervisor [%s].",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit

log.debug("CurrentActiveTaskCount reached task count Min limit, skipping scale in action for dataSource[%s].",
dataSource
log.debug(
"CurrentActiveTaskCount reached task count Min limit, skipping scale in action for supervisor[%s].",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe log the min limit here since might be different from current spec's config

Comment on lines +193 to +196
CollectionUtils.getOnlyElement(
spec.getDataSources(),
xs -> DruidException.defensive("Expected one dataSource, got[%s]", xs)
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip passing the dataSource here and just extract it from the spec inside the LagBasedAutoScaler.

@gianm gianm merged commit 2cf5da7 into apache:master Mar 6, 2026
37 checks passed
@gianm gianm deleted the ss-scale-metrics branch March 6, 2026 20:18
@github-actions github-actions bot added this to the 37.0.0 milestone Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants