Skip to content

feat: rebalance cost-based autoscaler for best throughput#19646

Merged
Fly-Style merged 6 commits into
apache:masterfrom
Fly-Style:cba-expose-more-configs
Jul 2, 2026
Merged

feat: rebalance cost-based autoscaler for best throughput#19646
Fly-Style merged 6 commits into
apache:masterfrom
Fly-Style:cba-expose-more-configs

Conversation

@Fly-Style

@Fly-Style Fly-Style commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

This PR retunes the cost-based autoscaler scoring for better throughput-oriented decisions and exposes idealIdleRatio as a configurable knob for the U-shaped idle-cost function.
It also updates autoscaler logging to show the effective idle ratio and max observed processing rate used during optimal task-count calculation.

Details

Picture 1

image
  • Small supervisor case: 48 partitions, starting from 6 tasks.
  • Shows that the new cost function scales up gradually as lag becomes more important, while higher observed idle ratios keep the recommendation conservative.
  • Useful sanity check for avoiding aggressive jumps on smaller supervisors.

Picture 2

image
  • Medium supervisor case: 125 partitions, plot title shows startTaskCount=25.
  • Shows that low-idle/high-pressure cases recommend materially more tasks, while moderate or high idle ratios stay well below the partition count.
  • Useful for checking that the scoring does not blindly chase one-task-per-partition when idle capacity is already available.

Picture 3

image
  • Large supervisor case: 500 partitions, starting at the maximum task count.
  • Shows the strongest separation between low-idle and high-idle scenarios: low idle can keep the recommendation near max capacity, while higher idle ratios strongly favor scale-down.
  • Useful for validating that the idle-cost side still prevents waste at large partition counts.

This PR has:

  • been self-reviewed.
  • a release note entry in the PR description.

@Fly-Style Fly-Style changed the title Rebalance cost-based autoscaler for best troughput feat: rebalance cost-based autoscaler for best throughput Jul 1, 2026
@Fly-Style Fly-Style force-pushed the cba-expose-more-configs branch from f5be5c2 to 8a3e6dd Compare July 1, 2026 09:46
@Fly-Style Fly-Style marked this pull request as ready for review July 1, 2026 10:16
@Fly-Style Fly-Style self-assigned this Jul 1, 2026
@Fly-Style Fly-Style requested a review from kfaraz July 1, 2026 10:37

@kfaraz kfaraz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In conclusion, does the auto-scaler now prefer under-provisioning by default?

* Controls the steepness of the U-shape on the over-provisioning side.
*/
static final double OVER_PROVISIONING_PENALTY = 1.0;
static final double OVER_PROVISIONING_PENALTY = 2.5;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this to 2.5 instead of 2.0?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried 2.0, but 2.5 looked better in simulation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel 2x penalty on over-provisioning is already enough. Changing these constants very frequently will make it difficult to gather enough data to refine the constants empirically.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use 2 for now and see how it does in real clusters. This PR is already interchanging the under provisioned penalty and the over provisioning penalty, so we will have enough stuff to validate anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's use 2, but I can assure you 2.5 looks even better on simulation. We'll see 😺

* do not return infinitely large lag recovery times, at the expense of underestimating the lag cost.
*/
static final double MIN_PROCESSING_RATE = 1_000;
static final double MIN_PROCESSING_RATE = 5_000;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?
Technically, 1000 was also an arbitrary number but 5000 definitely seems to be on the higher side, especially when dealing with bulkier records with large (say JSON) column values. I would rather the user choose whether they want to prefer lag recovery or throughput by tweaking the weights.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to tweak it too and realized that 1000 is too permissible to scaleup during minimal lag. 5000 implies more strict behaviour in that critical state where we have not received metrics yet.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have not received metrics yet, auto-scaling would be skipped since CostBasedAutoScaler.validateMetricsForScaling would return an error.

I have personally seen tasks in prod clusters maxing out at 5000 records/sec when dealing with large records.
So, using a large MIN_PROCESSING_RATE would cause us to always under-estimate lagRecoveryTime, irrespective of the actual avgProcessingRate.
As such, let's keep a low value of MIN_PROCESSING_RATE(maybe even as low as 100), since it is meant to be a safe-side measure that kicks in only when avgProcessingRate is very low.

The penalty for scale-up is already driven by the optimal task idleness, and can be controlled using the weights.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. reverting

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future PR, I think we can remove the MIN_PROCESSING_RATE altogether and maybe use the window maxProcessingRate, but I haven't fully thought it through yet. It might have some unforeseen side effects.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kfaraz I forgot about Math.max(...) and then I reconsider my approach 😁

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1
Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

Found 1 issue.

Reviewed 6 of 6 changed files.


This is an automated review by Codex GPT-5.5

@Fly-Style Fly-Style requested a review from kfaraz July 1, 2026 12:37

@kfaraz kfaraz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a non-blocking comment.

@Fly-Style Fly-Style merged commit a2105da into apache:master Jul 2, 2026
38 checks passed
@Fly-Style Fly-Style deleted the cba-expose-more-configs branch July 2, 2026 10:32
@github-actions github-actions Bot added this to the 38.0.0 milestone Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants