-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-30593][autoscaler] Improve restart time tracking #735
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nices fixes. LGTM. Just a minor suggestion.
if (ctx.getJobStatus() == JobStatus.RUNNING) { | ||
if (scalingTracking.setEndTimeIfTrackedAndParallelismMatches( | ||
if (scalingTracking.recordRestartDurationIfTrackedAndParallelismMatches( | ||
now, jobTopology, scalingHistory)) { | ||
stateStore.storeScalingTracking(ctx, scalingTracking); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the RUNNING job state check. This block can be reduced to:
if (scalingTracking.recordRestartDurationIfTrackedAndParallelismMatches(
now, jobTopology, scalingHistory)) {
stateStore.storeScalingTracking(ctx, scalingTracking);
}
The reason is that this method only gets called when the job is in running state (see line 99). Enforcing a RUNNING state has always been a precondition for executing the autoscaling logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks.
This needs a rebase. I'll run the tests afterwards. |
if (ctx.getJobStatus() == JobStatus.RUNNING) { | ||
if (scalingTracking.setEndTimeIfTrackedAndParallelismMatches( | ||
if (scalingTracking.recordRestartDurationIfTrackedAndParallelismMatches( | ||
now, jobTopology, scalingHistory)) { | ||
stateStore.storeScalingTracking(ctx, scalingTracking); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please extract this logic into a method to keep the flow simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be part of stateStore.storeScalingTracking(ctx, scalingTracking);
even
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to recommend this as well. But these are only three lines of code (after removing the unneeded RUNNING condition) and the resulting method signature would be quite big. I don't think we need to block the PR on this refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be part of stateStore.storeScalingTracking(ctx, scalingTracking); even
I removed the redundant RUNNING check, as Max recommended, so it looks more straightforward now. Pushing this call down into the storeScalingTracking
would make it harder to reason, since it is key that runRescaleLogic
is only executed when the job is in the RUNNING state and hence the transition is considered complete. It also does not seem right to bundle the logic specific to this concrete situation into KubernetesAutoScalerStateStore
which acts more as a simple persistence layer. Hope this is fine by you.
…tion to record ScalingTracking endTime This change is required because without it, the restart duration won't be recorded until the metrics are considered to be fully collected.
dcd9b08
to
d73dd18
Compare
@mxm I addressed the comments and rebased, could you please kick off the tests again? |
Sure, they should be running now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alex! Looks good. Are you ok with merging @gyfora?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
This PR contains the following improvements to the restart tracking logic: