[FLINK-33187] using hashcode for parallelism map comparison #685

clarax · 2023-10-18T08:17:54Z

What is the purpose of the change

This is to implement the scaling report comparison algorithm for event suppression by using hashcodes of the parallelism maps. Originally I used the full string of report message which contains the metrics fluctuating with current load. As long as parallelism map doesn't change, we don't need to generate new events within the defined interval.

Brief change log

Save hashcode of the parallelism map to metadata labels
Use the hashcode to compare two scaling report for advice

Verifying this change

Updated and added unit tests.
Verified in integration test env.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): No
The public API, i.e., is any changes to the CustomResourceDescriptors: No
Core observer or reconciler logic that is regularly executed: No

Documentation

Does this pull request introduce a new feature? No
If yes, how is the feature documented? N/A

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/event/AutoScalerEventHandler.java

gyfora · 2023-10-19T20:23:14Z

Sorry @clarax I think I gave confusing feedback and didn't express myself clearly. I was thinking something like this:

interface AutoScalerEventHandler {

void handleGenericEvent(Ctx, Type, Reson, Message, Key, Interval);

default void handleScalingEvent(Ctx, Map<JobVertexId, ScalingSummary> summaries, boolean scalingEnabled, Interval) {
    // Provide default implementation without proper deduplication
   handleGenericEvent(...., toMessage(summaries), interval);
} 

}

Then for Kubernetes we have the custom label logic encapsulated:

class KubernetesAutoScalerEventHandler {

...

@Override 
void handleScalingEvent(Ctx, Map<JobVertexId, ScalingSummary> summaries, boolean scalingEnabled, Interval) {
   var labels = createLabels(summaries)
   getOldEvent, compare labels, whatever we need to do we do not leak to ScalingExecutor
}

}

This way the ScalingExecutor simply calls:

eventHandler.handleScalingEvent(...)

And we could remove the predicate/label logic that really do not belong there as that is very Kubernetes specific and move it to the KubernetesAutoScalerEventHandler implementation.

What do you think?

clarax · 2023-10-20T03:39:04Z

Refactored as @gyfora suggusted.

gyfora

I think it looks really good now, we have a much nicer separation of responsibilities in the different components. Please verify the latest version manually (locally or something) if you can @clarax

@1996fanrui What do you think about the interface change?

...n/java/org/apache/flink/kubernetes/operator/autoscaler/KubernetesAutoScalerEventHandler.java

...kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/EventUtils.java

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/event/AutoScalerEventHandler.java

gyfora

Another thing we have been discussing with @1996fanrui

The interval config should probably be renamed scaling.report.interval->scaling.event.interval this way we can use it generally in the future for autoscaler triggered events.

We should also make sure that the simple handleEvent method also respects the interval if specified. And we should probably use the interval also for ineffective scaling events. I know that some of these changes are not directly related to this PR but it may be better to clean it up so we leave it in a good state afterwards.

clarax · 2023-10-24T03:09:18Z

Another thing we have been discussing with @1996fanrui

The interval config should probably be renamed scaling.report.interval->scaling.event.interval this way we can use it generally in the future for autoscaler triggered events.

We should also make sure that the simple handleEvent method also respects the interval if specified. And we should probably use the interval also for ineffective scaling events. I know that some of these changes are not directly related to this PR but it may be better to clean it up so we leave it in a good state afterwards.

Resolved all requested changes.

1996fanrui · 2023-10-24T04:58:49Z

Hi, the ci fails, and please run the mvn clean install -DskipTests -Pgenerate-docs again, thanks

https://github.com/apache/flink-kubernetes-operator/actions/runs/6621898931/job/17987047236?pr=685#step:5:19405

gyfora

The PR looks great @clarax , thank you!

I found 2 very minor things (see comments), otherwise it is good to go

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/config/AutoScalerOptions.java

...kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/EventUtils.java

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/config/AutoScalerOptions.java

review feedback

clarax closed this Oct 18, 2023

gyfora reviewed Oct 19, 2023

View reviewed changes

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/event/AutoScalerEventHandler.java Outdated Show resolved Hide resolved

clarax reopened this Oct 19, 2023

clarax force-pushed the main branch from f855612 to 7573c23 Compare October 19, 2023 20:11

clarax force-pushed the main branch from 7573c23 to b9f78b7 Compare October 20, 2023 03:35

gyfora approved these changes Oct 20, 2023

View reviewed changes

1996fanrui self-requested a review October 20, 2023 07:51

1996fanrui reviewed Oct 21, 2023

View reviewed changes

gyfora reviewed Oct 23, 2023

View reviewed changes

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/event/AutoScalerEventHandler.java Outdated Show resolved Hide resolved

gyfora requested changes Oct 23, 2023

View reviewed changes

clarax force-pushed the main branch from aec8725 to f466672 Compare October 24, 2023 03:06

clarax force-pushed the main branch from f466672 to 6b5ea0a Compare October 24, 2023 03:16

1996fanrui approved these changes Oct 24, 2023

View reviewed changes

gyfora requested changes Oct 24, 2023

View reviewed changes

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/config/AutoScalerOptions.java Show resolved Hide resolved

...kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/EventUtils.java Show resolved Hide resolved

1996fanrui reviewed Oct 24, 2023

View reviewed changes

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/config/AutoScalerOptions.java Show resolved Hide resolved

Clara Xiong added 2 commits October 24, 2023 08:41

[FLINK-33187] using hashcode for parallelism map comparison

10ed8e7

review feedback

review feedback

ee8d932

clarax force-pushed the main branch from 6b5ea0a to aa53d61 Compare October 24, 2023 15:41

Regenerated doc

ec59377

clarax force-pushed the main branch from aa53d61 to ec59377 Compare October 24, 2023 16:13

gyfora merged commit faaff56 into apache:main Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLINK-33187] using hashcode for parallelism map comparison #685

[FLINK-33187] using hashcode for parallelism map comparison #685

Uh oh!

clarax commented Oct 18, 2023

Uh oh!

Uh oh!

gyfora commented Oct 19, 2023

Uh oh!

clarax commented Oct 20, 2023

Uh oh!

gyfora left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gyfora left a comment

Uh oh!

clarax commented Oct 24, 2023

Uh oh!

1996fanrui commented Oct 24, 2023

Uh oh!

gyfora left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[FLINK-33187] using hashcode for parallelism map comparison #685

[FLINK-33187] using hashcode for parallelism map comparison #685

Uh oh!

Conversation

clarax commented Oct 18, 2023

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

Uh oh!

gyfora commented Oct 19, 2023

Uh oh!

clarax commented Oct 20, 2023

Uh oh!

gyfora left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gyfora left a comment

Choose a reason for hiding this comment

Uh oh!

clarax commented Oct 24, 2023

Uh oh!

1996fanrui commented Oct 24, 2023

Uh oh!

gyfora left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!