[FLINK-31476] AdaptiveScheduler respects minimum parallelism #22883

zentol · 2023-06-27T16:46:00Z

based on #22795 for ease of testing.

The Adaptive Scheduler now supports a minimum parallelism per vertex.

a job is only run if the minimum required slots of all vertices is fulfilled
if the minimum parallelism is being raised above the currently available number of slots (== implicitly above the current parallelism) the scheduler immediately cancels the job and tries to rescale it. It will either restart the job if it could acquire the additionally required slots, or fail the job once the stabilization timeout kicked in.
- We could consider adding some timeout here for the cancellation, to give us some time to acquire more slots.
if the minimum parallelism is lowered, or raised below the current parallelism, the job just keeps running.

flinkbot · 2023-06-27T16:54:01Z

CI report:

d6ac605 UNKNOWN
a238734 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

gyfora · 2023-06-27T17:10:30Z

I think this behavior may cause some serious problems if the new resources cannot be acquired. I think the scheduler should wait until the resources are acquired before taking any action to not cause downtime.

I am thinking about the case where you simply want to increase the parallelism to a given new (larger) parallelism . In that case you would set the min max to the same new value, and you would expect the scaling to happen only when resources are available.

It’s not enough in these cases to simply raise the max because that may lead to incremental multiple restarts as the resources become available which can cause even more downtime.

what do you think @zentol ?

cc @mxm

zentol · 2023-06-27T19:01:19Z

I think the scheduler should wait until the resources are acquired before taking any action to not cause downtime.

I may have misremembered what your requirements were. It shouldn't be difficult to solve this such that don't cancel the job immediately.

gyfora · 2023-06-27T19:18:27Z

We were thinking of using this in the context of autoscaling or in general scaling jobs up or down to a target parallelism. Theoretically there could be some circumstances where if the job requirements state a minimum that is higher than the current we may want cancel the job (eventually).

But from a purely practical perspective, I think most people would rather have the job running with the current (smaller) parallelism than to not have it at all in cases where the resources cannot be acquired. This is a rather common case in cloud environments due to various resource quotas, changing circumstances etc.

We could also make this configurable with a timeout to cancel where a negative/infinite timeout would mean that we actually wait forever for the resources without cancelling.

...k-runtime/src/main/java/org/apache/flink/runtime/scheduler/VertexParallelismInformation.java

gyfora · 2023-06-28T08:02:21Z

...ain/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java

+    private static Map<SlotSharingGroupId, Integer> getMaxParallelismForSlotSharingGroups(
+            Iterable<JobInformation.VertexInformation> vertices) {
+        return getPerSlotSharingGroups(
+                vertices, JobInformation.VertexInformation::getParallelism, Math::max);


isn't this supposed to be getMaxParallelism here?

nope, but the confusion is understandable. Max parallelism is an overloaded term, both referring to the max parallelism a job can ever run (== the number of key groups), and the upper bound parallelism that the job can run at.

Outside of validation purposes the actual max parallelism isn't relevant for scaling.

We should rename things to explicitly refer to lower/upper parallelism bound.

...ain/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java

gyfora

I think one test case may still be missing if I understand correctly. Otherwise the PR looks good to me!

gyfora · 2023-06-29T09:21:06Z

...runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveSchedulerTest.java

+                                            newJobResourceRequirements);
+                                    assertThat(scheduler.getState()).isSameAs(originalState);
+                                },
+                                singleThreadMainThreadExecutor))


I think we may be missing the test case that this actually scales up once the resources are available for the new higher min parallelism. Maybe that could be added here

...ain/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java

…/adaptive/allocator/SlotSharingSlotAllocator.java

RocXing · 2024-01-19T09:42:10Z

...ain/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java

+                                            - vertexInformation.getMinParallelism()),
+                    (metaInfo1, metaInfo2) ->
+                            new SlotSharingGroupMetaInfo(
+                                    Math.min(metaInfo1.getMinLowerBound(), metaInfo2.minLowerBound),


Hi Schepler! May you explain here why it's not Math.max(metaInfo1.getMinLowerBound(), metaInfo2.minLowerBound) but min here?

zentol requested review from dmvk and gyfora June 27, 2023 16:46

gyfora reviewed Jun 28, 2023

View reviewed changes

zentol commented Jun 28, 2023

View reviewed changes

...ain/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java Outdated Show resolved Hide resolved

gyfora approved these changes Jun 29, 2023

View reviewed changes

gyfora reviewed Jun 29, 2023

View reviewed changes

zentol commented Jun 29, 2023

View reviewed changes

...ain/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java Outdated Show resolved Hide resolved

zentol commented Jun 29, 2023

View reviewed changes

...ain/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java Outdated Show resolved Hide resolved

zentol added 6 commits July 4, 2023 17:19

[FLINK-31476] AdaptiveScheduler respects minimum parallelism

c24ce04

clarify calculation / fix edge-case / add tests

94aea94

do not cancel job when mi is raised above current parallelism

cb4cdf1

formatting / javadocs

f3b5733

Update flink-runtime/src/main/java/org/apache/flink/runtime/scheduler…

289d9fd

…/adaptive/allocator/SlotSharingSlotAllocator.java

eagerly collect ssg metainfo

78af5e4

zentol force-pushed the 31476 branch from 4f17aa5 to 78af5e4 Compare July 4, 2023 15:19

zentol added 2 commits July 5, 2023 12:45

extend test

20b129a

remove extra code for maxUP computation

a238734

zentol merged commit 38f4d13 into apache:master Jul 5, 2023

RocXing reviewed Jan 19, 2024

View reviewed changes

flinkbot added the component=Runtime/Coordination label Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-31476] AdaptiveScheduler respects minimum parallelism #22883

[FLINK-31476] AdaptiveScheduler respects minimum parallelism #22883

Uh oh!

zentol commented Jun 27, 2023

Uh oh!

flinkbot commented Jun 27, 2023 •

edited

Loading

Uh oh!

gyfora commented Jun 27, 2023

Uh oh!

zentol commented Jun 27, 2023

Uh oh!

gyfora commented Jun 27, 2023 •

edited

Loading

Uh oh!

Uh oh!

gyfora Jun 28, 2023

Uh oh!

zentol Jun 28, 2023

Uh oh!

Uh oh!

Uh oh!

gyfora left a comment

Uh oh!

gyfora Jun 29, 2023

Uh oh!

Uh oh!

Uh oh!

RocXing Jan 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[FLINK-31476] AdaptiveScheduler respects minimum parallelism #22883

[FLINK-31476] AdaptiveScheduler respects minimum parallelism #22883

Uh oh!

Conversation

zentol commented Jun 27, 2023

Uh oh!

flinkbot commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

gyfora commented Jun 27, 2023

Uh oh!

zentol commented Jun 27, 2023

Uh oh!

gyfora commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gyfora Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

zentol Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gyfora left a comment

Choose a reason for hiding this comment

Uh oh!

gyfora Jun 29, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

RocXing Jan 19, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

flinkbot commented Jun 27, 2023 •

edited

Loading

gyfora commented Jun 27, 2023 •

edited

Loading