[FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler #21981

rkhachatryan · 2023-02-21T08:41:18Z

What is the purpose of the change

Adjust slot assignment by Adaptive Scheduler
to try to re-use previous allocations
so that TMs can use Local Recovery.

Contributed mostly by @dmvk.
The main defferences from the original contribution:

Previous ExecutionGraph is passed from the previous state explicitly (currently, WaitingForResources stage, which triggers the computation, doesn't have the graph)
In SlotAssigner, the split into two methods is removed mostly for consistency (two methods mostly duplicated each other). That results in higher asymptotical complexity of StateLocalitySlotAssigner (O(mnlog*mnlog) vs O(mnlog)
DoP is computed according to FLINK-30895

Brief change log

Support LocalRecovery by AdaptiveScheduler
Add previous ExecutionGraph to WaitingForResources AdaptiveScheduler state
Make LocalRecoveryITCase fail when allocations don't match

Verifying this change

Adjusted LocalRecoveryITCase
Added SlotSharingSlotAllocatorTest.testStickyAllocation
Added StateLocalitySlotAssignerTest

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

flinkbot · 2023-02-21T08:48:00Z

CI report:

ff1b080 UNKNOWN
672c8f5 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

...ain/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java

flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveScheduler.java

...in/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java

zentol

I'm not a fan of passing around execution graphs, and would rather see a dedicated structure for our purposes that lives in the AdaptiveScheduler.

This would avoid some edge-cases, like local recovery breaking down unnecessarily when CreatingWithExecutionGraph failed.

...src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/DefaultSlotAssigner.java

flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/StateTransitions.java

...ain/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java

flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveScheduler.java

...in/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java

...ntime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotAllocator.java

rkhachatryan · 2023-02-23T23:53:30Z

Thanks for the feedback, @zentol and @dmvk .
I've updated the PR, would you mind taking another look?

I've significantly restructured the code after the offline discussions.
Probably a good place to start are the final versions of SlotAllocator and SlotAssigner interfaces.

dmvk

Thanks for the update @rkhachatryan, this starts looking really good! I've left a few questions, PTAL

flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/JobSchedulingPlan.java

...ain/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java

...in/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java

.../src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateSizeEstimates.java

...in/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java

...ime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/AllocationsInfo.java

...src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/DefaultSlotAssigner.java

.../src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateSizeEstimates.java

...in/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java

flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveScheduler.java

...k-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/WaitingForResources.java

flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveScheduler.java

dmvk

Thanks for the update @rkhachatryan; this looks great! I've added a few more minor comments, PTAL.

My biggest concern is whether the integration test correctly stresses the AdaptiveScheduler code path.

Please prepare the PR for the merging (fixing the commit history + moving the StateSizeEstimests out of the PR as discussed offline).

flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CompletedCheckpointStore.java

...e/src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/VertexParallelism.java

flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptive/CreatedTest.java

...me/src/test/java/org/apache/flink/runtime/scheduler/adaptive/CreatingExecutionGraphTest.java

...runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveSchedulerTest.java

...ntime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotAllocator.java

...in/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java

flink-tests/src/test/java/org/apache/flink/test/recovery/LocalRecoveryITCase.java

…match Currently, wrong allocation fails the task causing a restart, which eventually allows to fix the allocation by picking the right TM. This prevents the test from failure and hides the wrong allocation.

rkhachatryan · 2023-03-01T08:28:10Z

Thanks a lot for the thorough review @dmvk!

I've cleaned up the commit history and I think all concerns are now resolved, PTAL.

dmvk

Thanks for updating the PR @rkhachatryan. Great stuff!

🎉 💪

flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveScheduler.java

flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/StateTransitions.java

...in/java/org/apache/flink/runtime/scheduler/adaptive/allocator/JobAllocationsInformation.java

...untime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/CreatingExecutionGraph.java

…rces AdaptiveScheduler state Previous ExecutionGraph will be used in a subsequent commit to allocate workloads more optimally by taking previous allocations into account.

…uler and SlotAssigner Slot assignments are computed and consumed by SlotAllocator. This is expressed implicitly by extending VertexParallelism. This change tries to make that clear, while still allowing to assign slots to something other than Slot Sharing Groups. It does so by: 1. Introduce JobSchedulingPlan, computed and consumed by SlotAllocator. It couples VertexParallelism with slot assignments 2. Introduce determineParallelismAndCalculateAssignment method in addition to determineParallelism, specifically for assignments 3. Push the polymorphism of state assignments from VertexParallelism into the JobSchedulingPlan (slot assignment target)

rkhachatryan · 2023-03-02T21:41:28Z

Merged as d718342..e38a670.
Thanks a lot for the initial prototype and for the thorough reviews @dmvk and @zentol !

rkhachatryan requested review from zentol and dmvk February 21, 2023 08:41

rkhachatryan commented Feb 21, 2023

View reviewed changes

...ain/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java Outdated Show resolved Hide resolved

dmvk reviewed Feb 21, 2023

View reviewed changes

flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveScheduler.java Outdated Show resolved Hide resolved

dmvk reviewed Feb 21, 2023

View reviewed changes

...in/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java Outdated Show resolved Hide resolved

zentol reviewed Feb 21, 2023

View reviewed changes

flinkbot added the component=Runtime/Coordination label Feb 21, 2023

zentol self-assigned this Feb 21, 2023

zentol reviewed Feb 21, 2023

View reviewed changes

...ntime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotAllocator.java Outdated Show resolved Hide resolved

dmvk reviewed Feb 24, 2023

View reviewed changes

zentol reviewed Feb 24, 2023

View reviewed changes

...in/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java Outdated Show resolved Hide resolved

...in/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java Outdated Show resolved Hide resolved

dmvk reviewed Feb 24, 2023

View reviewed changes

...in/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java Outdated Show resolved Hide resolved

dmvk reviewed Feb 24, 2023

View reviewed changes

flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveScheduler.java Outdated Show resolved Hide resolved

dmvk reviewed Feb 24, 2023

View reviewed changes

...k-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/WaitingForResources.java Outdated Show resolved Hide resolved

dmvk reviewed Feb 24, 2023

View reviewed changes

flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveScheduler.java Outdated Show resolved Hide resolved

dmvk reviewed Feb 28, 2023

View reviewed changes

[hotfix][tests] Make LocalRecoveryITCase fail when allocations don't …

4bf04c3

…match Currently, wrong allocation fails the task causing a restart, which eventually allows to fix the allocation by picking the right TM. This prevents the test from failure and hides the wrong allocation.

rkhachatryan force-pushed the f21450 branch from b1c08e7 to 624e80b Compare February 28, 2023 22:22

rkhachatryan mentioned this pull request Feb 28, 2023

[FLINK-31261][runtime] Make AdaptiveScheduler aware of local state size #22046

Open

rkhachatryan requested a review from dmvk March 1, 2023 08:28

dmvk approved these changes Mar 2, 2023

View reviewed changes

dmvk reviewed Mar 2, 2023

View reviewed changes

...untime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/CreatingExecutionGraph.java Outdated Show resolved Hide resolved

rkhachatryan and others added 3 commits March 2, 2023 14:30

[FLINK-21450][runtime] Add previous ExecutionGraph to WaitingForResou…

ac46a9a

…rces AdaptiveScheduler state Previous ExecutionGraph will be used in a subsequent commit to allocate workloads more optimally by taking previous allocations into account.

[FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler

672c8f5

rkhachatryan force-pushed the f21450 branch from 06dede2 to 672c8f5 Compare March 2, 2023 14:37

rkhachatryan merged commit e38a670 into apache:master Mar 2, 2023

rkhachatryan deleted the f21450 branch March 2, 2023 21:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler #21981

[FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler #21981

rkhachatryan commented Feb 21, 2023 •

edited

flinkbot commented Feb 21, 2023 •

edited

zentol left a comment

rkhachatryan commented Feb 23, 2023

dmvk left a comment

dmvk left a comment

rkhachatryan commented Mar 1, 2023

dmvk left a comment

rkhachatryan commented Mar 2, 2023

[FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler #21981

[FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler #21981

Conversation

rkhachatryan commented Feb 21, 2023 • edited

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Feb 21, 2023 • edited

CI report:

zentol left a comment

Choose a reason for hiding this comment

rkhachatryan commented Feb 23, 2023

dmvk left a comment

Choose a reason for hiding this comment

dmvk left a comment

Choose a reason for hiding this comment

rkhachatryan commented Mar 1, 2023

dmvk left a comment

Choose a reason for hiding this comment

rkhachatryan commented Mar 2, 2023

rkhachatryan commented Feb 21, 2023 •

edited

flinkbot commented Feb 21, 2023 •

edited