[FLINK-21326][runtime] Optimize building topology when initializing ExecutionGraph #14868

Thesharing · 2021-02-04T09:33:37Z

What is the purpose of the change

This PR introduces the optimization of building topology when initializing ExecutionGraph.
The main idea is to put all the vertices that consumed the same result partitions into one group, and put all the result partitions that have the same consumer vertices into one consumer group.
The complexity of building topology in ExecutionGraph decreases from O(N^2) to O(N).

For more details please check FLINK-21326.

Brief change log

Introduced EdgeManager, ConsumerVertexGroup and ConsumedPartitionGroup to store the topology in ExecutionGraph
Introduced optimizations on the procedure of building topology when initializing ExecutionGraph
Removed ExecutionEdge and fixed related tests

Verifying this change

Since these optimizations do not change the original logic of building topology in ExecutionGraph, we believe that this change is already covered by existing tests, such as ExecutionGraphConstructionTest, ExecutionGraphRescalingTest, PointwisePatternTest, and etc.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

flinkbot · 2021-02-04T09:35:48Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 85af2bb (Thu Feb 04 09:35:47 UTC 2021)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

flinkbot · 2021-02-04T10:11:32Z

CI report:

91c2d95 Azure: FAILURE

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run travis re-run the last Travis build
@flinkbot run azure re-run the last Azure build

zhuzhurk

Thanks for opening this PR @Thesharing
The change generally looks good to me. I have a few minor comments.

...untime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumedPartitionGroup.java

...k-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumerVertexGroup.java

...ntime/src/main/java/org/apache/flink/runtime/executiongraph/IntermediateResultPartition.java

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManagerBuildUtil.java

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java

flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/DistributionPattern.java

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManagerBuildUtil.java

Thesharing

Thanks for reviewing and providing these great suggestions. I've resolved them in the fix-up commits.

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManagerBuildUtil.java

flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/DistributionPattern.java

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java

flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/PointwisePatternTest.java

zhuzhurk

Thanks for addressing all the comments @Thesharing
The change looks good to me.
@tillrohrmann do you want to take another look?

tillrohrmann · 2021-02-26T09:35:28Z

I'll try to give it a pass until Monday. If I didn't manage to do it, then go ahead with merging it.

tillrohrmann

Thanks for creating this PR @Thesharing. The changes go in a good direction. I had a couple of comments. Please take a look.

tillrohrmann · 2021-02-27T15:14:23Z

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManager.java

+        // sanity check
+        checkState(consumedPartitions.size() == inputNumber);


Does this mean that we have to add the consumed partitions in increasing order? If this is the contract, then we might wanna add a JavaDoc explaining this more explicitly.

Alternatively we could change the API so that one needs to add all Collection<ConsumedPartitionGroup> when adding an ExecutionVertexID.

Yes, this order is redundant, there is no limitation about order before. I prefer to remove inputNumber from the parameters, since currently in EdgeManagerBuildUtils ConsumedPartitionGroup is added one-by-one per JobEdge.

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManager.java

...untime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumedPartitionGroup.java

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java

tillrohrmann · 2021-02-27T16:06:52Z

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManagerBuildUtil.java

+/** Utilities for building {@link EdgeManager}. */
+public class EdgeManagerBuildUtil {
+
+    public static void connectVertexToResult(


Do we have some tests for this method?

Like EdgeManager, we think since we didn't change the original logic, it's covered by The all-to-all edges are tested by ExecutionGraphConstructionTest. The pointwise edges are tested by PointwisePatternTest.

Cool, could you add a comment to the the two test classes that they effectively test EdgeManagerBuildUtil now?

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManager.java

tillrohrmann · 2021-02-27T16:12:19Z

...untime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumedPartitionGroup.java

+    public List<IntermediateResultPartitionID> getResultPartitions() {
+        return Collections.unmodifiableList(resultPartitions);
+    }


We could hide the implementation detail by letting ConsumedPartitionGroup implement the methods we need it to have to directly work with it. For example if it implements size() and Iterable, then it should already go a far way. Maybe we also need get(int index). The same applies to the ConsumerVertexGroup.

Totally agreed. This will make the call of ConsumedPartitionGroup more simplified. After discussing with @zhuzhurk, we decided to have the following methods:

iterator()

size()

getFirst() (to replace get(0))

isEmpty()

Thesharing

Thank you so much for these patient and enlightening suggestions, @tillrohrmann. I've made several changes according to them. Would you mind re-reviewing it again when you got free time?

...ntime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptorFactory.java

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManager.java

Thesharing · 2021-03-01T03:47:56Z

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManager.java

+        // sanity check
+        checkState(consumedPartitions.size() == inputNumber);


Yes, this order is redundant, there is no limitation about order before. I prefer to remove inputNumber from the parameters, since currently in EdgeManagerBuildUtils ConsumedPartitionGroup is added one-by-one per JobEdge.

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManager.java

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java

Thesharing · 2021-03-01T07:31:32Z

...untime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumedPartitionGroup.java

+    public List<IntermediateResultPartitionID> getResultPartitions() {
+        return Collections.unmodifiableList(resultPartitions);
+    }


Totally agreed. This will make the call of ConsumedPartitionGroup more simplified. After discussing with @zhuzhurk, we decided to have the following methods:

iterator()

size()

getFirst() (to replace get(0))

isEmpty()

...untime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumedPartitionGroup.java

...k-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumerVertexGroup.java

zhuzhurk · 2021-03-02T09:48:38Z

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManagerBuildUtil.java

+     * based on the {@link DistributionPattern}. The connection information is stored in the {@link
+     * EdgeManager}.
+     */
+    public static void connectVertexToResult(


Seems it can be package private.

zhuzhurk · 2021-03-02T09:59:52Z

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java

-            if (parallelism % numSources == 0) {
-                // same number of targets per source
-                int factor = parallelism / numSources;
-                sourcePartition = subTaskIndex / factor;
-            } else {
-                // different number of targets per source
-                float factor = ((float) parallelism) / numSources;
-                sourcePartition = (int) (subTaskIndex / factor);
-            }


I think the previous code which handles case XXX % YYY == 0 is an unnecessarily complication and we can simplify it a bit. The result should be the same and PointwiseTest#testPointwiseConnectionSequence is added to ensure this.

tillrohrmann

Thanks for updating this PR @Thesharing. I had a few more comments. Please take a look.

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManagerBuildUtil.java

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java

tillrohrmann · 2021-03-02T15:54:25Z

...ntime/src/main/java/org/apache/flink/runtime/executiongraph/IntermediateResultPartition.java

        this.partitionId = new IntermediateResultPartitionID(totalResult.getId(), partitionNumber);
+
+        producer.getExecutionGraph().registerResultPartition(partitionId, this);


I am not a huge fan of coupling components by these kind of constructs. Couldn't we register the IntermediateResultPartition where it is created (e.g. in the ExecutionVertex)?

Yes, it seems coupling too much. I think it's better to register all the ExecutionVertices and IntermediateResultParititons after all of them are created.

Now we register them in ExecutionGraph#registerExecutionVerticesAndResultPartitions, and this method is called in ExecutionGraph#attachJobGraph, right after creating all the ExecutionJobVertices.

tillrohrmann · 2021-03-02T15:55:47Z

...ntime/src/main/java/org/apache/flink/runtime/executiongraph/IntermediateResultPartition.java

+    private EdgeManager getEdgeManager() {
+        return producer.getExecutionGraph().getEdgeManager();
    }


I think this shows that we are overly coupling the IntermediateResultPartition with the ExecutionGraph. Couldn't we give an EdgeManager to the IntermediateResultPartition when we create it? This makes the dependency explicit.

Resolved. I'm wondering should we also give EdgeManager to the constructor of ExecutionVertex? I'm not sure about it.

Theoretically yes. But the ExecutionVertex is already coupled quite tightly to the ExecutionGraph. Hence, it might not make a big difference to not pass it in.

...untime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumedPartitionGroup.java

...k-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumerVertexGroup.java

Thesharing

Thanks for providing these great suggestions, @tillrohrmann. They really help me improve this pull request. Would you mind re-reviewing it if you got free time? Thank you in advance.

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManager.java

Thesharing · 2021-03-03T06:36:42Z

...ntime/src/main/java/org/apache/flink/runtime/executiongraph/IntermediateResultPartition.java

+    private EdgeManager getEdgeManager() {
+        return producer.getExecutionGraph().getEdgeManager();
    }


Resolved. I'm wondering should we also give EdgeManager to the constructor of ExecutionVertex? I'm not sure about it.

...untime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumedPartitionGroup.java

...k-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumerVertexGroup.java

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java

Thesharing · 2021-03-05T08:23:09Z

I've rebased on latest master branch, due to changes introduced in FLINK-21347.

tillrohrmann · 2021-03-05T08:37:23Z

I'll try to give it another pass today.

tillrohrmann

Thanks for updating this PR @Thesharing. It looks really nice now. Well done! I had a few very minor comments. I will address them myself while merging this PR.

tillrohrmann · 2021-03-05T13:03:18Z

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/DefaultExecutionGraph.java

+    private void registerExecutionVerticesAndResultPartitions(
+            List<ExecutionJobVertex> executionJobVertices) {
+        for (ExecutionJobVertex executionJobVertex : executionJobVertices) {
+            for (ExecutionVertex executionVertex : executionJobVertex.getTaskVertices()) {
+                executionVerticesById.put(executionVertex.getID(), executionVertex);
+                resultPartitionsById.putAll(executionVertex.getProducedPartitions());
+            }
+        }
+    }


Nice, this is a very good solution :-)

tillrohrmann · 2021-03-05T13:04:46Z

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManager.java

+        // sanity check
+        checkState(
+                consumers.isEmpty(), "Currently there has to be exactly one consumer in real jobs");


This checkState and the one above seem to be testing the same thing. I would keep only one. Ideally one with an explanation message.

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManagerBuildUtil.java

...untime/src/main/java/org/apache/flink/runtime/scheduler/strategy/ConsumedPartitionGroup.java

tillrohrmann · 2021-03-05T13:14:15Z

flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptive/ExecutingTest.java

+
+        @Override
+        public EdgeManager getEdgeManager() {
+            return null;


Let's implement this method with UnsupportedOperationException

tillrohrmann · 2021-03-05T13:14:20Z

flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptive/ExecutingTest.java

+
+        @Override
+        public ExecutionVertex getExecutionVertexOrThrow(ExecutionVertexID id) {
+            return null;


Let's implement this method with UnsupportedOperationException

tillrohrmann · 2021-03-05T13:14:23Z

flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptive/ExecutingTest.java

+        @Override
+        public IntermediateResultPartition getResultPartitionOrThrow(
+                IntermediateResultPartitionID id) {
+            return null;


Let's implement this method with UnsupportedOperationException

…mediateResultPartitionID

… POINTWISE edges follows the initial logic

…PartitionGroup

This closes apache#14868.

Thesharing · 2021-03-05T18:45:29Z

Thank you, @tillrohrmann and @zhuzhurk! I believe I've learnt a lot from your suggestions. I'll start to prepare the next pull request. Thank you for these enlightening reviews!

This closes apache#14868.

rmetzger added review=description? component=Runtime/Coordination labels Feb 4, 2021

Thesharing changed the title ~~[FLINK-21110][runtime] Optimize building topology when initializing ExecutionGraph~~ [FLINK-21326][runtime] Optimize building topology when initializing ExecutionGraph Feb 9, 2021

Thesharing force-pushed the flink-21110 branch from 85af2bb to 997d4dd Compare February 19, 2021 01:05

zhuzhurk reviewed Feb 23, 2021

View reviewed changes

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManagerBuildUtil.java Outdated Show resolved Hide resolved

Thesharing force-pushed the flink-21110 branch from 997d4dd to 62df715 Compare February 24, 2021 06:35

Thesharing commented Feb 24, 2021

View reviewed changes

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/EdgeManagerBuildUtil.java Outdated Show resolved Hide resolved

zhuzhurk reviewed Feb 24, 2021

View reviewed changes

flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/DistributionPattern.java Outdated Show resolved Hide resolved

zhuzhurk reviewed Feb 24, 2021

View reviewed changes

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java Outdated Show resolved Hide resolved

Thesharing force-pushed the flink-21110 branch 3 times, most recently from e19dc30 to caf86ff Compare February 25, 2021 12:15

zhuzhurk reviewed Feb 26, 2021

View reviewed changes

flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/PointwisePatternTest.java Outdated Show resolved Hide resolved

zhuzhurk reviewed Feb 26, 2021

View reviewed changes

flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/PointwisePatternTest.java Outdated Show resolved Hide resolved

Thesharing force-pushed the flink-21110 branch from cad67a0 to cb77b97 Compare February 26, 2021 07:47

zhuzhurk approved these changes Feb 26, 2021

View reviewed changes

tillrohrmann requested changes Feb 27, 2021

View reviewed changes

Thesharing commented Mar 2, 2021

View reviewed changes

Thesharing force-pushed the flink-21110 branch from cb77b97 to 2fd0280 Compare March 2, 2021 09:43

zhuzhurk reviewed Mar 2, 2021

View reviewed changes

Thesharing force-pushed the flink-21110 branch from 2fd0280 to 016599e Compare March 2, 2021 10:01

tillrohrmann requested changes Mar 2, 2021

View reviewed changes

Thesharing mentioned this pull request Mar 3, 2021

[FLINK-21576][runtime] Remove legacy ExecutionVertex#getPreferredLocations() #15069

Merged

Thesharing commented Mar 3, 2021

View reviewed changes

Thesharing force-pushed the flink-21110 branch 2 times, most recently from 782c2e6 to 3ed3f44 Compare March 4, 2021 15:39

Thesharing force-pushed the flink-21110 branch 3 times, most recently from 9b89841 to 1a9fa79 Compare March 5, 2021 08:21

tillrohrmann approved these changes Mar 5, 2021

View reviewed changes

Thesharing added 4 commits March 5, 2021 14:22

[hotfix] Expose the partitionNum and IntermediateDataSetID from Inter…

fa3bbf4

…mediateResultPartitionID

[FLINK-21326] Add tests to make sure the descendant logic of building…

7e5b388

… POINTWISE edges follows the initial logic

[FLINK-21326] Introduce EdgeManager, ConsumerVertexGroup and Consumed…

e2379c1

…PartitionGroup

[FLINK-21326] Optimize the topology building in ExecutionGraph

91c2d95

This closes apache#14868.

tillrohrmann force-pushed the flink-21110 branch from 1a9fa79 to 91c2d95 Compare March 5, 2021 13:23

tillrohrmann closed this in 91c2d95 Mar 5, 2021

tillrohrmann merged commit 91c2d95 into apache:master Mar 5, 2021

autophagy pushed a commit to autophagy/flink that referenced this pull request Mar 16, 2021

[FLINK-21326] Optimize the topology building in ExecutionGraph

3e2989b

This closes apache#14868.

		// sanity check
		checkState(consumedPartitions.size() == inputNumber);

		this.partitionId = new IntermediateResultPartitionID(totalResult.getId(), partitionNumber);

		producer.getExecutionGraph().registerResultPartition(partitionId, this);

[FLINK-21326][runtime] Optimize building topology when initializing ExecutionGraph #14868

[FLINK-21326][runtime] Optimize building topology when initializing ExecutionGraph #14868

Conversation

Thesharing commented Feb 4, 2021 • edited

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Feb 4, 2021

Automated Checks

Review Progress

flinkbot commented Feb 4, 2021 • edited

CI report:

zhuzhurk left a comment

Choose a reason for hiding this comment

Thesharing left a comment

Choose a reason for hiding this comment

zhuzhurk left a comment

Choose a reason for hiding this comment

tillrohrmann commented Feb 26, 2021

tillrohrmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thesharing left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tillrohrmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thesharing left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thesharing commented Mar 5, 2021

tillrohrmann commented Mar 5, 2021

tillrohrmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thesharing commented Mar 5, 2021

Thesharing commented Feb 4, 2021 •

edited

flinkbot commented Feb 4, 2021 •

edited

Thesharing left a comment •

edited