[FLINK-21332][runtime] Optimize releasing result partitions in RegionPartitionReleaseStrategy #15314

Thesharing · 2021-03-22T08:52:41Z

What is the purpose of the change

This pull request introduce the optimization of releasing result partitions in RegionPartitionReleaseStrategy.
RegionPartitionReleaseStrategy is responsible for releasing result partitions when all the downstream tasks finish.

The current implementation is:

for each consumed SchedulingResultPartition of current finished SchedulingPipelinedRegion:
  for each consumer SchedulingPipelinedRegion of the SchedulingResultPartition:
    if all the regions are finished:
      release the partitions

The time complexity of releasing a result partition is O(N^2). However, considering that during the entire stage, all the result partitions need to be released, the time complexity is actually O(N^3).

Based on FLINK-21228, the consumed result partitions of a pipelined region are grouped. Since the result partitions in one group are isomorphic, we can just cache the finished status of the pipeline regions and the fully consumed status of result partition groups.

The optimized implementation is:

for each ConsumedPartitionGroup of current finished SchedulingPipelinedRegion:
  if all consumer SchedulingPipelinedRegion of the ConsumedPartitionGroup are finished:
    set the ConsumePartitionGroup to be fully consumed
    for result partition in the ConsumePartitionGroup:
      if all the ConsumePartitionGroups it belongs to are fully consumed:
        release the result partition

After the optimization, the complexity decreases from O(N^3) to O(N).

For more details, please check FLINK-21332.

Brief change log

Optimize RegionPartitionReleaseStrategy#filterReleasablePartitions

Verifying this change

Since this optimization does not change the original logic of releasing result partitions in RegionPartitionReleaseStrategy, we believe that this change is already covered by RegionPartitionReleaseStrategyTest.

For newly added class ConsumerRegionGroupExecutionViewMaintainer and ConsumerRegionGroupExecutionView, we added the test case ConsumerRegionGroupExecutionViewMaintainerTest.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

flinkbot · 2021-03-22T08:56:08Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit f6a937c (Sat Aug 28 11:09:01 UTC 2021)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

flinkbot · 2021-03-22T09:20:12Z

CI report:

e504631 UNKNOWN
f6a937c Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run travis re-run the last Travis build
@flinkbot run azure re-run the last Azure build

Thesharing · 2021-03-22T12:29:01Z

It's worth noting that although after the optimization the complexity is O(N), the time cost on RegionPartitionReleaseStrategy#filterReleasablePartitions is still a bit long. As illustrated in the figure below, we can see that most of time is spent on HashMap.get.

Thesharing · 2021-03-26T03:15:04Z

@flinkbot run azure

zhuzhurk · 2021-03-26T03:32:49Z

It's worth noting that although after the optimization the complexity is O(N), the time cost on RegionPartitionReleaseStrategy#filterReleasablePartitions is still a bit long. As illustrated in the figure below, we can see that most of time is spent on HashMap.get.

I would suggest to introduce the assumption that one IntermediateResultPartition can have one only ConsumerVertexGroup(indicating that one IntermediateDataSet can have one only consumer JobEdge which is already a widely assumption in flink-runtime at the moment). This can help to reduce the complexity for each vertexFinished() invocation to O(1).

Thesharing · 2021-03-26T03:37:50Z

I would suggest to introduce the assumption that one IntermediateResultPartition can have one only ConsumerVertexGroup(indicating that one IntermediateDataSet can have one only consumer JobEdge which is already a widely assumption in flink-runtime at the moment). This can help to reduce the complexity for each vertexFinished() invocation to O(1).

Thanks for proposing this solution 👍 I've already add the check and comments according to this assumption. Would you mind re-reviewing it once you got any free time?

...runtime/executiongraph/failover/flip1/partitionrelease/ConsumerRegionGroupExecutionView.java

...k/runtime/executiongraph/failover/flip1/partitionrelease/RegionPartitionReleaseStrategy.java

.../executiongraph/failover/flip1/partitionrelease/ConsumerRegionGroupExecutionViewTracker.java

...cutiongraph/failover/flip1/partitionrelease/ConsumerRegionGroupExecutionViewTrackerTest.java

zhuzhurk

Here are some comments for the renaming commits.

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/PartitionDescriptor.java

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java

...k-runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionGraphTestUtils.java

flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/PointwisePatternTest.java

Thesharing · 2021-03-30T03:09:01Z

...iongraph/failover/flip1/partitionrelease/ConsumerRegionGroupExecutionViewMaintainerTest.java

+        consumerRegion = new TestingSchedulingPipelinedRegion(Collections.singleton(consumer));
+    }
+
+    private void createConsumerRegionGroupExecutionViewTracker() {


Suggested change

private void createConsumerRegionGroupExecutionViewTracker() {

private void createConsumerRegionGroupExecutionViewMaintainer() {

zhuzhurk

Thanks for addressing all the comments @Thesharing
The change looks good to me except for several minor comments.

...iongraph/failover/flip1/partitionrelease/ConsumerRegionGroupExecutionViewMaintainerTest.java

...ecutiongraph/failover/flip1/partitionrelease/ConsumerRegionGroupExecutionViewMaintainer.java

...k/runtime/executiongraph/failover/flip1/partitionrelease/RegionPartitionReleaseStrategy.java

…tex#getConsumedPartitionGroup

…ateResultPartition#getConsumerVertexGroups

…ReleaseStrategy

zhuzhurk

LGTM.
Merging.

rmetzger added the review=description? label Mar 22, 2021

Thesharing force-pushed the flink-21332 branch from 3764ec2 to 888160a Compare March 22, 2021 09:48

rmetzger added the component=Runtime/Coordination label Mar 22, 2021

Thesharing force-pushed the flink-21332 branch from 888160a to 7512eec Compare March 24, 2021 12:10

Thesharing changed the title ~~[FLINK-21332] Optimize releasing result partitions in RegionPartitionReleaseStrategy~~ [FLINK-21332][runtime] Optimize releasing result partitions in RegionPartitionReleaseStrategy Mar 24, 2021

Thesharing force-pushed the flink-21332 branch 2 times, most recently from 2dc45db to d69e745 Compare March 25, 2021 10:07

Thesharing force-pushed the flink-21332 branch from d69e745 to 2ec0121 Compare March 26, 2021 03:16

zhuzhurk reviewed Mar 26, 2021

View reviewed changes

zhuzhurk mentioned this pull request Mar 28, 2021

[FLINK-21731] Add benchmarks for DefaultScheduler's creation, scheduling and deploying apache/flink-benchmarks#11

Merged

Thesharing force-pushed the flink-21332 branch from 2ec0121 to 2c101e7 Compare March 29, 2021 12:56

zhuzhurk force-pushed the flink-21332 branch from 2c101e7 to e504631 Compare March 29, 2021 17:41

zhuzhurk reviewed Mar 29, 2021

View reviewed changes

Thesharing force-pushed the flink-21332 branch from e504631 to cd29b72 Compare March 29, 2021 18:04

Thesharing commented Mar 30, 2021

View reviewed changes

zhuzhurk reviewed Mar 30, 2021

View reviewed changes

Thesharing added 3 commits March 30, 2021 11:59

[hotfix] Rename ExecutionVertex#getConsumedPartitions to ExecutionVer…

267f664

…tex#getConsumedPartitionGroup

[hotfix] Rename IntermediateResultPartition#getConsumers to Intermedi…

19b0b5d

…ateResultPartition#getConsumerVertexGroups

[FLINK-21332] Optimize releasing result partitions in RegionPartition…

f6a937c

…ReleaseStrategy

Thesharing force-pushed the flink-21332 branch from cd29b72 to f6a937c Compare March 30, 2021 04:06

zhuzhurk approved these changes Mar 30, 2021

View reviewed changes

zhuzhurk closed this in 9951be8 Mar 30, 2021

	private void createConsumerRegionGroupExecutionViewTracker() {
	private void createConsumerRegionGroupExecutionViewMaintainer() {

[FLINK-21332][runtime] Optimize releasing result partitions in RegionPartitionReleaseStrategy #15314

[FLINK-21332][runtime] Optimize releasing result partitions in RegionPartitionReleaseStrategy #15314

Uh oh!

Conversation

Thesharing commented Mar 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Mar 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Checks

Review Progress

Uh oh!

flinkbot commented Mar 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

Thesharing commented Mar 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thesharing commented Mar 26, 2021

Uh oh!

zhuzhurk commented Mar 26, 2021

Uh oh!

Thesharing commented Mar 26, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhuzhurk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Thesharing Mar 30, 2021

Choose a reason for hiding this comment

Uh oh!

Thesharing Mar 30, 2021

Choose a reason for hiding this comment

Uh oh!

zhuzhurk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhuzhurk left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Thesharing commented Mar 22, 2021 •

edited

Loading

flinkbot commented Mar 22, 2021 •

edited

Loading

flinkbot commented Mar 22, 2021 •

edited

Loading

Thesharing commented Mar 22, 2021 •

edited

Loading