New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-28380][runtime] Produce one intermediate dataset for multiple consumer job vertices consuming the same data #20351
Conversation
…consumer job vertices consuming the same data This closes apache#20351.
Migrated tests include StreamingJobGraphGeneratorTest, ForwardForConsecutiveHashPartitionerTest and ForwardForUnspecifiedPartitionerTest. This closes apache#20351.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wsry for the PR! I left some comments.
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamConfig.java
Outdated
Show resolved
Hide resolved
...ming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java
Outdated
Show resolved
Hide resolved
...ming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java
Outdated
Show resolved
Hide resolved
...ming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java
Outdated
Show resolved
Hide resolved
…consumer job vertices consuming the same data This closes apache#20351.
Migrated tests include StreamingJobGraphGeneratorTest, ForwardForConsecutiveHashPartitionerTest and ForwardForUnspecifiedPartitionerTest. This closes apache#20351.
@gaoyunhaii Thanks for the review and feed back. I added a fixup commit. Please take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wsry Thanks for your nice work, I only left some minor comments about test.
...rc/test/java/org/apache/flink/runtime/executiongraph/BlockingResultPartitionReleaseTest.java
Outdated
Show resolved
Hide resolved
...test/java/org/apache/flink/runtime/executiongraph/DefaultExecutionGraphConstructionTest.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionJobVertexTest.java
Outdated
Show resolved
Hide resolved
...e/src/test/java/org/apache/flink/runtime/scheduler/adapter/DefaultExecutionTopologyTest.java
Outdated
Show resolved
Hide resolved
...e/src/test/java/org/apache/flink/runtime/scheduler/adapter/DefaultExecutionTopologyTest.java
Outdated
Show resolved
Hide resolved
...e/src/test/java/org/apache/flink/runtime/scheduler/adapter/DefaultExecutionTopologyTest.java
Outdated
Show resolved
Hide resolved
...ime/src/test/java/org/apache/flink/runtime/scheduler/adapter/DefaultResultPartitionTest.java
Outdated
Show resolved
Hide resolved
…consumer job vertices consuming the same data This closes apache#20351.
Migrated tests include StreamingJobGraphGeneratorTest, ForwardForConsecutiveHashPartitionerTest and ForwardForUnspecifiedPartitionerTest. This closes apache#20351.
@gaoyunhaii I have updated the PR. |
…consumer job vertices consuming the same data This closes apache#20351.
Migrated tests include StreamingJobGraphGeneratorTest, ForwardForConsecutiveHashPartitionerTest and ForwardForUnspecifiedPartitionerTest. This closes apache#20351.
Thanks @wsry for the PR! LGTM, let's have a final check after the commit merged. |
…consumer job vertices consuming the same data This closes apache#20351.
Migrated tests include StreamingJobGraphGeneratorTest, ForwardForConsecutiveHashPartitionerTest and ForwardForUnspecifiedPartitionerTest. This closes apache#20351.
…consumer job vertices consuming the same data This closes apache#20351.
…consumer job vertices consuming the same data This closes #20351.
…consumer job vertices consuming the same data This closes apache#20351.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wsry for the PR! LGTM
…consumer job vertices consuming the same data This closes apache#20351.
…consumer job vertices consuming the same data This closes apache#20351.
Tests passed: https://dev.azure.com/kevin-flink/flink/_build/results?buildId=624&view=results, merging. |
…consumer job vertices consuming the same data This closes apache#20351.
What is the purpose of the change
Currently, if one output of an upstream job vertex is consumed by multiple downstream job vertices, the upstream vertex will produce multiple dataset. For blocking shuffle, it means serialize and persist the same data multiple times. This ticket aims to optimize this behavior and make the upstream job vertex produce one dataset which will be read by multiple downstream vertex.
Brief change log
Verifying this change
This change added tests.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation