[FLINK-30631][runtime] Limit the max number of subpartitons consumed by each downstream task #21646

wanglijie95 · 2023-01-11T08:54:39Z

What is the purpose of the change

In the current implementation(FLINK-25035), when the upstream vertex parallelism is much greater than the downstream vertex parallelism, it may lead to a large number of channels in the downstream tasks(for example, A -> B, all to all edge, max parallelism is 1000. If parallelism of A is 1000, parallelism of B is decided to be 1, then the only subtask of B will consume 1000 * 1000 subpartitions), resulting in a large overhead for processing channels.

In this ticket, we temporarily address this issue by limiting the max number of subpartitons consumed by each downstream task. The ultimate solution should be to support single channel consume multiple subpartitons.

Verifying this change

Unit tests:
DefaultVertexParallelismAndInputInfosDeciderTest#testEvenlyDistributeDataWithMaxSubpartitionLimitation
DefaultVertexParallelismAndInputInfosDeciderTest#testDecideParallelismWithMaxSubpartitionLimitation

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (not applicable)

flinkbot · 2023-01-11T09:01:05Z

CI report:

81f6e2b Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

zhuzhurk

Thanks for opening this PR. @wanglijie95
I only have one comment regarding the first commit.

...ache/flink/runtime/scheduler/adaptivebatch/DefaultVertexParallelismAndInputInfosDecider.java

wanglijie95 · 2023-01-13T14:56:38Z

@zhuzhurk Thanks for review, I 've addressed the comment, please take a look.

zhuzhurk

LGTM.

…ism and input infos

…by each downstream task

wanglijie95 · 2023-01-16T06:03:41Z

@flinkbot run azure

…by each downstream task This closes apache#21646

flinkbot added the component=Runtime/Coordination label Jan 11, 2023

wanglijie95 force-pushed the FLINK-30631 branch from 50715d0 to 4129ede Compare January 12, 2023 14:05

zhuzhurk reviewed Jan 13, 2023

View reviewed changes

...ache/flink/runtime/scheduler/adaptivebatch/DefaultVertexParallelismAndInputInfosDecider.java Outdated Show resolved Hide resolved

zhuzhurk approved these changes Jan 15, 2023

View reviewed changes

wanglijie95 added 2 commits January 16, 2023 09:48

[FLINK-30670][runtime] Ignore broadcast bytes when computing parallel…

979177f

…ism and input infos

[FLINK-30631][runtime] Limit the max number of subpartitons consumed …

81f6e2b

…by each downstream task

wanglijie95 force-pushed the FLINK-30631 branch from 8f0f55a to 81f6e2b Compare January 16, 2023 01:49

wanglijie95 closed this in c0c7dc4 Jan 16, 2023

wanglijie95 deleted the FLINK-30631 branch January 16, 2023 09:14

chucheng92 pushed a commit to chucheng92/flink that referenced this pull request Feb 3, 2023

[FLINK-30631][runtime] Limit the max number of subpartitons consumed …

c8c85e8

…by each downstream task This closes apache#21646

akkinenivijay pushed a commit to krisnaru/flink that referenced this pull request Feb 11, 2023

[FLINK-30631][runtime] Limit the max number of subpartitons consumed …

d1f046e

…by each downstream task This closes apache#21646

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-30631][runtime] Limit the max number of subpartitons consumed by each downstream task #21646

[FLINK-30631][runtime] Limit the max number of subpartitons consumed by each downstream task #21646

Uh oh!

wanglijie95 commented Jan 11, 2023 •

edited

Loading

Uh oh!

flinkbot commented Jan 11, 2023 •

edited

Loading

Uh oh!

zhuzhurk left a comment

Uh oh!

Uh oh!

wanglijie95 commented Jan 13, 2023

Uh oh!

zhuzhurk left a comment

Uh oh!

wanglijie95 commented Jan 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[FLINK-30631][runtime] Limit the max number of subpartitons consumed by each downstream task #21646

[FLINK-30631][runtime] Limit the max number of subpartitons consumed by each downstream task #21646

Uh oh!

Conversation

wanglijie95 commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

zhuzhurk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wanglijie95 commented Jan 13, 2023

Uh oh!

zhuzhurk left a comment

Choose a reason for hiding this comment

Uh oh!

wanglijie95 commented Jan 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wanglijie95 commented Jan 11, 2023 •

edited

Loading

flinkbot commented Jan 11, 2023 •

edited

Loading