Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-25226][doc] Add documentation about the AdaptiveBatchScheduler #18757

Closed
wants to merge 1 commit into from

Conversation

wanglijie95
Copy link
Contributor

What is the purpose of the change

Add documentation about the AdaptiveBatchScheduler

Verifying this change

Document change without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 8f1fbaa (Mon Feb 14 13:57:17 UTC 2022)

Warnings:

  • This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 14, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@zhuzhurk zhuzhurk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding docs for adaptive batch scheduler! @wanglijie95
I have a few comments. Please take a look.

docs/content/docs/deployment/adaptive_batch_scheduler.md Outdated Show resolved Hide resolved
docs/content/docs/deployment/adaptive_batch_scheduler.md Outdated Show resolved Hide resolved
docs/content/docs/deployment/adaptive_batch_scheduler.md Outdated Show resolved Hide resolved
docs/content/docs/deployment/adaptive_batch_scheduler.md Outdated Show resolved Hide resolved

### Limitations

- **ALL-EDGES-BLOCKING batch jobs only**: The first version of Adaptive Batch Scheduler only supports ALL-EDGES-BLOCKING batch jobs only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALL-EDGES-BLOCKING -> ALL-EXCHANGES-BLOCKING

And maybe add a link to the config option "execution.batch-shuffle-mode" for reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first version of -> At the moment,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are 2 only and either should be removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? (from the user perspective)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for @zhuzhurk 's comment. Just tell user adaptive batch scheduler only support the case where execution.batch-shuffle-mode is ALL-EXCHANGES-BLOCKING, and link to the config pages.

@zhuzhurk
Copy link
Contributor

@tillrohrmann @dmvk would you help to take a look at the EN version document if it is convenient?

@dmvk
Copy link
Member

dmvk commented Feb 15, 2022

I'm not sure whether ABS should have its own "top level" section under the deployment menu. Would it make sense to incorporate this into elastic scaling page?

Copy link
Member

@dmvk dmvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @wanglijie95 👍 It's great to see that this feature will get a proper documentation 😍. I'm mostly concerned about what audience are we targeting with this docs, I think we should take a less advanced users into consideration here, because this is a really cool feature that many people will want to try out.

Also it would be nice to add a section about how this could be integrated with the external shuffle service (without it, this effort lacks the benefit of the being resource effective).

I left some comments in-line, please take a look.

For the grammar, once you're finished, you can ping @infoverload and she can help to correct it.

Are you also planning a blog post for this? It would be a good opportunity to enhance this with some high level pictures that could be then reused.

👍

Comment on lines 28 to 31
The Adaptive Batch Scheduler can automatically decide parallelisms of job vertices for batch jobs. If a job vertex is not set with a parallelism, the scheduler will decide parallelism for the job vertex according to the size of its consumed datasets. This can bring many benefits:
- Batch job users can be relieved from parallelism tuning
- Automatically tuned parallelisms can be vertex level and can better fit consumed datasets which have a varying volume size every day
- Vertices from SQL batch jobs can be assigned with different parallelisms which are automatically tuned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the target audience? Does regular Flink user supposed to know what the job vertex is? Overall this page feels bit too low level 🤔.

On the other hand I don't think that other pages withing this section are all much better in this regard 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the target audience? Does regular Flink user supposed to know what the job vertex is? Overall this page feels bit too low level 🤔.

Thanks for pointing that out. Maybe stage is more appropriate?

On the other hand I don't think that other pages withing this section are all much better in this regard 🤔

I'll check the rest content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use operator, although it is not exactly the same as the job vertex.

Copy link
Contributor

@zhuzhurk zhuzhurk Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for operator. It is the concept that users can/must understand. I think adaptively deciding parallelisms does mean to adaptively deciding parallelisms for operators. We just do not want to break beneficial operator chaining, so that parallelisms are decided for OperatorChain/JobVertex.


#### Set the parallelism of job vertices to `-1`
Adaptive Batch Scheduler will only decide parallelism for job vertices whose parallelism is not specified by users (parallelism is `-1`). So if you want the parallelism of vertices can be decided automatically, you should configure as follows:
- Set `paralleims.default` to `-1`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

- Set the parallelism of job vertices to `-1`.

#### Configure to use Adaptive Batch Scheduler
To use Adaptive Batch Scheduler, you need to set the [`jobmanager.scheduler`]({{< ref "docs/deployment/config" >}}#jobmanager-scheduler) to `AdpaptiveBatch`. In addition, there are several optional config options that might need adjustment when using Adaptive Batch Scheduler:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo AdpaptiveBatch

- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref "docs/deployment/config" >}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): The size of data volume to expect each task instance to process
- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref "docs/deployment/config" >}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): The default parallelism of source vertices

#### Set the parallelism of job vertices to `-1`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we set the defaults automatically when the ABS is enabled? Are there cases where we can't assume that this is what user wants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users explicitly configure the parallelism.default (with a value > 0) in flink-conf, but we override this value with -1, I think this may give the users a feeling that the configuration does not take effect. Maybe we can check the value of parallelism.default and then print an ERROR or WARNING log if the value > 0 ?


### Performance tuning

1. It's recommended to use `Sort Shuffle` and set [`taskmanager.network.memory.buffers-per-channel`]({{< ref "docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to `0`. This can decouple the network memory consumption from parallelism, so for large scale jobs, the possibility of "Insufficient number of network buffers" error can be decreased.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to link this with a blog post?

Copy link
Contributor

@zhuzhurk zhuzhurk Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to add a link to "https://flink.apache.org/2021/10/26/sort-shuffle-part1.html" (or maybe "https://flink.apache.org/2021/10/26/sort-shuffle-part1.html#motivation-behind-the-sort-based-implementation" which explains the benefits of Sort Shuffle including saving network buffers).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this. I will add it.

### Performance tuning

1. It's recommended to use `Sort Shuffle` and set [`taskmanager.network.memory.buffers-per-channel`]({{< ref "docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to `0`. This can decouple the network memory consumption from parallelism, so for large scale jobs, the possibility of "Insufficient number of network buffers" error can be decreased.
2. It's not recommended to configure an excessive value for [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-scheduler-adaptive-batch-max-parallelism), otherwise it will affect the performance. Because this option can affect the number of subpartitions produced by upstream tasks, excessive number of subpartitions may degrade the performance of hash shuffle and the performance of network transmission due to small packets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an excessive value in this context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the maximum parallelism should be set to the parallelism you expect to need to process the data in the worst case, a value large than it (expect value in worst case) can be considered as "excessive value". I will revise the description in this part.


### Limitations

- **ALL-EDGES-BLOCKING batch jobs only**: The first version of Adaptive Batch Scheduler only supports ALL-EDGES-BLOCKING batch jobs only.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? (from the user perspective)

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @wanglijie95. I think it is already really good. One thing that I missing is what David said: An explanation for less advanced users would be really cool. I think it could go along the lines of how the batch scheduler works and what benefits it brings.

docs/content/docs/deployment/adaptive_batch_scheduler.md Outdated Show resolved Hide resolved
@zhuzhurk
Copy link
Contributor

I'm not sure whether ABS should have its own "top level" section under the deployment menu. Would it make sense to incorporate this into elastic scaling page?

Good idea! +1 to add this doc as an Adaptive Batch Scheduler section in Elastic Scaling page.

@wanglijie95
Copy link
Contributor Author

Are you also planning a blog post for this? It would be a good opportunity to enhance this with some high level pictures that could be then reused.

Yes, blog post is in our plan, maybe shortly after 1.15 release.

@wanglijie95
Copy link
Contributor Author

wanglijie95 commented Feb 17, 2022

Thanks for your comments @zhuzhurk @dmvk @tillrohrmann, this goes a long way towards perfecting this document. I've updated the document, looking forward for your further feedback.

@dmvk Currently RSS does not support ABS (mainly does not support one input gate consumes subpartition range), so the part of integrating with external shuffle services , I think it can be added after RSS is adapted to ABS. When posting blog posts in the future, if the adaptation of RSS has finished, users can also be recommended to use it.

docs/content.zh/docs/deployment/elastic_scaling.md Outdated Show resolved Hide resolved
Comment on lines 28 to 31
The Adaptive Batch Scheduler can automatically decide parallelisms of job vertices for batch jobs. If a job vertex is not set with a parallelism, the scheduler will decide parallelism for the job vertex according to the size of its consumed datasets. This can bring many benefits:
- Batch job users can be relieved from parallelism tuning
- Automatically tuned parallelisms can be vertex level and can better fit consumed datasets which have a varying volume size every day
- Vertices from SQL batch jobs can be assigned with different parallelisms which are automatically tuned
Copy link
Contributor

@zhuzhurk zhuzhurk Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for operator. It is the concept that users can/must understand. I think adaptively deciding parallelisms does mean to adaptively deciding parallelisms for operators. We just do not want to break beneficial operator chaining, so that parallelisms are decided for OperatorChain/JobVertex.

@zhuzhurk
Copy link
Contributor

Thanks for addressing the comments! The change looks good to me.
@dmvk do you want to take another look?

docs/content/docs/deployment/elastic_scaling.md Outdated Show resolved Hide resolved
docs/content/docs/deployment/elastic_scaling.md Outdated Show resolved Hide resolved
docs/content/docs/deployment/elastic_scaling.md Outdated Show resolved Hide resolved
docs/content/docs/deployment/elastic_scaling.md Outdated Show resolved Hide resolved
docs/content/docs/deployment/elastic_scaling.md Outdated Show resolved Hide resolved
docs/content/docs/deployment/elastic_scaling.md Outdated Show resolved Hide resolved
docs/content.zh/docs/deployment/elastic_scaling.md Outdated Show resolved Hide resolved
docs/content.zh/docs/deployment/elastic_scaling.md Outdated Show resolved Hide resolved
docs/content.zh/docs/deployment/elastic_scaling.md Outdated Show resolved Hide resolved
docs/content.zh/docs/deployment/elastic_scaling.md Outdated Show resolved Hide resolved
Copy link
Contributor

@zhuzhurk zhuzhurk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all the comments. @wanglijie95
The doc now looks good to me.

@wanglijie95
Copy link
Contributor Author

@flinkbot run azure

@zhuzhurk zhuzhurk closed this in 192351e Mar 20, 2022
@wanglijie95 wanglijie95 deleted the FLINK-25226 branch March 21, 2022 01:52
JasonLeeCoding pushed a commit to JasonLeeCoding/flink that referenced this pull request May 27, 2022
zstraw pushed a commit to zstraw/flink that referenced this pull request Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants