-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-21926][doc] Add docs for fine-grained resource management #16561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 8cd1586 (Thu Jul 22 03:37:50 UTC 2021) ✅no warnings Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
sjwiesman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a first pass
| ## How it works | ||
|
|
||
| As described in [Flink Architecture]({{< ref "docs/concepts/flink-architecture" >}}#anatomy-of-a-flink-cluster), | ||
| the resource for task execution in TaskManager is split into a bunch of slots, where job tasks are scheduled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| the resource for task execution in TaskManager is split into a bunch of slots, where job tasks are scheduled. | |
| task execution resources in a TaskManager are split into many slots. |
|
|
||
| As described in [Flink Architecture]({{< ref "docs/concepts/flink-architecture" >}}#anatomy-of-a-flink-cluster), | ||
| the resource for task execution in TaskManager is split into a bunch of slots, where job tasks are scheduled. | ||
| The slot is the basic unit of both resource scheduling and resource requirement in Flink runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The slot is the basic unit of both resource scheduling and resource requirement in Flink runtime. | |
| The slot is the basic unit of both resource scheduling and resource requirement in Flinks runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Flink's?
| In fine-grained resource management, the slots are requested with specific resource profiles, which can be specified by users. | ||
| Flink will respect those user-specified resource requirements and dynamically cut an exactly-matched slot out of the TaskManager’s available | ||
| resources. As shown above, there is a requirement for a slot with 0.25 Core and 1GB memory and Flink allocates *Slot 1* for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| In fine-grained resource management, the slots are requested with specific resource profiles, which can be specified by users. | |
| Flink will respect those user-specified resource requirements and dynamically cut an exactly-matched slot out of the TaskManager’s available | |
| resources. As shown above, there is a requirement for a slot with 0.25 Core and 1GB memory and Flink allocates *Slot 1* for it. | |
| With fine-grained resource management, the slots requests contain specific resource profiles, which users can specify. | |
| Flink will respect those user-specified resource requirements and dynamically cut an exactly-matched slot out of the TaskManager’s available | |
| resources. As shown above, there is a requirement for a slot with 0.25 Core and 1GB memory, and Flink allocates *Slot 1* for it. |
| Previously in Flink, the resource requirement only contained the number of the required slots, without fine-grained resource | ||
| profiles, namely **coarse-grained resource management**. The TaskManager contained a fixed number of identical slots to fulfill those requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Previously in Flink, the resource requirement only contained the number of the required slots, without fine-grained resource | |
| profiles, namely **coarse-grained resource management**. The TaskManager contained a fixed number of identical slots to fulfill those requirements. |
I moved this down
| In fine-grained resource management, the slots are requested with specific resource profiles, which can be specified by users. | ||
| Flink will respect those user-specified resource requirements and dynamically cut an exactly-matched slot out of the TaskManager’s available | ||
| resources. As shown above, there is a requirement for a slot with 0.25 Core and 1GB memory and Flink allocates *Slot 1* for it. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| {{< hint info >}} | |
| Previously in Flink, the resource requirement only contained the number of the required slots, without fine-grained resource | |
| profiles, namely **coarse-grained resource management**. The TaskManager contained a fixed number of identical slots to fulfill those requirements. | |
| {{< /hint >}} |
| <div class="alert alert-info"> | ||
| <strong>Note:</strong> Each slot sharing group can only attach to one specified resource, any conflict will fail the compiling of your job. | ||
| </div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <div class="alert alert-info"> | |
| <strong>Note:</strong> Each slot sharing group can only attach to one specified resource, any conflict will fail the compiling of your job. | |
| </div> | |
| {{< hint warning >}} | |
| **Note:** Each slot sharing group can only attach to one specified resource, any conflict will fail the compiling of your job. | |
| {{< /hint >}} |
| Flink will respect those user-specified resource requirements and dynamically cut an exactly-matched slot out of the TaskManager’s available | ||
| resources. As shown above, there is a requirement for a slot with 0.25 Core and 1GB memory and Flink allocates *Slot 1* for it. | ||
|
|
||
| For the resource requirement without a specified resource profile, Flink will automatically decide the resource profile of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| For the resource requirement without a specified resource profile, Flink will automatically decide the resource profile of it. | |
| For the resource requirement without a specified resource profile, Flink will automatically decide a resource profile. |
| the Flink runtime selects a TaskManager to cut slots and allocates TaskManagers on [Native Kubernetes]({{< ref "docs/deployment/resource-providers/native_kubernetes" >}}) | ||
| and [YARN]({{< ref "docs/deployment/resource-providers/yarn" >}}). Note that the resource allocation strategy is pluggable in | ||
| Flink runtime and here we introduce its default implementation in the first step of fine-grained resource | ||
| management. In the future, there might be various strategies that can be selected for different scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| management. In the future, there might be various strategies that can be selected for different scenarios. | |
| management. In the future, there might be various strategies that users can select for different scenarios. |
| Apache Flink allows you to control the resource consumption of your workload in a finer granularity, namely **fine-grained resource management**. | ||
| It provides means for users to further improve Flink’s resource efficiency with knowledge of their specific scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Apache Flink allows you to control the resource consumption of your workload in a finer granularity, namely **fine-grained resource management**. | |
| It provides means for users to further improve Flink’s resource efficiency with knowledge of their specific scenarios. | |
| Apache Flink works hard to auto-derive sensible default resource requirements for all applications out of the box. | |
| For users who wish to fine-tune their resource consumption, based on knowledge of their specific scenarios, Flink offers **fine-grained resource management**. |
I want to make it clear to new users that you don't have to do this.
|
Thanks for the valuable comments! @sjwiesman PR updated. |
|
@sjwiesman Hi, would you like to give it another pass? |
What is the purpose of the change
(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)
Brief change log
(for example:)
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation