-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-14566] Enable to get/set whether an operator uses managed memory #10427
[FLINK-14566] Enable to get/set whether an operator uses managed memory #10427
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 22ccb31 (Thu Dec 05 04:38:52 UTC 2019) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
* memory in runtime (linear association). Note that it only works in cases of UNKNOWN | ||
* resources. | ||
*/ | ||
private int managedMemoryWeight = DEFAULT_MANAGED_MEMORY_WEIGHT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to express I don't need managed memory
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One can set the weight to value 0 explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making don't need managed memory
as default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to be aligned with other resources like cpu and heap memory. Operators with UNKNOWN resources are always able to acquire all kinds of available resources currently.
I'd prefer to not making managed memory a special one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what kind of ResourceSpec I should use when some operator doesn't need managed memory? First set UNKNOWN resource and then set managed memory to 0 explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what kind of ResourceSpec I should use when some operator doesn't need managed memory? First set UNKNOWN resource and then set managed memory to 0 explicitly?
Yes. Weights only work in cases of UNKNOWN resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am +1 to default don't need managed memory
.
Note it is manage memory, only way is requiring memory from memory manager explicitly. It is different from other resources.
And image user write an operator which using manage memory, and he believe there is only one operator to use manage memory, that is his operator. But if he use DataStream/DataSet api, whatever operators including map/source... these operators will rob his memories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note it is manage memory, only way is requiring memory from memory manager explicitly. It is different from other resources.
How much managed memory an operator can acquire actually depends on the declared resources rather the the needed resources. For example, one can specify the task_heap_memory/task_offheap_memory/managed_memory of an operator to be a large number even if the operator does not use that much or even does not use that kinds of resources.
The framework should respect that settings. And I don't see managed memory to be special here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And image user write an operator which using manage memory, and he believe there is only one operator to use manage memory, that is his operator. But if he use DataStream/DataSet api, whatever operators including map/source... these operators will rob his memories.
The weight is not a public interface and users cannot set it. So if the user wants to use the fraction, he will always get a 0 managed memory fraction if the default weight is 0, even if the operator requires managed memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @zhuzhurk that the default value should not be 0
. Otherwise we have a problem if a user writes a stateful DataStream
program using RocksDB as he cannot set the weight value. And also if he could, then he would need to remember to set it otherwise his operator wouldn't get any managed memory.
flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @zhuzhurk. LGTM. I'll be addressing my comment while merging this PR.
assertEquals(resources, iterationPair.f0.getMinResources()); | ||
assertEquals(ResourceSpec.ZERO, iterationPair.f1.getMinResources()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are testing for implementation details. Wouldn't it be better to test that the sum of the source and sink resources equals resources
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. Testing the merged resources would be better since the way to split the resources should make no difference at the moment and we actually do not care about it.
Currently resources is only validated in DataStream. But table planner may directly set resources to Transformation via Transformation#setResources which is a public interface. We must validate the resources params in Transformation#setResources.
…urces when the head node has specified resources
…rresponding StreamNode
…arding managed memory weights This only applies to vertices with UNKNOWN resources. This closes apache#10427.
80562d5
to
976039e
Compare
What is the purpose of the change
To calculate managed memory fraction for an operator with UNKNOWN resources, we need to know whether the operator will use managed memory to better utilize memory memory for better performance, according to FLINK-14062.
To achieve this, we need an interface to set/get whether an operator uses managed memory.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation