-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-5473] Limit MaxParallelism to 1 for non-parallel operators and improve choice of max parallelism without explicit configuration #3182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @uce |
f6081f3 to
fc2e0fb
Compare
tillrohrmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good changes @StefanRRichter.
I was wondering whether we could simplify the max parallelism logic in ExecutionJobVertex a little bit. For example, we could have only the field maxParallelism and boolean maxParallelismConfigured which we initialize in the constructor. If the passed parameter equals VALUE_NOT_SET, then we use KeyGroupRangeAssignment.computeDefaultMaxParallelism and set maxParallelismConfigured to false. If not, then we set it to true. Now we only allow changes to maxParallelism if maxParallelismConfigured == false. I think this could simplify the logic a little bit (especially in getMaxParallelism). What do you think?
Apart from that +1 for merging.
| this.tasks = tasks; | ||
| this.latest = latest; | ||
| this.taskStates = taskStates; | ||
| this.allowNonRestoredState = allowNonRestoredState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precondition checks could be helpful here.
| this.allowNonRestoredState = allowNonRestoredState; | ||
| } | ||
|
|
||
| public boolean assignStates() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems a bit lengthy. Maybe we could split it up.
| JobVertex jobVertex, | ||
| int defaultParallelism, | ||
| Time timeout) throws JobException, IOException { | ||
| Time timeout) throws JobException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method declaration parameters which are broken into multiple lines are usually indented twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, but I kept the indentation to avoid formatting changes.
|
|
||
| Preconditions.checkArgument(maxParallelism > 0 | ||
| && maxParallelism <= KeyGroupRangeAssignment.UPPER_BOUND_MAX_PARALLELISM, | ||
| "Overriding max parallelism is not in valid bounds: " + maxParallelism); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add the valid bounds here.
|
|
||
| List<List<ExecutionEdge>> consumers = partition.getConsumers(); | ||
|
|
||
| if(consumers.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace missing between if and (
| StreamEdge outEdge = outEdgesInOrder.get(i); | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One line break would probably be enough here.
| /** | ||
| * Returns the effective max parallelism. This value is determined in the following order of priority: | ||
| * <p> | ||
| * (maxParallelismConfigured) overrides (maxParallelismOverride) override (max(128, roundUp(parallelism)) / default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxParallelismOverride => maxParallelismDerived?
|
Thanks for the review, @tillrohrmann! I followed all of your suggestions, except for the indentation formatting. |
41fb61e to
c277a7d
Compare
c277a7d to
e9aa755
Compare
|
Rebased. |
|
Changes look good. Travis passed. Merging this PR. Thanks for your work @StefanRRichter :-) |
[FLINK-5473] Better default behaviours for unspecified maximum parallelism This closes apache#3182.
[FLINK-5473] Better default behaviours for unspecified maximum parallelism This closes apache#3182.
|
I've merged the PR to the |
[FLINK-5473] Better default behaviours for unspecified maximum parallelism This closes apache#3182.
[FLINK-5473] Better default behaviours for unspecified maximum parallelism This closes apache#3182.
This PR limits the maximum parallelism for non-parallel operator to 1.
Furthermore, this improves the default behaviour if the user did not explicitly specify a maximum parallelism. In particular, maximum parallelism can now be derived from savepoints, allowing users that migrate from Flink 1.1 to Flink 1.2 to keep their job unchanged.