-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-13473][table-blink] Add stream Windowed FlatAggregate support for blink planner #9396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 0df9723 (Fri Aug 23 10:16:20 UTC 2019) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
|
@flinkbot attention @JingsongLi @wuchong @hequn8128 @aljoscha I appreciate if you can have look at this PR :) |
|
I would suggest to modify the component name in commit message a bit. How about: [FLINK-13473][table-planner-blink] Support windowed TableAggregate in some MetadataHandle And for the pull request title, I would suggest to use "[table-blink]" because it doesn't contain API changes. What do you think? |
|
Thanks for the review @wuchong ! |
JingsongLi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sunjincheng121 , just left some minor comments.
| if (isWindow) { | ||
| // no need to bind input | ||
| val exprGenerator = new ExprCodeGenerator(ctx, INPUT_NOT_NULL) | ||
| var valueExprs = Seq[GeneratedExpression]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use val valueExprs = getWindowExpressions(windowProperties)? Because windowProperties is empty when hasNamespace is false.
| val COLLECTOR_TERM = "out" | ||
| val MEMBER_COLLECTOR_TERM = "convertCollector" | ||
| val CONVERT_COLLECTOR_TYPE_TERM = "ConvertCollector" | ||
| val KEY_TERM = "groupKey" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can name it NAMESPACE_TABLE_AGG_KEY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, and how about NAMESPACE_TABLE_AGG_KEY_TERM. It looks more consistent with other variables which end with _TERM.
|
|
||
| val builder = getWindowOperatorBuilder(inputFields, timeIdx) | ||
| builder | ||
| .aggregate(aggsHandler, accTypes, aggValueTypes, windowPropertyTypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should window table agg support withAllowedLateness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ou are right. We can support withAllowedLateness for table aggregate. However, there are two reasons that it is not supported in this PR.
- This PR mainly dedicate to port table aggregate from flink to blink. It maybe good to keep both align with each other first.
- withAllowedLateness is not a simple change for table aggregate. We also need to think about how to support emitStrategy in Table API.
So, in this PR we just align with flink planner table aggregate, What do you think?
| /** | ||
| * A {@link WindowOperator} that dedicates for group aggregate. | ||
| * | ||
| * <p>When an element arrives it gets assigned a key using a {@link KeySelector} and it gets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeated comments? can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove this paragraph, the next paragraph("Each pane gets its own instance...") would be hard to understand.
Maybe we can keep this paragraph, or remove these two paragraphs in the base class? What do you think?
| /** | ||
| * A {@link WindowOperator} that dedicates for group table aggregate. | ||
| * | ||
| * <p>When an element arrives it gets assigned a key using a {@link KeySelector} and it gets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeated comments? can we remove it?
| public WindowOperator build() { | ||
| checkNotNull(trigger, "trigger is not set"); | ||
| if (generatedAggregateFunction != null && generatedEqualiser != null) { | ||
| if (generatedTableAggregateFunction != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of branches, can we abstract this builder to multiple builders?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. We can make the WindowOperatorBuilder as a base builer and add two more builders for aggregate and table aggregate.
What do you think?
9a5651f to
63ad06a
Compare
|
@JingsongLi Thanks for the review, I have update the PR, and I also have been shared my throughs on your comments. :) |
wuchong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sunjincheng121 , I left some comments.
|
|
||
| override def replaceInputNode( | ||
| ordinalInParent: Int, | ||
| newInputNode: ExecNode[StreamPlanner, _]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent
| } | ||
|
|
||
| override protected def translateToPlanInternal( | ||
| planner: StreamPlanner): Transformation[BaseRow] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent
| val window: LogicalWindow, | ||
| namedProperties: Seq[PlannerNamedWindowProperty], | ||
| inputTimeFieldIndex: Int, | ||
| val emitStrategy: Option[WindowEmitStrategy], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think window TableAggregate can also have emit strategy. Because we also pass the sendRetraction and allowedLateness parameter into TableAggregateWindowOperator.
If we don't support it currently, we can throw exception if an emit strategy is defined when translateToPlan . So that we can keep the physical node more consistent for agg and tableAgg.
| * .build(); | ||
| * </pre> | ||
| */ | ||
| public final class AggregateWindowOperatorBuilder extends WindowOperatorBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to avoid introducing AggregateXXX and TableAggregateXXX for all window operator relative classes. For the WindowOperatorBuilder, I think we can keep only one WindowOperatorBuilder entry point, and return an inner AggregateWindowOperatorBuilder/TableAggregateWindowOperatorBuilder class when corresponding aggregate(...) is called.
So that, users don't need to choose builder, but separate the build() logic. What do you think ?
| return new AggregateWindowOperatorBuilder(); | ||
| } | ||
|
|
||
| public AggregateWindowOperatorBuilder withAllowedLateness(Duration allowedLateness) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can push withAllowedLateness into the base builder, because table aggregate can also support it. We can throw an exception in build() if allowedLateness is specified for now.
| generator | ||
| } | ||
|
|
||
| protected def enrichWindowOperatorBuilder( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to finish the build logic in one place, rather than passing a half-work builder here and there.
Can we use a if-else to generate the window operator in the base class depends on whether it is a table aggregate? like what we do in DataStreamGroupWindowAggregateBase?
aljoscha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good as far as I can see. But I don't have much familiarity with the Blink code. I had two comments about the wording of javadocs.
| import static org.apache.flink.util.Preconditions.checkNotNull; | ||
|
|
||
| /** | ||
| * A {@link WindowOperator} that dedicates for group aggregate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * A {@link WindowOperator} that dedicates for group aggregate. | |
| * A {@link WindowOperator} for grouped window aggregates. |
| import org.apache.flink.util.Collector; | ||
|
|
||
| /** | ||
| * A {@link WindowOperator} that dedicates for group table aggregate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * A {@link WindowOperator} that dedicates for group table aggregate. | |
| * A {@link WindowOperator} for grouped and windowed table aggregates. |
…gat on blink planner
… some MetadataHandle
b6c1e2d to
0df9723
Compare
wuchong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @sunjincheng121 . Looks good from my side.
|
Thanks for the review @JingsongLi @wuchong @aljoscha |
|
LGTM |
…for blink planner This close apache#9396
What is the purpose of the change
This pull request ports window flatAggregate from Flink planner to Blink planner.
The changes mainly include two modules: blink-planner and blink-runtime. The code of the API part can be reused as the two planner share the same API.
Brief change log
This pr is further split into the following 4 commits:
Add plan support for table aggregate. 0e875a2
This commit adds RelNodes(Logical, FlinkLogical and StreamExec) and Rules in blink planner. The execution procedures are:
First, the WindowAggregate QueryOperation node in API level will be converted to a LogicalWindowTableAggregate relnode in
QueryOperationConverter.Second, the LogicalWindowTableAggregate will be converted to a FlinkLogicalWindowTableAggregate by the rule in logical optimization phase.
Third, the FlinkLogicalWindowTableAggregate will be converted to a StreamExecGroupWindowTableAggregate by the rule in physical optimization phase.
Add runtime support for table aggregate. a1120036
This commit adds codegen and window operator support for window table aggregete.
The AggCodeGen generates a NamespaceTableAggsHandleFunction which is used in the window operator.
For window operator, this commit refactors the previous window operator by adding a base window operator(
WindowOperator) and two sub window operators(AggregateWindowOperatorandTableAggregateWindowOperator).Additionally, to make the refactor possible, this commit also change the window processing function(
InternalWindowProcessFunction) by changing thegetWindowAggregationResultmethod toprepareAggregateAccumulatorForEmit.The
prepareAggregateAccumulatorForEmitmethod returns the accumulator, with which theAggregateWindowOperatorcan use getValue to return result and theTableAggregateWindowOperatorcan use emitValue to return results.Add window operator tests for window table aggregate. 686ab9df
Change the
WindowOperatorTestinto a parameterized test which can test both aggregate and table aggregate.To simplify the testing logic, the table aggregate outputs same value with the aggregate except that the table aggregate outputs two same records each time.
Support WindowTableAggregate in some MetadataHandler. 9a5651f1d
MetadataHandler is used to provide Metadata of a relational expression. For example, provide the unique key information of a relnode. The information can be used during optimization.
This commit adds WindowTableAggregate support in
FlinkRelMdColumnInterval,FlinkRelMdFilteredColumnIntervalandFlinkRelMdModifiedMonotonicity. The absence of WindowTableAggregate in other MetadataHandlers mainly because:BTW: I do not add the emitpolicy for window aggregate just we want align with Flink Planner.
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (no)Documentation