Skip to content

Conversation

@luoyuxia
Copy link
Contributor

@luoyuxia luoyuxia commented May 16, 2022

What is the purpose of the change

To make Flink support CUME_DIST function.

Brief change log

Verifying this change

Test in OverAggregateITCase.scala

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs

@flinkbot
Copy link
Collaborator

flinkbot commented May 16, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@luoyuxia luoyuxia force-pushed the FLINK-27617 branch 5 times, most recently from 05d76dd to 44e9bc4 Compare June 22, 2022 02:13
@luoyuxia luoyuxia force-pushed the FLINK-27617 branch 3 times, most recently from 43e7fba to 6e3706f Compare June 29, 2022 07:05
@luoyuxia luoyuxia changed the title [FLINK-27617][sql] Flink supports CUME_DIST function [FLINK-27618][sql] Flink supports CUME_DIST function Jul 4, 2022
@luoyuxia
Copy link
Contributor Author

luoyuxia commented Jul 4, 2022

@flinkbot run azure

@luoyuxia
Copy link
Contributor Author

luoyuxia commented Jul 8, 2022

@flinkbot run azure

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @luoyuxia for the contribution. I have left some comments.

if (expression instanceof UnresolvedReferenceExpression) {
UnresolvedReferenceExpression expr = (UnresolvedReferenceExpression) expression;
String name = expr.getName();
if (function instanceof SizeBasedWindowFunction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

public interface SizeBasedWindowFunction {

/** The field for the window size. */
UnresolvedReferenceExpression windowSize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A window function should have only a context-provided window size expression. So we don't need to let users define the expression. The interface can pre-define one. In addition, the expression can be a resolved one, to make it easy to use when code generation (don't need toWindowSizeExpr). For example:

default ResolvedExpression windowSizeAttribute() {
        return localRef("window_size", DataTypes.INT());
}

Then devs dont' need to implement this method and can directly be used just like method operand(i).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really a good suggestion!


override def setWindowSize(generator: ExprCodeGenerator): String = {
throw new TableException(
"Distinct shouldn't set window size, this is a bug, please file a issue.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that COUNT(DISTINCT) OVER ... may hit this?

Copy link
Contributor Author

@luoyuxia luoyuxia Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, currently, the dictinct + over is not supported, see the code.
Also, in this pr, it'll skip setWindowSize when it's for DistinctAggCodeGen.

}

@Test
def testCumeDist(): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add additional tests to cover values of order-key containing duplicates.

@luoyuxia luoyuxia force-pushed the FLINK-27617 branch 5 times, most recently from e206652 to e94b0d2 Compare July 22, 2022 01:38
@luoyuxia
Copy link
Contributor Author

@wuchong Thanks for review. I have updated this pr.

initialAggregateInformation(aggInfoList)

// generates all methods body first to add necessary reuse code to context
val setWindowSizeCode = if (isWindowSizeNeeded) genSetWindowSize() else ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the way of genRetract() to generate throw exception when window size is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the current implementation, we will always call method setWindowSize, so we can't generate the code to throw exception . But I add some comments for this method genSetWindowSize.

Comment on lines 41 to 49
* Set window size for the aggregate function. Some aggregate functions may requires the size of
* current window to do calculation.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more explanation about the "window size", it may confuse devs the relationship with tumble/hop window size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wuchong wuchong merged commit 8017a41 into apache:master Jul 23, 2022
huangxiaofeng10047 pushed a commit to huangxiaofeng10047/flink that referenced this pull request Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants