Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: UDAFs with multiple/variadic args #9361

Merged
merged 68 commits into from Aug 2, 2022

Conversation

reneesoika
Copy link
Contributor

@reneesoika reneesoika commented Jul 28, 2022

Description

This is the main PR to add support for multiple UDAF arguments. One argument can be variadic, either through wrapping the last column argument type in VariadicArgs or making the UDAF factory method have a variadic argument.

There are other changes that will be made as separate, smaller PRs:

  • support for up to ten five column arguments. This PR only adds support for two and three column arguments (though adding the rest is trivial).
  • correlation
  • var args when there are initial arguments. Currently, this is disallowed because we do not support resolving functions with var args in the middle.
  • variadic TOPK
  • user documentation

This PR should not introduce breaking changes.

The most important changes are in ExpressionTypeManager, AggregateNode, UdafUtil, UdafAggregateFunctionFactory (resolving multi-param UDAFs), UdafTypes, UdafFactoryInvoker (handling multi-param UDAF signatures), and KudafAggregator (applying multiple columns). Most of the other changes are tweaks to use lists for column indices or arrays for param schemas.

Testing done

Added unit tests to existing classes that were tweaked. Added QTTs to cover typical and edge cases with multi-param UDAFs.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@reneesoika reneesoika requested a review from jzaralim July 28, 2022 17:13
@reneesoika reneesoika requested a review from a team as a code owner July 28, 2022 17:13
Copy link
Contributor

@jzaralim jzaralim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass at reviewing. i havent reviewed the unit tests. this looks great so far!

this.right = right;
}

public T1 getLeft() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KLIP only mentions 10 because the javatuples library goes up to 10. If we're implementing our own classes, I don't think we really need to go up to 10. How were you planning to implement Quartet+?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created classes for each tuple type (Quadruple, Quintuple, ...), so I was going to add these as cases in BaseAggregateFunction#determineInputConverter(). Ten column arguments does seem a bit excessive, though. Maybe stop at five?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Five is reasonable. My question was more, were you planning on following the same getLeft/getRight pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Quadruple and later, I've been using getFirst(), getSecond(), getThird(), and so on since getLeft() and similar didn't fit. I used getLeft() and getRight() for Triple since Pair uses these terms.

@@ -73,8 +73,7 @@ public AggregateFunctionFactory(final UdfMetadata metadata) {
this.metadata = Objects.requireNonNull(metadata, "metadata can't be null");
}

public abstract KsqlAggregateFunction<?, ?, ?> createAggregateFunction(
List<SqlArgument> argTypeList, AggregateFunctionInitArguments initArgs);
public abstract FunctionSource getFunction(List<SqlType> argTypeList);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it impossible to write aggregate functions with time unit initial args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zara and I discussed this on Slack, and this is not a regression. In the old implementation and the current implementation, initial args must be Literals, but time units are not Literals (IntervalUnit extends Expression instead of Literal). This might be reported as a separate issue.

Copy link
Contributor

@jzaralim jzaralim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants