Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Variadic TopK that can select other columns #9493

Merged
merged 12 commits into from Aug 26, 2022

Conversation

reneesoika
Copy link
Contributor

@reneesoika reneesoika commented Aug 24, 2022

Description

Follow-up to #9361. Extends the TopK UDAF to allow users to include other columns for each value that is selected.

If only one column is provided, then an array of values is returned like the existing TopK. If more columns are provided, then a struct is returned. The struct has a field named sort_col that contains the column that was used to sort the values. Other columns are included as col0, col1, col2, etc. in the order that the user provided. Note that UDAFs do not have access to names of columns, only the values, so we have to use this generic naming.

The aggregate function documentation has been updated to reflect this new variant of TopK.

Related: #403, #5300, #5747. Users have also requested an extended TopKDistinct in these issues. Extending TopKDistinct should be similar to the changes made in this PR.

Testing done

Added unit tests and QTTs.

I added one QTT for the pre-existing TopK to check that it does not work with delimited. Delimited doesn't support the array return type, so this isn't a breaking change.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@reneesoika reneesoika requested review from a team and JimGalasyn as code owners August 24, 2022 22:32
Copy link
Member

@JimGalasyn JimGalasyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a couple of suggestions.

Co-authored-by: Jim Galasyn <jim.galasyn@confluent.io>
@reneesoika reneesoika merged commit 2a4c696 into confluentinc:master Aug 26, 2022
@reneesoika reneesoika deleted the feat_variadic_topk branch August 26, 2022 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants