New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Variadic TopK that can select other columns #9493
Merged
reneesoika
merged 12 commits into
confluentinc:master
from
reneesoika:feat_variadic_topk
Aug 26, 2022
Merged
feat: Variadic TopK that can select other columns #9493
reneesoika
merged 12 commits into
confluentinc:master
from
reneesoika:feat_variadic_topk
Aug 26, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
JimGalasyn
reviewed
Aug 24, 2022
JimGalasyn
reviewed
Aug 24, 2022
JimGalasyn
reviewed
Aug 24, 2022
JimGalasyn
approved these changes
Aug 24, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a couple of suggestions.
Co-authored-by: Jim Galasyn <jim.galasyn@confluent.io>
jzaralim
reviewed
Aug 25, 2022
ksqldb-engine/src/main/java/io/confluent/ksql/function/udaf/topk/TopkKudaf.java
Outdated
Show resolved
Hide resolved
ksqldb-functional-tests/src/test/resources/query-validation-tests/topk-group-by.json
Outdated
Show resolved
Hide resolved
jzaralim
approved these changes
Aug 26, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Follow-up to #9361. Extends the TopK UDAF to allow users to include other columns for each value that is selected.
If only one column is provided, then an array of values is returned like the existing TopK. If more columns are provided, then a struct is returned. The struct has a field named
sort_col
that contains the column that was used to sort the values. Other columns are included ascol0
,col1
,col2
, etc. in the order that the user provided. Note that UDAFs do not have access to names of columns, only the values, so we have to use this generic naming.The aggregate function documentation has been updated to reflect this new variant of TopK.
Related: #403, #5300, #5747. Users have also requested an extended TopKDistinct in these issues. Extending TopKDistinct should be similar to the changes made in this PR.
Testing done
Added unit tests and QTTs.
I added one QTT for the pre-existing TopK to check that it does not work with delimited. Delimited doesn't support the array return type, so this isn't a breaking change.
Reviewer checklist