Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding query function override for Aggregate functions of multi valued column #11307

Merged
merged 7 commits into from Sep 25, 2023

Conversation

eaugene
Copy link
Contributor

@eaugene eaugene commented Aug 9, 2023

Adding query function override for distinct functions of multi valued column .

DISTINCTCOUNT & Other distinct aggregate functions have a different behaviour when queried over a multivalued column with Dict enabled & disabled. This PR fixes that by overriding the function names to use multivalued Aggragation functions.

Fixes : #11292

@codecov-commenter
Copy link

codecov-commenter commented Aug 9, 2023

Codecov Report

Merging #11307 (765c828) into master (2a3a171) will increase coverage by 0.01%.
The diff coverage is 58.97%.

@@             Coverage Diff              @@
##             master   #11307      +/-   ##
============================================
+ Coverage     63.04%   63.06%   +0.01%     
+ Complexity     1121     1120       -1     
============================================
  Files          2343     2343              
  Lines        125705   125743      +38     
  Branches      19309    19318       +9     
============================================
+ Hits          79252    79294      +42     
+ Misses        40808    40802       -6     
- Partials       5645     5647       +2     
Flag Coverage Δ
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 63.00% <58.97%> (-0.02%) ⬇️
java-17 62.91% <58.97%> (+0.02%) ⬆️
java-20 62.90% <58.97%> (-0.01%) ⬇️
temurin 63.06% <58.97%> (+0.01%) ⬆️
unittests 63.05% <58.97%> (+0.01%) ⬆️
unittests1 67.23% <ø> (+0.02%) ⬆️
unittests2 14.44% <58.97%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...roker/requesthandler/BaseBrokerRequestHandler.java 46.47% <58.97%> (+0.48%) ⬆️

... and 11 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Other than distinct count, other aggregation function also have the MV variance. Let's handle all of them here since the logic is the same.

@eaugene
Copy link
Contributor Author

eaugene commented Aug 12, 2023

Other than distinct count, other aggregation function also have the MV variance. Let's handle all of them here since the logic is the same.

I currently added for

  • DISTINCTCOUNT
  • DISTINCTCOUNTBITMAP
  • DISTINCTCOUNTHLL
  • DISTINCTCOUNTRAWHLL
  • DISTINCTSUM
  • DISTINCTAVG

Other possible Aggregations are :
COUNT
MIN
MAX
SUM
AVG
MINMAXRANGE

But was a bit concerned about overriding for COUNT as that would result in not being able to get total rows in MultiValued column .

@Jackie-Jiang
Copy link
Contributor

@eaugene If user explicitly use COUNT over a MV column, I would assume the intention is to do COUNTMV, or they should just use COUNT(*)

@eaugene eaugene changed the title Adding query function override for distinct functions of multi valued column Adding query function override for Aggregate functions of multi valued column Sep 21, 2023
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

@eaugene
Copy link
Contributor Author

eaugene commented Sep 24, 2023

Thanks @Jackie-Jiang for the review. Addressed the final set of comments & included override for the new distinctCountHLLplus aggregations introduced as a part of #11346

@Jackie-Jiang Jackie-Jiang merged commit 272a199 into apache:master Sep 25, 2023
21 checks passed
@eaugene eaugene deleted the distinct_multi_value_overide branch September 25, 2023 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants