Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is DataType.BYTES control in aggregate() aggregateGroupBySV() aggregateGroupByMV() needed? #5356

Closed
damianoporta opened this issue May 9, 2020 · 5 comments

Comments

@damianoporta
Copy link

Hello, in different aggregators like the following:

I see a "pseudo duplication" of the code because of the column DataType. But, is this really needed?
I think that serialization/deserialization is done in aggregationResultHolder object only, no?

@kishoreg
Copy link
Member

@Jackie-Jiang @mayankshriv what do you think about this? Its not clear from the code why we are handling BYTES in AVG and MinMax

@Jackie-Jiang
Copy link
Contributor

We allow serialized pre-aggregated BYTES for these aggregation functions so that users can pre-aggregate the records, or enable star-tree on these functions:

  • AVG
  • MINMAXRANGE
  • DISTINCTCOUNTHLL
  • PERCENTILEEST
  • PERCENTILETDIGEST

@kishoreg
Copy link
Member

so, this is not a must for other functions right?

@Jackie-Jiang
Copy link
Contributor

It is required if these conditions are met:

  • The intermediate result type is Object instead of Number or String
  • User might want to pre-aggregate for the aggregation function

E.g. for DistinctCountAggregationFunction, even though the intermediate result type is IntOpenHashSet, we don't want user to pre-aggregate because the size of the pre-aggregated result is unbounded. In such case, we don't support BYTES type.

@kishoreg
Copy link
Member

Not an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants