Is DataType.BYTES control in aggregate() aggregateGroupBySV() aggregateGroupByMV() needed? #5356

damianoporta · 2020-05-09T08:02:56Z

Hello, in different aggregators like the following:

I see a "pseudo duplication" of the code because of the column DataType. But, is this really needed?
I think that serialization/deserialization is done in aggregationResultHolder object only, no?

kishoreg · 2020-05-11T20:00:08Z

@Jackie-Jiang @mayankshriv what do you think about this? Its not clear from the code why we are handling BYTES in AVG and MinMax

Jackie-Jiang · 2020-05-11T20:04:09Z

We allow serialized pre-aggregated BYTES for these aggregation functions so that users can pre-aggregate the records, or enable star-tree on these functions:

AVG
MINMAXRANGE
DISTINCTCOUNTHLL
PERCENTILEEST
PERCENTILETDIGEST

kishoreg · 2020-05-11T20:12:43Z

so, this is not a must for other functions right?

Jackie-Jiang · 2020-05-11T21:18:23Z

It is required if these conditions are met:

The intermediate result type is Object instead of Number or String
User might want to pre-aggregate for the aggregation function

E.g. for DistinctCountAggregationFunction, even though the intermediate result type is IntOpenHashSet, we don't want user to pre-aggregate because the size of the pre-aggregated result is unbounded. In such case, we don't support BYTES type.

kishoreg · 2020-06-12T00:16:17Z

Not an issue

kishoreg closed this as completed Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is DataType.BYTES control in aggregate() aggregateGroupBySV() aggregateGroupByMV() needed? #5356

Is DataType.BYTES control in aggregate() aggregateGroupBySV() aggregateGroupByMV() needed? #5356

damianoporta commented May 9, 2020

kishoreg commented May 11, 2020

Jackie-Jiang commented May 11, 2020

kishoreg commented May 11, 2020

Jackie-Jiang commented May 11, 2020

kishoreg commented Jun 12, 2020

Is DataType.BYTES control in aggregate() aggregateGroupBySV() aggregateGroupByMV() needed? #5356

Is DataType.BYTES control in aggregate() aggregateGroupBySV() aggregateGroupByMV() needed? #5356

Comments

damianoporta commented May 9, 2020

kishoreg commented May 11, 2020

Jackie-Jiang commented May 11, 2020

kishoreg commented May 11, 2020

Jackie-Jiang commented May 11, 2020

kishoreg commented Jun 12, 2020