Ideas for performance optimizations of aggregation in order

After #35111 we can apply aggregation in order optimization for queries where aggregation keys are a superset of storage ORDER BY.

If data is sorted by `(a, b)` and we want to perform `GROUP BY a, b, c`  the current implementation will do the following:

1. Split each block into segments with the same `(a, b)` velues.
2. Perform aggregation over each segment using the key set `(a, b, c)`.
3. Sort each block by `(a, b, c)`.

I think there are opportunities for optimization:
* In step 2 we know that values of `(a, b)` are equal, so we need to aggregate data only over `(c)`. It means we can reduce the hash table key to `(c)` and as a result, it may reduce hash table lookup latency.
* In step 3 we already have blocks sorted by `(a, b)` and we don't need to sort blocks by the values of these columns. It's possible to pass into the sort information about equality ranges and just sort by `(c)`.

cc @azat 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for performance optimizations of aggregation in order #42696

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ideas for performance optimizations of aggregation in order #42696

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions