-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Open
Labels
Description
After #35111 we can apply aggregation in order optimization for queries where aggregation keys are a superset of storage ORDER BY.
If data is sorted by (a, b) and we want to perform GROUP BY a, b, c the current implementation will do the following:
- Split each block into segments with the same
(a, b)velues. - Perform aggregation over each segment using the key set
(a, b, c). - Sort each block by
(a, b, c).
I think there are opportunities for optimization:
- In step 2 we know that values of
(a, b)are equal, so we need to aggregate data only over(c). It means we can reduce the hash table key to(c)and as a result, it may reduce hash table lookup latency. - In step 3 we already have blocks sorted by
(a, b)and we don't need to sort blocks by the values of these columns. It's possible to pass into the sort information about equality ranges and just sort by(c).
cc @azat
Reactions are currently unavailable