[TRAFODION-2246] group by rollup feature #733
Conversation
Syntax: select item1, item2... from ... group by rollup (v1,...,vn) Restrictions related to grouby rollup usage and query transformations listed below. Some may be lifted in later checkins: -- grouping column list must not have duplicate entries -- aggrs must be min/max/sum/avg/count/count(*) -- distinct aggregates are not supported -- grouping columns/values must be nullable. To group on a non-nullable item, use function CAST(v as nullable). This will convert v to nullable, if needed. -- HAVING predicate will not be pushed down to scan predicate. -- group by in a subquery will not be eliminated -- grouping comparison predicate does not use pcode -- group by rollup cannot be replaced by primary key -- groupby will be done using sortgroupby since ordering is needed to materialize rollup groups. -- rollup groupby will be done at root (master root or esp root). This is needed to detect rollup group changes across the whole result set. -- since groupby can materialize new null rows for rollup groups, a final sort operator is needed to be added after groupy even if sortgroupby has already guaranteed that it gets rows in order. Rollup computation logic is discussed in file executor/ex_sort_grby.cpp prior to ex_sort_grby_rollup_tcb::work method. See regress/seabase/TEST033 for tests of various 'group by rollup' functionality.
Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1191/ |
+1 Didn't look at all the details, especially not in the executor, but the overall idea looks really good to me. I'm glad we are able to extend the sort groupby operator, our earlier idea of using a UDF would have been more messy. This also adds a few useful features we can use elsewhere, like an expression that can tell us where an error occurred. This could be useful when converting Hive records to Trafodion format, so we can handle errors better. |
|
||
if (isRollup()) | ||
cwa += " roll:"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to add the rollupGroupExprList() to the cache key. If we rebuild the list above from a ValueIdSet on line 418 above, it is probably going to be in the same order, regardless whether it was ROLLUP(a,b) or ROLLUP(b,a).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good point. Tried it and it does incorrectly caches them.
Will fix it.
Test Failed. https://jenkins.esgyn.com/job/Check-PR-master/1191/ |
cache key computation changes based on review comment. And another change that only shows up during release run (some old code was within #ifndef DEBUG define)
New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1196/ |
Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/1196/ |
See comments listed below as part of commit.