Skip to content
This repository has been archived by the owner on Jun 7, 2021. It is now read-only.

[TRAFODION-2246] group by rollup feature #733

Merged
merged 4 commits into from Sep 30, 2016

Conversation

anoopsharma00
Copy link
Contributor

See comments listed below as part of commit.

Syntax:
  select item1, item2... from ... group by rollup (v1,...,vn)

Restrictions related to grouby rollup usage and query transformations
listed below.  Some may be lifted in later checkins:

 -- grouping column list must not have duplicate entries
 -- aggrs must be min/max/sum/avg/count/count(*)
 -- distinct aggregates are not supported
 -- grouping columns/values must be nullable.
    To group on a non-nullable item, use function
     CAST(v as nullable).
    This will convert v to nullable, if needed.
 -- HAVING predicate will not be pushed down to scan predicate.
 -- group by in a subquery will not be eliminated
 -- grouping comparison predicate does not use pcode
 -- group by rollup cannot be replaced by primary key
 -- groupby will be done using sortgroupby since ordering is needed to
    materialize rollup groups.
 -- rollup groupby will be done at root (master root or esp root). This
   is needed to detect rollup group changes across the whole result set.
 -- since groupby can materialize new null rows for rollup groups,
    a final sort operator is needed to be added after groupy even
   if sortgroupby has already guaranteed that it gets rows in order.

Rollup computation logic is discussed in file executor/ex_sort_grby.cpp
prior to ex_sort_grby_rollup_tcb::work method.

See regress/seabase/TEST033 for tests of various 'group by rollup'
functionality.
@Traf-Jenkins
Copy link

Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1191/

@zellerh
Copy link
Contributor

zellerh commented Sep 29, 2016

+1

Didn't look at all the details, especially not in the executor, but the overall idea looks really good to me. I'm glad we are able to extend the sort groupby operator, our earlier idea of using a UDF would have been more messy. This also adds a few useful features we can use elsewhere, like an expression that can tell us where an error occurred. This could be useful when converting Hive records to Trafodion format, so we can handle errors better.


if (isRollup())
cwa += " roll:";

Copy link
Contributor

@zellerh zellerh Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to add the rollupGroupExprList() to the cache key. If we rebuild the list above from a ValueIdSet on line 418 above, it is probably going to be in the same order, regardless whether it was ROLLUP(a,b) or ROLLUP(b,a).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good point. Tried it and it does incorrectly caches them.
Will fix it.

@Traf-Jenkins
Copy link

cache key computation changes based on review comment.
And another change that only shows up during release run
(some old code was within #ifndef DEBUG define)
@Traf-Jenkins
Copy link

New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1196/

@Traf-Jenkins
Copy link

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants