-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for GROUP BY ALL/ORDER BY ALL #2899
Conversation
WHERE | ||
l_shipdate <= CAST('1998-09-02' AS date) | ||
GROUP BY ALL | ||
ORDER BY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ORDER BY *
here to stress that code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, sounds good!
c_address, | ||
c_comment | ||
ORDER BY | ||
revenue DESC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a fine stress test for ORDER BY * DESC
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly this won't work since revenue
is not the first column in the select list.
GROUP BY | ||
c_count | ||
ORDER BY | ||
custdist DESC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple ORDER BY * DESC
if you swap the SELECT
clause order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapping the SELECT
clause will change the result, which means we can't read the result from the given CSV anymore.
1 3 | ||
|
||
query II | ||
SELECT * FROM integers UNION SELECT * FROM integers ORDER BY ALL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the implied order here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as if there was no set operation, the first and then the second column. Set operations do not change the number of columns.
|
||
query IIIII nosort v1 | ||
SELECT * FROM v1 | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing something here - why are these results empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result checker here uses the labels (v1
and v2
) to verify that the result does not change after restarting. The result is empty because when labels are present the result is ignored and instead results with the same label are compared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh duh I knew that. Sorry!
First, I'm still super excited about this feature! I had one small question. How would this interact with Grouping sets / cube / rollup? Are they mutually exclusive? Or would it be possible to do a syntax like: Group By
Cube ALL Thanks! |
For now they are mutually exclusive, but in theory this could work like that as well. This would require a bit more work, however. If there is enough demand to extend the syntax to allow CUBE ALL/ROLLUP ALL I would be happy to have a go at that in a future PR. |
I think that your current solution solves 99% of cases! I'm just curious for my own use (but I'm an outlier and only use those features very occasionally!). |
The
GROUP BY
clause is often superfluous, as it often simply contains all of the elements in theSELECT
clause that do not contain aggregations. For example, we might write a query like this:We see here that the two columns
state
andcity
are repeated in theGROUP BY
. In reality, the system knows we need to group by at least state and city. If we alter the query and remove eitherstate
orcity
from the group by, we will receive an error:While we can specify additional columns to group by that are not contained in the selection list (e.g. we could specify
GROUP BY country, state, city
) this is rare in practice. After all, if we group by certain columns but do not add them to theSELECT
list, we do not know what we are looking at in the query result!As an example, in the set of 22 TPC-H queries, every single group by clause contains only columns duplicated from the
SELECT
clause (and can thus be replaced byGROUP BY ALL
).As a result, in almost all scenarios, the
GROUP BY
simply contains duplicated columns from theSELECT
list. This is not particularly useful, since the system can easily figure out by itself by which columns to group (in fact, it needs to know that to be able to emit such an error message!).GROUP BY ALL
For that reason, we add support for the
GROUP BY ALL
, which causes a grouping to happen on all ungrouped columns in theSELECT
list. For example:Now we no longer need to duplicate the columns in the
GROUP BY
list, and it becomes much easier to add or remove groups, since we only need to add or remove groups in one location! For example, if we want to group just by state, we only need to change the query as follows:ORDER BY ALL
In many cases you will want a result to be ordered for easy viewing/browsing. You might add a clause such as
ORDER BY 1 ,2, 3, ...
to the query. However, this requires you to keep track of the number of columns and also change the number of orders when you alter the query and change the number of columns. For that reason, we also add anORDER BY ALL
clause, which automatically orders by the columns from left to right.ORDER BY ALL can also contain the usual order specifiers (
ASC/DESC
andNULLS FIRST/NULLS LAST
), these will be applied to all columns: