Support for GROUP BY ALL/ORDER BY ALL #2899

Mytherin · 2022-01-10T14:21:41Z

The GROUP BY clause is often superfluous, as it often simply contains all of the elements in the SELECT clause that do not contain aggregations. For example, we might write a query like this:

SELECT state, city, COUNT(*) as population
FROM people
GROUP BY state, city;

We see here that the two columns state and city are repeated in the GROUP BY. In reality, the system knows we need to group by at least state and city. If we alter the query and remove either state or city from the group by, we will receive an error:

SELECT state, city, COUNT(*) as population
FROM people
GROUP BY state;
-- Error: Binder Error: column "city" must appear in the GROUP BY clause or be used in an aggregate function

While we can specify additional columns to group by that are not contained in the selection list (e.g. we could specify GROUP BY country, state, city) this is rare in practice. After all, if we group by certain columns but do not add them to the SELECT list, we do not know what we are looking at in the query result!

As an example, in the set of 22 TPC-H queries, every single group by clause contains only columns duplicated from the SELECT clause (and can thus be replaced by GROUP BY ALL).

As a result, in almost all scenarios, the GROUP BY simply contains duplicated columns from the SELECT list. This is not particularly useful, since the system can easily figure out by itself by which columns to group (in fact, it needs to know that to be able to emit such an error message!).

GROUP BY ALL

For that reason, we add support for the GROUP BY ALL, which causes a grouping to happen on all ungrouped columns in the SELECT list. For example:

SELECT state, city, COUNT(*) as population
FROM people
GROUP BY ALL;
-- group by state, city

Now we no longer need to duplicate the columns in the GROUP BY list, and it becomes much easier to add or remove groups, since we only need to add or remove groups in one location! For example, if we want to group just by state, we only need to change the query as follows:

SELECT state, COUNT(*) as population
FROM people
GROUP BY ALL;

ORDER BY ALL

In many cases you will want a result to be ordered for easy viewing/browsing. You might add a clause such as ORDER BY 1 ,2, 3, ... to the query. However, this requires you to keep track of the number of columns and also change the number of orders when you alter the query and change the number of columns. For that reason, we also add an ORDER BY ALL clause, which automatically orders by the columns from left to right.

SELECT state, COUNT(*) as population
FROM people
GROUP BY ALL
ORDER BY ALL;
-- GROUP BY state
-- ORDER BY state, population

ORDER BY ALL can also contain the usual order specifiers (ASC/DESC and NULLS FIRST/NULLS LAST), these will be applied to all columns:

SELECT state, COUNT(*) as population
FROM people
GROUP BY ALL
ORDER BY ALL DESC NULLS LAST;
-- GROUP BY state
-- ORDER BY state DESC NULLS LAST, population DESC NULLS LAST

hawkfish · 2022-01-10T18:21:40Z

test/sql/aggregate/group/tpch_group_by_all.test_slow

+WHERE
+    l_shipdate <= CAST('1998-09-02' AS date)
+GROUP BY ALL
+ORDER BY


Maybe ORDER BY * here to stress that code path?

Sure, sounds good!

hawkfish · 2022-01-10T18:23:26Z

test/sql/aggregate/group/tpch_group_by_all.test_slow

+    c_address,
+    c_comment
+ORDER BY
+    revenue DESC


This would be a fine stress test for ORDER BY * DESC.

Sadly this won't work since revenue is not the first column in the select list.

hawkfish · 2022-01-10T18:24:15Z

test/sql/aggregate/group/tpch_group_by_all.test_slow

+GROUP BY
+    c_count
+ORDER BY
+    custdist DESC,


Multiple ORDER BY * DESC if you swap the SELECT clause order?

Swapping the SELECT clause will change the result, which means we can't read the result from the given CSV anymore.

hawkfish · 2022-01-10T18:25:01Z

test/sql/order/order_by_all.test

+1	3
+
+query II
+SELECT * FROM integers UNION SELECT * FROM integers ORDER BY ALL


What is the implied order here?

Same as if there was no set operation, the first and then the second column. Set operations do not change the number of columns.

hawkfish · 2022-01-10T18:26:49Z

test/sql/storage/store_group_order_all.test

+
+query IIIII nosort v1
+SELECT * FROM v1
+----


I'm missing something here - why are these results empty?

The result checker here uses the labels (v1 and v2) to verify that the result does not change after restarting. The result is empty because when labels are present the result is ignored and instead results with the same label are compared.

Oh duh I knew that. Sorry!

Alex-Monahan · 2022-01-11T00:45:55Z

First, I'm still super excited about this feature!

I had one small question. How would this interact with Grouping sets / cube / rollup? Are they mutually exclusive? Or would it be possible to do a syntax like:

Group By
Cube ALL

Thanks!

Mytherin · 2022-01-11T01:19:35Z

For now they are mutually exclusive, but in theory this could work like that as well. This would require a bit more work, however. If there is enough demand to extend the syntax to allow CUBE ALL/ROLLUP ALL I would be happy to have a go at that in a future PR.

Alex-Monahan · 2022-01-11T01:38:20Z

I think that your current solution solves 99% of cases! I'm just curious for my own use (but I'm an outlier and only use those features very occasionally!).

Mytherin added 5 commits January 8, 2022 13:16

GROUP BY ALL support (initial)

04cc6a6

Add support for ORDER BY ALL

622b117

More tests for GROUP BY/ORDER BY ALL

2c2b44c

Increment storage version

09d36f4

Format fix

cca28f5

hawkfish approved these changes Jan 10, 2022

View reviewed changes

Use ORDER BY ALL in this test

fedf835

hawkfish approved these changes Jan 11, 2022

View reviewed changes

Mytherin merged commit cd8ea60 into duckdb:master Jan 11, 2022

Mytherin deleted the groupbyall branch January 17, 2022 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for GROUP BY ALL/ORDER BY ALL #2899

Support for GROUP BY ALL/ORDER BY ALL #2899

Mytherin commented Jan 10, 2022 •

edited

hawkfish Jan 10, 2022

Mytherin Jan 11, 2022

hawkfish Jan 10, 2022

Mytherin Jan 11, 2022

hawkfish Jan 10, 2022

Mytherin Jan 11, 2022

hawkfish Jan 10, 2022

Mytherin Jan 11, 2022

hawkfish Jan 10, 2022

Mytherin Jan 11, 2022

hawkfish Jan 11, 2022

Alex-Monahan commented Jan 11, 2022

Mytherin commented Jan 11, 2022

Alex-Monahan commented Jan 11, 2022

Support for GROUP BY ALL/ORDER BY ALL #2899

Support for GROUP BY ALL/ORDER BY ALL #2899

Conversation

Mytherin commented Jan 10, 2022 • edited

GROUP BY ALL

ORDER BY ALL

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Alex-Monahan commented Jan 11, 2022

Mytherin commented Jan 11, 2022

Alex-Monahan commented Jan 11, 2022

Mytherin commented Jan 10, 2022 •

edited