Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add support for GROUPING SETS, CUBE, and ROLLUP #46280

Open
yuzefovich opened this issue Mar 19, 2020 · 10 comments
Open

sql: add support for GROUPING SETS, CUBE, and ROLLUP #46280

yuzefovich opened this issue Mar 19, 2020 · 10 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) E-quick-win Likely to be a quick win for someone experienced. E-starter Might be suitable for a starter project for new employees or team members. T-sql-queries SQL Queries Team
Projects

Comments

@yuzefovich
Copy link
Member

yuzefovich commented Mar 19, 2020

According to Postgres docs:

More complex grouping operations are possible using the concept of *grouping
sets*. The data selected by the `FROM` and `WHERE` clauses is grouped separately
by each specified grouping set, aggregates computed for each group just as for
simple `GROUP BY` clauses, and then the results returned.

We (maybe) should add support for GROUPING SETS, CUBE, and ROLLUP.

I have no idea how hard that would be, but 7 out 99 queries of TPC-DS benchmark use rollup.

Jira issue: CRDB-5096

@yuzefovich yuzefovich added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Mar 19, 2020
@yuzefovich yuzefovich added this to Triage in BACKLOG, NO NEW ISSUES: SQL Optimizer via automation Mar 19, 2020
@yuzefovich
Copy link
Member Author

cc @awoods187

@RaduBerinde RaduBerinde moved this from Triage to New features in BACKLOG, NO NEW ISSUES: SQL Optimizer Apr 18, 2020
@asubiotto asubiotto moved this from Triage to [TENT] SQL Features in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 20, 2020
@asubiotto asubiotto moved this from [TENT] SQL Features to [TENT] SQL Exec in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 20, 2020
@batara666
Copy link

we need this

craig bot pushed a commit that referenced this issue Sep 18, 2020
54536: sql: add unimplemented errors for trigrams r=solongordon a=otan

Resolves #51137 

**sql: add unimplemented errors for index with operator classes**

* Amend lexing to do lookahead for NULLS FIRST / NULLS LAST to avoid
  shift/reduce conflicts.
* Amend index_params support to be Postgre-compatible. Doing a_exprs
  inside indexes now require surrounding braces.
* Add operator_class argument to index_params.
* Add unimplemented error for operator class support.

Release note: None

**builtins: add unimplemented notices around trigram builtins**

Release note: None



54541: opt: build constraint for containment operator r=rytaft a=mgartner

This commit builds a non-tight, non-null constraint for containment
operators. This is valid because `NULL` cannot contain any elements.

This fixes a minor issue with the statistics of partial inverted index
scans. Tests for these scans have been added for in this commit.

Release note: None

54558: opt: deduce not-null constraints for functions with non-nullable args r=rytaft a=rytaft

Release justification: low risk, high benefit changes to existing
functionality, bug fixes and low-risk updates to new functionality

This commit enhances the constraint builder code so that it can deduce
not-null constraints for any variable arguments to functions with non-
nullable args. This is important for improving optimizer support for
spatial functions, since many of these functions have non-nullable args
and are used as filter predicates. Deducing not-null constraints for
the arguments to these functions enables us to calculate more accurate
statistics, and may unlock other optimizations.

Release note (performance improvement): The optimizer can now deduce that
certain variable arguments to functions must be non-null. This improves
cardinality estimation for those variables and unlocks other types of
optimizations. As a result, the optimizer may choose better query plans
when a function is used as a filter predicate.

54579: parser: add unimplemented for GROUP BY {GROUPING SETS, ROLLUP, CUBE} r=solongordon a=otan

Refs #51424 
Refs #46280

Release note: None

54582: parser: add unimplemented error for COPY ... WHERE ... r=solongordon a=otan

Refs #54580 
Refs #51424 

Release note: None

Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@yuzefovich yuzefovich added this to Triage in BACKLOG, NO NEW ISSUES: SQL Execution via automation Oct 25, 2020
@yuzefovich yuzefovich moved this from Triage to [GENERAL BACKLOG] Enhancements/Features/Investigations in BACKLOG, NO NEW ISSUES: SQL Execution Oct 25, 2020
@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@lacasaprivata2
Copy link

lacasaprivata2 commented Sep 26, 2021

is there a timeline on this feature? if not, i'm going to install spark and/or pandas for in-memory post-processing....

@yuzefovich
Copy link
Member Author

cc @vy-ton

@vy-ton
Copy link
Contributor

vy-ton commented Sep 28, 2021

@vitruvvius We don't have a timeline for this feature yet.

If you could share a sample query/use case that would help future prioritization?

@batara666
Copy link

@vy-ton i could use CUBE for saving []float64, usually i use CUBE for saving my machine learning encoding data

@lacasaprivata2
Copy link

rollups are very common when computing analytics for customers - ie, pivot tables

@rytaft
Copy link
Collaborator

rytaft commented Apr 21, 2022

All of these seem like they can be implemented with transformations in the optimizer. Check out these pages: SQL GROUPING SETS - Generate Multiple Grouping Sets, SQL ROLLUP, SQL CUBE

The example in the GROUPING SETS page shows that this query:

SELECT
    warehouse,
    product, 
    SUM (quantity) qty
FROM
    inventory
GROUP BY
    GROUPING SETS(
        (warehouse,product),
        (warehouse),
        (product),
        ()
    );

can be rewritten as:

SELECT
    warehouse,
    product, 
    SUM (quantity) qty
FROM
    inventory
GROUP BY
    warehouse,
    product
UNION ALL
SELECT
    warehouse, 
    null,
    SUM (quantity) qty
FROM
    inventory
GROUP BY
    warehouse
UNION ALL
SELECT
    null,
    product, 
    SUM (quantity) qty
FROM
    inventory
GROUP BY
    product
UNION ALL
SELECT
    null,
    null,
    SUM(quantity) qty
FROM
    inventory;

I think we can perform similar transformations for the others as well.

@rytaft rytaft added this to Triage in SQL Queries via automation Apr 21, 2022
@michae2 michae2 moved this from Triage to Backlog in SQL Queries Apr 26, 2022
@michae2 michae2 removed this from New features in BACKLOG, NO NEW ISSUES: SQL Optimizer Apr 26, 2022
@michae2 michae2 removed this from [GENERAL BACKLOG] Enhancements/Features/Investigations in BACKLOG, NO NEW ISSUES: SQL Execution Apr 26, 2022
@michae2 michae2 added E-starter Might be suitable for a starter project for new employees or team members. E-quick-win Likely to be a quick win for someone experienced. labels Apr 26, 2022
@kenliu-crl
Copy link
Contributor

manually reviewed and brought up to date

@cockroachdb cockroachdb deleted a comment from mari-crl May 24, 2022
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) E-quick-win Likely to be a quick win for someone experienced. E-starter Might be suitable for a starter project for new employees or team members. T-sql-queries SQL Queries Team
Projects
Status: Backlog
SQL Queries
Backlog (DO NOT ADD NEW ISSUES)
Development

No branches or pull requests

8 participants