Skip to content

Add SELECT * EXCLUDE, SELECT * EXCEPT support#6481

Merged
mustafasrepo merged 9 commits intoapache:mainfrom
synnada-ai:feature/exclude_support
Jun 1, 2023
Merged

Add SELECT * EXCLUDE, SELECT * EXCEPT support#6481
mustafasrepo merged 9 commits intoapache:mainfrom
synnada-ai:feature/exclude_support

Conversation

@mustafasrepo
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #6439.

Rationale for this change

With this PR we can run SELECT queries with EXCLUDE where all the columns except EXCLUDE columns are projected, such as below.

SELECT * EXCLUDE(a, b)
  FROM table1

Similar behavior can be obtained using EXCEPT clause also. Query below can also be run with this PR

SELECT * EXCEPT(a, b)
  FROM table1

What changes are included in this PR?

Are these changes tested?

Yes new tests are added select.slt file, to check whether query produces expected columns.

Are there any user-facing changes?

@github-actions github-actions Bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels May 30, 2023
Comment thread datafusion/core/tests/sqllogictests/test_files/select.slt
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have test cases for negative scenarios, like exclude non existent columns, or exclude what is included, or exclude duplicated cols

@mustafasrepo
Copy link
Copy Markdown
Contributor Author

Can we also have test cases for negative scenarios, like exclude non existent columns, or exclude what is included, or exclude duplicated cols

I have extended to cover negative scenarios, thanks for the suggestion @comphead.

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
@mustafasrepo
Copy link
Copy Markdown
Contributor Author

Because of the issue in #6495. CI fails, I will fix CI problem, once that issue is resolved.


# EXCEPT, or EXCLUDE can only be used after wildcard *
# below query should give 4 columns, a1, b1, b, c, d
query IIIII
Copy link
Copy Markdown
Contributor

@comphead comphead May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is very interesting scenario, so if the col a aliased as a1 then except/exclude won't exclude it, but in the same time won't fail.

# EXCEPT, or EXCLUDE shouldn't contain duplicate column names
statement error DataFusion error: Error during planning: EXCLUDE or EXCEPT contains duplicate column names
SELECT * EXCLUDE(a, a)
FROM table1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if scenario?

SELECT a EXCLUDE(a)
FROM table1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK EXCLUDE can only be used to modify the wildcard, so that query wouldn't be valid

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, what if

select * exclude (a) from (select a from table1) x

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good test!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged this PR as is. @comphead when I run the query in your suggestion, it returns an error during projection push down. I have opened the issue #6510 to track this problem.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice to me. Thank you @mustafasrepo and @ozankabak

@mustafasrepo mustafasrepo merged commit 25df887 into apache:main Jun 1, 2023
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jun 1, 2023

Here is a proposed addition to the user guide for this feature: #6512

@mustafasrepo mustafasrepo deleted the feature/exclude_support branch June 2, 2023 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for EXCLUDE in SELECT

5 participants