Skip to content

Add support for DISTINCT projections in decorrelate_where_exists #3724

@andygrove

Description

@andygrove

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Setup

$ export DATAFUSION_OPTIMIZER_SKIP_FAILED_RULES=false
$ echo "1,2" > test.csv
$ cd datafusion-cli
$ cargo run
❯ create external table test (a int, b int) stored as csv location 'test.csv';

Test

This query works:

❯ select * from test where exists (select a from test t2 where test.a = t2.a);
+---+---+
| a | b |
+---+---+
| 1 | 2 |
+---+---+

If I add the DISTINCT keyword then the optimizer fails:

❯ select * from test where exists (select distinct a from test t2 where test.a = t2.a);
Internal("Optimizer rule 'decorrelate_where_exists' failed due to unexpected error: cannot optimize non-correlated subquery at /home/andy/git/apache/arrow-datafusion/datafusion/optimizer/src/decorrelate_where_exists.rs:141\ncaused by\nError during planning: Could not coerce into Filter! at /home/andy/git/apache/arrow-datafusion/datafusion/expr/src/logical_plan/plan.rs:1157")

Describe the solution you'd like
Support distinct projections in subqueries

Describe alternatives you've considered
None

Additional context
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions