Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push down filters below Unnest in sub queries #10935

Closed
ahirner opened this issue Jun 16, 2024 · 11 comments · Fixed by #10974
Closed

Push down filters below Unnest in sub queries #10935

ahirner opened this issue Jun 16, 2024 · 11 comments · Fixed by #10974
Assignees
Labels
enhancement New feature or request

Comments

@ahirner
Copy link

ahirner commented Jun 16, 2024

Is your feature request related to a problem or challenge?

Currently where clauses on a sub query are not pushed below Unnest expressions. In conjunction with some other limitations, this makes composing queries with unnest() not ideal.

In 39.0.0 cli:

> CREATE TABLE IF NOT EXISTS v AS VALUES(1,[1,2,3]),(2,[3,4,5]);
0 row(s) fetched.
Elapsed 0.008 seconds.

We get equal results with or w/o a subquery:

> select unnest(column2) from v where column1=2;
+-------------------+
| unnest(v.column2) |
+-------------------+
| 3                 |
| 4                 |
| 5                 |
+-------------------+
3 row(s) fetched.
Elapsed 0.011 seconds.
> select "unnest(v.column2)" from (select unnest(column2), column1 from v) where column1=2;
+-------------------+
| unnest(v.column2) |
+-------------------+
| 3                 |
| 4                 |
| 5                 |
+-------------------+
3 row(s) fetched.
Elapsed 0.015 seconds.

Filter and FilterExec are above projections with unnest, instead of below.
Ok:

> explain select unnest(column2) from v where column1=2;
+---------------+------------------------------------------------------------------------+
| plan_type     | plan                                                                   |
+---------------+------------------------------------------------------------------------+
| logical_plan  | Unnest: lists[unnest(v.column2)] structs[]                             |
|               |   Projection: v.column2 AS unnest(v.column2)                           |
|               |     Filter: v.column1 = Int64(2)                                       |
|               |       TableScan: v projection=[column1, column2]                       |
| physical_plan | UnnestExec                                                             |
|               |   RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1 |
|               |     ProjectionExec: expr=[column2@1 as unnest(v.column2)]              |
|               |       CoalesceBatchesExec: target_batch_size=8192                      |
|               |         FilterExec: column1@0 = 2                                      |
|               |           MemoryExec: partitions=1, partition_sizes=[1]                |
|               |                                                                        |
+---------------+------------------------------------------------------------------------+

Problematic:

> explain select "unnest(v.column2)" from (select unnest(column2), column1 from v) where column1=2;
+---------------+---------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                  |
+---------------+---------------------------------------------------------------------------------------+
| logical_plan  | Projection: unnest(v.column2)                                                         |
|               |   Filter: v.column1 = Int64(2)                                                        |
|               |     Unnest: lists[unnest(v.column2)] structs[]                                        |
|               |       Projection: v.column2 AS unnest(v.column2), v.column1                           |
|               |         TableScan: v projection=[column1, column2]                                    |
| physical_plan | ProjectionExec: expr=[unnest(v.column2)@0 as unnest(v.column2)]                       |
|               |   CoalesceBatchesExec: target_batch_size=8192                                         |
|               |     FilterExec: column1@1 = 2                                                         |
|               |       RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1            |
|               |         UnnestExec                                                                    |
|               |           ProjectionExec: expr=[column2@1 as unnest(v.column2), column1@0 as column1] |
|               |             MemoryExec: partitions=1, partition_sizes=[1]                             |
|               |                                                                                       |
+---------------+---------------------------------------------------------------------------------------+

Describe the solution you'd like

FilterExec is pushed down to tables that support them.

Describe alternatives you've considered

A workaround is trying to not use subqueries. However, then one cannot unest structs with multiple lists, or unnest a column more than once. Those are also unsupported.

> select unnest(unnest([[1],[1,2],[1,2,3]]));
type_coercion
caused by
This feature is not implemented: Unnest should be rewritten to LogicalPlan::Unnest before type coercion
> select unnest(column2) as x, unnest(column2) as y from v;
Error during planning: Projections require unique expression names but the expression "v.column2 AS unnest(v.column2)" at position 0 and "v.column2 AS unnest(v.column2)" at position 1 have the same name. Consider aliasing ("AS") one of them.

(note: a real case would be something like select unnest(named_struct)['foo'] as foo, unnest(named_struct)['bar'] as bar).

Additional context

Maybe related: #5364

@ahirner ahirner added the enhancement New feature or request label Jun 16, 2024
@jayzhan211
Copy link
Contributor

jayzhan211 commented Jun 16, 2024

For this kind of query select unnest(unnest([[1],[1,2],[1,2,3]]));, I think we can follow what Duckdb does.

Unnest with recursive, and max_depth

select unnest([[1],[1,2],[1,2,3]], recursive := true);
or select unnest([[1],[1,2],[1,2,3]], max_depth := 2);

@duongcongtoai
Copy link
Contributor

I think this is related #10660

@ahirner
Copy link
Author

ahirner commented Jun 16, 2024

Thanks for the pointers @duongcongtoai
I think the recursive and max_depth options can be a special workaround, but introduce a good amount of complexity. Fixing the push down makes single unnest composable in general.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jun 17, 2024

@ahirner
I found an alternative query is to move filter into the query

select "unnest(v.column2)" from (select unnest(column2), column1 from v where column1=2);

query TT
explain select "unnest(v.column2)" from (select unnest(column2), column1 from v where column1=2);
----
logical_plan
01)Projection: unnest(v.column2)
02)--Unnest: lists[unnest(v.column2)] structs[]
03)----Projection: v.column2 AS unnest(v.column2), v.column1
04)------Filter: v.column1 = Int64(2)
05)--------TableScan: v projection=[column1, column2]
physical_plan
01)ProjectionExec: expr=[unnest(v.column2)@0 as unnest(v.column2)]
02)--UnnestExec
03)----RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
04)------ProjectionExec: expr=[column2@1 as unnest(v.column2), column1@0 as column1]
05)--------CoalesceBatchesExec: target_batch_size=8192
06)----------FilterExec: column1@0 = 2
07)------------MemoryExec: partitions=1, partition_sizes=[1]

Given that, it might not be necessary to support pushing down outer filter

@ahirner
Copy link
Author

ahirner commented Jun 17, 2024

alternative query

Yes. The problem is that one can't move the filter to the inner query (views) or one shouldn't have to with other common sql constructs (sub queries, ctes).

@jayzhan211 jayzhan211 self-assigned this Jun 18, 2024
@jayzhan211
Copy link
Contributor

I think we can rewrite Unnest plan in push down filter

@alamb
Copy link
Contributor

alamb commented Jun 18, 2024

PR to do so: #10974

@ahirner
Copy link
Author

ahirner commented Jun 19, 2024

I confirm that 1b396c4 fixes our more involved case in practice.

@ahirner
Copy link
Author

ahirner commented Jun 19, 2024

Thanks for the swift action.

I just found out this second, that the change introduces a new issue when filtering on an unnested struct column (Optimizer rule 'common_sub_expression_eliminate' failed). Sorry for being late.

I'll file a new issue. The reproducer is less trivial as here.

@ahirner
Copy link
Author

ahirner commented Jun 19, 2024

Added issue that's quite minimal but maybe not entirely minimal: #10990

@alamb
Copy link
Contributor

alamb commented Jun 19, 2024

I confirm that 1b396c4 fixes our more involved case in practice.

Thank you @jayzhan211 for the quick response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants