Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

push down unnest filter only on non-unnest column #10989

Closed
jayzhan211 opened this issue Jun 19, 2024 · 4 comments · Fixed by #10991
Closed

push down unnest filter only on non-unnest column #10989

jayzhan211 opened this issue Jun 19, 2024 · 4 comments · Fixed by #10991
Labels
enhancement New feature or request

Comments

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jun 19, 2024

Is your feature request related to a problem or challenge?

In #10974, we push down unnest filter what ever the filter plan is, but if we filter on unnest column, we should not push down the filter plan since the unnest column is not evaluated yet.

# test unnest with filter
statement ok
create table t (a int[]) as values ([1,2,3]), ([4,5,6]);

query error DataFusion error: External error: Arrow error: Invalid argument error: Invalid comparison operation: List\(Field \{ name: "item", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\) > Int32
select ua from (select unnest(a) as ua from t) where ua > 3;

The expected result is

query
select ua from (select unnest(a) as ua from t) where ua > 3;
----
4
5
6

Describe the solution you'd like

No response

Push down filter plan only if the column is non-unnest.

Describe alternatives you've considered

No response

Additional context

No response

@jayzhan211 jayzhan211 added the enhancement New feature or request label Jun 19, 2024
@jonahgao
Copy link
Member

It seems to be supported on the main branch.

DataFusion CLI v39.0.0
> create table t (a int[]) as values ([1,2,3]), ([4,5,6]);
0 row(s) fetched.
Elapsed 0.030 seconds.

> select ua from (select unnest(a) as ua from t) where ua > 3;

+----+
| ua |
+----+
| 4  |
| 5  |
| 6  |
+----+
3 row(s) fetched.
Elapsed 0.009 seconds.

@jayzhan211
Copy link
Contributor Author

It seems to be supported on the main branch.

DataFusion CLI v39.0.0
> create table t (a int[]) as values ([1,2,3]), ([4,5,6]);
0 row(s) fetched.
Elapsed 0.030 seconds.

> select ua from (select unnest(a) as ua from t) where ua > 3;

+----+
| ua |
+----+
| 4  |
| 5  |
| 6  |
+----+
3 row(s) fetched.
Elapsed 0.009 seconds.

I still get the error 🤔

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Jun 19, 2024

It seems to be supported on the main branch.

DataFusion CLI v39.0.0
> create table t (a int[]) as values ([1,2,3]), ([4,5,6]);
0 row(s) fetched.
Elapsed 0.030 seconds.

> select ua from (select unnest(a) as ua from t) where ua > 3;

+----+
| ua |
+----+
| 4  |
| 5  |
| 6  |
+----+
3 row(s) fetched.
Elapsed 0.009 seconds.

I still get the error 🤔

Ok, my bad. It is introduced by the latest PR #10974

@jayzhan211 jayzhan211 changed the title Support filter on unnest column push down unnest filter only on non-unnest column Jun 19, 2024
@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Jun 19, 2024

At the point of push down filter, we no longer have Expr::Unnest but Expr::Column or Expr::BinaryExpr

expr: BinaryExpr(BinaryExpr { left: Column(Column { relation: None, name: "unnest(t.a)" }), op: Gt, right: Literal(Int32(3)) })

Instead of checking whether there is unnest contained in filter expression, should we move the conversion of Expr::Unnest to LogicalPlan::Unnest try_process_unnest to the optimization step, specifically at some point AFTER push down filter step? I think we can check whether unnest exists by Expr::Unnest, and also I think we don't need to specialize LogicalPlan::Unnest in push down filter step, we can deal with LogicalPlan::Projection only, Expr::Unnest should be part of the LogicalPlan::Unnest. After push down filter, we convert Expr::Unnest to LogicalPlan::Unnest

But ideally logical plan optimization should be optional, which means even the optimization step is skipped, we still expect to get the result. Based on this, it might not be a good idea to move the conversion to optmization step 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants