Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10685: [Rust] [DataFusion] Added support for Join on filter-pushdown optimizer. #8738

Closed
wants to merge 2 commits into from
Closed

Conversation

jorgecarleitao
Copy link
Member

This PR extends the filter pushdown optimizer to support nodes of multiple children. In the context of the join, this allows to push filters down the joins. E.g.

"\
Filter: #a LtEq Int64(1)\
\n  Join: a = a\
\n    TableScan: test projection=None\
\n    TableScan: test projection=None"

is optimized to

"\
Join: a = a\
\n  Filter: #a LtEq Int64(1)\
\n    TableScan: test projection=None\
\n  Filter: #a LtEq Int64(1)\
\n    TableScan: test projection=None"

This also reduces the complexity of the optimizer by making it perform a single pass on the plan.

Naturally, this has a major implication in performance as the join is an expensive operation.

@jorgecarleitao
Copy link
Member Author

jorgecarleitao commented Nov 22, 2020

fyi @andygrove and @alamb :)

left_cols,
right_cols,
)?
.join(&right.to_logical_plan(), join_type, left_cols, right_cols)?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unrelated: it is just a small simplification to the signature of LogicalPlanBuilder::join to not require the Arc.

@github-actions
Copy link

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done a thorough review but I pulled these changes and test them against TPC-H query 12 and it worked great.

@andygrove
Copy link
Member

@jorgecarleitao could you try pushing an empty commit to trigger CI again?

@andygrove andygrove closed this in ca6e838 Nov 23, 2020
@jorgecarleitao jorgecarleitao deleted the filter_push branch November 23, 2020 16:47
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…hdown optimizer.

This PR extends the filter pushdown optimizer to support nodes of multiple children. In the context of the `join`, this allows to push filters down the joins. E.g.

```
"\
Filter: #a LtEq Int64(1)\
\n  Join: a = a\
\n    TableScan: test projection=None\
\n    TableScan: test projection=None"
```

is optimized to

```
"\
Join: a = a\
\n  Filter: #a LtEq Int64(1)\
\n    TableScan: test projection=None\
\n  Filter: #a LtEq Int64(1)\
\n    TableScan: test projection=None"
```

This also reduces the complexity of the optimizer by making it perform a single pass on the plan.

Naturally, this has a major implication in performance as the `join` is an expensive operation.

Closes apache#8738 from jorgecarleitao/filter_push

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants