Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataFusion] Add constant folding to expressions during logically planning #96

Closed
alamb opened this issue Apr 26, 2021 · 2 comments
Closed
Labels
arrow Changes to the arrow crate

Comments

@alamb
Copy link
Contributor

alamb commented Apr 26, 2021

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-9770

The high level idea is that if an expression can be partially evaluated during planning time then

The execution time will be increased

There may be additional optimizations possible (like removing entire LogicalPlan nodes, for example)

I recently saw the following selection expression created (by the [predicate push down|https://github.com/apache/arrow/pull/7880])

{code}
Selection: #a Eq Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1) And #a Eq Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1)
TableScan: test projection=None
{code}

This could be simplified significantly:

  1. Duplicate clauses could be removed (e.g. #a Eq Int64(1) And #a Eq Int64(1) --> #a Eq Int64(1))
  2. Algebraic simplification (e.g. if A<=B and A=5, is the same as A=5)

Inspiration can be taken from the postgres code that evaluates constant expressions https://doxygen.postgresql.org/clauses_8c.html#ac91c4055a7eb3aa6f1bc104479464b28

(in this case, for example if you have a predicate A=5 then you can basically substitute in A=5 for any expression higher up in the the plan).

Other classic optimizations include things such as A OR TRUE --> A, A AND TRUE --> TRUE, etc.

@alamb alamb added the arrow Changes to the arrow crate label Apr 26, 2021
@alamb
Copy link
Contributor Author

alamb commented Apr 26, 2021

Comment from Jorge Leitão(jorgecarleitao) @ 2020-08-23T21:51:08.856+0000:

That is a cool idea. It seems to be a case of algebraic simplification. Maybe there is already a rust library that can do that work for us?

Comment from Remi Dettai(rdettai) @ 2020-12-08T14:11:11.412+0000:

This could also be used to apply filter pushdown into the catalog, if the combination (and) of:
 - the file/partition statistics ({{col( x ) > min and col( x ) < max}})
 - the filter expression

can be simplified to {{false}}, their is no need to read that file/partition. Isn't that wonderful ? :)

Note: there is an implementation of this folding for the C++ dataset API: [https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/filter.h]

@alamb
Copy link
Contributor Author

alamb commented Apr 26, 2021

wrong repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

No branches or pull requests

2 participants