Skip to content

Optimize the evaluation of DATE_TRUNC(<col>) == <constant>) when pushed down #18319

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Filing on behalf of @colinmarc and @drin so we can get more community help

It is much easier for table providers such as the build in parquet one as well as iceberg to push down <col> op <constant> expressions. They often don't push the predicates down if they have a scalar function wrapping them

For example using the predicate

date_trunc(part, column) <= constant_rhs 

Will not typically be able to be pushed down

However, this equivalent expression can be

column <= date_trunc(part, date_add(constant_rhs, INTERVAL 1 part)

Describe the solution you'd like

It would be good if DataFusion could do this type of rewrite

Describe alternatives you've considered

This seems to fit well into the existing simply expressions framework:
https://github.com/apache/datafusion/tree/main/datafusion/optimizer/src/simplify_expressions

And we already do something very similar for casts in unwrap casts: https://github.com/apache/datafusion/blob/e12ef3ae90677fe4b1bc548feea2b3082eecdaa2/datafusion/optimizer/src/simplify_expressions/unwrap_cast.rs#

For example

CAST(x AS float) = 5

is rewritten to

x = CAST(5 as float)

The code for that is here:

Perhaps we can follow a similar model for date_trunc

Note There is already a ScalarUDFImpl::simplify for simplifying functions, however that method doesn't get any part of the larger expression (the = in the above code) so we might have to extend the API somewhat

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceMake DataFusion faster

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions