-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
As mentioned in #18319, it is much easier for optimizers to reason about predicates of the form <col> op <constant> expressions. They often can't optimize anywhere near as well if they have a scalar function wrapping them
This includes DataFusion's PruningPredicate
For example the predicate looking for a particular year
WHERE EXTRACT (YEAR FROM k) = 2024Can be rewritten as
k >= 2024-01-01 AND k < 2025-01-01.And then k is easier to pushdown and subject to range analysis, etc.
The ClickHouse paper : https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf calles these "preimage" (from the mathematical term) for this rewrite (I think toYear(k) is the equivalent of EXTRACT(YEAR from k))
Second, some functions can compute the preimage of a given function result. This is used to replace comparisons of constants with function calls on the key columns by comparing the key column value with the preimage. For example, toYear(k) = 2024 can be replaced by k >= 2024-01-01 && k < 2025-01-01.
Describe the solution you'd like
I would like DataFusion to do this rewrite too
Describe alternatives you've considered
This might be possible to do by implementing ScalarUDFImpl::simplify for date_part 👍
https://github.com/apache/datafusion/blob/main/datafusion/functions/src/datetime/date_part.rs
Additional context
No response