Skip to content

Support "pre-image" for pruning predicate evaluation #18320

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

As mentioned in #18319, it is much easier for optimizers to reason about predicates of the form <col> op <constant> expressions. They often can't optimize anywhere near as well if they have a scalar function wrapping them

This includes DataFusion's PruningPredicate

For example the predicate looking for a particular year

WHERE EXTRACT (YEAR FROM k) = 2024

Can be rewritten as

k >= 2024-01-01 AND k < 2025-01-01.

And then k is easier to pushdown and subject to range analysis, etc.

The ClickHouse paper : https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf calles these "preimage" (from the mathematical term) for this rewrite (I think toYear(k) is the equivalent of EXTRACT(YEAR from k))

Second, some functions can compute the preimage of a given function result. This is used to replace comparisons of constants with function calls on the key columns by comparing the key column value with the preimage. For example, toYear(k) = 2024 can be replaced by k >= 2024-01-01 && k < 2025-01-01.

Describe the solution you'd like

I would like DataFusion to do this rewrite too

Describe alternatives you've considered

This might be possible to do by implementing ScalarUDFImpl::simplify for date_part 👍
https://github.com/apache/datafusion/blob/main/datafusion/functions/src/datetime/date_part.rs

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions