-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Is your feature request related to a problem or challenge?
In #21077, we set distinct_count to Exact(1) when a filter predicate collapses a column interval to a single value. This works for numeric types (Int8, Int16, Int32, Int64, UInt32, Float32, Float64) but not for temporal types (Timestamp, Interval, Duration).
Despite next_value_helper() having implementations for these types, the constraint propagation solver does not collapse equality predicates on temporal columns to single-value intervals.
For example, WHERE ts = '2021-01-01T00:00:00' should produce NDV=Exact(1) but currently stays at the original Inexact NDV.
Describe the solution you'd like
Investigate why interval analysis does not collapse temporal equality predicates despite next_value support, and fix the root cause. Alternatively, detect equality predicates directly in collect_new_statistics, bypassing interval analysis for the NDV computation.
Describe alternatives you've considered
The direct equality detection approach proposed in #21109 (for strings) could also cover temporal types in one pass.
Additional context
- Part of EPIC: Making use of NDVs (number of distinct values) in DataFusion #20766
- Related: feat: Set NDV to Exact(1) for numeric equality filter predicates #21077 (NDV=Exact(1) for numeric types), feat: Extend single-value NDV optimization to string types #21109 (string types)