You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Interval and statistics propagation in DataFusion is a best-effort optimization: it refines bounds on expressions to enable better pruning, filter pushdown, and join planning, but it is never required for query correctness. If propagation can't compute precise bounds for an expression, the optimizer should fall back to an unbounded interval (or unknown distribution) and continue planning.
Today, several propagation paths return Result<Interval> and ?-propagate errors all the way out to the user as Internal error: .... A single arithmetic combination that hasn't been wired through the type-coercion plumbing — or any other unexpected failure inside Interval arithmetic — turns into a hard query failure rather than degraded statistics.
A concrete recent example: #22027 fixed five assert_eq_or_internal_err! calls in Interval::mul/div/intersect/union/contains that were firing for Decimal128(38, 10) / Decimal128(20, 0) and similar mismatched-type cases. The underlying bug was real (missing type coercion), but the way it surfaced — as an internal error from a plain SQL numeric / bigint query — was disproportionate to the actual problem (we couldn't refine bounds for one expression).
CREATETABLEt(c1 BIGINT) ASVALUES (1::bigint), (2::bigint), (10::bigint);
SELECT
c1,
CASE WHEN c1 =0 THEN 100.0
ELSE ROUND((1.0- (c1::numeric/ c1)) *100, 2)
END AS rate
FROM t
WHERE (1.0- (c1::numeric/ c1)) *100<95.0;
fails in datafusion-cli with:
Internal error: Assertion failed: ... Intervals must have the same data type for division
After #22027 this specific path returns rows. But any future operator/type pair that isn't fully coerced will reintroduce the same shape of failure — that's what this issue is about preventing structurally.
Proposal
Treat interval/stats propagation as advisory: if any error escapes Interval arithmetic during propagation, log a debug warning and substitute an unbounded interval (or Distribution::new_generic with an unknown range), then continue optimizing.
Natural seams
The two main entry points where errors should be caught and converted into "no information":
BinaryExpr::evaluate_bounds in datafusion/physical-expr/src/expressions/binary.rs (around line 411). Currently:
Wrap the apply_operator call so that on Err, we log and return Interval::make_unbounded(&self.data_type(...)).
apply_operator callers in statistics.rs (datafusion/expr-common/src/statistics.rs lines 738 and 763). Both compute a range_evaluation from operand ranges; on error they should degrade to a Bernoulli(unknown) (line 738) or unbounded Distribution::new_generic (line 763) rather than failing the entire stats computation.
propagate_constraints in BinaryExpr and the CP solver paths in cp_solver.rs (update_ranges, propagate_arithmetic, etc.) likely deserve the same treatment — failure to propagate should yield "no refinement" (return Ok(Some(vec![])) or skip the refinement), not bubble out.
Considerations
Visibility: silently swallowing internal errors masks real bugs in interval arithmetic. Mitigations:
Always log at warn! or debug! with the operator and operand types.
Gate the fallback on DataFusionError::Internal specifically — let External, IoError, ResourcesExhausted, etc. continue to propagate.
Add a session config like optimizer.fail_on_interval_error (default false) so tests and CI can opt into the strict behavior and catch regressions.
Test coverage: existing tests rely on errors propagating in some cases (e.g. CP-solver edge cases). Each call site needs a careful audit to make sure "no information" is the right fallback for that specific function's contract.
Make every Interval method infallible by always returning an unbounded interval on internal failure — pushes the policy down into the data structure but loses signal for legitimately fallible cases like cast errors during coercion.
Background
Interval and statistics propagation in DataFusion is a best-effort optimization: it refines bounds on expressions to enable better pruning, filter pushdown, and join planning, but it is never required for query correctness. If propagation can't compute precise bounds for an expression, the optimizer should fall back to an unbounded interval (or unknown distribution) and continue planning.
Today, several propagation paths return
Result<Interval>and?-propagate errors all the way out to the user asInternal error: .... A single arithmetic combination that hasn't been wired through the type-coercion plumbing — or any other unexpected failure insideIntervalarithmetic — turns into a hard query failure rather than degraded statistics.A concrete recent example: #22027 fixed five
assert_eq_or_internal_err!calls inInterval::mul/div/intersect/union/containsthat were firing forDecimal128(38, 10) / Decimal128(20, 0)and similar mismatched-type cases. The underlying bug was real (missing type coercion), but the way it surfaced — as an internal error from a plain SQLnumeric / bigintquery — was disproportionate to the actual problem (we couldn't refine bounds for one expression).Reproducer (current behavior, prior to #22027)
fails in
datafusion-cliwith:After #22027 this specific path returns rows. But any future operator/type pair that isn't fully coerced will reintroduce the same shape of failure — that's what this issue is about preventing structurally.
Proposal
Treat interval/stats propagation as advisory: if any error escapes
Intervalarithmetic during propagation, log a debug warning and substitute an unbounded interval (orDistribution::new_genericwith an unknown range), then continue optimizing.Natural seams
The two main entry points where errors should be caught and converted into "no information":
BinaryExpr::evaluate_boundsindatafusion/physical-expr/src/expressions/binary.rs(around line 411). Currently:Wrap the
apply_operatorcall so that onErr, we log and returnInterval::make_unbounded(&self.data_type(...)).apply_operatorcallers instatistics.rs(datafusion/expr-common/src/statistics.rslines 738 and 763). Both compute arange_evaluationfrom operand ranges; on error they should degrade to aBernoulli(unknown)(line 738) or unboundedDistribution::new_generic(line 763) rather than failing the entire stats computation.propagate_constraintsinBinaryExprand the CP solver paths incp_solver.rs(update_ranges,propagate_arithmetic, etc.) likely deserve the same treatment — failure to propagate should yield "no refinement" (returnOk(Some(vec![]))or skip the refinement), not bubble out.Considerations
warn!ordebug!with the operator and operand types.DataFusionError::Internalspecifically — letExternal,IoError,ResourcesExhausted, etc. continue to propagate.optimizer.fail_on_interval_error(defaultfalse) so tests and CI can opt into the strict behavior and catch regressions.Alternatives
Intervalmethod infallible by always returning an unbounded interval on internal failure — pushes the policy down into the data structure but loses signal for legitimately fallible cases like cast errors during coercion.🤖 Generated with Claude Code