Stats/interval propagation should degrade to unbounded intervals on internal errors, not fail the query

## Background

Interval and statistics propagation in DataFusion is a best-effort optimization: it refines bounds on expressions to enable better pruning, filter pushdown, and join planning, but it is never *required* for query correctness. If propagation can't compute precise bounds for an expression, the optimizer should fall back to an unbounded interval (or unknown distribution) and continue planning.

Today, several propagation paths return `Result<Interval>` and `?`-propagate errors all the way out to the user as `Internal error: ...`. A single arithmetic combination that hasn't been wired through the type-coercion plumbing — or any other unexpected failure inside `Interval` arithmetic — turns into a hard query failure rather than degraded statistics.

A concrete recent example: #22027 fixed five `assert_eq_or_internal_err!` calls in `Interval::mul`/`div`/`intersect`/`union`/`contains` that were firing for `Decimal128(38, 10) / Decimal128(20, 0)` and similar mismatched-type cases. The underlying bug was real (missing type coercion), but the way it surfaced — as an internal error from a plain SQL `numeric / bigint` query — was disproportionate to the actual problem (we couldn't refine bounds for one expression).

## Reproducer (current behavior, prior to #22027)

```sql
CREATE TABLE t(c1 BIGINT) AS VALUES (1::bigint), (2::bigint), (10::bigint);

SELECT
  c1,
  CASE WHEN c1 = 0 THEN 100.0
       ELSE ROUND((1.0 - (c1::numeric / c1)) * 100, 2)
  END AS rate
FROM t
WHERE (1.0 - (c1::numeric / c1)) * 100 < 95.0;
```
fails in `datafusion-cli` with:
```
Internal error: Assertion failed: ... Intervals must have the same data type for division
```

After #22027 this specific path returns rows. But any future operator/type pair that isn't fully coerced will reintroduce the same shape of failure — that's what this issue is about preventing structurally.

## Proposal

Treat interval/stats propagation as *advisory*: if any error escapes `Interval` arithmetic during propagation, log a debug warning and substitute an unbounded interval (or `Distribution::new_generic` with an unknown range), then continue optimizing.

### Natural seams

The two main entry points where errors should be caught and converted into "no information":

1. **`BinaryExpr::evaluate_bounds`** in `datafusion/physical-expr/src/expressions/binary.rs` (around line 411). Currently:
   ```rust
   fn evaluate_bounds(&self, children: &[&Interval]) -> Result<Interval> {
       let left_interval = children[0];
       let right_interval = children[1];
       apply_operator(&self.op, left_interval, right_interval)
   }
   ```
   Wrap the `apply_operator` call so that on `Err`, we log and return `Interval::make_unbounded(&self.data_type(...))`.

2. **`apply_operator` callers in `statistics.rs`** (`datafusion/expr-common/src/statistics.rs` lines 738 and 763). Both compute a `range_evaluation` from operand ranges; on error they should degrade to a `Bernoulli(unknown)` (line 738) or unbounded `Distribution::new_generic` (line 763) rather than failing the entire stats computation.

`propagate_constraints` in `BinaryExpr` and the CP solver paths in `cp_solver.rs` (`update_ranges`, `propagate_arithmetic`, etc.) likely deserve the same treatment — failure to propagate should yield "no refinement" (return `Ok(Some(vec![]))` or skip the refinement), not bubble out.

### Considerations

- **Visibility**: silently swallowing internal errors masks real bugs in interval arithmetic. Mitigations:
  - Always log at `warn!` or `debug!` with the operator and operand types.
  - Gate the fallback on `DataFusionError::Internal` specifically — let `External`, `IoError`, `ResourcesExhausted`, etc. continue to propagate.
  - Add a session config like `optimizer.fail_on_interval_error` (default `false`) so tests and CI can opt into the strict behavior and catch regressions.
- **Test coverage**: existing tests rely on errors propagating in some cases (e.g. CP-solver edge cases). Each call site needs a careful audit to make sure "no information" is the right fallback for that specific function's contract.
- **Scope**: this is a behavioral change, not a bug fix. Worth landing on its own with explicit reviewer sign-off, separate from #22027.

## Alternatives

- Make every `Interval` method infallible by always returning an unbounded interval on internal failure — pushes the policy down into the data structure but loses signal for legitimately fallible cases like cast errors during coercion.
- Status quo + fix bugs as they're reported. This is what we've been doing; the cost is that each new operator/type combination that wasn't anticipated turns into a user-visible regression (cf. #22027).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stats/interval propagation should degrade to unbounded intervals on internal errors, not fail the query #22028

Background

Reproducer (current behavior, prior to #22027)

Proposal

Natural seams

Considerations

Alternatives

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Stats/interval propagation should degrade to unbounded intervals on internal errors, not fail the query #22028

Description

Background

Reproducer (current behavior, prior to #22027)

Proposal

Natural seams

Considerations

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions