Skip to content

v2 stat pipeline: default= swallows errors instead of recording them alongside the fallback #687

@paddymul

Description

@paddymul

Problem

In the v2 stat framework, @stat(default=...) is a binary switch: on exception you get either a StatError (no default) or a silent Ok(default) (default set). There is no "use the fallback for display and surface the error" path.

This means any @stat that wants a display-safe fallback has to choose between (a) failing the cell entirely or (b) hiding real failures from the user.

Concrete case: the ibis histogram (#686) declares default=[] so a backend that rejects the GROUP BY still renders a blank histogram cell instead of breaking the row. With today's framework, that GROUP BY failure is invisible — no error in errs, no log, nothing.

Where

buckaroo/pluggable_analysis_framework/stat_pipeline.py:221-233:

except Exception as e:
    if sf.default is not MISSING:
        for sk in sf.provides:
            accumulator[sk.name] = Ok(sf.default)   # error vanishes
    else:
        for sk in sf.provides:
            accumulator[sk.name] = Err(
                error=e, stat_func_name=sf.name,
                column_name=column_name, inputs=kwargs.copy(),
            )

resolve_accumulator builds the errors list by scanning the accumulator for Err entries, so an Ok(default) write there is unrecoverable downstream.

Note: this is independent of #686's batch path — IbisStatPipeline Phase 1 always emits Err and ignores default, so the same @stat with default=[] behaves differently depending on whether it landed in the batch phase or the per-column phase. Worth fixing the inconsistency at the same time.

Desired behavior

  • accumulator[name] = Ok(default) so dependents and the rendered table see the fallback.
  • The exception is also surfaced in the errors list returned from process_df / process_table.
  • errs (v1-compat dict) gets the entry too, so existing consumers see it.

Sketch of fix

Two options:

A. Side-channel error list (small). Pass a mutable error list into _execute_stat_func; on default-fallback, append a StatError there while still writing Ok(default) to the accumulator. process_column / process_table merge it with errors derived from accumulator Err entries.

B. New StatResult variant: OkWithWarning(value, error) (cleaner, bigger). Every isinstance(x, Ok) site becomes isinstance(x, (Ok, OkWithWarning)). resolve_accumulator lifts the warning into the errors list while the value flows to dependents. Better long-term model; touches more code.

A is ~10 lines and contained. B is the right shape if we expect more "non-fatal" stat outcomes.

Out of scope for this issue

  • The ibis-specific batch/per-column inconsistency in Add ibis/xorq analysis backend (v2 @stat pipeline) #686 — fixing here will make them converge.
  • Whether default should be allowed at all on stats that have downstream dependents (separate question about how upstream defaults propagate).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions