Skip to content

feat(paf): @stat(cost=...) decorator + tag known-expensive stats#808

Open
paddymul wants to merge 2 commits into
mainfrom
feat/stat-cost-decorator
Open

feat(paf): @stat(cost=...) decorator + tag known-expensive stats#808
paddymul wants to merge 2 commits into
mainfrom
feat/stat-cost-decorator

Conversation

@paddymul
Copy link
Copy Markdown
Collaborator

Summary

Phase 1 of the JS-driven progressive stats design — the metadata field only. Adds a cost: str field to StatFunc declaring the compute-cost class of a stat. "scalar" (default) means cheap — ships in the initial state response. "aggregate" is the opt-in for slow stats that a future JS-driven router will fetch via a separate WS round-trip after a debounce.

This PR ships the metadata only — no consumer yet. That's a separate PR (Phase 2). Shipping the field independently means the tags don't have to be batched with downstream changes; consumers can adopt incrementally.

Changes

  • StatFunc.cost: str — default "scalar". Validated against VALID_COSTS = ("scalar", "aggregate") at decoration time.
  • @stat(cost=...) decorator kwarg. Bad values (e.g. cost="bigly") raise ValueError loudly rather than silently bypassing a router downstream.
  • Tag the four built-in histogram stats as cost="aggregate":
    • buckaroo.customizations.pd_stats_v2.histogram
    • buckaroo.customizations.pd_stats_v2.histogram_series
    • buckaroo.customizations.pl_stats_v2.pl_histogram_series
    • buckaroo.customizations.xorq_stats_v2.histogram

These are the known per-column-querying expensive funcs — ~250 ms × 26 cols on the boston restaurant dataset via xorq datafusion, ~6.5 s total. Tagging them now is harmless until a router consumes the field.

Why these specific stats

Profile of one state_change on boston (xorq backend, 883K rows × 26 cols, instrumented):

xorq.process_table phase 2 (per-column queries) — 6.5 s
└─ histogram queries (one per column) dominate
batch_execute (scalars + value_counts)         — 360 ms

The histogram producers are the only stats that re-query the backend per column. Everything else (length, min, max, mean, std, distinct_count, top-K) is computed in the scalar batch. So the cost-class line is clear: histograms are aggregate, the rest are scalar.

value_counts itself is a borderline case — on polars/pandas it's relatively fast (~10-20 ms on 883K-row strings); on xorq it's part of the scalar batch already. Leaving it as scalar for now; a future PR can split it if needed.

Commit split

  • 84d802d3 — failing tests (3 new in TestStatDecorator: default cost, explicit aggregate, invalid cost rejection).
  • 898414f7 — implementation + tags + one more test (test_known_expensive_stats_marked_aggregate) pinning the four tagged stats.

Test plan

  • TestStatDecorator — 10/10 pass (3 new for cost + 1 new for tags + 6 existing)
  • Full Python suite: 968 passed, 1 skipped (the pre-existing flaky MCP test in editable-install worktrees)
  • No downstream consumer yet → no behavior change. Existing pipelines run unchanged.
  • CI green (pending push)

Risk

Zero behavior change. The cost field is read by nobody yet. The decorator's validation only affects new @stat(cost=...) callers, of which there are 4 (all under buckaroo/customizations/).

Next

  • Phase 2: split process_table into process_table_scalars / _aggregates keyed off the new field. New WS message types.
  • Phase 3: BuckarooStateOrchestrator JS class with the 2× adaptive debounce.

🤖 Generated with Claude Code

paddymul and others added 2 commits May 21, 2026 06:57
Phase 1 of the JS-driven progressive stats design
(plans/js-driven-stat-debounce.md). Adds a ``cost: str`` field to
``StatFunc`` declaring the stat's compute-cost class. Default
``"scalar"`` (cheap, ships in the initial state_change response).
``"aggregate"`` opts in to the slow path (histograms, value_counts,
anything per-column-querying) that the JS orchestrator will fetch
via a separate ``compute_stat_group`` round-trip after a debounce.

Three failing tests in ``TestStatDecorator``:
  - ``test_stat_default_cost_is_scalar`` — undecorated cost is scalar.
  - ``test_stat_explicit_cost_aggregate`` — ``@stat(cost="aggregate")``
    round-trips into ``StatFunc.cost``.
  - ``test_stat_invalid_cost_rejected`` — typos like ``cost="bigly"``
    raise ``ValueError`` at decoration time, not silently downstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of plans/js-driven-stat-debounce.md — the metadata field.

  - New ``StatFunc.cost: str`` (default ``"scalar"``). ``"aggregate"``
    is the opt-in for slow stats (histograms, per-column queries) that
    a future JS-driven router can fetch via a separate WS round-trip
    after a debounce.
  - New ``@stat(cost=...)`` kwarg. Validated against ``VALID_COSTS``
    at decoration time; bad values raise ``ValueError`` loudly rather
    than silently bypassing the router downstream.
  - Tag the three histogram stats — ``pd_stats_v2.histogram``,
    ``pd_stats_v2.histogram_series``, ``pl_stats_v2.pl_histogram_series``,
    ``xorq_stats_v2.histogram`` — as ``cost="aggregate"``. These are
    the known per-column-querying expensive funcs (~250 ms × N cols
    on xorq).

This commit ships the metadata only. No router consumer yet — that's
phases 2/3 in the plan. Tagging now means the consumer PR is purely
additive and tags don't have to be batched with downstream changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.3.dev26221977178

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.3.dev26221977178

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.14.3.dev26221977178" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

📖 Docs preview

🎨 Storybook preview

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

triaged Reviewed and triaged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant