feat(paf): @stat(cost=...) decorator + tag known-expensive stats#808
Open
paddymul wants to merge 2 commits into
Open
feat(paf): @stat(cost=...) decorator + tag known-expensive stats#808paddymul wants to merge 2 commits into
paddymul wants to merge 2 commits into
Conversation
Phase 1 of the JS-driven progressive stats design
(plans/js-driven-stat-debounce.md). Adds a ``cost: str`` field to
``StatFunc`` declaring the stat's compute-cost class. Default
``"scalar"`` (cheap, ships in the initial state_change response).
``"aggregate"`` opts in to the slow path (histograms, value_counts,
anything per-column-querying) that the JS orchestrator will fetch
via a separate ``compute_stat_group`` round-trip after a debounce.
Three failing tests in ``TestStatDecorator``:
- ``test_stat_default_cost_is_scalar`` — undecorated cost is scalar.
- ``test_stat_explicit_cost_aggregate`` — ``@stat(cost="aggregate")``
round-trips into ``StatFunc.cost``.
- ``test_stat_invalid_cost_rejected`` — typos like ``cost="bigly"``
raise ``ValueError`` at decoration time, not silently downstream.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of plans/js-driven-stat-debounce.md — the metadata field.
- New ``StatFunc.cost: str`` (default ``"scalar"``). ``"aggregate"``
is the opt-in for slow stats (histograms, per-column queries) that
a future JS-driven router can fetch via a separate WS round-trip
after a debounce.
- New ``@stat(cost=...)`` kwarg. Validated against ``VALID_COSTS``
at decoration time; bad values raise ``ValueError`` loudly rather
than silently bypassing the router downstream.
- Tag the three histogram stats — ``pd_stats_v2.histogram``,
``pd_stats_v2.histogram_series``, ``pl_stats_v2.pl_histogram_series``,
``xorq_stats_v2.histogram`` — as ``cost="aggregate"``. These are
the known per-column-querying expensive funcs (~250 ms × N cols
on xorq).
This commit ships the metadata only. No router consumer yet — that's
phases 2/3 in the plan. Tagging now means the consumer PR is purely
additive and tags don't have to be batched with downstream changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
📦 TestPyPI package publishedpip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.3.dev26221977178or with uv: uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.3.dev26221977178MCP server for Claude Codeclaude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.14.3.dev26221977178" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table📖 Docs preview🎨 Storybook preview |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1 of the JS-driven progressive stats design — the metadata field only. Adds a
cost: strfield toStatFuncdeclaring the compute-cost class of a stat."scalar"(default) means cheap — ships in the initial state response."aggregate"is the opt-in for slow stats that a future JS-driven router will fetch via a separate WS round-trip after a debounce.This PR ships the metadata only — no consumer yet. That's a separate PR (Phase 2). Shipping the field independently means the tags don't have to be batched with downstream changes; consumers can adopt incrementally.
Changes
StatFunc.cost: str— default"scalar". Validated againstVALID_COSTS = ("scalar", "aggregate")at decoration time.@stat(cost=...)decorator kwarg. Bad values (e.g.cost="bigly") raiseValueErrorloudly rather than silently bypassing a router downstream.cost="aggregate":buckaroo.customizations.pd_stats_v2.histogrambuckaroo.customizations.pd_stats_v2.histogram_seriesbuckaroo.customizations.pl_stats_v2.pl_histogram_seriesbuckaroo.customizations.xorq_stats_v2.histogramThese are the known per-column-querying expensive funcs — ~250 ms × 26 cols on the boston restaurant dataset via xorq datafusion, ~6.5 s total. Tagging them now is harmless until a router consumes the field.
Why these specific stats
Profile of one state_change on boston (xorq backend, 883K rows × 26 cols, instrumented):
The histogram producers are the only stats that re-query the backend per column. Everything else (length, min, max, mean, std, distinct_count, top-K) is computed in the scalar batch. So the cost-class line is clear: histograms are aggregate, the rest are scalar.
value_countsitself is a borderline case — on polars/pandas it's relatively fast (~10-20 ms on 883K-row strings); on xorq it's part of the scalar batch already. Leaving it as scalar for now; a future PR can split it if needed.Commit split
84d802d3— failing tests (3 new inTestStatDecorator: default cost, explicit aggregate, invalid cost rejection).898414f7— implementation + tags + one more test (test_known_expensive_stats_marked_aggregate) pinning the four tagged stats.Test plan
TestStatDecorator— 10/10 pass (3 new for cost + 1 new for tags + 6 existing)Risk
Zero behavior change. The
costfield is read by nobody yet. The decorator's validation only affects new@stat(cost=...)callers, of which there are 4 (all under buckaroo/customizations/).Next
process_tableintoprocess_table_scalars/_aggregateskeyed off the new field. New WS message types.BuckarooStateOrchestratorJS class with the 2× adaptive debounce.🤖 Generated with Claude Code