feat(jlisp): sd channel — SDResult return + sd-as-arg read-only view by paddymul · Pull Request #755 · buckaroo-data/buckaroo

paddymul · 2026-05-17T14:40:11Z

Summary

Extends the jlisp interpreter so a Command can interact with the running summary-stats dict (sd) alongside the dataframe. Supersedes #744 with a cleaner design.

Three transform shapes share one mutable sd dict in the lisp env and compose freely in the same pipeline:

bare df (legacy) — passed through unchanged. All current commands.
SDResult(df, sd_updates) — interpreter merges sd_updates into the running sd via apply-result! (a per-call closure over the mutable dict).
s('sd') in command_default — transform receives a MappingProxyType read-only view of the current sd. To write, return SDResult from the same call (read-then-write).

Design doc: docs/lisp-sd-channel-plan.md.

Key decisions (from grilling pass)

SDResult is @dataclass(frozen=True) in buckaroo/jlisp/configure_utils.py, re-exported from the four Command-defining modules.
Dispatch is isinstance(result, SDResult) — no tuple-shape heuristic. Future PivotResult etc. add cases.
sd-as-arg is contractually read-only; MappingProxyType raises TypeError on top-level mutation.
apply-result! is a per-call closure (not a registered primitive) so it has access to the mutable dict while only the proxy is bound under name sd.
initial_sd is deep-copied at the buckaroo_transform boundary; caller's nested dicts can't leak mutations.
handle_ops_and_clean passes cleaning_sd as initial_sd so sd-as-arg readers see autocleaning analysis state (orig_col_name, _type, etc.) alongside upstream op writes.
s('sd') lives at index 2 in command_default (right after s('df')); transform sig is transform(df, sd, ...).

Test plan

pytest tests/unit/jlisp/ tests/unit/dataflow/ tests/unit/commands/ — passes
Full unit suite (excluding contrib/file_cache): 819 passed, 7 skipped
CI green

Tests

3 interpreter-level tests in tests/unit/jlisp/test_sd_channel.py:

test_sdresult_merges_via_apply_result_closure
test_sd_arg_is_read_only_mappingproxy
test_sd_arg_sees_upstream_sdresult_mutations

1 integration test in tests/unit/dataflow/test_sd_channel_integration.py:

test_three_shapes_compose_through_handle_ops_and_clean

TDD: failing-tests commit (b4b15bfd) was visible failing on CI before the implementation commit (5b74824) landed.

🤖 Generated with Claude Code

github-actions · 2026-05-17T14:41:54Z

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.13.5.dev25996720230

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.13.5.dev25996720230

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.13.5.dev25996720230" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

📖 Docs preview

🎨 Storybook preview

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b74824b0a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T14:42:38Z

    buckaroo_eval, raw_parse = make_interpreter(transform_lisp_primitives)

-    def buckaroo_transform(instructions, df):
+    def buckaroo_transform(instructions, df, initial_sd):


Restore transform compatibility for auto_type_df2

This required initial_sd argument leaves the existing auto_type_df2() helper in buckaroo/auto_clean/cleaning_commands.py still calling transform(full_ops, df) with the old two-argument shape, so any caller of that helper now gets a TypeError before auto-typing runs. I checked repo-wide references with rg "auto_type_df2|buckaroo_transform"; the helper is the in-repo call site that was not updated, and even after passing {} it would need to preserve its previous df-only return contract rather than leaking the new (df, sd) tuple.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-17T14:42:38Z

-        return ret_val
+
+        sd_dict = copy.deepcopy(initial_sd)
+        sd_view = MappingProxyType(sd_dict)


Deep-freeze sd before passing it to commands

Wrapping only the top-level dict with MappingProxyType does not make the nested per-column metadata read-only, so an sd-aware command can still mutate state with inputs like sd[col]["note"] = val and bypass the intended SDResult write path. Because sd is explicitly documented here as {col: {key: value}} and read-only for command authors, this scenario silently changes the live copied sd_dict instead of raising like the new tests expect for top-level writes.

Useful? React with 👍 / 👎.

The helper has been broken since Oct 2023 — get_auto_type_operations gained `metadata_f` and `recommend_f` required args in 2325d10, but auto_type_df2 still called it with a single argument. The signature change in PR #755 (transform now takes initial_sd) stacks a second TypeError on the same dead path. No in-repo callers (zero grep hits across .py/.ipynb/.md/configs) and the function has no tests or docs. Drop it along with the now-unused cleaning_classes list and configure_buckaroo / get_auto_type_operations imports; keep the SDResult re-export. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pins the contract for three jlisp transform return shapes (df-only, SDResult-return, sd-as-arg) sharing one mutable sd dict in the lisp env. Tests import SDResult from configure_utils and polars_commands; both fail at collection until the implementation lands. - tests/unit/jlisp/test_sd_channel.py — three interpreter-level tests: SDResult merge via apply-result!, sd-as-arg read-only proxy enforcement, sd-as-arg sees upstream SDResult mutations. - tests/unit/dataflow/test_sd_channel_integration.py — one end-to-end test through handle_ops_and_clean covering all three shapes composing. - docs/lisp-sd-channel-plan.md — design plan with decisions recorded from grilling pass (frozen dataclass, MappingProxyType view, per-call closure, deepcopy seed, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three transform shapes share one mutable sd dict in the lisp env: - bare df : passed through unchanged (legacy) - SDResult(df, sd_updates) : interpreter merges sd_updates into the running sd dict via apply-result! (a per-call closure over the mutable dict) - s('sd') in command_default : transform receives a MappingProxyType view; top-level mutation raises TypeError. To write, return SDResult. The two channels compose: a downstream sd-as-arg reader sees mutations from upstream SDResult ops within the same pipeline. A single command can do both (read via arg, write via SDResult). SDResult is @DataClass(frozen=True) defined in configure_utils.py and re-exported from the four Command-defining modules (all_transforms.py, polars_commands.py, pandas_commands.py, auto_clean/cleaning_commands.py) so authors import it alongside Command. buckaroo_transform(instructions, df, initial_sd) deep-copies initial_sd at the boundary and returns (df, sd). _run_df_interpreter mirrors the signature; wrap_set_df threads each form through apply-result!. handle_ops_and_clean passes cleaning_sd as initial_sd so sd-as-arg readers see autocleaning analysis state alongside upstream op writes. The to_py codegen interpreter binds {'df': 5, 'sd': {}} so sd-aware commands' transform_to_py can be invoked; authors emit standalone Python (typically df-only equivalent) since generated code has no sd. Design plan in docs/lisp-sd-channel-plan.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The empty-ops short-circuit exists to skip self.df_interpreter — which internally does df.copy()/clone() and would churn df identity, firing traitlets and causing a frontend resync of unchanged data. My initial implementation defeated that on the sd side with copy.deepcopy(initial_sd); fix is to return initial_sd as-is. Nothing mutated it (no ops ran), so identity preservation is safe and matches the df side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two no-op short-circuits return df by reference rather than letting the interpreter run and copy: one in _run_df_interpreter, one in handle_ops_and_clean. The short-circuits are load-bearing for two reasons, the second of which is non-obvious and easy to "clean up": 1. df.copy()/clone() churns DfTrait identity → traitlets fires → frontend resyncs unchanged data over the anywidget boundary. 2. During widget init, where the df/operations traits can be set in either order, creating fresh objects on the no-op path used to cascade into observer-chain infinite loops. Comments capture both reasons so future reviewers don't strip the short-circuit as redundant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The helper has been broken since Oct 2023 — get_auto_type_operations gained `metadata_f` and `recommend_f` required args in 2325d10, but auto_type_df2 still called it with a single argument. The signature change in PR #755 (transform now takes initial_sd) stacks a second TypeError on the same dead path. No in-repo callers (zero grep hits across .py/.ipynb/.md/configs) and the function has no tests or docs. Drop it along with the now-unused cleaning_classes list and configure_buckaroo / get_auto_type_operations imports; keep the SDResult re-export. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fixes the Python / Lint CI failure on this branch — paddy_format.py --check flagged these four files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… channel Mirrors #758 for the xorq backend. The xorq path doesn't go through configure_buckaroo's lisp interpreter (ibis exprs can't .copy()), so it can't reuse the SDResult machinery from #755 directly. Instead this adds an analogous sd channel inside XorqAutocleaning: - Handlers in _XORQ_OP_HANDLERS may now return either a bare expr (legacy) or (expr, sd_updates). _apply_xorq_ops accumulates the per-column sd entries across ops, merging col-by-col. - handle_ops_and_clean runs the accumulated updates through _rekey_op_sd_to_internal (the same helper PandasAutocleaning uses since #758) so orig-named entries land on buckaroo's internal a/b/c letter keys and compose cleanly with the summary_sd that XorqDataflow._get_summary_sd produces (also keyed by letter). - _xorq_search returns the filtered expr plus {col: {'highlight_phrase': [val]}} for every ibis-String column. Uses highlight_phrase (list of literal needles) rather than highlight_regex because ibis StringValue.contains is a literal substring match — matching the filter semantics on the highlight side avoids regex-metacharacter divergence. Scope: only the search command is wired today. The sd channel itself is generic — other ops can opt in by returning (expr, sd_updates). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

paddymul temporarily deployed to testpypi May 17, 2026 14:41 — with GitHub Actions Inactive

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

paddymul temporarily deployed to testpypi May 17, 2026 16:27 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi May 17, 2026 16:30 — with GitHub Actions Inactive

This was referenced May 17, 2026

SDResult.sd_updates is mutable despite frozen=True dataclass framing #757

Open

feat(jlisp): transform may return (df, sd_updates) tuple #744

Closed

paddymul temporarily deployed to testpypi May 17, 2026 16:42 — with GitHub Actions Inactive

paddymul and others added 6 commits May 17, 2026 12:43

style: apply paddy-format to sd-channel files

0ec5968

Fixes the Python / Lint CI failure on this branch — paddy_format.py --check flagged these four files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

paddymul force-pushed the feat/jlisp-sd-channel branch from e114926 to 0ec5968 Compare May 17, 2026 16:43

paddymul enabled auto-merge May 17, 2026 16:44

paddymul temporarily deployed to testpypi May 17, 2026 16:45 — with GitHub Actions Inactive

paddymul added this pull request to the merge queue May 17, 2026

Merged via the queue into main with commit 08cc2ce May 17, 2026
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(jlisp): sd channel — SDResult return + sd-as-arg read-only view#755

feat(jlisp): sd channel — SDResult return + sd-as-arg read-only view#755
paddymul merged 6 commits into
mainfrom
feat/jlisp-sd-channel

paddymul commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paddymul commented May 17, 2026

Summary

Key decisions (from grilling pass)

Test plan

Tests

Uh oh!

github-actions Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 TestPyPI package published

MCP server for Claude Code

📖 Docs preview

🎨 Storybook preview

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 17, 2026 •

edited

Loading