feat(jlisp): sd channel — SDResult return + sd-as-arg read-only view#755
Conversation
📦 TestPyPI package publishedpip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.13.5.dev25996720230or with uv: uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.13.5.dev25996720230MCP server for Claude Codeclaude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.13.5.dev25996720230" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table📖 Docs preview🎨 Storybook preview |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5b74824b0a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| buckaroo_eval, raw_parse = make_interpreter(transform_lisp_primitives) | ||
|
|
||
| def buckaroo_transform(instructions, df): | ||
| def buckaroo_transform(instructions, df, initial_sd): |
There was a problem hiding this comment.
Restore transform compatibility for auto_type_df2
This required initial_sd argument leaves the existing auto_type_df2() helper in buckaroo/auto_clean/cleaning_commands.py still calling transform(full_ops, df) with the old two-argument shape, so any caller of that helper now gets a TypeError before auto-typing runs. I checked repo-wide references with rg "auto_type_df2|buckaroo_transform"; the helper is the in-repo call site that was not updated, and even after passing {} it would need to preserve its previous df-only return contract rather than leaking the new (df, sd) tuple.
Useful? React with 👍 / 👎.
| return ret_val | ||
|
|
||
| sd_dict = copy.deepcopy(initial_sd) | ||
| sd_view = MappingProxyType(sd_dict) |
There was a problem hiding this comment.
Deep-freeze sd before passing it to commands
Wrapping only the top-level dict with MappingProxyType does not make the nested per-column metadata read-only, so an sd-aware command can still mutate state with inputs like sd[col]["note"] = val and bypass the intended SDResult write path. Because sd is explicitly documented here as {col: {key: value}} and read-only for command authors, this scenario silently changes the live copied sd_dict instead of raising like the new tests expect for top-level writes.
Useful? React with 👍 / 👎.
The helper has been broken since Oct 2023 — get_auto_type_operations gained `metadata_f` and `recommend_f` required args in 2325d10, but auto_type_df2 still called it with a single argument. The signature change in PR #755 (transform now takes initial_sd) stacks a second TypeError on the same dead path. No in-repo callers (zero grep hits across .py/.ipynb/.md/configs) and the function has no tests or docs. Drop it along with the now-unused cleaning_classes list and configure_buckaroo / get_auto_type_operations imports; keep the SDResult re-export. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pins the contract for three jlisp transform return shapes (df-only, SDResult-return, sd-as-arg) sharing one mutable sd dict in the lisp env. Tests import SDResult from configure_utils and polars_commands; both fail at collection until the implementation lands. - tests/unit/jlisp/test_sd_channel.py — three interpreter-level tests: SDResult merge via apply-result!, sd-as-arg read-only proxy enforcement, sd-as-arg sees upstream SDResult mutations. - tests/unit/dataflow/test_sd_channel_integration.py — one end-to-end test through handle_ops_and_clean covering all three shapes composing. - docs/lisp-sd-channel-plan.md — design plan with decisions recorded from grilling pass (frozen dataclass, MappingProxyType view, per-call closure, deepcopy seed, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three transform shapes share one mutable sd dict in the lisp env:
- bare df : passed through unchanged (legacy)
- SDResult(df, sd_updates) : interpreter merges sd_updates into the
running sd dict via apply-result! (a per-call
closure over the mutable dict)
- s('sd') in command_default : transform receives a MappingProxyType
view; top-level mutation raises TypeError.
To write, return SDResult.
The two channels compose: a downstream sd-as-arg reader sees mutations
from upstream SDResult ops within the same pipeline. A single command
can do both (read via arg, write via SDResult).
SDResult is @DataClass(frozen=True) defined in configure_utils.py and
re-exported from the four Command-defining modules (all_transforms.py,
polars_commands.py, pandas_commands.py, auto_clean/cleaning_commands.py)
so authors import it alongside Command.
buckaroo_transform(instructions, df, initial_sd) deep-copies initial_sd
at the boundary and returns (df, sd). _run_df_interpreter mirrors the
signature; wrap_set_df threads each form through apply-result!.
handle_ops_and_clean passes cleaning_sd as initial_sd so sd-as-arg
readers see autocleaning analysis state alongside upstream op writes.
The to_py codegen interpreter binds {'df': 5, 'sd': {}} so sd-aware
commands' transform_to_py can be invoked; authors emit standalone
Python (typically df-only equivalent) since generated code has no sd.
Design plan in docs/lisp-sd-channel-plan.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The empty-ops short-circuit exists to skip self.df_interpreter — which internally does df.copy()/clone() and would churn df identity, firing traitlets and causing a frontend resync of unchanged data. My initial implementation defeated that on the sd side with copy.deepcopy(initial_sd); fix is to return initial_sd as-is. Nothing mutated it (no ops ran), so identity preservation is safe and matches the df side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two no-op short-circuits return df by reference rather than letting the interpreter run and copy: one in _run_df_interpreter, one in handle_ops_and_clean. The short-circuits are load-bearing for two reasons, the second of which is non-obvious and easy to "clean up": 1. df.copy()/clone() churns DfTrait identity → traitlets fires → frontend resyncs unchanged data over the anywidget boundary. 2. During widget init, where the df/operations traits can be set in either order, creating fresh objects on the no-op path used to cascade into observer-chain infinite loops. Comments capture both reasons so future reviewers don't strip the short-circuit as redundant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The helper has been broken since Oct 2023 — get_auto_type_operations gained `metadata_f` and `recommend_f` required args in 2325d10, but auto_type_df2 still called it with a single argument. The signature change in PR #755 (transform now takes initial_sd) stacks a second TypeError on the same dead path. No in-repo callers (zero grep hits across .py/.ipynb/.md/configs) and the function has no tests or docs. Drop it along with the now-unused cleaning_classes list and configure_buckaroo / get_auto_type_operations imports; keep the SDResult re-export. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes the Python / Lint CI failure on this branch — paddy_format.py --check flagged these four files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e114926 to
0ec5968
Compare
… channel Mirrors #758 for the xorq backend. The xorq path doesn't go through configure_buckaroo's lisp interpreter (ibis exprs can't .copy()), so it can't reuse the SDResult machinery from #755 directly. Instead this adds an analogous sd channel inside XorqAutocleaning: - Handlers in _XORQ_OP_HANDLERS may now return either a bare expr (legacy) or (expr, sd_updates). _apply_xorq_ops accumulates the per-column sd entries across ops, merging col-by-col. - handle_ops_and_clean runs the accumulated updates through _rekey_op_sd_to_internal (the same helper PandasAutocleaning uses since #758) so orig-named entries land on buckaroo's internal a/b/c letter keys and compose cleanly with the summary_sd that XorqDataflow._get_summary_sd produces (also keyed by letter). - _xorq_search returns the filtered expr plus {col: {'highlight_phrase': [val]}} for every ibis-String column. Uses highlight_phrase (list of literal needles) rather than highlight_regex because ibis StringValue.contains is a literal substring match — matching the filter semantics on the highlight side avoids regex-metacharacter divergence. Scope: only the search command is wired today. The sd channel itself is generic — other ops can opt in by returning (expr, sd_updates). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Extends the jlisp interpreter so a Command can interact with the running summary-stats dict (
sd) alongside the dataframe. Supersedes #744 with a cleaner design.Three transform shapes share one mutable sd dict in the lisp env and compose freely in the same pipeline:
SDResult(df, sd_updates)— interpreter mergessd_updatesinto the running sd viaapply-result!(a per-call closure over the mutable dict).s('sd')incommand_default— transform receives aMappingProxyTyperead-only view of the current sd. To write, returnSDResultfrom the same call (read-then-write).Design doc:
docs/lisp-sd-channel-plan.md.Key decisions (from grilling pass)
SDResultis@dataclass(frozen=True)inbuckaroo/jlisp/configure_utils.py, re-exported from the fourCommand-defining modules.isinstance(result, SDResult)— no tuple-shape heuristic. FuturePivotResultetc. add cases.MappingProxyTyperaisesTypeErroron top-level mutation.apply-result!is a per-call closure (not a registered primitive) so it has access to the mutable dict while only the proxy is bound under namesd.initial_sdis deep-copied at thebuckaroo_transformboundary; caller's nested dicts can't leak mutations.handle_ops_and_cleanpassescleaning_sdasinitial_sdso sd-as-arg readers see autocleaning analysis state (orig_col_name, _type, etc.) alongside upstream op writes.s('sd')lives at index 2 incommand_default(right afters('df')); transform sig istransform(df, sd, ...).Test plan
pytest tests/unit/jlisp/ tests/unit/dataflow/ tests/unit/commands/— passesTests
3 interpreter-level tests in
tests/unit/jlisp/test_sd_channel.py:test_sdresult_merges_via_apply_result_closuretest_sd_arg_is_read_only_mappingproxytest_sd_arg_sees_upstream_sdresult_mutations1 integration test in
tests/unit/dataflow/test_sd_channel_integration.py:test_three_shapes_compose_through_handle_ops_and_cleanTDD: failing-tests commit (b4b15bfd) was visible failing on CI before the implementation commit (5b74824) landed.
🤖 Generated with Claude Code