feat(xorq): autocleaning + interpreter version with 5 ported commands#767
feat(xorq): autocleaning + interpreter version with 5 ported commands#767paddymul wants to merge 1 commit into
Conversation
XorqAutocleaning previously sidestepped the lisp interpreter and ran a single dict-dispatched _xorq_search handler. Route it through the same configure_buckaroo interpreter that pandas/polars use, and port four more commands (NoOp, DropCol, FillNA, DropDuplicates) into customizations/xorq_commands.py. The interpreter's df_copy fork in jlisp/configure_utils.py grew a third branch: pandas → .copy(), polars → .clone(), ibis exprs → passthrough. Ibis expressions are immutable, so transforms must return a new expr anyway and a defensive copy is both unavailable and unnecessary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📦 TestPyPI package publishedpip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.2.dev26000095273or with uv: uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.2.dev26000095273MCP server for Claude Codeclaude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.14.2.dev26000095273" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table📖 Docs preview🎨 Storybook preview |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92b652db6e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def transform_to_py(expr, col): | ||
| return f" expr = expr.drop('{col}')" |
There was a problem hiding this comment.
Generate runnable code for xorq transforms
When a user applies this command, the inherited code generator wraps these snippets as def clean(df): ... return df in configure_utils.buckaroo_to_py, but the new xorq snippets assign to expr without ever defining it. For example, a drop-column op generates a function that raises UnboundLocalError at expr = expr.drop(...) instead of providing usable generated code in operation_results['generated_py_code']; the same pattern appears in the other non-noop xorq commands.
Useful? React with 👍 / 👎.
Polars commands previously ran eager: each lisp op materialised a new DataFrame, so an N-op cleaning pipeline paid N times for what polars's query optimiser can fuse into one plan. Switch the polars autoclean conf to thread a LazyFrame through the interpreter and collect once at exit. - `AutocleaningConfig` grows two staticmethod hooks, `lazy_enter` and `lazy_exit`, defaulting to identity. Pandas inherits unchanged; xorq (when #767 lands) inherits the no-op default — ibis exprs are already lazy, so the unified pattern fits both dialects. - `NoCleaningConfPl` overrides with `df.lazy()` on entry and `df.collect() if isinstance(df, pl.LazyFrame) else df` on exit. The isinstance guard handles `GroupBy.transform`, which materialises mid-pipeline; anything downstream of a groupby runs eager and the exit becomes a no-op. - `_run_df_interpreter` wraps the interpreter call with the hooks. The no-op short-circuit fires *before* lazy_enter, preserving the by-reference identity contract the traitlets/anywidget init path depends on. - `Search.transform` switches to `df.collect_schema()` to avoid polars's PerformanceWarning when handed a LazyFrame. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
XorqAutocleaningnow flows through the sameconfigure_buckaroolisp interpreter that pandas/polars use, instead of a dict-dispatched_xorq_searchshortcut. The override onhandle_ops_and_cleanis gone; the parent's full pipeline (quick-ops → merge → interpret → make_origs → code gen) handles ibis exprs unchanged.buckaroo/customizations/xorq_commands.py—NoOp,DropCol,FillNA,DropDuplicates,Search— wired up via a newNoCleaningConfXorqinxorq_autoclean_conf.py(matching the pandas/polars conf modules).buckaroo_transforminjlisp/configure_utils.pygrew a third copy branch — pandas →.copy(), polars →.clone(), ibis exprs → pass-through (immutable, transforms return new exprs).Test plan
uv run --extra xorq pytest tests/unit/test_xorq_commands.py— 9 new tests covering each command via the interpreter, a two-op pipeline, and conf registrationuv run --extra xorq pytest tests/unit/test_xorq_buckaroo_widget.py— 37 existing widget tests still green (Search regression via the new path)uv run --extra xorq pytest tests/unit/ --ignore=tests/unit/contrib --ignore=tests/unit/file_cache— full unit suite (935 pass), confirming pandas/polars interpreter paths unaffecteduv run ruff checkon touched files — clean🤖 Generated with Claude Code