feat(polars-search): plumb search term to JS as highlight_regex via SDResult by paddymul · Pull Request #758 · buckaroo-data/buckaroo

paddymul · 2026-05-17T17:08:14Z

Summary

Polars Search.transform returns SDResult(filtered_df, sd_updates) so the search term flows into cleaning_sd as highlight_regex on every polars-String column. DefaultMainStyling.style_column reads highlight_phrase / highlight_regex / highlight_color off col_meta and threads them into the string displayer_args, where the JS side already renders matches as <mark>.

First concrete consumer of the SDResult channel from #755 (which superseded #744).

Replaces #745 — rebased onto main after #755 merged, with the obsolete (df, sd_updates) tuple commit dropped and Search ported to return SDResult instead.

Supporting changes

autocleaning.handle_ops_and_clean rekeys op-supplied sd entries from orig col names onto buckaroo's internal a/b/c letter keys, so they merge into the matching analysis entry instead of sitting alongside as orphans. Runs after make_origs (which uses the keys as column names on cleaned_df).
style_column falls back to obj when _type is absent, so a stray op-only entry (no analysis ran for that column) can't KeyError the styling pass.

Test plan

pytest tests/unit/dataflow/ tests/unit/jlisp/ tests/unit/commands/polars_command_test.py — 147 pass
Full unit suite (excluding contrib/file_cache): 825 passed, 7 skipped
CI green

Tests added

test_sdresult_lands_in_cleaning_sd_through_handle_ops_and_clean — wiring of SDResult.sd_updates into cleaning_sd via handle_ops_and_clean
test_search_threads_highlight_regex_into_cleaning_sd_under_rename — Search contributes highlight_regex keyed by the renamed (a/b/c) column, not the orig name
test_default_main_styling_emits_highlight_regex_into_displayer_args — style_column copies highlight_regex into the string displayer_args
test_style_column_handles_col_meta_missing_type — fallback to obj when _type absent
test_search_op_delivers_highlight_regex_into_displayer_args — e2e through PolarsBuckarooInfiniteWidget into df_viewer_config.column_config
test_column_config_overrides_preserves_highlight_phrase_and_color — user-supplied overrides survive the merge

🤖 Generated with Claude Code

…DResult Polars Search.transform now returns SDResult(filtered_df, sd_updates) so the search term flows into cleaning_sd as `highlight_regex` on every polars-String column. DefaultMainStyling.style_column reads `highlight_phrase` / `highlight_regex` / `highlight_color` off col_meta and threads them into the string displayer_args, where the JS-side displayer already renders matches as <mark>. First concrete consumer of the SDResult channel from #755. Supporting changes: - autocleaning.handle_ops_and_clean rekeys op-supplied sd entries from orig col names onto buckaroo's internal a/b/c letter keys, so they merge into the matching analysis entry instead of sitting alongside as orphans. Runs after make_origs (which uses the keys as column names on cleaned_df). - style_column falls back to obj when `_type` is absent, so a stray op-only entry (no analysis ran for that column) can't KeyError the styling pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-17T17:10:00Z

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.0.dev25997510667

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.0.dev25997510667

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.14.0.dev25997510667" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

📖 Docs preview

🎨 Storybook preview

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: de38f9c3f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T17:10:24Z

+    rewrites = dict(old_col_new_col(cleaned_df))
+    out = {}
+    for col, kv in cleaning_sd.items():
+        target = rewrites.get(col, col)


Do not rekey existing analysis metadata

This pass rewrites every cleaning_sd key that happens to match an original column name, but the dict already contains analysis entries keyed by Buckaroo's internal names. With a frame like ['b', 'foo'] and any non-empty cleaning method, the analysis entry for internal column b (the second column) matches rewrites['b'] == 'a' here and gets merged into column a, so styling/metadata such as _type, orig_col_name, and highlights can be applied to the wrong column. The rekey needs to distinguish op-contributed original-name entries from existing rewritten-name analysis entries before applying old_col_new_col.

Useful? React with 👍 / 👎.

Good catch — fixed in f57289f. With cols ['b', 'foo'] and any non-empty cleaning method the analysis entry for internal b (orig foo) was being merged into a because rewrites['b']=='a'. Both pandas (analysis_management.py:42) and polars (polars_analysis_management.py:203) analysis pipelines set rewritten_col_name on every entry, and op-contributed SDResult entries don't — so the rekey now skips any entry carrying that marker. Failing repro test landed in 87419f2 first.

…ollide with letters Failing test for the codex review on #758. With df cols ['b', 'foo'], analysis assigns internal a='b' and b='foo'. _rekey_op_sd_to_internal sees cleaning_sd['b'] (an analysis entry for orig 'foo') and rewrites it to 'a' because rewrites['b']=='a' — merging the second column's analysis metadata into the first column's entry and dropping the second column's entry entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…name marker Codex review on #758 found that _rekey_op_sd_to_internal corrupted analysis metadata for frames where an orig col name collides with an internal letter key. Example: cols ['b', 'foo'] → internal a='b' and b='foo'. The pass would see cleaning_sd['b'] (the analysis entry for orig 'foo'), look up rewrites['b']=='a', and merge it into 'a' — so the first column ends up with the second column's _type, orig_col_name, and any contributed highlights. Both pandas and polars analysis pipelines set `rewritten_col_name` on every entry (analysis_management.py:42, polars_analysis_management.py:203); op-contributed SDResult entries do not. Skip rekeying for any entry carrying the marker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fixes paddy-format --check failure on main (introduced via #758) that was blocking CI on this branch. Single-file mechanical reformat — no test behavior changes. Unblocks #759. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…SDResult Mirrors #758 for the pandas backend. Pandas `Search.transform` now returns `SDResult(filtered_df, sd_updates)` so the search term flows into `cleaning_sd` as `highlight_phrase` on every string/object column. Together with the existing `style_column` reader (added in #758) the phrase lands in the string `displayer_args`, where the JS-side displayer already renders matches as `<mark>`. Uses `highlight_phrase` (list of literal needles) rather than the `highlight_regex` (single regex string) variant polars emits because `search_df_str` uses `Series.str.find` — a literal substring match. Matching the filter semantics on the highlight side avoids the case where a search term containing regex metacharacters would filter on literal text but try to highlight as a regex. The string-column detection mirrors `search_df_str`: union of `select_dtypes("string")` and `select_dtypes("object")` columns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… channel Mirrors #758 for the xorq backend. The xorq path doesn't go through configure_buckaroo's lisp interpreter (ibis exprs can't .copy()), so it can't reuse the SDResult machinery from #755 directly. Instead this adds an analogous sd channel inside XorqAutocleaning: - Handlers in _XORQ_OP_HANDLERS may now return either a bare expr (legacy) or (expr, sd_updates). _apply_xorq_ops accumulates the per-column sd entries across ops, merging col-by-col. - handle_ops_and_clean runs the accumulated updates through _rekey_op_sd_to_internal (the same helper PandasAutocleaning uses since #758) so orig-named entries land on buckaroo's internal a/b/c letter keys and compose cleanly with the summary_sd that XorqDataflow._get_summary_sd produces (also keyed by letter). - _xorq_search returns the filtered expr plus {col: {'highlight_phrase': [val]}} for every ibis-String column. Uses highlight_phrase (list of literal needles) rather than highlight_regex because ibis StringValue.contains is a literal substring match — matching the filter semantics on the highlight side avoids regex-metacharacter divergence. Scope: only the search command is wired today. The sd channel itself is generic — other ops can opt in by returning (expr, sd_updates). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

paddymul mentioned this pull request May 17, 2026

feat(polars-search): plumb search term to JS as highlight_regex #745

Closed

2 tasks

paddymul temporarily deployed to testpypi May 17, 2026 17:09 — with GitHub Actions Inactive

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

paddymul and others added 2 commits May 17, 2026 13:14

paddymul had a problem deploying to testpypi May 17, 2026 17:16 — with GitHub Actions Error

Merge branch 'main' into feat/polars-search-highlight-sd-rebased

934d2a8

paddymul enabled auto-merge May 17, 2026 17:18

paddymul temporarily deployed to testpypi May 17, 2026 17:22 — with GitHub Actions Inactive

paddymul added this pull request to the merge queue May 17, 2026

Merged via the queue into main with commit 22fe196 May 17, 2026
26 of 27 checks passed

paddymul mentioned this pull request May 17, 2026

feat(styling): init_sd as augmentation channel (nested merge + delete_keys + demo) #748

Closed

4 tasks

This was referenced May 17, 2026

feat(pandas-search): plumb search term to JS as highlight_phrase via SDResult #764

Merged

feat(xorq-search): plumb search term to JS as highlight_phrase via sd channel #765

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(polars-search): plumb search term to JS as highlight_regex via SDResult#758

feat(polars-search): plumb search term to JS as highlight_regex via SDResult#758
paddymul merged 4 commits into
mainfrom
feat/polars-search-highlight-sd-rebased

paddymul commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Uh oh!

paddymul May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paddymul commented May 17, 2026

Summary

Supporting changes

Test plan

Tests added

Uh oh!

github-actions Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 TestPyPI package published

MCP server for Claude Code

📖 Docs preview

🎨 Storybook preview

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

paddymul May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 17, 2026 •

edited

Loading