Skip to content

feat(pandas-search): plumb search term to JS as highlight_phrase via SDResult#764

Merged
paddymul merged 3 commits into
mainfrom
feat/pandas-search-highlight-sd
May 18, 2026
Merged

feat(pandas-search): plumb search term to JS as highlight_phrase via SDResult#764
paddymul merged 3 commits into
mainfrom
feat/pandas-search-highlight-sd

Conversation

@paddymul
Copy link
Copy Markdown
Collaborator

Summary

Mirrors #758 for the pandas backend. Pandas Search.transform now returns SDResult(filtered_df, sd_updates) so the search term flows into cleaning_sd as highlight_phrase on every string/object column. The existing DefaultMainStyling.style_column reader (landed in #758) threads it into the string displayer_args, where the JS side already renders matches as <mark>.

Uses highlight_phrase (list of literal needles) rather than the highlight_regex variant polars emits because search_df_str uses Series.str.find — a literal substring match. Matching the filter semantics on the highlight side avoids the case where a search term containing regex metacharacters would filter on literal text but try to highlight as a regex.

String-column detection mirrors search_df_str exactly: union of select_dtypes("string") and select_dtypes("object") columns.

Test plan

  • pytest tests/unit/dataflow/autocleaning_pd_test.py tests/unit/dataflow/customizable_dataflow_test.py tests/unit/commands/pandas_commands_test.py — passes locally
  • Full unit suite (excluding contrib/file_cache + an unrelated standalone.js env failure): 834 passed
  • CI green

Tests added

  • test_search_threads_highlight_phrase_into_cleaning_sd_under_rename (autocleaning_pd_test.py) — unit-level wiring of pandas Search → SDResult → _rekey_op_sd_to_internal. Asserts the orig-named entry (businessname) is rekeyed onto its internal letter (a) and the integer column (b/rating) gets no highlight.
  • test_search_op_delivers_highlight_phrase_into_displayer_args (customizable_dataflow_test.py) — end-to-end through BuckarooInfiniteWidget with NoCleaningConf. Sets an operation on the dataflow and asserts highlight_phrase == ['area'] lands in displayer_args for both string columns, skipping the numeric column.

TDD: failing-tests commit was pushed first; CI run on that commit will be visible failing before the implementation commit lands.

🤖 Generated with Claude Code

Pins the pandas equivalent of #758 polars Search → SDResult, but using
`highlight_phrase` (list) rather than `highlight_regex` (string) — pandas
search_df_str uses literal `Series.str.find`, so a phrase match on the
JS side matches the actual filter semantics.

- tests/unit/dataflow/autocleaning_pd_test.py: unit-level — Search
  contributes `highlight_phrase` keyed by the renamed (a/b) column under
  PandasAutocleaning, with the rekey running over `cleaning_sd` so the
  orig-named entry merges into the internal letter key.
- tests/unit/dataflow/customizable_dataflow_test.py: end-to-end through
  BuckarooInfiniteWidget with NoCleaningConf — a `search` op should land
  `highlight_phrase` in `displayer_args` for each string column and skip
  the numeric column.

Both fail today: pandas `Search.transform` still returns a bare df.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a35323cd6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

_cleaned, cleaning_sd, _gen, _ops = ac.handle_ops_and_clean(
df, cleaning_method='', quick_command_args={}, existing_operations=[search_op])

assert cleaning_sd.get('a', {}).get('highlight_phrase') == ['pizza']
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include the pandas Search SDResult implementation

This assertion is unreachable with the code in this commit because only tests changed: Search.transform in buckaroo/customizations/pandas_commands.py still returns the bare search_df_str(df, val) result, so no SDResult metadata is merged and cleaning_sd remains empty. I verified the new tests with .venv/bin/python -m pytest tests/unit/dataflow/autocleaning_pd_test.py::test_search_threads_highlight_phrase_into_cleaning_sd_under_rename tests/unit/dataflow/customizable_dataflow_test.py::test_search_op_delivers_highlight_phrase_into_displayer_args -q; both fail, blocking CI until the pandas Search implementation is added.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 17, 2026

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.2.dev25999757528

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.2.dev25999757528

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.14.2.dev25999757528" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

📖 Docs preview

🎨 Storybook preview

…SDResult

Mirrors #758 for the pandas backend. Pandas `Search.transform` now
returns `SDResult(filtered_df, sd_updates)` so the search term flows
into `cleaning_sd` as `highlight_phrase` on every string/object column.
Together with the existing `style_column` reader (added in #758) the
phrase lands in the string `displayer_args`, where the JS-side
displayer already renders matches as `<mark>`.

Uses `highlight_phrase` (list of literal needles) rather than the
`highlight_regex` (single regex string) variant polars emits because
`search_df_str` uses `Series.str.find` — a literal substring match.
Matching the filter semantics on the highlight side avoids the case
where a search term containing regex metacharacters would filter on
literal text but try to highlight as a regex.

The string-column detection mirrors `search_df_str`: union of
`select_dtypes("string")` and `select_dtypes("object")` columns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ruff F401 on CI. SDResult was imported speculatively for the failing
test but never used (only Search is referenced directly).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@paddymul paddymul added this pull request to the merge queue May 18, 2026
Merged via the queue into main with commit 8100093 May 18, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant