Add visual search image tooling plugin by eric-tramel · Pull Request #8 · NVIDIA-NeMo/DataDesignerPlugins

eric-tramel · 2026-05-05T21:51:03Z

What

Adds data-designer-visual-search, a self-contained Data Designer plugin that registers a new visual-search custom column type for VLM-driven image inspection.

The column accepts an image input column plus a prompt and model alias, then exposes a small local image-tool interface to the model:

open_image: open the row image and return the root image_id
get_image_info: inspect dimensions and lineage metadata for an image ID
list_images: list the current image tree
crop_image: create a pixel or percentage crop from any previous image ID
transform_image: rotate, flip, or resize any previous image ID
edit_color: adjust brightness, contrast, saturation, sharpness, grayscale, or inversion

Each tool-created image is stored in a row-local in-memory workspace and receives an ID such as img_0001. The column re-attaches tool-created images to the next model turn, so the model can crop or edit an image, inspect the result, branch from an earlier image ID, and continue reasoning. The main output column contains the final model answer, and {column_name}__image_history records the operation tree by default.

This PR also adds plugin-owned documentation under plugins/data-designer-visual-search/docs/ and generated site docs under docs/plugins/data-designer-visual-search/.

Why

Users want visual-search workflows where a model can zoom into regions, improve contrast, compare multiple crops, and recover from a bad crop by rewinding to an earlier image. A normal multimodal column can pass the initial image, but successive tool-generated image content needs extra plumbing: the output image bytes from a local tool must become model input on the next turn, while preserving IDs and lineage so the model can decide what to operate on next.

This plugin owns that column-specific loop instead of pushing image state management into generic Data Designer behavior.

Usage

import pandas as pd

from data_designer.config.config_builder import DataDesignerConfigBuilder
from data_designer.config.models import ChatCompletionInferenceParams, ModelConfig, ModelProvider
from data_designer.config.seed_source_dataframe import DataFrameSeedSource
from data_designer.interface.data_designer import DataDesigner

seed_df = pd.DataFrame(
    {
        "image_path": ["/path/to/store-shelf.png"],
        "target": ["the nutrition label on the cereal box"],
    }
)

provider = ModelProvider(
    name="nvidia",
    endpoint="https://integrate.api.nvidia.com/v1",
    api_key="NVIDIA_API_KEY",
    provider_type="openai",
)

vision_model = ModelConfig(
    alias="vision",
    model="qwen/qwen3.5-122b-a10b",
    provider="nvidia",
    inference_parameters=ChatCompletionInferenceParams(
        temperature=0,
        max_tokens=512,
        timeout=60,
    ),
)

builder = DataDesignerConfigBuilder(model_configs=[vision_model])
builder.with_seed_dataset(DataFrameSeedSource(df=seed_df))
builder.add_column(
    name="visual_answer",
    column_type="visual-search",
    image_column="image_path",
    prompt=(
        "Find {{ target }}. Use crop_image or edit_color if that helps. "
        "Return the text you can read and explain which image_id you used."
    ),
    model_alias="vision",
    max_tool_call_turns=4,
)

result = DataDesigner(
    artifact_path="artifacts",
    model_providers=[provider],
).preview(builder, num_records=1)

The resulting dataset includes:

visual_answer: the final VLM answer.
visual_answer__image_history: metadata for the image operation tree, including image_id, parent/child IDs, operation name, dimensions, and operation parameters.

Practical branching example:

open_image() -> img_0000
crop_image(image_id="img_0000", x=0, y=0, width=50, height=50, unit="percent") -> img_0001
edit_color(image_id="img_0001", contrast=1.5) -> img_0002
crop_image(image_id="img_0000", x=50, y=50, width=50, height=50, unit="percent") -> img_0003

The second crop branches from img_0000 instead of continuing from the edited first crop, which lets the model compare independent regions of the same source image.

Useful configuration options include:

allowed_tools: restrict the model to a subset of the built-in tools.
image_data_type and image_format: handle explicit URL or base64 image inputs.
image_placeholder: add a model-specific image token for endpoints that require text placeholders for attached images.
attach_images_after_tool_calls: control whether tool output images are sent back into model context.
with_trace and extract_reasoning_content: capture side-effect trace columns for debugging.

How

VisualSearchColumnConfig defines the public column interface and side-effect columns.
VisualImageWorkspace keeps the per-row in-memory image tree and exposes image IDs, data URI conversion, and history metadata.
VisualSearchToolExecutor provides OpenAI-compatible function schemas and executes the Pillow-backed local image tools.
VisualSearchColumnGenerator runs the chat/tool loop directly so it can append local tool results and attach generated image content blocks to the next model request.
The plugin documentation is authored in the plugin package and generated into the repository docs site with make plugin-docs / make all.

Intermediate images are intentionally kept in memory per row and are not persisted as dataset artifacts. The persisted side-effect output is metadata describing the image tree, not image bytes.

Validation

make sync
make lint
make test-plugin PLUGIN=data-designer-visual-search
make all
Live smoke test against https://integrate.api.nvidia.com/v1 using qwen/qwen3.5-122b-a10b: one generated image row, the model called crop_image, the crop was re-attached as the next model image input, and the final answer correctly identified the crop color.

feat: add visual search plugin

31506ef

eric-tramel added the 🎊 new plugin label May 5, 2026

eric-tramel self-assigned this May 5, 2026

eric-tramel force-pushed the codex/visual-search-plugin branch from 3d4fc7e to 31506ef Compare May 5, 2026 22:01

eric-tramel changed the title ~~[codex] add visual search plugin~~ Add visual search image tooling plugin May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add visual search image tooling plugin#8

Add visual search image tooling plugin#8
eric-tramel wants to merge 1 commit intomainfrom
codex/visual-search-plugin

eric-tramel commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eric-tramel commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Usage

How

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eric-tramel commented May 5, 2026 •

edited

Loading