Add visual search image tooling plugin#8
Draft
eric-tramel wants to merge 1 commit intomainfrom
Draft
Conversation
3d4fc7e to
31506ef
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
data-designer-visual-search, a self-contained Data Designer plugin that registers a newvisual-searchcustom column type for VLM-driven image inspection.The column accepts an image input column plus a prompt and model alias, then exposes a small local image-tool interface to the model:
open_image: open the row image and return the rootimage_idget_image_info: inspect dimensions and lineage metadata for an image IDlist_images: list the current image treecrop_image: create a pixel or percentage crop from any previous image IDtransform_image: rotate, flip, or resize any previous image IDedit_color: adjust brightness, contrast, saturation, sharpness, grayscale, or inversionEach tool-created image is stored in a row-local in-memory workspace and receives an ID such as
img_0001. The column re-attaches tool-created images to the next model turn, so the model can crop or edit an image, inspect the result, branch from an earlier image ID, and continue reasoning. The main output column contains the final model answer, and{column_name}__image_historyrecords the operation tree by default.This PR also adds plugin-owned documentation under
plugins/data-designer-visual-search/docs/and generated site docs underdocs/plugins/data-designer-visual-search/.Why
Users want visual-search workflows where a model can zoom into regions, improve contrast, compare multiple crops, and recover from a bad crop by rewinding to an earlier image. A normal multimodal column can pass the initial image, but successive tool-generated image content needs extra plumbing: the output image bytes from a local tool must become model input on the next turn, while preserving IDs and lineage so the model can decide what to operate on next.
This plugin owns that column-specific loop instead of pushing image state management into generic Data Designer behavior.
Usage
The resulting dataset includes:
visual_answer: the final VLM answer.visual_answer__image_history: metadata for the image operation tree, includingimage_id, parent/child IDs, operation name, dimensions, and operation parameters.Practical branching example:
The second crop branches from
img_0000instead of continuing from the edited first crop, which lets the model compare independent regions of the same source image.Useful configuration options include:
allowed_tools: restrict the model to a subset of the built-in tools.image_data_typeandimage_format: handle explicit URL or base64 image inputs.image_placeholder: add a model-specific image token for endpoints that require text placeholders for attached images.attach_images_after_tool_calls: control whether tool output images are sent back into model context.with_traceandextract_reasoning_content: capture side-effect trace columns for debugging.How
VisualSearchColumnConfigdefines the public column interface and side-effect columns.VisualImageWorkspacekeeps the per-row in-memory image tree and exposes image IDs, data URI conversion, and history metadata.VisualSearchToolExecutorprovides OpenAI-compatible function schemas and executes the Pillow-backed local image tools.VisualSearchColumnGeneratorruns the chat/tool loop directly so it can append local tool results and attach generated image content blocks to the next model request.make plugin-docs/make all.Intermediate images are intentionally kept in memory per row and are not persisted as dataset artifacts. The persisted side-effect output is metadata describing the image tree, not image bytes.
Validation
make syncmake lintmake test-plugin PLUGIN=data-designer-visual-searchmake allhttps://integrate.api.nvidia.com/v1usingqwen/qwen3.5-122b-a10b: one generated image row, the model calledcrop_image, the crop was re-attached as the next model image input, and the final answer correctly identified the crop color.