-
Notifications
You must be signed in to change notification settings - Fork 0
Datasets Vision Models
Vision models add AI-powered image understanding to your datasets. When a vision model is selected on a dataset, uploaded images are automatically analyzed to generate searchable descriptions — making your image content accessible through conversations with agents.
When you upload images to a dataset with a vision model configured:
- Image analysis — The vision model examines each image and generates a detailed text description covering objects, text, layout, and visual elements. Images embedded in PDFs are automatically extracted and analyzed individually as well.
- Embedding — The description is indexed as searchable content in your dataset
- RAG search — Agents can find and reference your images when answering questions
- On-demand deep analysis — During a conversation, agents will refer to the source image for a closer, more targeted look based on your prompt.
Without a vision model selected, uploaded images are stored with basic metadata only (filename, dimensions, file size) and will not be searchable by content.
- Go to Datasets in the sidebar
- Click Create Dataset
- Fill in the name, description, and category
- In the Vision Model dropdown, select a model (Llama 4 Maverick is selected by default)
- Click Create Dataset
- Click Upload Documents button and select the images you want to upload
- Open the dataset details page
- Click Edit
- Find the Vision Model dropdown (marked with a purple eye icon)
- Select a vision model from the list
- Click Update Dataset
- Delete pre existing files that require image processing
Important: Changing the vision model affects future uploads only. Images already in the dataset keep their existing descriptions. To re-analyze existing images with a new vision model, re-upload them.
| Format | Extension | Notes |
|---|---|---|
| PNG | .png | Best for screenshots, diagrams, text-heavy images |
| JPEG | .jpg, .jpeg | Best for photographs |
| WebP | .webp | Modern format, good compression |
Images are supported alongside your usual text document file formats (PDF, Word, text, etc.). When you upload a PDF that contains embedded images, those images are automatically extracted and analyzed by the vision model — so your visual content is searchable even when it's inside a document.
- Open your dataset (must have a vision model selected)
- Click Upload Documents or drag files into the upload area
- Select your image files — PNG, JPG, and WebP are all supported
- Wait for processing to complete — the vision model will analyze each image
- Images appear in the dataset details alongside your other documents
- In any conversation, click the paperclip icon to open the dataset panel
- At the top of the panel, you'll see a drag-and-drop area — drop your images there (or click to browse)
- Choose whether to upload into an existing dataset or create a new one
- Click Upload — the files start processing and the dataset is automatically attached to your conversation
Note: You can rename a dataset that is created during the file upload process in the Datasets menu.
Once images are uploaded to a vision-enabled dataset:
- Start or continue a conversation with any agent that is attached to the vision-enabled dataset
- For example: click the paperclip icon and check the box next to a dataset that contains images
- Send your message — the agent will now reference both your documents and image files that are included in your vision enabled dataset
Important: Your dataset must have a Vision Model selected for images to be analyzed. Without a vision model configured, uploaded images won't be processed for content.
| Capability | Example |
|---|---|
| Describing scenes and objects | "A bar chart showing Q3 revenue by region" |
| Identifying text in images | Reading labels, signs, document text |
| Recognizing visual elements | Logos, icons, UI components, charts, diagrams |
| Categorizing content | Photos vs. screenshots vs. diagrams |
| Summarizing visual information | Key takeaways from infographics or dashboards |
| Limitation | Details |
|---|---|
| Not pixel-perfect | May miss fine details, small text, or low-contrast elements |
| Interpretation varies by model | Different vision models may describe the same image differently |
| Quality depends on input | Clear, well-lit, high-resolution images with clearly defined borders or image outlines produce better results |
| Not a replacement for human review | Always verify critical details — treat AI descriptions as a helpful starting point |
The vision models available on your instance depend on what your Super Admin has provisioned. Other options may include models from various providers — check with your Tenant Admin to request additional models.
Tip: Different models may produce different quality results for different types of images. Try multiple models to see which works best for your use case.
| Tip | Why |
|---|---|
| Start with a small test dataset | Upload a few sample images to evaluate vision model results before committing to large batches |
| Try different vision models | Switch models on your dataset (Edit → Vision Model dropdown) and re-upload to compare description quality |
| Use high-quality images | Clear, well-lit, high-resolution images produce significantly better descriptions |
| Experiment with file types | PNG for screenshots and text-heavy images, JPEG for photos, WebP for web content |
| Search to check descriptions | After uploading, ask an Agent to search your dataset to see what the vision model detected after initial upload |
| Combine images with documents | Mix connected images and text documents in a dataset for richer, multi-modal context |
| Chat with a Vision Agent | Upload files to a dataset and use an agent (such as the Vision Chat Agent template below) that's optimized for image Q&A |
Get started quickly with a pre-configured agent optimized for chatting with datasets that contain images.
This agent is designed to help you analyze, discuss, and ask questions about images in your datasets.
Click the below link, and once on the CSV file's github page, click the "Download Raw File" button on the top right of the page to download the Agent Template CSV file.
Download Vision Chat Agent Template
- Download the Agent Template CSV file using the link above
- Go to Agents in the sidebar
- Click Import in the top right
- Select the downloaded CSV file
- Review the agent configuration
- Click Import to create the agent
- The agent will appear on the Agent Configuration list and can be added to your Favorites list by clicking on the Add Favorites button
- Start a new conversation with the Vision Chat Agent
- Click the paperclip icon and select a dataset that contains images
- Ask questions about your images per the examples listed below
a. "What do the images in this dataset show?"
b. "Is there any text visible in these images?"
c. "Describe the diagram in the uploaded screenshot"
d. "Compare the images and summarize key differences"
See Agent Templates for more details on importing and customizing templates.
- Check vision model selection: Open the dataset → click Edit → verify a vision model is selected in the dropdown
- Verify image format: Only PNG, JPG, and WebP are supported
- Ensure image upload processing is complete: Image analysis can commence after upload and processing is fully completed
- Use more specific instructions in your user prompts when requesting the Agent analyze your desired images — such as saying what specifically you want it to find in the images, or for what purpose you want the images to be analyzed
- Use higher resolution images with good lighting and contrast
- Try a different vision model (edit dataset → change model → re-upload images)
- Simple, focused images with clear subjects produce better results than busy or cluttered images
- Verify the dataset is attached to the conversation (click the paperclip icon)
- Confirm images have been processed (check the document list in dataset details)
- Try more specific search queries related to what the vision model would describe
- Remember: without a vision model selected, images have only basic file metadata
- Creating Datasets — Full guide to dataset setup
- Managing Documents — Add and organize files in datasets
- Using Datasets in Chat — Attach datasets to conversations
- Agent Templates — Browse all available agent templates