-
Notifications
You must be signed in to change notification settings - Fork 0
Datasets
Gen 3: This is a legacy Gen 2 article. For current GT AI OS 3.0 guidance, see gen3/datasets.
Datasets are collections of documents that provide knowledge to AI agents. This guide helps you create, manage, and share datasets in GT AI OS.
- Creating Datasets - Upload documents and configure your dataset
- Managing Documents - Add, remove, and organize documents
- Sharing Datasets - Share with teams and export/import
- Vision Models - Use AI vision to analyze and search images
A dataset is a collection of documents that can be:
- Attached to agents to give them specialized knowledge
- Used in conversations to provide context for your questions
- Shared with team members for collaborative work
When you attach a dataset to an agent or conversation, the agent can search your documents and use relevant information in its responses.
- Upload documents (PDF, Word, text files, images, etc.)
- Processing happens automatically - text is extracted, split into chunks, and indexed
- Images are analyzed (optional) - if a vision model is configured, images are described by AI and made searchable
- Attach to agents or conversations - the agent can now search your content
- Ask questions - relevant information is retrieved and used in responses
- Go to Datasets in the sidebar
- Click Create Dataset
- Enter a name and description
- Upload your documents
- Wait for processing to complete
See Creating Datasets for a detailed guide.
You can add datasets to a conversation at any time:
- Click the Paperclip icon in the chat interface
- Select existing datasets or upload new files
- The agent can now search your content
Give an agent permanent access to your documents:
- Edit the agent
- In the Datasets section, select datasets to attach
- Save the agent
Create categories to group related datasets (e.g., Legal, Engineering, Sales). Filter by category on the Datasets page.
Add tags for additional filtering. Use consistent tags across your organization for easy discovery.
Use clear, descriptive names:
- Good: "Q3 2024 Sales Reports"
- Good: "Engineering - API Documentation"
- Avoid: "Documents" or "New Dataset"
- Keep datasets focused - smaller, topic-specific datasets often work better than large ones
- Update regularly - remove outdated documents and add new ones
- Test retrieval - ask questions to verify the agent finds relevant content
- Share thoughtfully - make datasets available to those who need them