-
Notifications
You must be signed in to change notification settings - Fork 0
Datasets Creating
Gen 3: This is a legacy Gen 2 article. For current GT AI OS 3.0 guidance, see gen3/datasets/uploading.
This guide walks you through creating and setting up datasets in GT AI OS step by step.
- You need appropriate permissions to create datasets
- Have documents ready to upload
- Go to Datasets in the sidebar
- Click Create Dataset
- The creation form opens
Fill in the dataset details:
| Field | Required | Description | Example |
|---|---|---|---|
| Name | Yes | Clear, descriptive name | "Q3 2024 Sales Reports" |
| Description | Yes | What this dataset contains | "Quarterly sales data and analysis for North America region" |
| Category | Yes | Category for organization | Create or select existing |
| Tags | No | Keywords for filtering | "sales", "q3", "2024", "reports" |
Categories help organize datasets across your organization:
- Select from existing categories or create a new one
- Categories appear as filter options on the Datasets page
- Common categories: Legal, Engineering, Sales, Marketing, HR, Operations
Tip: Coordinate with your team on category names to keep things consistent.
Tags provide additional ways to find datasets:
- Enter keywords relevant to the content
- Press Enter or comma to add each tag
- Tags are searchable and filterable
- Use multiple tags to improve discoverability
Tag examples:
- Content type: "reports", "policies", "procedures", "templates"
- Time period: "q1-2024", "annual", "monthly"
- Department: "engineering", "sales", "legal"
- Status: "current", "archived", "draft"
Choose who can use this dataset:
| Visibility | Who Can Access | Best For |
|---|---|---|
| Individual | Only you | Personal knowledge bases |
| Team | Selected team members | Team-specific resources |
| Organization | All users in your organization | Company-wide knowledge |
Note: Only Tenant Admins can set Organization visibility. Once set, all users can access the dataset.
When selecting Team visibility:
- Select which teams should have access
- Team members will see the dataset in their list
Configure a vision model to enable AI-powered image analysis for documents in this dataset.
- Find the Vision Model dropdown (marked with a purple eye icon)
- A default vision model is pre-selected automatically
- To disable image analysis, select None - Images will not be analyzed
| Option | Behavior |
|---|---|
| Vision model selected | Images in uploaded documents (PDFs, standalone images) are analyzed by AI. Descriptions are generated and made searchable via RAG. |
| None | Images are not analyzed. Standard text extraction only. |
How it works: When a vision model is configured, the system extracts images from your documents (including embedded images in PDFs), sends them to the vision model for analysis, and stores the generated descriptions as searchable chunks — making your visual content discoverable through natural language queries.
Tip: Leave the default vision model selected unless you have a specific reason to disable image analysis.
- Review your settings
- Click Create Dataset
- The dataset is created (empty)
- You're taken to the dataset detail view
After creating the dataset:
- Click Upload Documents or drag files into the upload area
- Select files to upload
Supported file types:
| Type | Extension | Notes |
|---|---|---|
| Text is extracted automatically | ||
| Word | .docx | Modern Word format |
| Text | .txt | Plain text files |
| Markdown | .md | Markdown files |
| CSV | .csv | Tabular data |
| JSON | .json | Structured data |
| Image | .png, .jpg, .jpeg | Requires vision model for analysis |
Limits:
- Maximum 50MB per file
Tip: For Excel files (.xlsx), convert to CSV for best results.
When you upload documents:
- Uploading: Files are transferred to the server
- Processing: Text is extracted from files
- Chunking: Text is split into searchable segments
- Embedding: Each chunk is converted to a vector
- Image Analysis (if vision model configured): Images are described by AI
- Complete: Documents are ready for use
Processing time depends on document size and complexity. You can continue working while documents process.
After processing completes:
- Check that all documents appear in the list
- Verify document counts and sizes
- Note the chunk count (segments created)
To give an agent permanent access to your dataset:
- Edit the agent (or create a new one)
- Scroll to the Dataset Selection section
- Search for your dataset by name
- Check the box next to your dataset
- Save the agent
Add datasets to any conversation:
- In the chat interface, click the paperclip icon
- Select the dataset to reference
- The agent can now search that dataset for the conversation
Use clear, descriptive names:
- Good: "Legal - Contract Templates 2024"
- Good: "Engineering - API Documentation"
- Avoid: "Documents" or "Stuff" or "New Dataset"
Before creating:
- Identify what documents you'll include
- Consider who needs access
- Plan the category and tags
- Keep datasets focused on specific topics
| Tip | Why |
|---|---|
| Use focused datasets | Better retrieval than one large dataset |
| Keep documents updated | Remove outdated information |
| Include diverse content | Cover different aspects of a topic |
| Avoid duplicates | Same content in multiple documents hurts quality |
To add documents to an existing dataset:
- Open the dataset
- Click Upload Documents
- Select new files
- Wait for processing
To remove a document:
- Open the dataset
- Find the document in the list
- Click the delete icon
- Confirm removal
Note: Removing documents triggers re-indexing.
To update name, description, category, or tags:
- Open the dataset
- Click Edit
- Make your changes
- Save
After creating your dataset:
- Add and manage documents
- Share with your team
- Attach to agents to provide knowledge