Datasets Creating

Gen 3: This is a legacy Gen 2 article. For current GT AI OS 3.0 guidance, see gen3/datasets/uploading.

Creating Datasets

This guide walks you through creating and setting up datasets in GT AI OS step by step.

Prerequisites

You need appropriate permissions to create datasets
Have documents ready to upload

Creating a New Dataset

Step 1: Access Dataset Creation

Go to Datasets in the sidebar
Click Create Dataset
The creation form opens

Step 2: Basic Information

Fill in the dataset details:

Field	Required	Description	Example
Name	Yes	Clear, descriptive name	"Q3 2024 Sales Reports"
Description	Yes	What this dataset contains	"Quarterly sales data and analysis for North America region"
Category	Yes	Category for organization	Create or select existing
Tags	No	Keywords for filtering	"sales", "q3", "2024", "reports"

Using Categories

Categories help organize datasets across your organization:

Select from existing categories or create a new one
Categories appear as filter options on the Datasets page
Common categories: Legal, Engineering, Sales, Marketing, HR, Operations

Tip: Coordinate with your team on category names to keep things consistent.

Using Tags

Tags provide additional ways to find datasets:

Enter keywords relevant to the content
Press Enter or comma to add each tag
Tags are searchable and filterable
Use multiple tags to improve discoverability

Tag examples:

Content type: "reports", "policies", "procedures", "templates"
Time period: "q1-2024", "annual", "monthly"
Department: "engineering", "sales", "legal"
Status: "current", "archived", "draft"

Step 3: Visibility Settings

Choose who can use this dataset:

Visibility	Who Can Access	Best For
Individual	Only you	Personal knowledge bases
Team	Selected team members	Team-specific resources
Organization	All users in your organization	Company-wide knowledge

Note: Only Tenant Admins can set Organization visibility. Once set, all users can access the dataset.

When selecting Team visibility:

Select which teams should have access
Team members will see the dataset in their list

Step 4: Vision Model (Optional) (Added in 2.0.36)

Configure a vision model to enable AI-powered image analysis for documents in this dataset.

Find the Vision Model dropdown (marked with a purple eye icon)
A default vision model is pre-selected automatically
To disable image analysis, select None - Images will not be analyzed

Option	Behavior
Vision model selected	Images in uploaded documents (PDFs, standalone images) are analyzed by AI. Descriptions are generated and made searchable via RAG.
None	Images are not analyzed. Standard text extraction only.

How it works: When a vision model is configured, the system extracts images from your documents (including embedded images in PDFs), sends them to the vision model for analysis, and stores the generated descriptions as searchable chunks — making your visual content discoverable through natural language queries.

Tip: Leave the default vision model selected unless you have a specific reason to disable image analysis.

Step 5: Create the Dataset

Review your settings
Click Create Dataset
The dataset is created (empty)
You're taken to the dataset detail view

Adding Documents

Step 6: Upload Your Documents

After creating the dataset:

Click Upload Documents or drag files into the upload area
Select files to upload

Supported file types:

Type	Extension	Notes
PDF	.pdf	Text is extracted automatically
Word	.docx	Modern Word format
Text	.txt	Plain text files
Markdown	.md	Markdown files
CSV	.csv	Tabular data
JSON	.json	Structured data
Image	.png, .jpg, .jpeg	Requires vision model for analysis

Limits:

Maximum 50MB per file

Tip: For Excel files (.xlsx), convert to CSV for best results.

Step 7: Wait for Processing

When you upload documents:

Uploading: Files are transferred to the server
Processing: Text is extracted from files
Chunking: Text is split into searchable segments
Embedding: Each chunk is converted to a vector
Image Analysis (if vision model configured): Images are described by AI
Complete: Documents are ready for use

Processing time depends on document size and complexity. You can continue working while documents process.

Step 8: Verify Upload

After processing completes:

Check that all documents appear in the list
Verify document counts and sizes
Note the chunk count (segments created)

Using Your Dataset

Attach to Agents

To give an agent permanent access to your dataset:

Edit the agent (or create a new one)
Scroll to the Dataset Selection section
Search for your dataset by name
Check the box next to your dataset
Save the agent

Use During Chat

Add datasets to any conversation:

In the chat interface, click the paperclip icon
Select the dataset to reference
The agent can now search that dataset for the conversation

Best Practices

Naming Conventions

Use clear, descriptive names:

Good: "Legal - Contract Templates 2024"
Good: "Engineering - API Documentation"
Avoid: "Documents" or "Stuff" or "New Dataset"

Content Planning

Before creating:

Identify what documents you'll include
Consider who needs access
Plan the category and tags
Keep datasets focused on specific topics

Quality Tips

Tip	Why
Use focused datasets	Better retrieval than one large dataset
Keep documents updated	Remove outdated information
Include diverse content	Cover different aspects of a topic
Avoid duplicates	Same content in multiple documents hurts quality

Managing Your Dataset

Adding More Documents

To add documents to an existing dataset:

Open the dataset
Click Upload Documents
Select new files
Wait for processing

Removing Documents

To remove a document:

Open the dataset
Find the document in the list
Click the delete icon
Confirm removal

Note: Removing documents triggers re-indexing.

Editing Dataset Info

To update name, description, category, or tags:

Open the dataset
Click Edit
Make your changes
Save

Next Steps

After creating your dataset:

Add and manage documents
Share with your team
Attach to agents to provide knowledge

GT AI OS Instructions

Home

Self-Hosted deployment

Uh oh!

Datasets Creating

Creating Datasets

Prerequisites

Creating a New Dataset

Step 1: Access Dataset Creation

Step 2: Basic Information

Using Categories

Using Tags

Step 3: Visibility Settings

Step 4: Vision Model (Optional) (Added in 2.0.36)

Step 5: Create the Dataset

Adding Documents

Step 6: Upload Your Documents

Step 7: Wait for Processing

Step 8: Verify Upload

Using Your Dataset

Attach to Agents

Use During Chat

Best Practices

Naming Conventions

Content Planning

Quality Tips

Managing Your Dataset

Adding More Documents

Removing Documents

Editing Dataset Info

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!