-
Notifications
You must be signed in to change notification settings - Fork 0
Datasets Managing Documents
GT AI OS Release edited this page Jun 11, 2026
·
3 revisions
Gen 3: This is a legacy Gen 2 article. For current GT AI OS 3.0 guidance, see gen3/datasets/managing.
This guide covers how to add, manage, and organize documents within datasets.
- Open the dataset details page
- Click Upload Documents or drag files to the upload area
- Select files from your computer
- Wait for processing to complete
| Type | Extensions | Notes |
|---|---|---|
| Best for formatted documents | ||
| Word | .docx | Standard office documents |
| Text | .txt | Plain text files |
| Markdown | .md | Documentation, notes |
| CSV | .csv | Tabular data |
| JSON | .json | Structured data |
- Maximum file size: 50MB per file
- Large files may take longer to process
When you upload a document:
- Extraction: Text is extracted from the file
- Chunking: Text is split into searchable segments
- Embedding: Each chunk is converted to a vector
- Indexing: Vectors are stored for fast retrieval
Documents show their status:
- Pending: Waiting to upload
- Uploading: File transfer in progress
- Processing: Being analyzed and embedded
- Completed: Available for use
- Failed: Error during processing
Click on a document to see:
- File name and type
- Upload date
- Processing status
- Chunk and vector counts
To delete a document:
- Find it in the document list
- Click the delete icon
- Confirm removal
Note: Removal triggers reindexing of the dataset.
Documents are managed individually within a dataset. For bulk operations at the dataset level, use the Export and Import features on the Datasets page.
Content quality:
- Use clear, well-structured text
- Avoid scanned images (OCR limitations)
- Include context and definitions
File quality:
- Use native text formats when possible
- Ensure files aren't corrupted
- Keep file sizes reasonable
Help agents find relevant content:
- Use descriptive headings
- Include key terms naturally
- Organize content logically
If processing gets stuck:
- Check file format is supported
- Verify file isn't corrupted
- Try uploading a smaller version
- Contact support if issue persists
If expected content isn't retrieved:
- Verify document processed successfully
- Check content isn't in images (not searchable)
- Try searching with exact phrases
- Consider document chunking settings
If search results aren't relevant:
- Review document content quality
- Consider splitting large documents
- Update chunking configuration
- Add more context to documents