-
Notifications
You must be signed in to change notification settings - Fork 0
Gen3 Datasets Managing
- Open Datasets and open a dataset's documents view.
- Review the documents table for status, preprocessing stages, errors, and duplicates.
- Use View summary to open the floating document summary panel for a selected file.
- Remove or replace stale files before major reviews; use bulk reprocess/delete/export when many rows need the same action.
- Re-share or adjust group access when membership changes.
Managing content keeps retrieval trustworthy—operators trust answers only when the underlying document set is current.
Dataset management in active Gen 3 stays inside the datasets workspace. Use the dataset documents view and the dataset edit form together to keep the retrieval set accurate, current, and appropriately shared.
- review which documents are in the dataset
- remove outdated files
- upload replacement files
- confirm import results
- review sharing posture
- verify retrieval defaults such as embedding model and chunking strategy
Open the dataset documents view when you need to inspect the files inside one dataset. This is the right place to confirm whether ingestion succeeded and whether old material should be removed.
When you can mutate the dataset, select rows (or all visible rows) to run bulk Reprocess, Delete, or Export JSON from the documents modal bulk bar.
View summary on a document row opens a draggable, resizable floating panel with the generated summary text. Use it to skim long corpora without leaving the documents table.
While ingestion runs, the progress column shows human-readable stages such as Queued for processing, Preparing document, Reading document, Analyzing images, Transcribing audio, Generating embeddings, Generating summary, and Finalizing. Wait for Ready (or resolve Processing failed) before relying on retrieval in GT Chat.
Return to the dataset edit workflow when you need to change:
- name or description
- access posture
- group sharing
- retrieval defaults
- other metadata that affects how the dataset should be used
- Open the dataset.
- Review the current document list.
- Remove outdated or incorrect material.
- Upload or import replacement content.
- Re-check the sharing posture.
- Validate the result in GT Chat or in any agent that depends on the dataset.
If the problem is bad or missing source material, fix the dataset. If the problem is how the agent uses otherwise-correct source material, then review the agent instead. Separating those two causes prevents unnecessary agent churn.
- Keep one dataset aligned to one coherent body of source material.
- Remove stale files promptly so retrieval does not mix old and new guidance.
- Revisit group sharing after imports, because imported content may deserve a narrower or broader audience.
- Test the real downstream workflow after major changes.