-
Notifications
You must be signed in to change notification settings - Fork 0
Gen3 Admin Contextual Memory
- Open Contextual Memory in the Control Panel sidebar (Super Admin only).
- Enable the platform layer, select embedding and summary models, and set TTL and auto-inject options as needed.
- Click Save Contextual Memory settings before expecting tenant chat or GT Helper to index or recall prior messages.
- After changing the embedding model, use Force re-embed all to rebuild stored embeddings and summaries (review the cost estimate in the confirmation dialog).
- Monitor pipeline jobs on this page and in tenant Observability → Contextual Memory; configure per-agent defaults in Building Agents.

Contextual Memory is GT AI OS’s three-layer recall system for chat and helper threads. The Control Panel page controls the platform layer only: indexing, embedding, summarization, optional auto-inject into helper context, and deployment-wide pipeline operations.
When the platform layer is disabled or missing an embedding model, the brain control in GT Chat is hidden and memory tools are unavailable—even if an agent default would otherwise enable recall. Tenant operators still choose per-conversation scope in chat; agents set defaults for new conversations in Building Agents.
| Layer | Where configured | What it controls |
|---|---|---|
| 1. Platform | Control Panel Contextual Memory (this page) | Indexing, embeddings, summaries, TTL, auto-inject, force re-embed |
| 2. Agent default | Tenant Building Agents → agent configuration | Default recall mode for new conversations (this_conversation, this_agent, all_agents) |
| 3. Per-conversation | GT Chat brain control in the composer | User toggles memory on/off and recall scope for that thread |
Platform layer must be enabled with a valid embedding model before layers 2 and 3 affect runtime behavior.
| Setting | Purpose |
|---|---|
| Enable Contextual Memory | Master switch for tenant indexing and recall tooling |
| Embedding model | Model used to embed chat and helper messages (required when enabled) |
| Summary model | Chat-capable model for rolling conversation summaries |
| Message embedding TTL (days) | Retention for message-level embeddings |
| Summary TTL (days) | Retention for generated summaries |
| Auto-inject into helper context | Prepend relevant memory excerpts into GT Helper / CTP Helper inference |
| Auto-inject max runes | Upper bound on injected memory text per helper turn |
Deployment-wide default chat and embedding models for non-memory flows remain on Models → Default Models. This page uses dedicated memory model IDs so you can tune recall without changing general chat defaults.
Saving is blocked when Contextual Memory is enabled but no embedding model is selected.
Use Force re-embed all when you:
- Change the memory embedding model
- Need to recover from a bad backfill or widespread embedding failures
- Migrate after a major model catalog change
The workflow:
- Click Force re-embed all (enabled only when the platform layer is on and embedding model is set).
- Review the estimate: embeddable characters, estimated tokens, and estimated USD cost.
- Confirm in the dialog to enqueue a full backfill.
The operation deletes existing message embeddings and summaries, then enqueues workers to rebuild tenant chat and helper threads. Expect elevated embedding traffic until the pipeline drains.
When the platform layer is enabled or recent pipeline activity exists, the page shows Contextual Memory pipeline status for the Control Panel helper stream:
- Job counts: pending, running, succeeded, failed
- Message embeddings and summaries: active, expired, total
- Jobs by kind table
- Recent failures with error text
Use Refresh status after a re-embed or when investigating stuck jobs.
Tenant roles review memory pipeline metrics under Management → Observability → Contextual Memory tab in Observability:
- Scope follows tenant role (owner-wide, managed-group, or personal)
- Objective metrics: job counts, stored artifacts, usage signals, recent failures
- Complements billing breakdowns on the Billing tab when financial controls expose memory spend
Control Panel pipeline status focuses on the operator helper stream; tenant observability covers the deployment scope your role can see.
In Building Agents, set default memory mode for new chats:
- This conversation only — no cross-thread recall
- Just conversations with this agent — agent-scoped memory search
- All my conversations — user-wide recall when platform layer allows
In GT Chat, users override per conversation with the brain control when the platform layer is enabled: toggle memory on/off and choose recall scope for that thread.
Agents may invoke memory search tools when scope and platform settings allow. Activity labels such as Searching Contextual Memory appear in the chat timeline during recall.
Before enabling Contextual Memory in production:
- Configure a reachable embedding provider on Models (for example Ollama with
embeddinggemmaper Ollama host setup). - Set deployment default embedding model if datasets also depend on embeddings.
- Select memory-specific embedding and summary models on this page.
- Plan a maintenance window before Force re-embed all on large tenants.
| Symptom | What to check |
|---|---|
| Brain control missing in GT Chat | Platform layer disabled or embedding model unset on this page |
| Memory tools fail in agent chat | Agent default mode, per-conversation scope, and platform enablement |
| High failed job count | Recent failures table on this page or tenant Contextual Memory observability tab |
| Stale recall after model change | Run Force re-embed all and monitor pipeline until succeeded counts stabilize |
| Helper auto-inject too large | Lower Auto-inject max runes or disable auto-inject |
- Models
- Ollama host setup
- Observability (tenant Contextual Memory tab)
- Building Agents
- GT Helper Settings
- Financial Controls