Skip to content

Fix unbounded query context growth in chat sessions #453

@kovtcharov

Description

@kovtcharov

Problem

`src/gaia/chat/ui/server.py:690-723` — chat history grows without checking total token count against the model's context window:

messages = db.get_messages(request.session_id, limit=20)
# Takes last 4 pairs = 8 messages
# If each message is 10KB, that's 80KB of context per query
# Over many queries, LLM context fills up, response quality degrades

No truncation if history + RAG context + system prompt approaches model limit.

Impact

  • Long conversations degrade response quality as context fills up
  • Very large messages (pasted documents, code blocks) overflow context
  • No token counting before sending to LLM
  • RAG chunks may be pushed out of context by long history

Proposed Fix

  1. Estimate token count of history before sending to LLM
  2. Implement sliding window with token budget (e.g., 80% of model context)
  3. Truncate oldest history pairs when budget exceeded
  4. Reserve fixed budget for RAG chunks and system prompt
  5. Add `max_history_tokens` config parameter
  6. Log when history truncation occurs

Files

  • src/gaia/chat/ui/server.py (lines 690-723, _get_chat_response)
  • src/gaia/chat/sdk.py (conversation history management)

Acceptance Criteria

  • History truncated when approaching model context limit
  • RAG context always has reserved token budget
  • Config option for max history tokens
  • Test: long conversation → verify truncation occurs cleanly

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingchatChat SDK changesp1medium priorityperformancePerformance-critical changesragRAG system changes

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions