Fix unbounded query context growth in chat sessions

## Problem

\`src/gaia/chat/ui/server.py:690-723\` — chat history grows without checking total token count against the model's context window:

```python
messages = db.get_messages(request.session_id, limit=20)
# Takes last 4 pairs = 8 messages
# If each message is 10KB, that's 80KB of context per query
# Over many queries, LLM context fills up, response quality degrades
```

No truncation if history + RAG context + system prompt approaches model limit.

## Impact
- Long conversations degrade response quality as context fills up
- Very large messages (pasted documents, code blocks) overflow context
- No token counting before sending to LLM
- RAG chunks may be pushed out of context by long history

## Proposed Fix
1. Estimate token count of history before sending to LLM
2. Implement sliding window with token budget (e.g., 80% of model context)
3. Truncate oldest history pairs when budget exceeded
4. Reserve fixed budget for RAG chunks and system prompt
5. Add \`max_history_tokens\` config parameter
6. Log when history truncation occurs

## Files
- `src/gaia/chat/ui/server.py` (lines 690-723, _get_chat_response)
- `src/gaia/chat/sdk.py` (conversation history management)

## Acceptance Criteria
- [ ] History truncated when approaching model context limit
- [ ] RAG context always has reserved token budget
- [ ] Config option for max history tokens
- [ ] Test: long conversation → verify truncation occurs cleanly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unbounded query context growth in chat sessions #453

Problem

Impact

Proposed Fix

Files

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix unbounded query context growth in chat sessions #453

Description

Problem

Impact

Proposed Fix

Files

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions