Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,7 @@
"group": "Release Notes",
"pages": [
"releases/index",
"releases/v0.17.4",
"releases/v0.17.3",
"releases/v0.17.2",
"releases/v0.17.1",
Expand Down Expand Up @@ -450,7 +451,7 @@
"navbar": {
"links": [
{
"label": "v0.17.3 \u00b7 Lemonade 10.0.0",
"label": "v0.17.4 \u00b7 Lemonade 10.0.0",
"href": "https://github.com/amd/gaia/releases"
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/plans/email-triage-agent.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2598,7 +2598,7 @@ Choices the spec implies but does not resolve:
- [Whittaker & Sidner, "Email Overload" (CHI 1996)](https://dl.acm.org/doi/10.1145/238386.238530)
- [Bellotti et al., "Taking Email to Task" / Taskmaster (CHI 2003)](https://www.semanticscholar.org/paper/Taking-email-to-task/8a28a1ee766d87ca9acbd741a7c1972d69217359)
- [Aberdeen, Pacovsky & Slater, "Gmail Priority Inbox" (NIPS 2010)](https://research.google/pubs/pub36955/)
- [Cohen, Carvalho & Mitchell, "Email Speech Acts" (EMNLP 2004)](https://www.cs.cmu.edu/~tom/EMNLP2004_final.pdf)
- [Cohen, Carvalho & Mitchell, "Learning to Classify Email into 'Speech Acts'" (EMNLP 2004)](https://aclanthology.org/W04-3240/)
- [Vellum, "Levels of Agentic Behavior"](https://www.vellum.ai/blog/levels-of-agentic-behavior)
- [Knight Institute, "Levels of Autonomy for AI Agents"](https://knightcolumbia.org/content/levels-of-autonomy-for-ai-agents-1)

Expand Down
50 changes: 50 additions & 0 deletions docs/releases/v0.17.4.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: "v0.17.4"
description: "Custom-agent model selection, C++ null-safety, and docs link fix"
---

# GAIA v0.17.4 Release Notes

GAIA v0.17.4 is a patch release covering two correctness fixes in the Agent UI custom-agent path, a null-safety fix in the C++ library for smaller LLMs, and a broken docs citation.

**Why upgrade:**
- **Custom agents use their declared model** — If a custom agent sets a model via `kwargs.setdefault("model_id", ...)`, the Agent UI now respects that setting when the session is at the DB default, instead of falling back to the session model.
- **Compatibility with smaller LLMs in the C++ library** — The C++ JSON parser now tolerates `null` values in `"tool"` and `"content"` fields, which some smaller models emit in place of omitting the field.

---

## What's New

### Custom Agent `model_id` Respected in the Agent UI

`_chat_helpers.py` previously passed `model_id=<session model>` explicitly to `registry.create_agent()`, which defeated `kwargs.setdefault("model_id", ...)` in custom agents — `setdefault` only fires when the key is absent (PR [#841](https://github.com/amd/gaia/pull/841)). The Agent UI now builds `create_kwargs` conditionally, omitting `model_id` when the session is at the DB default so the agent's `__init__` setdefault governs. Three-branch precedence is now explicit: `custom_model` setting > session-explicit model > agent's own `setdefault`.

A follow-up fix (PR [#842](https://github.com/amd/gaia/pull/842)) restored the pre-construction `model_id` as the agent-cache key. The initial PR #841 landing had switched `_store_agent` to use the post-construction `_effective_model(agent, model_id)` while `_get_cached_agent` still looked up with `model_id`, so keys never matched for custom-model agents and the agent was rebuilt on every turn. A two-turn cache-hit regression test and a static guard on `_store_agent` call sites were added alongside the fix.

Supporting refactor: extracted `_build_create_kwargs()` and `_effective_model()` helpers in `src/gaia/ui/_chat_helpers.py` to deduplicate the three-branch logic across streaming and non-streaming paths, and exported `SESSION_DEFAULT_MODEL` from `database.py` as the single source of truth.

---

### C++ Library: Null-Safety in LLM Response Parsing

`parseLlmResponse()` in `cpp/src/json_utils.cpp` now guards `.get<std::string>()` calls on the `"tool"` and `"answer"` JSON fields with `.is_string()` / `.is_null()` checks (PR [#780](https://github.com/amd/gaia/pull/780)). This fixes a crash (`json.exception.type_error.302: type must be string, but is null`) when smaller LLMs (for example `qwen3.5:9b`) return `null` for those fields instead of omitting them. `json.contains()` returns `true` for `null` values, so the existing presence checks were insufficient.

---

## Bug Fixes

- **Email-triage agent plan: broken CMU citation link** (PR [#817](https://github.com/amd/gaia/pull/817)) — Swapped the failing `www.cs.cmu.edu/~tom/EMNLP2004_final.pdf` URL in `docs/plans/email-triage-agent.mdx` for the canonical ACL Anthology record at [W04-3240](https://aclanthology.org/W04-3240/). The CMU URL was failing DNS resolution in CI, breaking the `Verify external URLs` check on every open docs PR. Restored the paper's full title ("Learning to Classify Email into 'Speech Acts'") for consistency with other citations in the same references list.

---

## Full Changelog

**5 commits** since v0.17.3:

- `8fc43f3f` — fix(cpp): add null-safety checks for JSON string fields in LLM response parsing (#780)
- `62722de2` — fix(ui): honor custom agent model_id when session is at DB default (#841)
- `4acfd400` — fix(ui): extract _build_create_kwargs/_effective_model, import SESSION_DEFAULT_MODEL
- `8f5c7621` — fix(ui): restore intent-key for agent cache store to fix miss regression (#842)
- `a0fdb109` — docs(plans): fix broken CMU link to EMNLP 2004 Email Speech Acts paper (#817)

Full Changelog: [v0.17.3...v0.17.4](https://github.com/amd/gaia/compare/v0.17.3...v0.17.4)
2 changes: 1 addition & 1 deletion src/gaia/apps/webui/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@amd-gaia/agent-ui",
"version": "0.17.3",
"version": "0.17.4",
"type": "module",
"productName": "GAIA Agent UI",
"description": "Privacy-first agentic AI interface with document Q&A - runs 100% locally on AMD Ryzen AI",
Expand Down
5 changes: 4 additions & 1 deletion src/gaia/mcp/mcp_bridge.py
Original file line number Diff line number Diff line change
Expand Up @@ -628,7 +628,10 @@ def handle_jsonrpc(self, data):
400,
{
"jsonrpc": "2.0",
"error": {"code": -32600, "message": "Invalid Request: expected JSON object"},
"error": {
"code": -32600,
"message": "Invalid Request: expected JSON object",
},
"id": None,
},
)
Expand Down
104 changes: 87 additions & 17 deletions src/gaia/ui/_chat_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
import time as _time
from pathlib import Path

from .database import ChatDatabase
from .database import SESSION_DEFAULT_MODEL, ChatDatabase
from .models import ChatRequest
from .sse_handler import (
_ANSWER_JSON_SUB_RE,
Expand Down Expand Up @@ -73,6 +73,9 @@ def get_agent_registry():
_agent_cache_lock = threading.Lock()
_MAX_CACHED_AGENTS = 10

# Alias so call-sites read naturally; the canonical value lives in database.py.
_DB_DEFAULT_MODEL = SESSION_DEFAULT_MODEL

# Last known MCP runtime status — updated after each agent setup so
# GET /api/mcp/status can return it without needing a running chat.
_mcp_status_cache: list[dict] = []
Expand All @@ -84,6 +87,56 @@ def get_agent_registry():
model_load_lock = threading.Lock()


def _build_create_kwargs(
*,
custom_model: str | None,
model_id: str | None,
streaming: bool = False,
) -> dict:
"""Return the kwargs dict for registry.create_agent().

Precedence (high → low):
1. custom_model setting (explicit user override from db)
2. session-explicit model (differs from SESSION_DEFAULT_MODEL)
3. omit model_id — lets the agent's kwargs.setdefault govern (fix #841)

Note: if registry.resolve_model() already promoted model_id before this
call, it is forwarded as-is via branch 2 (resolve_model result ≠ default).
"""
suffix = " (streaming)" if streaming else ""
kwargs: dict = {"silent_mode": not streaming, "debug": False}
if streaming:
kwargs["streaming"] = True

if custom_model:
kwargs["model_id"] = custom_model
logger.info("create_agent: custom_model override -> %s%s", custom_model, suffix)
elif model_id and model_id != _DB_DEFAULT_MODEL:
kwargs["model_id"] = model_id
logger.info("create_agent: session-explicit model -> %s%s", model_id, suffix)
else:
# Omit model_id so kwargs.setdefault in the agent's __init__ fires.
# setdefault only works when the key is ABSENT. Passing the DB default
# (or None / empty) explicitly defeats it — this is the fix for #841.
logger.info(
"create_agent: omitting model_id kwarg (session at DB default %s); "
"agent's kwargs.setdefault or AgentConfig fallback will govern%s",
_DB_DEFAULT_MODEL,
suffix,
)
return kwargs


def _effective_model(agent, fallback: str | None) -> str | None:
"""Return agent.model_id if set, else fallback.

Uses explicit None check (not `or`) to avoid treating empty-string
model_id as missing — which would silently load the wrong model.
"""
effective = getattr(agent, "model_id", None)
return effective if effective is not None else fallback


def get_cached_mcp_status() -> list[dict]:
"""Return the last known MCP server connection status from any cached agent."""
with _mcp_status_lock:
Expand Down Expand Up @@ -556,17 +609,23 @@ def _do_chat():
)
agent = registry.create_agent(
agent_type,
model_id=model_id,
silent_mode=True,
debug=False,
**_build_create_kwargs(
custom_model=custom_model, model_id=model_id
),
)
logger.info(
"chat: Invoking agent %s for session %s, model=%s",
agent_type,
session_id[:8],
_effective_model(agent, model_id),
)
_store_agent(
session_id,
model_id,
document_ids,
agent,
agent_type,
)
_store_agent(session_id, model_id, document_ids, agent, agent_type)

# Restore conversation history (limited to prevent context overflow).
# Always re-inject from DB so the history is consistent with what was
Expand All @@ -585,8 +644,11 @@ def _do_chat():
agent.conversation_history.append({"role": "user", "content": u})
agent.conversation_history.append({"role": "assistant", "content": a})

# Pre-flight: same fix as the streaming path — see _maybe_load_expected_model.
_maybe_load_expected_model(model_id)
# Pre-flight on agent's ACTUAL effective model. When model_id kwarg was
# omitted, the agent's __init__ set model_id via kwargs.setdefault —
# a value invisible pre-construction. Using _effective_model preserves
# the existing 100-900s silent-hang protection for all code paths.
_maybe_load_expected_model(_effective_model(agent, model_id))

result = agent.process_query(request.message)
if isinstance(result, dict):
Expand Down Expand Up @@ -915,17 +977,18 @@ def _run_agent():
t_construct = _time.monotonic()
agent = registry.create_agent(
agent_type,
model_id=model_id,
streaming=True,
silent_mode=False,
debug=False,
**_build_create_kwargs(
custom_model=custom_model,
model_id=model_id,
streaming=True,
),
)
agent.console = sse_handler
logger.info(
"chat: Invoking agent %s for session %s, model=%s took=%.3fs",
agent_type,
session_id[:8],
model_id,
_effective_model(agent, model_id),
_time.monotonic() - t_construct,
)

Expand All @@ -937,7 +1000,11 @@ def _run_agent():
_index_rag_with_progress(agent, rag_file_paths, sse_handler)

_store_agent(
session_id, model_id, document_ids, agent, agent_type
session_id,
model_id,
document_ids,
agent,
agent_type,
)

sse_handler._emit(
Expand Down Expand Up @@ -987,10 +1054,13 @@ def _run_agent():
if sse_handler.cancelled.is_set():
return

# Pre-flight: ensure a chat-capable LLM is active before sending the query.
# Lemonade silently hangs when no model is loaded or the embedding model is
# active — no error is returned, so _execute_with_auto_download never fires.
_maybe_load_expected_model(model_id, sse_handler)
# Pre-flight on agent's ACTUAL effective model. When model_id kwarg was
# omitted, the agent's __init__ set model_id via kwargs.setdefault — a value
# invisible pre-construction. Using agent.model_id preserves the existing
# 100-900s silent-hang protection for all code paths including setdefault.
_maybe_load_expected_model(
_effective_model(agent, model_id), sse_handler
)

# -- Phase 5: Query processing --
t_query = _time.monotonic()
Expand Down
6 changes: 5 additions & 1 deletion src/gaia/ui/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@

DEFAULT_DB_PATH = Path.home() / ".gaia" / "chat" / "gaia_chat.db"

# Default model for new sessions — kept in sync with the SQL schema DEFAULT and
# any code that reads session["model"] and falls back when the field is NULL.
SESSION_DEFAULT_MODEL = "Qwen3.5-35B-A3B-GGUF"

SCHEMA_SQL = """
-- Global document library
CREATE TABLE IF NOT EXISTS documents (
Expand Down Expand Up @@ -230,7 +234,7 @@ def create_session(
"""Create a new chat session."""
session_id = str(uuid.uuid4())
now = self._now()
model = model or "Qwen3.5-35B-A3B-GGUF"
model = model or SESSION_DEFAULT_MODEL
title = title or "New Chat"
agent_type = agent_type or "chat"

Expand Down
91 changes: 91 additions & 0 deletions tests/integration/test_chat_ui_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -1591,3 +1591,94 @@ def test_delete_messages_from_session_not_found(self, client):
"""DELETE .../and-below returns 404 for non-existent session."""
resp = client.delete("/api/sessions/nonexistent/messages/1/and-below")
assert resp.status_code == 404


# ── Issue #841 regression: custom agent model_id honored through API ──────────


class TestCustomAgentModelChoice:
"""Verify that a custom Python agent's kwargs.setdefault model_id reaches the
registry.create_agent call without model_id being passed as an explicit kwarg.

This is the integration-layer pin for issue #841. It exercises the full
path: HTTP POST → session → _get_chat_response → registry.create_agent.
"""

def test_custom_agent_model_id_honored_through_api(self, tmp_path):
import textwrap

agents_dir = tmp_path / ".gaia" / "agents" / "smallbot"
agents_dir.mkdir(parents=True)
(agents_dir / "agent.py").write_text(textwrap.dedent("""
from gaia.agents.base.agent import Agent

class SmallBot(Agent):
AGENT_ID = "smallbot"
AGENT_NAME = "SmallBot"

def __init__(self, **kwargs):
kwargs.setdefault("model_id", "Qwen3.5-4B-GGUF")
super().__init__(skip_lemonade=True, **kwargs)

def _get_system_prompt(self):
return "x"

def _register_tools(self):
pass
"""))

# HOME patch must wrap the full lifespan: discover() fires on __enter__.
with patch("gaia.agents.registry.Path.home", return_value=tmp_path):
app = create_app(db_path=":memory:")

with TestClient(app) as client:
# Spy on create_agent AFTER lifespan fires (registry exists now).
captured = {}
original_create = app.state.agent_registry.create_agent

def _spy(agent_id, **kwargs):
if agent_id == "smallbot":
captured["model_id_kwarg"] = kwargs.get("model_id", "<omitted>")
agent = original_create(agent_id, **kwargs)
if agent_id == "smallbot":
captured["agent_model_id"] = getattr(agent, "model_id", None)
return agent

app.state.agent_registry.create_agent = _spy

# Create a session typed to our custom agent.
sess_resp = client.post(
"/api/sessions",
json={"title": "841-test", "agent_type": "smallbot"},
)
assert sess_resp.status_code == 200, sess_resp.text
sid = sess_resp.json()["id"]

# Send a chat message, bypassing Lemonade and LLM.
with (
patch("gaia.ui._chat_helpers._maybe_load_expected_model"),
patch(
"gaia.ui._chat_helpers._agent_registry",
app.state.agent_registry,
),
):
chat_resp = client.post(
"/api/chat/send",
json={
"session_id": sid,
"message": "hi",
"stream": False,
},
)

assert chat_resp.status_code == 200, chat_resp.text

assert captured, "create_agent spy was never called for smallbot"
assert captured["model_id_kwarg"] == "<omitted>", (
f"Issue #841: model_id kwarg must be omitted when session is at DB default; "
f"got model_id_kwarg={captured['model_id_kwarg']!r}"
)
assert captured["agent_model_id"] == "Qwen3.5-4B-GGUF", (
f"Issue #841: agent.model_id must reflect kwargs.setdefault value; "
f"got {captured['agent_model_id']!r}"
)
Loading
Loading