Skip to content

Release v0.17.4#855

Merged
itomek merged 7 commits intomainfrom
v0.17.4-release
Apr 24, 2026
Merged

Release v0.17.4#855
itomek merged 7 commits intomainfrom
v0.17.4-release

Conversation

@itomek
Copy link
Copy Markdown
Collaborator

@itomek itomek commented Apr 23, 2026

GAIA v0.17.4 Release Notes

GAIA v0.17.4 is a patch release covering two correctness fixes in the Agent UI custom-agent path, a null-safety fix in the C++ library for smaller LLMs, and a broken docs citation.

Why upgrade:

  • Custom agents use their declared model — If a custom agent sets a model via kwargs.setdefault("model_id", ...), the Agent UI now respects that setting when the session is at the DB default, instead of falling back to the session model.
  • Compatibility with smaller LLMs in the C++ library — The C++ JSON parser now tolerates null values in "tool" and "content" fields, which some smaller models emit in place of omitting the field.

What's New

Custom Agent model_id Respected in the Agent UI

_chat_helpers.py previously passed model_id=<session model> explicitly to registry.create_agent(), which defeated kwargs.setdefault("model_id", ...) in custom agents — setdefault only fires when the key is absent (PR #841). The Agent UI now builds create_kwargs conditionally, omitting model_id when the session is at the DB default so the agent's __init__ setdefault governs. Three-branch precedence is now explicit: custom_model setting > session-explicit model > agent's own setdefault.

A follow-up fix (PR #842) restored the pre-construction model_id as the agent-cache key. The initial PR #841 landing had switched _store_agent to use the post-construction _effective_model(agent, model_id) while _get_cached_agent still looked up with model_id, so keys never matched for custom-model agents and the agent was rebuilt on every turn. A two-turn cache-hit regression test and a static guard on _store_agent call sites were added alongside the fix.

Supporting refactor: extracted _build_create_kwargs() and _effective_model() helpers in src/gaia/ui/_chat_helpers.py to deduplicate the three-branch logic across streaming and non-streaming paths, and exported SESSION_DEFAULT_MODEL from database.py as the single source of truth.


C++ Library: Null-Safety in LLM Response Parsing

parseLlmResponse() in cpp/src/json_utils.cpp now guards .get<std::string>() calls on the "tool" and "answer" JSON fields with .is_string() / .is_null() checks (PR #780). This fixes a crash (json.exception.type_error.302: type must be string, but is null) when smaller LLMs (for example qwen3.5:9b) return null for those fields instead of omitting them. json.contains() returns true for null values, so the existing presence checks were insufficient.


Bug Fixes

  • Email-triage agent plan: broken CMU citation link (PR #817) — Swapped the failing www.cs.cmu.edu/~tom/EMNLP2004_final.pdf URL in docs/plans/email-triage-agent.mdx for the canonical ACL Anthology record at W04-3240. The CMU URL was failing DNS resolution in CI, breaking the Verify external URLs check on every open docs PR. Restored the paper's full title ("Learning to Classify Email into 'Speech Acts'") for consistency with other citations in the same references list.

Full Changelog

5 commits since v0.17.3:

Full Changelog: v0.17.3...v0.17.4


Release checklist

  • util/validate_release_notes.py docs/releases/v0.17.4.mdx --tag v0.17.4 passes
  • src/gaia/version.py0.17.4
  • src/gaia/apps/webui/package.json0.17.4
  • Navbar label in docs/docs.jsonv0.17.4 · Lemonade 10.0.0
  • All 5 commits in the range (v0.17.3..HEAD) are represented in the notes
  • Review from @kovtcharov-amd addressed

itomek and others added 6 commits April 20, 2026 18:50
)

Previously, _chat_helpers.py always passed model_id=<session model> explicitly
to registry.create_agent(), defeating kwargs.setdefault("model_id", ...) in
custom agents — which only fires when the key is absent.

Fix: build create_kwargs conditionally, omitting model_id when the session is
at the DB default so the agent's __init__ setdefault governs. Also use
agent.model_id (post-construction) for both _store_agent cache key and the
pre-flight _maybe_load_expected_model call.

Three-branch precedence: custom_model setting > session-explicit > omit kwarg.

Closes #841
…N_DEFAULT_MODEL

Addresses code review feedback on PR #842:

- Export SESSION_DEFAULT_MODEL from database.py (single source of truth)
  instead of duplicating the string literal in _chat_helpers.py
- Extract _build_create_kwargs() helper to eliminate the duplicate three-branch
  create_kwargs logic across non-streaming and streaming code paths
- Extract _effective_model() helper using explicit None check (not `or`)
  to safely read agent.model_id post-construction without treating empty
  string as missing
- Fix static regression guard regex to use [^()]* so nested helper calls
  inside create_agent() are not falsely flagged
- Update unit test to import SESSION_DEFAULT_MODEL instead of hardcoding
…ion (#842)

_store_agent was changed by the #842 fix to use _effective_model(agent,
model_id) as the cache key — the post-construction value set by kwargs.setdefault.
_get_cached_agent still looks up using the pre-construction model_id variable.
For custom agents whose setdefault model differs from the session model, the
keys never match and the agent is rebuilt on every turn.

Revert the two _store_agent call sites to use model_id (the pre-construction
intent key), matching what the lookup uses. _effective_model stays at the two
_maybe_load_expected_model sites (Lemonade pre-flight needs the actual model)
and in log statements (observability).

Add two regression guards:
- test_cache_hit_on_second_turn_for_setdefault_agent: two-turn cache-hit test
  with four assertions (call count, object identity, stored-key equality,
  agent.model_id). Covers the builder/template.py setdefault pattern.
- test_no_effective_model_in_store_agent_calls: static grep guard that asserts
  _store_agent never receives _effective_model(...) as a positional arg,
  preventing this pattern from silently returning in a future cleanup pass.
#817)

## Summary

One-line fix: swap the failing `www.cs.cmu.edu/~tom/EMNLP2004_final.pdf`
URL in `docs/plans/email-triage-agent.mdx:2601` for the canonical ACL
Anthology record at [W04-3240](https://aclanthology.org/W04-3240/). The
CMU URL fails DNS resolution in CI (see [recent
run](https://github.com/amd/gaia/actions/runs/24595902571/job/72072156929)),
breaking the ``Verify external URLs`` check for every open PR that
touches docs. ACL Anthology is the permanent archive for ACL/EMNLP
papers — stable URL, no more link rot.

Also restored the paper's actual full title ("Learning to Classify Email
into 'Speech Acts'") for consistency with the other full-title citations
in the same references list.

## Test plan

- [x] `curl -sI https://aclanthology.org/W04-3240/` returns 200
- [ ] After merge, `Verify external URLs` check should go green on
downstream PRs
Patch release: custom agents now honor their declared model, and the C++
library no longer crashes on null JSON fields from smaller LLMs.

- Custom Agent UI agents honor kwargs.setdefault("model_id", ...) when the
  session is at the DB default (#841, follow-up #842 restores cache hits).
- C++ library adds null-safety guards in parseLlmResponse() to tolerate
  smaller LLMs that return null for "tool" or "content" (#780).
- Docs: swap broken CMU link for canonical ACL Anthology URL (#817).
@itomek itomek requested a review from kovtcharov-amd April 23, 2026 23:08
@itomek itomek self-assigned this Apr 23, 2026
@itomek itomek added the release label Apr 23, 2026
@github-actions github-actions Bot added documentation Documentation changes mcp MCP integration changes tests Test changes labels Apr 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Summary

PR #855 is the v0.17.4 release: a version bump plus the _chat_helpers.py fix for issue #841 (custom-agent model_id ignored), its #842 follow-up (agent-cache key divergence), and new regression tests. The refactor is clean — _build_create_kwargs() / _effective_model() de-duplicate the three-branch precedence across the streaming and non-streaming paths, and the tests pin both the behavior and the source-level antipattern. Ship it. Two tiny doc nits and one optional readability suggestion below.

Issues Found

🟢 Minor — Release notes changelog undercounts commits (docs/releases/v0.17.4.mdx:82)

The "Full Changelog" claims 5 commits since v0.17.3, but git log v0.17.3..HEAD on main shows 7 non-release commits including 3b51ca92 style(mcp): apply Black formatting to mcp_bridge.py (CI lint fix) — which is present in this PR's diff (src/gaia/mcp/mcp_bridge.py). Either bump the count and add the lint-fix bullet, or phrase it as "notable commits" so the omission isn't a contradiction with the diff.

**6 commits** since v0.17.3:

- `8fc43f3f` — fix(cpp): add null-safety checks for JSON string fields in LLM response parsing (#780)
- `62722de2` — fix(ui): honor custom agent model_id when session is at DB default (#841)
- `4acfd400` — fix(ui): extract _build_create_kwargs/_effective_model, import SESSION_DEFAULT_MODEL
- `8f5c7621` — fix(ui): restore intent-key for agent cache store to fix miss regression (#842)
- `a0fdb109` — docs(plans): fix broken CMU link to EMNLP 2004 Email Speech Acts paper (#817)
- `3b51ca92` — style(mcp): apply Black formatting to mcp_bridge.py (CI lint fix)

🟢 Minor — Documented edge case: user explicitly picking the DB default (src/gaia/ui/_chat_helpers.py:114)

elif model_id and model_id != _DB_DEFAULT_MODEL treats a session whose model happens to equal SESSION_DEFAULT_MODEL as "unset", and falls through to branch 3 where the agent's setdefault governs. That's the intended fix for #841, but it does mean a user who explicitly chose Qwen3.5-35B-A3B-GGUF will silently have a custom agent's setdefault override them. The code comment at line 118-120 explains the mechanism but not this user-visible consequence. Consider adding one sentence to the branch-3 log line so the behavior is discoverable in support logs, e.g.:

        # Omit model_id so kwargs.setdefault in the agent's __init__ fires.
        # setdefault only works when the key is ABSENT. Passing the DB default
        # (or None / empty) explicitly defeats it — this is the fix for #841.
        # Note: a session whose model coincidentally equals the DB default is
        # indistinguishable from "unset" here and will surrender to setdefault.
        logger.info(
            "create_agent: omitting model_id kwarg (session at DB default %s); "
            "agent's kwargs.setdefault or AgentConfig fallback will govern%s",
            _DB_DEFAULT_MODEL,
            suffix,
        )

🟢 Minor — Test helper uses deprecated asyncio.get_event_loop() (tests/unit/chat/ui/test_chat_helpers_model_resolution.py:27)

asyncio.get_event_loop() emits DeprecationWarning under Python 3.12+ when no loop is running (GAIA requires >=3.10). There's existing precedent in tests/unit/chat/ui/test_history_limits.py:42 so this isn't a regression, but asyncio.run(coro) is the modern drop-in. Not blocking for the release — worth a follow-up issue to migrate both.

Strengths

  • Regression pins at three layers are what make this PR durable: a unit-level functional test for the kwarg selection, a source-level re.search guard against reintroducing create_agent(..., model_id=model_id, ...) (test_chat_helpers_model_resolution.py:759), and a matching guard against _store_agent(..., _effective_model(...)) to catch the fix(ui): honor custom agent model_id when session is at DB default #842 cache-key divergence (line 776). Future refactors that re-introduce either antipattern will fail loudly in <5ms.
  • _effective_model uses explicit None check, not or (_chat_helpers.py:137) — correctly avoids the footgun of treating empty-string model_id as missing, and the docstring names the silent-wrong-model failure it's preventing. This is the kind of small decision worth documenting.
  • Cache-hit regression test is object-identity based (test_chat_helpers_model_resolution.py:731) — second_agent is first_agent plus the create_agent.call_count == 1 assertion gives two independent signals that the cache actually hit. Much stronger than asserting on the stored key alone.
  • SESSION_DEFAULT_MODEL exported from database.py as a single source of truth (line 26) — correctly preferred over duplicating the literal in _chat_helpers.py. The _DB_DEFAULT_MODEL alias at _chat_helpers.py:77 keeps call-sites readable without inventing a second canonical value.

Verdict

Approve with suggestions. No blockers. The one concrete suggestion worth taking before merge is the commit count in the release notes — the rest are nits that can land in a follow-up.

@itomek itomek enabled auto-merge April 23, 2026 23:54
@itomek itomek added this pull request to the merge queue Apr 24, 2026
Merged via the queue into main with commit 9a74298 Apr 24, 2026
40 of 43 checks passed
@itomek itomek deleted the v0.17.4-release branch April 24, 2026 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Documentation changes mcp MCP integration changes release tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants