feat: Lemonade version mismatch warning, eval perf tracking, MCP stats by kovtcharov · Pull Request #637 · amd/gaia

kovtcharov · 2026-03-27T21:36:41Z

Summary

Lemonade version mismatch warning: Warns when running server version differs from expected; checks both CLI and running server versions with minor/patch diff warnings
Eval performance tracking: Aggregates per-turn performance data (tok/s, TTFT, token counts) into scenario-level summaries and scorecard Performance section
MCP agent UI stats: Captures inference stats from done SSE events and surfaces them in MCP responses
Eval judge prompt improvements: Updated judge and simulator prompts for better scoring

Test plan

python -m pytest tests/unit/test_lemonade_version_check.py -xvs
Run gaia eval and verify performance summary in scorecard
Verify Lemonade version warning appears when versions differ

- Warn when Lemonade Server version doesn't match expected (minor/patch), not just on major version mismatch. Also check server-reported version from health endpoint and display it during initialization. - Add performance data collection to eval framework: per-turn inference stats (tok/s, TTFT, token counts) aggregated into scenario and scorecard summaries. - Capture inference stats from SSE done events in agent_ui_mcp and expose them in get_messages responses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…se-quality

…alth get_status() previously took ctx_size from the first model in all_models_loaded, which could be an embedding model with an irrelevant context size. Now iterates to find the first non-embedding model, falling back to the legacy health.context_size field.

… errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix pylint W1404: merge implicit string concatenation in version warning - Fix flake8 F401: remove unused pytest import in version check tests - Guard get_mcp_status_report() against None _mcp_manager (crashes when MCP dependencies are not installed) - Use `or 0` pattern in MCP perf logging to handle None stat values

kovtcharov and others added 2 commits March 25, 2026 12:36

Merge remote-tracking branch 'origin/main' into optimize/agent-respon…

a73cf08

…se-quality

github-actions Bot added mcp MCP integration changes llm LLM backend changes eval Evaluation framework changes tests Test changes performance Performance-critical changes labels Mar 27, 2026

kovtcharov added 2 commits March 29, 2026 07:30

Merge remote-tracking branch 'origin/main' into optimize/agent-respon…

33bed2f

…se-quality

kovtcharov marked this pull request as ready for review March 30, 2026 21:45

kovtcharov requested a review from kovtcharov-amd as a code owner March 30, 2026 21:45

kovtcharov self-assigned this Mar 30, 2026

kovtcharov requested a review from itomek March 30, 2026 22:04

kovtcharov and others added 2 commits March 30, 2026 15:04

fix: resolve pylint implicit-str-concat and flake8 unused-import lint…

8e97a1c

… errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

itomek approved these changes Mar 30, 2026

View reviewed changes

kovtcharov added this pull request to the merge queue Mar 30, 2026

Merged via the queue into main with commit 780a711 Mar 30, 2026
36 checks passed

kovtcharov deleted the optimize/agent-response-quality branch March 30, 2026 22:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Lemonade version mismatch warning, eval perf tracking, MCP stats#637

feat: Lemonade version mismatch warning, eval perf tracking, MCP stats#637
kovtcharov merged 6 commits intomainfrom
optimize/agent-response-quality

kovtcharov commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kovtcharov commented Mar 27, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants