Conversation
…timeouts The ChatAgent's _get_system_prompt() was sending a 17,648-token system prompt unconditionally, causing every message to timeout with Qwen3.5-35B on local hardware (Lemonade's ~5min timeout exceeded before first token). Changes: - Merge duplicate platform blocks into one (current OS only, ~50 lines saved) - Two-tier RAG gating: Tier 1 (discovery/tool-usage rules) always present since RAG tools are always registered; Tier 2 (query/factual-accuracy rules) only injected when has_indexed=True - Remove hardcoded AVAILABLE TOOLS REFERENCE (duplicated auto-generated list) - Remove both UNSUPPORTED FEATURES blocks, replace with 3-line version - Deduplicate MULTI-FACT QUERY RULE and CONVERSATION SUMMARY RULE - Remove git merge conflict marker (<<<<<<< HEAD) from prompt string - Condense WRONG/RIGHT examples throughout Result: no-docs prompt ~3,853 tokens (was ~17,648), 78% reduction. Also increase streaming and non-streaming chat timeouts from 120/180s to 600s, add "Sending to model..." status events for better UX feedback, and un-suppress print_processing_start() in SSE handler.
Tracks which MCP servers are actually connected at runtime and exposes this status in the Settings modal. Backend: - MCPClientManager._failed: records connection errors from load_from_config() - MCPClientManager.get_status_report(): returns connected/failed status with tool counts for all configured servers - MCPClientMixin.get_mcp_status_report(): delegates to _mcp_manager - _chat_helpers.py: emits mcp_status SSE event after each agent setup; caches last-known status for the REST endpoint - GET /api/mcp/status: returns cached runtime MCP status - Fix: list_mcp_server_tools used .get() on MCPTool dataclass (AttributeError) - Fix: remove_server now clears _failed to avoid ghost entries Frontend: - types/index.ts: MCPServerStatus type, mcp_status StreamEventType - services/api.ts: getMCPRuntimeStatus() - SettingsModal.tsx: MCP Servers section showing connected/failed status with tool counts (only shown when MCP servers are configured) Tests: 6 new tests for MCPClientManager._failed tracking and get_status_report()
Verify that: - Tier 2 rules (FACTUAL ACCURACY, POST-INDEX QUERY, etc.) are absent from the system prompt when no documents are indexed - Tier 2 rules appear after a document is indexed and rebuild_system_prompt() is called (has_indexed=True gates the Tier 2 block) - Tier 1 rules (SMART DISCOVERY WORKFLOW, FILE SEARCH) are always present since RAG tools are always registered unconditionally
…iptions
Replace Prompts.format_chat_history() + generate() with structured
messages + chat() in send_messages() and send_messages_stream(). This
prevents the model seeing nested/malformed ChatML tokens that caused it
to recite system prompt instructions instead of responding normally.
Also trims tool descriptions to first line only (~1660 token savings),
adds heartbeat status events during prompt prefill, and gates the SD
mixin prompt on sd_default_model presence.
Fixes: model generating garbage text ("According to my Instructions")
instead of valid JSON responses in GAIA Agent UI.
…prompt Restores all removed example pairs and multi-turn worked examples into the modular prompt structure (base_prompt, tool_rules, discovery_rules, rag_query_rules, data_file_rules). Examples were removed for token savings but are necessary for LLM adherence to complex behavioral rules. Also moves POST-INDEX QUERY RULE from conditional rag_query_rules to always-present tool_rules — Smart Discovery can trigger indexing mid-conversation even before docs are initially indexed, so the rule must be present from the start. Updates test to reflect this design. Removes dead variable has_docs (defined but never used).
|
Recommend creating your own MCP-focused agent. |
DocumentLibrary, FileBrowser, and ChatView drag-and-drop now call api.attachDocument() after indexing so the agent's RAG system receives the session's documents. Also fixes context bar to show session-scoped docs, changes X button to detach (not hard-delete), evicts agent cache on detach, and adds early-exit guard in read_file for binary formats. Closes #609
4b40932 to
5e2a5e4
Compare
Auto-detect when frontend source files are newer than the built dist and run `npm run build` before starting the server. Also add no-cache headers to index.html responses so browsers and tunnel proxies always pick up rebuilt assets. Add VS Code Dev Tunnels to CORS origins. Co-Authored-By: Tomasz Iniewicz <infancy_shred.0d@icloud.com>
Disable uvicorn access logs by default (enable with --debug flag). Gate frontend console info/debug/timed logs behind ?debug URL param or localStorage, keeping only warnings and errors visible. Co-Authored-By: Tomasz Iniewicz <infancy_shred.0d@icloud.com>
- Add _maybe_load_expected_model() pre-flight check in _chat_helpers.py that detects when Lemonade has no chat-capable model loaded (empty all_models_loaded list or embedding-only) and calls load_model() before process_query(). Lemonade silently hangs HTTP connections in this state instead of returning an error, causing 100-900s hangs. - Suppress false-alarm "Wrong model" banner in ConnectionBanner.tsx Case 4 when the embedding model is transiently active after indexing. - Add 10s connection timeout to MCPClientManager.load_from_config() so a hanging MCP stdio server cannot block agent construction indefinitely before the pre-flight check is reached. - Add 12 unit tests covering all pre-flight scenarios including fast path, embedding-only, no model, error handling, and concurrency.
_chat_helpers.py: - Add _model_load_lock (threading.Lock) to prevent concurrent load_model() calls from multiple sessions arriving simultaneously with no model active - Add _maybe_load_expected_model() pre-flight check that inspects Lemonade's all_models_loaded before process_query(). When no llm/vlm is active (empty list or embedding-only), calls blocking load_model() and emits a "Loading LLM model..." SSE status event. This prevents the 100-900s silent hang caused by Lemonade accepting chat completions but producing zero tokens when no text-generation model is loaded. - Call _maybe_load_expected_model() in both the streaming (_run_agent) and non-streaming (_do_chat) paths. ConnectionBanner.tsx: - Suppress Case 4 "Wrong model loaded" banner when embedding_model_loaded is true. The embedding model is transiently active after RAG indexing; the pre-flight in _chat_helpers.py loads the correct LLM before the first query. mcp_client_manager.py: - Add cancel_futures=True to pool.shutdown() so pending MCP connections are not started after the 10s timeout fires.
Cover re-check inside lock returning non-200 (load_model proceeds) and re-check raising ConnectError (outer handler catches, warning SSE emitted). Brings test count to 14.
|
@claude review this PR in detail. Identify if there's any technical debt or bugs. |
|
Claude finished @kovtcharov's task —— View job PR Review: Technical Debt and Bug AnalysisReviewing PR #617 in detail for technical debt and bugs...
SummaryThis PR makes significant improvements to system prompt size (78% reduction), MCP runtime status visibility, and timeout handling. The implementation is generally solid with good test coverage. However, I've identified several issues ranging from minor to important. Issues Found🟡 Important Issues1. Potential Race Condition in MCP Status Cache (
|
- Remove redundant `import shutil` in cli.py (already imported at top) - Fix test_system_status_llm_health_fields_have_safe_defaults to pin LEMONADE_BASE_URL so it passes regardless of environment variables - Reformat test_chat_preflight.py (black formatting) https://claude.ai/code/session_01E8XUu1vYUvGs6wGYoDSUsy
- Use copy.deepcopy() in get_cached_mcp_status() to prevent callers from mutating cached dicts after the lock is released (race condition) - Add type parameters to _agent_cache and _mcp_status_cache annotations https://claude.ai/code/session_01E8XUu1vYUvGs6wGYoDSUsy
## Summary Release v0.17.0 — **GAIA Agent UI**, eval benchmark framework, tool execution guardrails, system prompt optimization, and security hardening. ### Files Changed - **`docs/releases/v0.17.0.mdx`** — Comprehensive release notes (new file) - **`docs/docs.json`** — Added `releases/v0.17.0` to Releases tab, updated navbar to `v0.17.0 · Lemonade 10.0.0` - **`src/gaia/version.py`** — Already at `0.17.0` on main (no change needed) ### Release Highlights **New Features:** - **GAIA Agent UI** — Full-stack privacy-first desktop chat with streaming responses, 53+ format document Q&A, ngrok tunnel for mobile, page-level citations, session management (PR #428) - **Agent UI Eval Framework** — `gaia eval agent` command with 7-dimension weighted scoring across 34 scenarios, redesigned Settings modal, `<think>` block display, performance stats (PR #607) - **Tool Execution Guardrails** — Blocking confirmation popup (Allow/Deny/Always Allow) before write/shell tools, 60s timeout (PR #565, #604) - **Device Support Detection** — AMD Ryzen AI Max + Radeon ≥24GB detection, `--base-url` remote bypass, `GAIA_SKIP_DEVICE_CHECK` override (PR #593) - **Terminal UI Design** — Typewriter welcome page, pixelated AMD cursor, glassmorphism, `prefers-reduced-motion` support (PR #568) **Performance:** - **78% System Prompt Reduction** — 17,600 → 3,853 tokens via two-tier RAG gating, 600s chat timeout, MCP runtime status display (PR #617) **Security:** - **TOCTOU Race Condition** — Atomic `O_NOFOLLOW` + `fstat` fix in document upload, per-file `asyncio.Lock` (PR #564) **Bug Fixes:** - LRU eviction silent failure + new `--max-indexed-files`/`--max-total-chunks` CLI flags (PR #567) - Lemonade v10 device key renames: `npu` → `amd_npu`, `gpu` → `amd_igpu`/`amd_dgpu` (PR #548) - Agent UI rendering, Windows paths, JSON safety regex, RAG indexing guards (PR #566, #604, #605) - Restored accidentally reverted changes from PRs #564, #565, #568 (PR #608) ### Post-Merge After merging, tag and push: ```bash git checkout main && git pull git tag v0.17.0 && git push origin v0.17.0 ``` CI runs `validate-release` → `publish-release`. PyPI gated on Kalin approval. ## Test plan - [ ] `docs.json` is valid JSON and renders on Mintlify - [ ] `validate_release_notes.py` passes for v0.17.0 - [ ] `version.py` reads `0.17.0` - [ ] Release notes content matches actual PR changes
Summary
MCPTool.get()AttributeError in tools endpointChanges
System prompt optimization (
perf: reduce system prompt...)_get_system_prompt()from 733 → 280 lines using two-tier RAG gating:has_indexed): FACTUAL ACCURACY, DOCUMENT SILENCE, POST-INDEX QUERY, etc.<<<<<<< HEAD, hardcoded AVAILABLE TOOLS REFERENCE, both UNSUPPORTED FEATURES blocks, duplicate rulesMCP runtime status (
feat: MCP runtime status in Agent UI)MCPClientManager._failed: tracks connection errors fromload_from_config()MCPClientManager.get_status_report(): returns{name, connected, tool_count, error}for all serversGET /api/mcp/status: REST endpoint returning cached runtime statusmcp_statusevent emitted after each agent setup_failedtracking andget_status_report()Bug fixes
list_mcp_server_toolswas calling.get()onMCPTooldataclass (pre-existingAttributeError)remove_server()now clears_failedto prevent ghost entries in status reporthas_docs→has_indexedfor Tier 2 gating (library docs aren't indexed yet)Tests
MCPClientManager._failedtracking andget_status_report()Test plan
uv run python -m pytest tests/unit/mcp/ tests/test_chat_agent.py -x -q— 136 pass, 1 pre-existing network failuregaia chat --ui→ send "hi" with no documents → responds without timeout (was timing out with Qwen3.5-35B)GET /api/mcp/statusreturns{"servers": []}before any chat, populated afterCloses #609