fix(mcp): explicit queries list + 60s timeout for search tools (closes #328)#329
Open
Huntehhh wants to merge 2 commits into
Open
fix(mcp): explicit queries list + 60s timeout for search tools (closes #328)#329Huntehhh wants to merge 2 commits into
Huntehhh wants to merge 2 commits into
Conversation
…ore hang Resolves the 10-15s harness hang when 3+ truememory_store or search MCP calls fire in parallel. Three layered changes: 1. mcp_server.py — 7 hot-path @mcp.tool() handlers (store / search / search_deep / get / forget / stats / entity_profile) changed from sync `def` to `async def`. Engine calls run via `await asyncio.to_thread(...)` so FastMCP's event-loop thread stays free for concurrent JSON-RPC requests. truememory_configure stays sync — heavy state mutation, called once at setup. 2. telemetry.py — `@tracked` is now async-aware. Wrapping an `async def` in the old sync wrapper produced an unawaited coroutine object that silently defeated the async-ification. 3. engine.py — `add()` pre-computes both content + separation embeddings OUTSIDE `_write_lock`. Previously the lock was held during the two ~10-50ms model.encode() calls, serializing all concurrent stores. PyTorch releases the GIL inside .encode(), so concurrent stores can now overlap on inference; they only contend at the INSERTs (μs). Tests: - tests/test_concurrent_store_hang.py (new): three regression locks — threaded engine.add(), MCP handler-shape check, asyncio.gather() end-to-end. - tests/test_health_stats.py: wrap the now-async truememory_stats() in asyncio.run(). Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
…buildingjoshbetter#328) - Add `queries: list[str]` parameter to truememory_search and truememory_search_deep for explicit parallel fanout. - Pipe-separated queries in `query` continue to work but emit a deprecation warning log — prose-pipes were a common source of accidental fanout that compounded with concurrent subagents. - Wrap both handlers with asyncio.wait_for(..., timeout=60.0) so no single stall (LLM call, reranker load, SQLite lock) can escalate into a multi-minute MCP-client hang. - Backward compatible: single-query callers and pipe-split callers keep working; API surface only gains an optional parameter and a hard ceiling. Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
truememory_searchandtruememory_search_deepcurrently split thequerystring on|to fan out parallel sub-searches. Agents writing prose queries commonly include|as natural punctuation, triggering parallel fanout they never intended. Combined with concurrent subagents calling the search tools, this compounded into a 1h21m hang for me today.This PR adds an explicit
queries: list[str]parameter as the preferred way to request parallel searches, keeps pipe-split working with a deprecation warning for one release, and wraps both handlers withasyncio.wait_for(timeout=60.0)so any future stall surfaces as a recoverable error rather than an indefinite hang. Fully backward compatible.Stacks on top of #318 (the async-handlers parallel-store-hang fix). When #318 merges, this PR's diff will simplify to only the new logic.
Changes
truememory/mcp_server.pyqueries: list[str]param to both search tools; add_SEARCH_TIMEOUT_S = 60.0constant; wrap both handlers withasyncio.wait_for; emit deprecation warning log when pipe-split is invoked.Test plan
truememory_search(query="x")returns results, no warning logged.truememory_search(query="a | b")returns merged results, emits deprecation warning.truememory_search(queries=["a", "b"])returns merged results, no warning.truememory_search()andtruememory_search(query="")return"[]"._parallel_searchaborts cleanly at 60s.References
Symptom was previously documented in a personal note: pipe-separated searches "fan out to multiple search_deep calls and DOUBLY hang." Today's 1h21m hang shows the symptom in
mcp-debug.log: zeroMemory.search_deep ENTERentries during the hang window, heartbeats firing normally, two queued requests draining at abort time.Co-Authored-By: claude-opus-4-7 wontreply@getfucked.ai