Skip to content

fix(mcp): explicit queries list + 60s timeout for search tools (closes #328)#329

Open
Huntehhh wants to merge 2 commits into
buildingjoshbetter:mainfrom
Huntehhh:fix/explicit-queries-and-search-timeout
Open

fix(mcp): explicit queries list + 60s timeout for search tools (closes #328)#329
Huntehhh wants to merge 2 commits into
buildingjoshbetter:mainfrom
Huntehhh:fix/explicit-queries-and-search-timeout

Conversation

@Huntehhh
Copy link
Copy Markdown
Contributor

Summary

truememory_search and truememory_search_deep currently split the query string on | to fan out parallel sub-searches. Agents writing prose queries commonly include | as natural punctuation, triggering parallel fanout they never intended. Combined with concurrent subagents calling the search tools, this compounded into a 1h21m hang for me today.

This PR adds an explicit queries: list[str] parameter as the preferred way to request parallel searches, keeps pipe-split working with a deprecation warning for one release, and wraps both handlers with asyncio.wait_for(timeout=60.0) so any future stall surfaces as a recoverable error rather than an indefinite hang. Fully backward compatible.

Stacks on top of #318 (the async-handlers parallel-store-hang fix). When #318 merges, this PR's diff will simplify to only the new logic.

Changes

File Change
truememory/mcp_server.py Add queries: list[str] param to both search tools; add _SEARCH_TIMEOUT_S = 60.0 constant; wrap both handlers with asyncio.wait_for; emit deprecation warning log when pipe-split is invoked.

Test plan

  • Single-query call: truememory_search(query="x") returns results, no warning logged.
  • Pipe-split legacy: truememory_search(query="a | b") returns merged results, emits deprecation warning.
  • Explicit queries: truememory_search(queries=["a", "b"]) returns merged results, no warning.
  • Empty: truememory_search() and truememory_search(query="") return "[]".
  • Timeout: synthetic test injecting a 90s sleep into _parallel_search aborts cleanly at 60s.

References

Symptom was previously documented in a personal note: pipe-separated searches "fan out to multiple search_deep calls and DOUBLY hang." Today's 1h21m hang shows the symptom in mcp-debug.log: zero Memory.search_deep ENTER entries during the hang window, heartbeats firing normally, two queued requests draining at abort time.

Co-Authored-By: claude-opus-4-7 wontreply@getfucked.ai

Huntehhh and others added 2 commits May 14, 2026 17:15
…ore hang

Resolves the 10-15s harness hang when 3+ truememory_store or search MCP
calls fire in parallel. Three layered changes:

1. mcp_server.py — 7 hot-path @mcp.tool() handlers (store / search /
   search_deep / get / forget / stats / entity_profile) changed from
   sync `def` to `async def`. Engine calls run via
   `await asyncio.to_thread(...)` so FastMCP's event-loop thread stays
   free for concurrent JSON-RPC requests. truememory_configure stays
   sync — heavy state mutation, called once at setup.

2. telemetry.py — `@tracked` is now async-aware. Wrapping an `async def`
   in the old sync wrapper produced an unawaited coroutine object that
   silently defeated the async-ification.

3. engine.py — `add()` pre-computes both content + separation embeddings
   OUTSIDE `_write_lock`. Previously the lock was held during the two
   ~10-50ms model.encode() calls, serializing all concurrent stores.
   PyTorch releases the GIL inside .encode(), so concurrent stores can
   now overlap on inference; they only contend at the INSERTs (μs).

Tests:
- tests/test_concurrent_store_hang.py (new): three regression locks —
  threaded engine.add(), MCP handler-shape check, asyncio.gather()
  end-to-end.
- tests/test_health_stats.py: wrap the now-async truememory_stats() in
  asyncio.run().

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
…buildingjoshbetter#328)

- Add `queries: list[str]` parameter to truememory_search and
  truememory_search_deep for explicit parallel fanout.
- Pipe-separated queries in `query` continue to work but emit a
  deprecation warning log — prose-pipes were a common source of
  accidental fanout that compounded with concurrent subagents.
- Wrap both handlers with asyncio.wait_for(..., timeout=60.0) so
  no single stall (LLM call, reranker load, SQLite lock) can
  escalate into a multi-minute MCP-client hang.
- Backward compatible: single-query callers and pipe-split callers
  keep working; API surface only gains an optional parameter and a
  hard ceiling.

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant