fix(mcp): logging hygiene + Popen file-handle cleanup#349
Open
Huntehhh wants to merge 1 commit into
Open
Conversation
Two tiny additive cleanups in mcp_server.py. Neither changes
happy-path behaviour; both close gaps that bite on the failure path.
_parallel_search (line ~520)
- Per-query exceptions in the ThreadPoolExecutor batch were swallowed
with bare `except Exception: pass`. Per-query timeouts are part of
normal load behaviour so WARNING would be noisy, but zero log call
meant operators had no trace when "search quality dropped." Now
`log.debug("parallel search query failed: %s", e)` — recoverable
with one log-level bump.
_drain_batch_from_backlog (line ~1172-1183)
- The per-session `.log` file was opened, handed to Popen as stdout,
then closed AFTER Popen. On any Popen failure (resource error, fd
exhaustion, ASR block, child-binary missing), the close never ran
and the FD leaked. Wrapped the Popen call in try/finally so close
always runs. A `with open(...)` form would close too early — Popen
needs the FD inheritable through the child.
Tests (tests/test_mcp_logging_hygiene.py — 2 new):
- _parallel_search logs failures at DEBUG and still returns successful
sibling query results.
- _drain_batch_from_backlog closes the log file even when Popen
raises (simulated OSError EMFILE).
Coordination notes (multi-agent sweep):
- Cross-cuts agent-A's mcp_server.py:1-1180 region (line 521) and
agent-B's mcp_server.py:~1181 region (line 1172-1183). Both
additive — no merge conflict with buildingjoshbetter#344 or PR-2b.
- Sibling Popen pattern at session_start.py:217-227 has the same
leak but lives in PR-2b's territory (start_new_session at line 226
will be added there). Left for agent-B to fold in or split as they
prefer.
Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
This was referenced May 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two tiny additive cleanups in
mcp_server.py. Neither changes happy-path behaviour; both close gaps that bite on the failure path.Branched off
origin/maindirectly — no PR dependencies, ships standalone.What changed
_parallel_search— line ~520Per-query exceptions in the
ThreadPoolExecutorbatch were swallowed with a bareexcept Exception: pass. Per-query timeouts and recoverable failures are part of normal load behaviour, so WARNING-level logging would be noisy. But with zero log call at all, an operator triaging "search quality dropped" had no breadcrumb that any query had failed.Replaced the bare
passwithlog.debug("parallel search query failed: %s", e). Keeps the signal recoverable with one log-level bump (TRUEMEMORY_LOG_LEVEL=DEBUGor equivalent) without raising baseline noise._drain_batch_from_backlog— line ~1172-1183The per-session
.logfile was opened, handed tosubprocess.Popenasstdout, and then closed AFTER Popen returned. If Popen raised (resource error, fd exhaustion, ASR block, child-binary missing), theclose()never ran and the FD leaked on every spawn failure.Wrapped the Popen call in
try/finallyso close always runs. Awith open(...)form would close the file too early — Popen needs the FD inheritable through the child. Thetry/finallyshape preserves that contract.Tests
tests/test_mcp_logging_hygiene.py— 2 new, both passing:test_parallel_search_logs_query_failures_at_debug— one query raises, sibling query succeeds; verify the DEBUG log fires with the failure detail AND the successful sibling's results still come through the batch.test_backlog_drainer_closes_log_file_when_popen_raises—subprocess.Popenpatched to raiseOSError(EMFILE); verify every opened.logfile is closed afterward.The second test injects a fake
_sharedmodule intosys.modulesto dodge the pre-existingimport fcntlWindows-only bug that agent-A's #345 fixes — keeps the test cross-platform without waiting on #345.Test plan
pytest tests/test_mcp_logging_hygiene.py -v— both pass on Linux / macOS / WindowsTRUEMEMORY_LOG_LEVEL=DEBUG(or stdliblogging.basicConfig(level=DEBUG)) and trigger a query failure — verify the DEBUG line appearslsof/Process Exploreragainst the MCP server process under a simulated Popen failure (e.g.ulimit -n 1on Linux) — verify no.logfile handle persists after the drainer iterationCoordination context (multi-agent sweep)
This PR cross-cuts two other agents' file ownership regions, both purely additively:
mcp_server.py:1-1180region (covered by fix(mcp): cold-start resilience — async handlers, reranker timeout, Windows portability #344's async-handler conversion). Adding alog.debugcall is additive — no behaviour change, no merge conflict with fix(mcp): cold-start resilience — async handlers, reranker timeout, Windows portability #344.start_new_sessionplatform branch at the same Popen call). Thetry/finallywrapper here is additive too — PR-2b'sstart_new_sessionchange goes inside thetryblock without conflict.A sibling Popen pattern at
truememory/ingest/hooks/session_start.py:217-227has the same leak. It lives in PR-2b's territory (line 226 is where thestart_new_sessionWindows branch goes), so I left it for agent-B to fold into PR-2b or split as they prefer.Coordination details are tracked in the multi-agent registry (private; ping
agent-Cfor context).Merge ordering
Order-independent. Additive cross-cut into agent-A's
mcp_server.py:1-1180region (line 521_parallel_searchper-query failure →log.debug) and agent-B'smcp_server.py:~1181region (lines 1172-1183 try/finally wrap around the backlog drainer Popen +_log_filecleanup). Pure additions — zero semantic conflict with #344 (async-handler conversion) or #351 (subprocess portability).Blocks: none.
Recommended sequence position: late in the train (after #344 / #351 land so the rebase, if any, is trivial). Can also ship anywhere — no hard constraint either direction.
Sibling pattern:
truememory/ingest/hooks/session_start.py:217-227has the same Popen+log_file leak. That file lives in #351's region (line 226 =start_new_sessionWindows branch), so left for agent-B to fold into #351 or split as preferred.