Skip to content

fix(mcp): logging hygiene + Popen file-handle cleanup#349

Open
Huntehhh wants to merge 1 commit into
buildingjoshbetter:mainfrom
Huntehhh:fix/mcp-logging-popen-cleanup
Open

fix(mcp): logging hygiene + Popen file-handle cleanup#349
Huntehhh wants to merge 1 commit into
buildingjoshbetter:mainfrom
Huntehhh:fix/mcp-logging-popen-cleanup

Conversation

@Huntehhh
Copy link
Copy Markdown
Contributor

@Huntehhh Huntehhh commented May 17, 2026

Summary

Two tiny additive cleanups in mcp_server.py. Neither changes happy-path behaviour; both close gaps that bite on the failure path.

Branched off origin/main directly — no PR dependencies, ships standalone.

What changed

_parallel_search — line ~520

Per-query exceptions in the ThreadPoolExecutor batch were swallowed with a bare except Exception: pass. Per-query timeouts and recoverable failures are part of normal load behaviour, so WARNING-level logging would be noisy. But with zero log call at all, an operator triaging "search quality dropped" had no breadcrumb that any query had failed.

Replaced the bare pass with log.debug("parallel search query failed: %s", e). Keeps the signal recoverable with one log-level bump (TRUEMEMORY_LOG_LEVEL=DEBUG or equivalent) without raising baseline noise.

_drain_batch_from_backlog — line ~1172-1183

The per-session .log file was opened, handed to subprocess.Popen as stdout, and then closed AFTER Popen returned. If Popen raised (resource error, fd exhaustion, ASR block, child-binary missing), the close() never ran and the FD leaked on every spawn failure.

Wrapped the Popen call in try / finally so close always runs. A with open(...) form would close the file too early — Popen needs the FD inheritable through the child. The try / finally shape preserves that contract.

Tests

tests/test_mcp_logging_hygiene.py — 2 new, both passing:

  • test_parallel_search_logs_query_failures_at_debug — one query raises, sibling query succeeds; verify the DEBUG log fires with the failure detail AND the successful sibling's results still come through the batch.
  • test_backlog_drainer_closes_log_file_when_popen_raisessubprocess.Popen patched to raise OSError(EMFILE); verify every opened .log file is closed afterward.

The second test injects a fake _shared module into sys.modules to dodge the pre-existing import fcntl Windows-only bug that agent-A's #345 fixes — keeps the test cross-platform without waiting on #345.

Test plan

  • pytest tests/test_mcp_logging_hygiene.py -v — both pass on Linux / macOS / Windows
  • Run with TRUEMEMORY_LOG_LEVEL=DEBUG (or stdlib logging.basicConfig(level=DEBUG)) and trigger a query failure — verify the DEBUG line appears
  • lsof / Process Explorer against the MCP server process under a simulated Popen failure (e.g. ulimit -n 1 on Linux) — verify no .log file handle persists after the drainer iteration

Coordination context (multi-agent sweep)

This PR cross-cuts two other agents' file ownership regions, both purely additively:

A sibling Popen pattern at truememory/ingest/hooks/session_start.py:217-227 has the same leak. It lives in PR-2b's territory (line 226 is where the start_new_session Windows branch goes), so I left it for agent-B to fold into PR-2b or split as they prefer.

Coordination details are tracked in the multi-agent registry (private; ping agent-C for context).

Merge ordering

Order-independent. Additive cross-cut into agent-A's mcp_server.py:1-1180 region (line 521 _parallel_search per-query failure → log.debug) and agent-B's mcp_server.py:~1181 region (lines 1172-1183 try/finally wrap around the backlog drainer Popen + _log_file cleanup). Pure additions — zero semantic conflict with #344 (async-handler conversion) or #351 (subprocess portability).

Blocks: none.

Recommended sequence position: late in the train (after #344 / #351 land so the rebase, if any, is trivial). Can also ship anywhere — no hard constraint either direction.

Sibling pattern: truememory/ingest/hooks/session_start.py:217-227 has the same Popen+log_file leak. That file lives in #351's region (line 226 = start_new_session Windows branch), so left for agent-B to fold into #351 or split as preferred.

Two tiny additive cleanups in mcp_server.py. Neither changes
happy-path behaviour; both close gaps that bite on the failure path.

_parallel_search (line ~520)
- Per-query exceptions in the ThreadPoolExecutor batch were swallowed
  with bare `except Exception: pass`. Per-query timeouts are part of
  normal load behaviour so WARNING would be noisy, but zero log call
  meant operators had no trace when "search quality dropped." Now
  `log.debug("parallel search query failed: %s", e)` — recoverable
  with one log-level bump.

_drain_batch_from_backlog (line ~1172-1183)
- The per-session `.log` file was opened, handed to Popen as stdout,
  then closed AFTER Popen. On any Popen failure (resource error, fd
  exhaustion, ASR block, child-binary missing), the close never ran
  and the FD leaked. Wrapped the Popen call in try/finally so close
  always runs. A `with open(...)` form would close too early — Popen
  needs the FD inheritable through the child.

Tests (tests/test_mcp_logging_hygiene.py — 2 new):
- _parallel_search logs failures at DEBUG and still returns successful
  sibling query results.
- _drain_batch_from_backlog closes the log file even when Popen
  raises (simulated OSError EMFILE).

Coordination notes (multi-agent sweep):
- Cross-cuts agent-A's mcp_server.py:1-1180 region (line 521) and
  agent-B's mcp_server.py:~1181 region (line 1172-1183). Both
  additive — no merge conflict with buildingjoshbetter#344 or PR-2b.
- Sibling Popen pattern at session_start.py:217-227 has the same
  leak but lives in PR-2b's territory (start_new_session at line 226
  will be added there). Left for agent-B to fold in or split as they
  prefer.

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant