Skip to content

Add mcplex doctor / healthcheck subcommand for liveness probing #13

@doobidoo

Description

@doobidoo

Problem

When MCPlex is deployed as a long-running service (launchd, systemd, docker, etc.) and the process dies or is accidentally unloaded, downstream clients (Claude Code bridge, Claude Desktop, custom agents) silently see failed connections with no easy way to diagnose whether the issue is:

  • Gateway process not running
  • Port collision / wrong bind address
  • Config parse error
  • One or more backing MCP servers failing handshake

In my case, launchd had silently unloaded com.mcplex.gateway.local. The downstream Hermes agent just showed mcplex-gateway [http]: failed with no actionable info, and the stale error log (see separate issue) pointed at a TOML error that had been fixed days earlier.

Proposed

A mcplex doctor (or mcplex health, mcplex status) CLI subcommand that:

  1. Reads the config and reports listen address + configured servers
  2. Probes the configured listen address (e.g. http://127.0.0.1:3100/mcp) for an MCP initialize response
  3. If reachable: prints per-server handshake status (similar to the startup log summary), tool/resource counts, and uptime
  4. If unreachable: prints the probe error and a hint (is the gateway running? check launchd/systemd/docker status)
  5. Exits non-zero on any failure so it can be wired into external monitors (launchd WatchPaths, cron, CI smoke tests, hermes pre-start checks)

Example output:

$ mcplex doctor --config ~/.config/mcplex/macmini.toml
Gateway: http://127.0.0.1:3100  [OK, uptime 2h 14m]
Dashboard: http://127.0.0.1:9090  [OK]
Servers (7/7 connected):
  memory              [OK]  16 tools  3 resources  5 prompts
  context-provider    [OK]  10 tools
  github              [OK]  26 tools
  applescript         [OK]   1 tool
  desktop-commander   [OK]  26 tools  2 resources
  constrictor         [OK]  13 tools
  telegram            [OK]   9 tools
Router: Semantic (MetaTool, top_k=5)
Cache: enabled (TTL 300s)

And on failure:

$ mcplex doctor
Gateway: http://127.0.0.1:3100  [UNREACHABLE]
  → connection refused; no process listening on 3100
  → hint: launchctl list | grep mcplex  (or: systemctl status mcplex)
exit 1

Why it matters

  • Turns silent failures into actionable output
  • Enables external wrappers (agents, orchestrators, CI) to gate behavior on gateway health
  • Complements --check (which only validates config without running), giving a runtime-health counterpart

Happy to contribute a PR if this direction looks right.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions