Skip to content

fix(DDE-365): security hardening — leak #1–#17#7

Open
nc-markovic wants to merge 21 commits into
mainfrom
fix/DDE-365-security-hardening
Open

fix(DDE-365): security hardening — leak #1–#17#7
nc-markovic wants to merge 21 commits into
mainfrom
fix/DDE-365-security-hardening

Conversation

@nc-markovic

Copy link
Copy Markdown
Collaborator

Summary

Security-hardening series covering 17 leaks identified during a code audit
(see plan/mcp-cert-auth-implementation.md for the full plan). Each leak is
a discrete commit on this branch so that git log --grep "leak #" and
git bisect work cleanly. Numbering follows commit-count convention on the
branch — leak #11/#12/#13 were folded into earlier fixes or retired
during the audit and intentionally skipped to keep numbering aligned with
the branch's commit count.

Leaks shipped

# Title Commit
1 sanitize URLs + reject embedded credentials in host a9a1a59
1b sanitize URLs in error messages f6f12db
2 redact CLI params before logging session init f586da6
3 strip Authorization on cross-origin redirect 18d4780
4 summarize AEM response bodies in error details 11e4964
5 deduplicate concurrent OAuth token mints (single-flight) fe6ad4e
6 use fresh AbortSignal on 401 retry 77427a6
7 skip 401 retry on permission errors 016776b
8 encode Basic auth credentials as Latin-1 ee30674
9 skip 401 retry for Basic auth 6c2c571
10 allow IMS endpoint override via AEM_IMS_URL a647b0c
14 reject expires_in <= 60 to prevent IMS storm c1dcc36
15 validate Origin header (allowlist) — prevent DNS rebinding 3216fc5
16 bind to loopback by default — prevent LAN exposure be4cb2e
17 graceful drain on SIGINT/SIGTERM + fatal-error fallbacks d5eab21

Two docs: commits codify project-level workflow rules in CLAUDE.md
(Testing workflow and Clarify before deciding).

Highlights

  • DNS-rebinding hardening (#15): cors({ origin: '*' }) replaced with a
    narrow allowlist (Inspector :6274/:6277 on localhost and 127.0.0.1).
    Extra origins via --allow-origin flag or MCP_ALLOWED_ORIGINS env.
    Disallowed Origin → HTTP 403 + JSON-RPC -32600 body.
  • LAN exposure (#16): app.listen(port) defaulted to 0.0.0.0
    anyone on the same WiFi could drive every tool. Now defaults to
    127.0.0.1. --bind 0.0.0.0 / MCP_BIND=0.0.0.0 for explicit opt-in.
  • Graceful drain (#17): SIGINT/SIGTERM now run a drain sequence with a
    60s default budget (--shutdown-drain-seconds / MCP_SHUTDOWN_DRAIN_SECONDS)
    so multi-minute bulk operations finish cleanly. Exit code is 0 on clean
    drain, 1 on forced timeout so orchestrators can alert on stuck handlers.
    Adds uncaughtException / unhandledRejection fallbacks (sync-only
    cleanup + exit 1).
  • OAuth resilience (Release merge #5, #14): Single-flight inflightToken dedupes
    concurrent 401-retry refreshes (5 parallel callers → 1 IMS call).
    expires_in <= 60 guard rejects unusable tokens that would force a
    per-request IMS round-trip.

Test plan

No test runner is configured — each leak ships with an ad-hoc Node harness
under src/test/ (gitignored) that exercises the affected code path. For
the four hardening leaks merged in this push:

  • npm run build — typecheck clean
  • node src/test/test-leak-14.mjs — 12 checks (expires_in guard, leak Release merge #5/feat(DDE-365): stdio transport — subprocess mode for Claude Desktop / Cursor / VS Code #9 regression)
  • node src/test/test-leak-15.mjs — 13 checks (Origin allowlist + preflight + CORS headers)
  • node src/test/test-leak-16.mjs — 10 checks (loopback default, --bind, env, CLI overrides env)
  • node src/test/test-leak-17.mjs — 19 checks (SIGINT/SIGTERM drain, deadline timeout, fatal handlers)
  • node src/test/test-leak-17-real-aem.mjs — 9 checks (drain against real AEM at localhost:4502)
  • Inspector UI smoke test against 127.0.0.1:8502/mcp (manual — recommended after merge)

Out of scope

Cert-auth (mTLS) is not in this PR — it lands as feat #1#8 on a
separate branch (feat/DDE-XXX-cert-auth-mtls) from a hardened main
after this merges. See plan/mcp-cert-auth-implementation.md §
"Cert-Auth Feature".

Zoran Markovic and others added 21 commits June 10, 2026 12:10
Guard the OAuth token mint path in AEMFetch.getAuthToken: when IMS returns
expires_in <= 60 (or NaN/undefined), tokenExpiry would be placed at-or-before
'now', forcing an IMS round-trip on every request. Single-flight dedups
concurrent callers within a tick but cannot prevent the per-call burn.
Throw a readable error instead of accepting the unusable token.
Replace cors({ origin: '*' }) with a narrow allowlist enforced ahead of the
cors middleware. Defaults cover the MCP Inspector (UI :6274, proxy :6277) on
both localhost and 127.0.0.1; extra origins via repeatable --allow-origin CLI
flag or comma-separated MCP_ALLOWED_ORIGINS env var. Disallowed Origin returns
HTTP 403 + JSON-RPC -32600 body. Requests with no Origin (CLI curl) pass
through unchanged. Access-Control-Expose-Headers: Mcp-Session-Id preserved.
Without an explicit bind argument, Express's `app.listen(port, …)` binds
0.0.0.0, exposing /mcp to anyone on the same LAN (cafe WiFi, office, hotel)
who can then drive every tool with whatever AEM credentials the server was
launched with. Default to 127.0.0.1; add a `--bind` CLI flag and `MCP_BIND`
env var for explicit all-interfaces opt-in. CLI flag overrides env. No auth
flow or MCP protocol surface changed.
Replace the unconditional process.exit(0) on SIGINT with a drain sequence:
server.close() to stop accepting new connections, closeIdleConnections() to
drop keep-alive sockets, transports[].close() for MCP session teardown, then
wait up to --shutdown-drain-seconds (env: MCP_SHUTDOWN_DRAIN_SECONDS, default
60s) for in-flight tool calls to finish. Worst-case bulk operations
(bulkUpdateComponents) can hold for minutes — a shorter window would still
abandon work. Wire the same handler to SIGTERM.

Exit code semantics: clean drain -> 0; forced timeout -> 1 so orchestrators
(Kubernetes, systemd) can distinguish "everything finished" from "had to
abandon pending work" and surface alerts on the latter. A "0" on timeout
would hide hostile keep-opens or stuck handlers from operators.

Add uncaughtException / unhandledRejection fallbacks: log to stderr with the
"[fatal]" prefix, sync-close transports (process state is undefined per Node
docs — no async drain), exit 1.
Two workflow rules added to make Claude's behaviour in this repo predictable
without needing to repeat the instructions every session:

- `Clarify before deciding` — before any non-trivial decision (credentials,
  destructive vs read-only, scope, approach against real environments), ask
  the user first with concrete options. Do not infer or default-to-
  destructive just because a plan lists a destructive path.

- `Testing workflow` — for items with a `How to Test` block, run every
  terminal-driven test yourself (ad-hoc Node harnesses under src/test/,
  curl against a locally-started server, port checks). Show results and only
  delegate to the user for tests that genuinely cannot run from a session
  (browser UIs, real cloud tenants, multi-machine signals) — with paste-
  ready commands when handing off. Typecheck is necessary but not sufficient.
…init

Replace the 400 Bad Request on unknown Mcp-Session-Id with 404 Not Found
per the MCP StreamableHTTP specification §6.3. Clients that send a stale
session ID (e.g. after a server restart) must receive 404 so that
inspection tooling like the MCP Inspector can detect the lost session and
trigger its automatic reinitialization loop. A 400 leaves the Inspector
stuck on a generic error with no recovery path, requiring a manual
reconnect.

Also tighten the JSON-RPC error code from -32000 (generic server error)
to -32001 (session not found) so programmatic clients can distinguish a
bad-session rejection from an internal server fault.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant