Skip to content

feat(mcp): W2.4 — per-tool response shaping (redact, cap, sample)#30

Merged
jrosskopf merged 1 commit into
mainfrom
feature/gh-24-mcp-response-shaping
May 16, 2026
Merged

feat(mcp): W2.4 — per-tool response shaping (redact, cap, sample)#30
jrosskopf merged 1 commit into
mainfrom
feature/gh-24-mcp-response-shaping

Conversation

@jrosskopf
Copy link
Copy Markdown
Contributor

Summary

  • Adds opt-in mcp-tool.response config with three independent knobs:
    • redact-columns: [ssn, salary] replaces values in every row with "<redacted>".
    • max-rows: 1000 caps the result-set length.
    • sample: true collapses the response to { row_count, columns, sampled: true }, no rows.
  • Closes W2.4 of the security roadmap (Security Wave 2: MCP hardening (per-tool RBAC, dry-run, response shaping, description hygiene) #24). With this, the vendor email's full "response inspection" pitch is covered in-product.

Test plan

  • 10 Catch2 unit cases in test/cpp/mcp_response_shaper_test.cpp exercise every combination — no-op, max-rows alone, redact alone, both composed, sample wins, non-array passthrough, empty array, zero cap, missing redact column.
  • 3 additional parser cases prove the default tool config is inert, the full response block round-trips, and a sample-only block parses cleanly.
  • ctest -R "MCPResponseShaper|response_shape|Parse MCP Tool" — 13/13 pass.
  • 3 end-to-end Python cases in test/integration/test_mcp_response_shaping.py boot a real flapi server with three shaped tools and verify redact, max-rows, and sample independently against a deterministic in-memory VALUES result. They skip cleanly in environments with the existing DuckDB v1.5.1/v1.5.2 extension-cache mismatch; CI exercises them against fresh extensions.
  • Reviewer: confirm the redaction sentinel string "<redacted>" is acceptable, or pick a different default.

Design notes

  • MCPResponseShaper is a pure single-responsibility transformer: (json_string, ResponseShape) → json_string. No QueryResult, no ConfigManager, no handler internals — easiest possible test surface.
  • ResponseShape::isNoOp() short-circuits the shape call when nothing is configured, so the path is free for tools without the block.
  • sample: true deliberately wins over the other knobs — sample mode never emits row data, so combining it with redact_columns or max_rows is structurally inert. The test pins this contract.
  • Non-array payloads (e.g., the dry-run JSON from feat(mcp): W2.2 — shadow / dry-run mode for tool calls #29) pass through unchanged; we don't impose row-shape semantics on objects we don't understand.

Closes part of #24
Refs #21

Adds opt-in response shaping for MCP tool results. Three independent
knobs under `mcp-tool.response`:

  mcp-tool:
    name: customer_lookup
    response:
      max-rows: 1000          # cap result-set length
      redact-columns: [ssn]   # mask listed columns with "<redacted>"
      sample: true            # return only row_count + column names

Defaults are all inert, so existing tools see no behaviour change. The
shaper runs in MCPToolHandler::executeTool on the read path, after
formatResult and before the JSON envelope leaves the server. Write
tools are unaffected (response shape applies to SELECT results).

Implementation:

- New MCPResponseShaper class — pure transformer with one
  responsibility: take a JSON string and a ResponseShape config,
  return a shaped JSON string. No QueryResult / ConfigManager / handler
  internals on the dependency graph; trivially unit-tested.
- MCPToolInfo gains a nested `ResponseShape` struct with `max_rows`
  (optional), `redact_columns`, `sample`. `isNoOp()` short-circuits
  the shape call when nothing is configured.
- endpoint_config_parser parses `mcp-tool.response.{max-rows,
  redact-columns, sample}` from YAML. Each field is independent.
- MCPToolHandler builds the shape config from MCPToolInfo and runs
  the shaper; emits `response_shaped: true` in tool metadata when
  shaping fires.

Semantics:

- `sample: true` wins over the other knobs and emits a summary object
  with `row_count`, `columns: [...]`, `sampled: true` — no row data.
- Otherwise `redact_columns` is applied first (replaces the listed
  values in every row with the literal string `"<redacted>"`),
  then `max_rows` truncates the resulting array.
- Non-array payloads (e.g., dry-run results) pass through unchanged.
- `max_rows: 0` is a legitimate "suppress everything" choice.
- Missing redact columns are tolerated as a no-op.

Tests:

- test/cpp/mcp_response_shaper_test.cpp: 10 Catch2 cases covering
  every combination — no-op, max-rows alone, redact alone, both
  composed, sample wins, non-array passthrough, empty array, zero
  cap, missing redact column.
- test/cpp/endpoint_config_parser_test.cpp: three additional cases
  proving the default tool config is inert, the full response block
  round-trips through the parser, and a sample-only block parses
  cleanly.
- test/integration/test_mcp_response_shaping.py: three end-to-end
  cases that boot a real flapi server with three role-shaped tools
  and exercise redact, max-rows, and sample independently against
  a deterministic in-memory result set. Skips cleanly on environments
  with the v1.5.1/v1.5.2 DuckDB extension-cache mismatch; CI runs
  against fresh extensions.

Skipped pre-commit hook per the existing precedent in commit e1b465e —
the bd-shim calls 'bd hook pre-commit' (singular) which is missing
from the installed bd binary (only 'bd hooks' plural exists).
@jrosskopf jrosskopf force-pushed the feature/gh-24-mcp-response-shaping branch from 97756cc to 8104358 Compare May 16, 2026 17:41
@jrosskopf jrosskopf merged commit 9c9cd55 into main May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant