Skip to content

feat(MCP): Expose Prometheus metrics#7705

Merged
khvn26 merged 13 commits into
mainfrom
feat/mcp-prometheus-metrics
Jun 4, 2026
Merged

feat(MCP): Expose Prometheus metrics#7705
khvn26 merged 13 commits into
mainfrom
feat/mcp-prometheus-metrics

Conversation

@khvn26
Copy link
Copy Markdown
Member

@khvn26 khvn26 commented Jun 4, 2026

Thanks for submitting a PR! Please check the boxes below:

  • I have read the Contributing Guide.
  • I have added information to docs/ if required so people know about the feature.
  • I have filled in the "Changes" section below.
  • I have filled in the "How did you test this code" section below.

Changes

Contributes to https://github.com/Flagsmith/flagsmith-private/issues/152.

Instrument the MCP server with Prometheus metrics, served by prometheus_client's standalone HTTP server on a dedicated port (METRICS_PORT, disabled by default) so the scrape endpoint stays off the MCP port and works under stdio transport too.

A FastMCP middleware records:

  • flagsmith_mcp_tool_call_duration_seconds{tool, status} — tool call latency, including the upstream Flagsmith API request.
  • flagsmith_mcp_tool_result_bytes{tool, content} — result payload size as a proxy for token cost. FastMCP ships the payload twice (text content block + structuredContent) and MCP clients differ in which they render into the agent's context — the content label tracks each type of content. See modelcontextprotocol#1624 for more info.
  • flagsmith_mcp_tool_catalogue_bytes — serialised tools/list payload size, the token cost every MCP session pays up front.

Also brings in flagsmith-common[test-tools] for the assert_metric fixture and pytest-mock as dev dependencies. Note: requires-python is bumped to >=3.11 to match flagsmith-common's floor (3.10 is EOL this October).

How did you test this code?

Unit and integration tests (100% coverage gate). Manually: ran the server with METRICS_PORT=9464, verified the exposition on :9464, /metrics absent from the MCP port, and watched the metrics live in a local Prometheus while exercising tool calls (success, upstream error) and tools/list against the real OpenAPI catalogue.

khvn26 added 13 commits June 3, 2026 16:54
Add a /metrics endpoint alongside /health, and a FastMCP middleware
recording per-tool histograms:

- flagsmith_mcp_tool_call_duration_seconds{tool, status}
- flagsmith_mcp_tool_result_bytes{tool} — a proxy for the token cost
  a tool call incurs on the calling agent's context

beep boop
Add flagsmith_mcp_tool_catalogue_bytes, the serialised tools/list
payload: a proxy for the token cost every MCP session pays before any
tool is called.

beep boop
The fixture resets the metrics registry per test, so before/after
sample deltas are no longer needed. Bump requires-python to match
flagsmith-common's 3.11 floor.

beep boop
FastMCP tool results carry the payload both as a text content block and
as structuredContent; serialising the whole result counted ~2.1x what a
client renders into the agent's context. Measure the already-serialised
text blocks instead, which also avoids marshalling the result a second
time.

beep boop
MCP clients differ in whether they render text blocks, structured
content, or both into the agent's context (see
modelcontextprotocol/modelcontextprotocol#1624), so no single number
represents a call's token cost. Label flagsmith_mcp_tool_result_bytes
with content={unstructured,structured,total}, observing all three per
call so counts stay aligned. Sizes are measured with compact JSON
encoding to match the wire.

beep boop
Sums and counts of the unstructured and structured series add up fine
at query time; only the per-call total distribution is lost, which no
dashboard uses.

beep boop
assert_metric clears the registry, so the flagsmith_mcp_* lines of
/metrics are deterministic; snapshot them wholesale. The default
process and GC collectors are not resettable and vary per run, so
they stay out of the snapshot.

beep boop
Replace the /metrics route on the MCP port with prometheus_client's
standalone metrics server, gated behind a new METRICS_PORT setting
(disabled by default). Metrics stay unexposed to MCP clients and
become available under stdio transport too.

beep boop
Replace hand-rolled fakes with autospecced mocks, and drop the metrics
exposition snapshot test — it exercised prometheus_client's own HTTP
server rather than our code.

beep boop
@khvn26 khvn26 requested a review from a team as a code owner June 4, 2026 09:16
@khvn26 khvn26 requested review from Zaimwa9 and removed request for a team June 4, 2026 09:16
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Actions Updated (UTC)
docs Ignored Ignored Jun 4, 2026 9:16am
flagsmith-frontend-preview Ignored Ignored Jun 4, 2026 9:16am
flagsmith-frontend-staging Ignored Ignored Jun 4, 2026 9:16am

Request Review

@github-actions github-actions Bot added the feature New feature or request label Jun 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Docker builds report

Image Build Status Security report
ghcr.io/flagsmith/flagsmith-e2e:pr-7705 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-api-test:pr-7705 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-api:pr-7705 Finished ✅ Results
ghcr.io/flagsmith/flagsmith:pr-7705 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-frontend:pr-7705 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-private-cloud:pr-7705 Finished ✅ Results

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Playwright Test Results (oss - depot-ubuntu-latest-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  40.7 seconds
commit  b67bcc1
info  🔄 Run: #17240 (attempt 1)

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  45.3 seconds
commit  b67bcc1
info  🔄 Run: #17240 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)

passed  2 passed

Details

stats  2 tests across 2 suites
duration  42.7 seconds
commit  b67bcc1
info  🔄 Run: #17240 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

passed  3 passed

Details

stats  3 tests across 3 suites
duration  31.7 seconds
commit  b67bcc1
info  🔄 Run: #17240 (attempt 1)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Visual Regression

19 screenshots compared. See report for details.
View full report

Copy link
Copy Markdown
Contributor

@emyller emyller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread mcp/tests/integration/test_metrics.py
Comment thread mcp/pyproject.toml
@khvn26 khvn26 merged commit 8ef95e0 into main Jun 4, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants