feat(MCP): Expose Prometheus metrics by khvn26 · Pull Request #7705 · Flagsmith/flagsmith

khvn26 · 2026-06-04T09:16:42Z

Thanks for submitting a PR! Please check the boxes below:

I have read the Contributing Guide.
I have added information to docs/ if required so people know about the feature.
I have filled in the "Changes" section below.
I have filled in the "How did you test this code" section below.

Changes

Contributes to https://github.com/Flagsmith/flagsmith-private/issues/152.

Instrument the MCP server with Prometheus metrics, served by prometheus_client's standalone HTTP server on a dedicated port (METRICS_PORT, disabled by default) so the scrape endpoint stays off the MCP port and works under stdio transport too.

A FastMCP middleware records:

flagsmith_mcp_tool_call_duration_seconds{tool, status} — tool call latency, including the upstream Flagsmith API request.
flagsmith_mcp_tool_result_bytes{tool, content} — result payload size as a proxy for token cost. FastMCP ships the payload twice (text content block + structuredContent) and MCP clients differ in which they render into the agent's context — the content label tracks each type of content. See modelcontextprotocol#1624 for more info.
flagsmith_mcp_tool_catalogue_bytes — serialised tools/list payload size, the token cost every MCP session pays up front.

Also brings in flagsmith-common[test-tools] for the assert_metric fixture and pytest-mock as dev dependencies. Note: requires-python is bumped to >=3.11 to match flagsmith-common's floor (3.10 is EOL this October).

How did you test this code?

Unit and integration tests (100% coverage gate). Manually: ran the server with METRICS_PORT=9464, verified the exposition on :9464, /metrics absent from the MCP port, and watched the metrics live in a local Prometheus while exercising tool calls (success, upstream error) and tools/list against the real OpenAPI catalogue.

Add a /metrics endpoint alongside /health, and a FastMCP middleware recording per-tool histograms: - flagsmith_mcp_tool_call_duration_seconds{tool, status} - flagsmith_mcp_tool_result_bytes{tool} — a proxy for the token cost a tool call incurs on the calling agent's context beep boop

Add flagsmith_mcp_tool_catalogue_bytes, the serialised tools/list payload: a proxy for the token cost every MCP session pays before any tool is called. beep boop

beep boop

The fixture resets the metrics registry per test, so before/after sample deltas are no longer needed. Bump requires-python to match flagsmith-common's 3.11 floor. beep boop

beep boop

FastMCP tool results carry the payload both as a text content block and as structuredContent; serialising the whole result counted ~2.1x what a client renders into the agent's context. Measure the already-serialised text blocks instead, which also avoids marshalling the result a second time. beep boop

MCP clients differ in whether they render text blocks, structured content, or both into the agent's context (see modelcontextprotocol/modelcontextprotocol#1624), so no single number represents a call's token cost. Label flagsmith_mcp_tool_result_bytes with content={unstructured,structured,total}, observing all three per call so counts stay aligned. Sizes are measured with compact JSON encoding to match the wire. beep boop

Sums and counts of the unstructured and structured series add up fine at query time; only the per-call total distribution is lost, which no dashboard uses. beep boop

beep boop

assert_metric clears the registry, so the flagsmith_mcp_* lines of /metrics are deterministic; snapshot them wholesale. The default process and GC collectors are not resettable and vary per run, so they stay out of the snapshot. beep boop

Replace the /metrics route on the MCP port with prometheus_client's standalone metrics server, gated behind a new METRICS_PORT setting (disabled by default). Metrics stay unexposed to MCP clients and become available under stdio transport too. beep boop

Replace hand-rolled fakes with autospecced mocks, and drop the metrics exposition snapshot test — it exercised prometheus_client's own HTTP server rather than our code. beep boop

vercel · 2026-06-04T09:16:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments

Project	Deployment	Updated (UTC)
docs	Ignored	Jun 4, 2026 9:16am
flagsmith-frontend-preview	Ignored	Jun 4, 2026 9:16am
flagsmith-frontend-staging	Ignored	Jun 4, 2026 9:16am

github-actions · 2026-06-04T09:17:58Z

Docker builds report

Image	Build Status	Security report
`ghcr.io/flagsmith/flagsmith-e2e:pr-7705`	Finished ✅	Skipped
`ghcr.io/flagsmith/flagsmith-api-test:pr-7705`	Finished ✅	Skipped
`ghcr.io/flagsmith/flagsmith-api:pr-7705`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith:pr-7705`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith-frontend:pr-7705`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith-private-cloud:pr-7705`	Finished ✅	Results ✅

github-actions · 2026-06-04T09:23:38Z

Playwright Test Results (oss - depot-ubuntu-latest-16)

1 passed

Details

1 test across 1 suite
40.7 seconds
b67bcc1
🔄 Run: #17240 (attempt 1)

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

1 passed

Details

1 test across 1 suite
45.3 seconds
b67bcc1
🔄 Run: #17240 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)

2 passed

Details

2 tests across 2 suites
42.7 seconds
b67bcc1
🔄 Run: #17240 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

3 passed

Details

3 tests across 3 suites
31.7 seconds
b67bcc1
🔄 Run: #17240 (attempt 1)

github-actions · 2026-06-04T09:25:30Z

Visual Regression

19 screenshots compared. See report for details.
View full report

emyller

LGTM

khvn26 added 13 commits June 3, 2026 16:54

feat(MCP): Measure the tool catalogue size

2909453

Add flagsmith_mcp_tool_catalogue_bytes, the serialised tools/list payload: a proxy for the token cost every MCP session pays before any tool is called. beep boop

refactor(MCP): Inline metric sample lookups in tests

5ff350b

beep boop

test(MCP): Use assert_metric fixture from flagsmith-common

a7abb07

The fixture resets the metrics registry per test, so before/after sample deltas are no longer needed. Bump requires-python to match flagsmith-common's 3.11 floor. beep boop

refactor(MCP): Name metric constants after the metrics they hold

2f00a68

beep boop

refactor(MCP): Drop the total content label from tool result sizes

dad0065

Sums and counts of the unstructured and structured series add up fine at query time; only the per-call total distribution is lost, which no dashboard uses. beep boop

docs(MCP): Tighten the tool result size metric description

c3d34bb

beep boop

test(MCP): Cover the metrics endpoint next to the other custom routes

a46aa41

beep boop

test(MCP): Snapshot the metrics exposition

3bdce4b

assert_metric clears the registry, so the flagsmith_mcp_* lines of /metrics are deterministic; snapshot them wholesale. The default process and GC collectors are not resettable and vary per run, so they stay out of the snapshot. beep boop

test(MCP): Mock run() collaborators with pytest-mock

b67bcc1

Replace hand-rolled fakes with autospecced mocks, and drop the metrics exposition snapshot test — it exercised prometheus_client's own HTTP server rather than our code. beep boop

khvn26 requested a review from a team as a code owner June 4, 2026 09:16

khvn26 requested review from Zaimwa9 and removed request for a team June 4, 2026 09:16

flagsmith-engineering Bot assigned Zaimwa9 Jun 4, 2026

github-actions Bot added the feature New feature or request label Jun 4, 2026

This was referenced Jun 4, 2026

feat: Django-free structlog to OTel integration via otel extra Flagsmith/flagsmith-common#229

Merged

feat(MCP): Set up logging and OpenTelemetry export #7706

Merged

khvn26 requested a review from emyller June 4, 2026 10:59

flagsmith-engineering Bot assigned emyller Jun 4, 2026

khvn26 unassigned emyller and Zaimwa9 Jun 4, 2026

emyller approved these changes Jun 4, 2026

View reviewed changes

Comment thread mcp/tests/integration/test_metrics.py

Comment thread mcp/pyproject.toml

khvn26 merged commit 8ef95e0 into main Jun 4, 2026
32 checks passed

flagsmithdev mentioned this pull request Jun 4, 2026

chore(main): release 2.239.0 #7675

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(MCP): Expose Prometheus metrics#7705

feat(MCP): Expose Prometheus metrics#7705
khvn26 merged 13 commits into
mainfrom
feat/mcp-prometheus-metrics

khvn26 commented Jun 4, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

emyller left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

khvn26 commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

How did you test this code?

Uh oh!

vercel Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Docker builds report

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Playwright Test Results (oss - depot-ubuntu-latest-16)

Details

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

Details

Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)

Details

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

Details

Uh oh!

github-actions Bot commented Jun 4, 2026

Visual Regression

Uh oh!

emyller left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

khvn26 commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading