feat(MCP): Expose Prometheus metrics#7705
Merged
Merged
Conversation
Add a /metrics endpoint alongside /health, and a FastMCP middleware
recording per-tool histograms:
- flagsmith_mcp_tool_call_duration_seconds{tool, status}
- flagsmith_mcp_tool_result_bytes{tool} — a proxy for the token cost
a tool call incurs on the calling agent's context
beep boop
Add flagsmith_mcp_tool_catalogue_bytes, the serialised tools/list payload: a proxy for the token cost every MCP session pays before any tool is called. beep boop
The fixture resets the metrics registry per test, so before/after sample deltas are no longer needed. Bump requires-python to match flagsmith-common's 3.11 floor. beep boop
FastMCP tool results carry the payload both as a text content block and as structuredContent; serialising the whole result counted ~2.1x what a client renders into the agent's context. Measure the already-serialised text blocks instead, which also avoids marshalling the result a second time. beep boop
MCP clients differ in whether they render text blocks, structured content, or both into the agent's context (see modelcontextprotocol/modelcontextprotocol#1624), so no single number represents a call's token cost. Label flagsmith_mcp_tool_result_bytes with content={unstructured,structured,total}, observing all three per call so counts stay aligned. Sizes are measured with compact JSON encoding to match the wire. beep boop
Sums and counts of the unstructured and structured series add up fine at query time; only the per-call total distribution is lost, which no dashboard uses. beep boop
assert_metric clears the registry, so the flagsmith_mcp_* lines of /metrics are deterministic; snapshot them wholesale. The default process and GC collectors are not resettable and vary per run, so they stay out of the snapshot. beep boop
Replace the /metrics route on the MCP port with prometheus_client's standalone metrics server, gated behind a new METRICS_PORT setting (disabled by default). Metrics stay unexposed to MCP clients and become available under stdio transport too. beep boop
Replace hand-rolled fakes with autospecced mocks, and drop the metrics exposition snapshot test — it exercised prometheus_client's own HTTP server rather than our code. beep boop
|
The latest updates on your projects. Learn more about Vercel for GitHub. 3 Skipped Deployments
|
Contributor
Docker builds report
|
Contributor
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
|
Contributor
Visual Regression19 screenshots compared. See report for details. |
This was referenced Jun 4, 2026
Merged
emyller
approved these changes
Jun 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thanks for submitting a PR! Please check the boxes below:
docs/if required so people know about the feature.Changes
Contributes to https://github.com/Flagsmith/flagsmith-private/issues/152.
Instrument the MCP server with Prometheus metrics, served by
prometheus_client's standalone HTTP server on a dedicated port (METRICS_PORT, disabled by default) so the scrape endpoint stays off the MCP port and works under stdio transport too.A FastMCP middleware records:
flagsmith_mcp_tool_call_duration_seconds{tool, status}— tool call latency, including the upstream Flagsmith API request.flagsmith_mcp_tool_result_bytes{tool, content}— result payload size as a proxy for token cost. FastMCP ships the payload twice (text content block +structuredContent) and MCP clients differ in which they render into the agent's context — thecontentlabel tracks each type of content. See modelcontextprotocol#1624 for more info.flagsmith_mcp_tool_catalogue_bytes— serialisedtools/listpayload size, the token cost every MCP session pays up front.Also brings in
flagsmith-common[test-tools]for theassert_metricfixture andpytest-mockas dev dependencies. Note:requires-pythonis bumped to>=3.11to match flagsmith-common's floor (3.10 is EOL this October).How did you test this code?
Unit and integration tests (100% coverage gate). Manually: ran the server with
METRICS_PORT=9464, verified the exposition on:9464,/metricsabsent from the MCP port, and watched the metrics live in a local Prometheus while exercising tool calls (success, upstream error) andtools/listagainst the real OpenAPI catalogue.