Add HTTP timing metrics and update Grafana dashboard#2586
Merged
Conversation
Introduce Prometheus metrics for HTTP timing (histograms and last-value gauges) and instrument the httpx provider to observe/set these metrics from request/response data and timing headers. Update Grafana dashboard to include stat and timeseries panels for request duration, upstream TTFB/total, proxy total, percentiles, and request rates. Adjust tests to construct mock request URLs using httpx.URL for compatibility with the provider instrumentation.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds new Prometheus metrics to capture client-side HTTP request duration plus upstream/proxy timing breakdowns, instruments the HTTPX provider to emit those metrics per response, and extends the Grafana dashboard to visualize the new signals.
Changes:
- Introduces new Prometheus Histograms/Gauges for HTTP request duration, upstream total/TTFB, and proxy total timings.
- Instruments
providers.httpxresponse hooks to record the new metrics based on elapsed time and timing headers. - Updates the Grafana dashboard with new stat and timeseries panels; adjusts HTTPX provider tests to use
httpx.URL.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
src/prometheus/__init__.py |
Adds new HTTP timing Histograms/Gauges exported via Prometheus. |
src/providers/httpx.py |
Records new HTTP timing metrics from request/response context and headers. |
tests/providers/test_httpx_provider.py |
Updates mocks to use httpx.URL objects for request URLs. |
grafana/dashboards/om1-dashboard.json |
Adds panels for “last” timings, percentiles, and request rates. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Replace the Prometheus metrics' 'url' label with 'host' (labels now ['host', 'method', 'status_code']) and update all metric invocations accordingly. In the httpx provider, extract host from response.request.url.host and rename usages, change start_time default to None, and add a guard that logs a warning and skips metric recording when start_time is missing to avoid spurious metrics.
Prune several HTTP timing and stat panels from grafana/dashboards/om1-dashboard.json to simplify the dashboard and remove redundant metrics. Removed panels include: "HTTP Request Duration (Last)", "HTTP Upstream TTFB (Last)", "HTTP Upstream Total (Last)", "HTTP Proxy Total (Last)", "HTTP Request Duration Percentiles", "HTTP Proxy & Upstream Timing Breakdown (p95)", and "HTTP Request Rate by Method & Status Code". This reduces clutter and focuses the dashboard on the remaining LLM request metrics.
Update tests for httpx event hooks to reflect expected behavior when a request has no start_time: add a start_time to the mock_request in the relevant test, rename the test to indicate it should warn-and-skip when start_time is absent, and change assertions to patch logging.warning (ensuring logging.info is not called) and verify the warning message contains "No start_time recorded".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces comprehensive HTTP client-side and upstream/proxy timing metrics, exposing them via Prometheus and visualizing them in Grafana. The changes include new Prometheus histograms and gauges for various HTTP timing metrics, instrumentation in the HTTPX provider to record these metrics, updates to the Grafana dashboard to display them, and test adjustments to ensure compatibility with the new metrics.
Metrics instrumentation and export:
src/prometheus/__init__.pyto track HTTP request duration, upstream/proxy timings (total and TTFB), and their most recent values, all labeled by method, status code, and URL.src/providers/httpx.py) to observe and set these metrics on every response, extracting timing information from response headers and request metadata. [1] [2]Dashboard and observability:
grafana/dashboards/om1-dashboard.jsonto add new panels visualizing the metrics: last HTTP request duration, upstream/proxy timings, request duration percentiles, and request rates by method and status code.Testing adjustments:
tests/providers/test_httpx_provider.pyto usehttpx.URLobjects for mock request URLs, ensuring compatibility with the updated metric labeling logic. [1] [2] [3] [4]