feat: add Grafana dashboard to prometheus metrics example#295
Conversation
📝 WalkthroughWalkthroughAdds a complete observability example: Prometheus + Grafana configs, docker-compose stack and Grafana provisioning/init, a prebuilt Grafana dashboard, README quick-start updates, metric renames, and example code adjusted to use PrometheusMetricReader/MeterProvider and async agent.join. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Stream Agent\n(:9464 /metrics)
participant Prom as Prometheus\n(Scraper)
participant Graf as Grafana\n(Visualizer)
participant Init as Grafana Init\n(Provisioner)
App->>Prom: Expose /metrics endpoint
loop every 5s
Prom->>App: GET /metrics
App-->>Prom: Metrics payload (Prometheus format)
Note over Prom: Store time-series
end
Init->>Graf: Poll /api/health
Graf-->>Init: 200 OK
Init->>Graf: POST /api/org/preferences (set home dashboard UID)
Graf-->>Init: 200 OK
Init->>Graf: POST /api/user/preferences (set admin home UID)
Graf-->>Init: 200 OK
Graf->>Prom: PromQL queries for dashboard panels
Prom-->>Graf: Time-series responses
Graf->>Graf: Render dashboard
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@examples/03_prometheus_metrics_example/docker-compose.yml`:
- Around line 25-32: Change the exposed port mapping from "3000:3000" to bind
only on localhost (e.g. "127.0.0.1:3000:3000") and update the Grafana env vars
to reduce anonymous privileges and re-enable login by setting
GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer and GF_AUTH_DISABLE_LOGIN_FORM=false (leave
GF_AUTH_ANONYMOUS_ENABLED as needed); this keeps the service inaccessible from
external hosts and prevents anonymous users from having Admin rights.
In
`@examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh`:
- Around line 18-21: The curl PUT to Grafana's /api/org/preferences is missing
basic auth and will likely 401; update the cURL command that targets
http://grafana:3000/api/org/preferences to include the same authentication as
the other call (e.g., add -u "admin:admin" or use the existing credential
variable) so the request is authorized when calling the Org preferences endpoint
in init-home-dashboard.sh.
In `@examples/03_prometheus_metrics_example/README.md`:
- Around line 59-67: The fenced code blocks in README.md that contain the ASCII
diagram (the three-box Stream Agent → Prometheus → Grafana diagram) and the
directory tree under "03_prometheus_metrics_example/" must include a language
identifier to satisfy MD040; change the opening triple backticks for those
blocks from ``` to ```text (both the diagram block and the subsequent
directory-tree block), and ensure any other similar fenced blocks in the file
(e.g., the ones referenced around lines 155-172) are updated the same way.
- Line 23: Replace bare URLs in the README (specifically occurrences of the
string "http://localhost:9464/metrics" found in the content) with proper
Markdown links using the [label](url) syntax (e.g., a descriptive label like
"metrics endpoint" linking to http://localhost:9464/metrics) so markdownlint
rule MD034 is satisfied; update all occurrences (the one shown and the other two
instances) to use this link syntax and ensure any other bare URLs in the file
are converted similarly.
🧹 Nitpick comments (5)
examples/03_prometheus_metrics_example/observability/prometheus/prometheus.yml (1)
9-16:host.docker.internalmay not resolve on Linux without additional configuration.This hostname works out-of-the-box on Docker Desktop (macOS/Windows), but on native Linux Docker, it requires adding
extra_hostsin the docker-compose service definition. Since this is a development example, consider adding a note in the README or docker-compose file for Linux users.examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh (1)
3-8: Consider adding a timeout to the health check loop.The current loop will wait indefinitely if Grafana fails to start. For a more robust script, consider adding a maximum retry count or timeout.
Optional: Add timeout
# Wait for Grafana to be ready echo "Waiting for Grafana to be ready..." +MAX_RETRIES=30 +RETRY_COUNT=0 -until curl -s http://grafana:3000/api/health > /dev/null 2>&1; do +until curl -s http://grafana:3000/api/health > /dev/null 2>&1 || [ $RETRY_COUNT -ge $MAX_RETRIES ]; do echo "Grafana not ready yet, waiting..." sleep 2 + RETRY_COUNT=$((RETRY_COUNT + 1)) done + +if [ $RETRY_COUNT -ge $MAX_RETRIES ]; then + echo "Grafana failed to start within timeout" + exit 1 +fiexamples/03_prometheus_metrics_example/observability/grafana/dashboards/stream-agents.json (1)
1-822: LGTM! Well-structured dashboard with consistent panel configurations.The dashboard is well-organized with appropriate PromQL queries using
histogram_quantilefor latency percentiles. The datasource UIDprometheuscorrectly references the provisioned datasource.One observation: the threshold value of 80ms is uniform across all latency panels (LLM, STT, TTS, Turn Detection). These services typically have different latency profiles—LLM responses are often slower than STT/TTS. Consider customizing thresholds per panel type for more meaningful alerting, though this is purely a nice-to-have refinement.
examples/03_prometheus_metrics_example/docker-compose.yml (2)
48-50: Ensure grafana-init waits for Grafana readiness.
depends_ononly waits for container start, not service readiness. The init script can race and fail. Wrap the command with a wait loop to poll the/api/healthendpoint:🛠️ Suggested implementation
- command: sh /init-home-dashboard.sh + command: sh -c 'until curl -sf http://grafana:3000/api/health; do sleep 2; done; sh /init-home-dashboard.sh'
4-4: Pin Docker image tags to specific versions instead oflatest.Using
latestmakes deployments non-reproducible and risks unexpected breaking changes. For example, prom/prometheus:latestcurrently points to v2.x while v3.x releases exist. Pin explicit version tags (e.g.,v2.53.5,v3.8.1) or use environment variables for version management.Affected lines: 4 (prometheus), 22 (grafana), 44 (curl)
Example using env-vars
- image: prom/prometheus:latest + image: prom/prometheus:${PROMETHEUS_VERSION} ... - image: grafana/grafana:latest + image: grafana/grafana:${GRAFANA_VERSION} ... - image: curlimages/curl:latest + image: curlimages/curl:${CURL_VERSION}
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
examples/03_prometheus_metrics_example/README.mdexamples/03_prometheus_metrics_example/docker-compose.ymlexamples/03_prometheus_metrics_example/observability/grafana/dashboards/stream-agents.jsonexamples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.shexamples/03_prometheus_metrics_example/observability/grafana/provisioning/dashboards/default.ymlexamples/03_prometheus_metrics_example/observability/grafana/provisioning/datasources/prometheus.ymlexamples/03_prometheus_metrics_example/observability/prometheus/prometheus.yml
🧰 Additional context used
🪛 Gitleaks (8.30.0)
examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh
[high] 26-27: Discovered a potential basic authorization token provided in a curl command, which could compromise the curl accessed resource.
(curl-auth-user)
🪛 markdownlint-cli2 (0.18.1)
examples/03_prometheus_metrics_example/README.md
23-23: Bare URL used
(MD034, no-bare-urls)
41-41: Bare URL used
(MD034, no-bare-urls)
59-59: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
155-155: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
194-194: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: unit / Ruff
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
- GitHub Check: unit / Mypy
- GitHub Check: unit / Mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
- GitHub Check: unit / Ruff
🔇 Additional comments (6)
examples/03_prometheus_metrics_example/observability/grafana/provisioning/datasources/prometheus.yml (1)
1-12: LGTM!The datasource configuration is well-structured. The UID
prometheuscorrectly matches the dashboard references, and thetimeInterval: 5saligns with the job-level scrape interval defined in the Prometheus configuration.examples/03_prometheus_metrics_example/observability/grafana/provisioning/dashboards/default.yml (1)
1-13: LGTM!The dashboard provisioning configuration is appropriate for a development example. Setting
allowUiUpdates: trueenables experimentation, andupdateIntervalSeconds: 10provides reasonably responsive dashboard reloading.examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh (1)
26-30: Hardcoded default credentials are acceptable for this local development example.The static analysis tool flagged the
admin:admincredentials. For a local observability stack example, using Grafana's default credentials is reasonable. If this were production code, you'd want to parameterize these via environment variables.examples/03_prometheus_metrics_example/docker-compose.yml (1)
55-61: LGTM for volumes and network isolation.Clean separation and persistence defaults look solid.
examples/03_prometheus_metrics_example/README.md (2)
3-11: Nice overview expansion for Grafana visualization.Clear, concise positioning of the Prometheus + Grafana flow.
103-125: Grafana dashboard section reads well.The PromQL examples and feature list are concise and actionable.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
examples/06_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh
Show resolved
Hide resolved
Add a docker-compose setup with Prometheus and Grafana for visualizing agent metrics. Includes a pre-built dashboard showing LLM, STT, TTS, and turn detection latencies with percentile breakdowns.
e8b055d to
03e64f2
Compare
The agent.join() method returns an async context manager, not an awaitable, so it should be used with `async with` instead of `with await`. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change label grouping from llm_class/stt_class/tts_class to provider (matching actual metric labels emitted by MetricsCollector) - Use increase()[15m] instead of rate()[5m] for STT percentiles to better handle sparse data - Add average latency calculation for STT that always displays data Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@examples/06_prometheus_metrics_example/prometheus_metrics_example.py`:
- Around line 26-40: Move the OpenTelemetry setup so
metrics.set_meter_provider(provider) runs before importing any vision_agents
modules: initialize PrometheusMetricReader(), create
MeterProvider(metric_readers=[reader]) and call
metrics.set_meter_provider(provider) prior to the import lines that bring in
Agent / AgentLauncher /Runner /User (the from vision_agents.core import ...
statement) so the module-level metric instruments in
vision_agents.core.observability.metrics bind to the configured provider instead
of the no-op default.
♻️ Duplicate comments (2)
examples/06_prometheus_metrics_example/README.md (2)
23-23: Replace bare URLs with Markdown links.
Lines 23, 41, and 212 use bare URLs and trip MD034. Please convert them to Markdown link syntax.✍️ Proposed fix
-Then open http://localhost:9464/metrics in your browser to see raw metrics as you talk to the agent. +Then open [http://localhost:9464/metrics](http://localhost:9464/metrics) in your browser to see raw metrics as you talk to the agent. -3. Open Grafana at http://localhost:3000 (no login required - anonymous access enabled) +3. Open Grafana at [http://localhost:3000](http://localhost:3000) (no login required - anonymous access enabled) -- Check Prometheus targets at http://localhost:9090/targets +- Check Prometheus targets at [http://localhost:9090/targets](http://localhost:9090/targets)Also applies to: 41-41, 212-212
60-68: Add language identifiers to fenced blocks (usetext).
Lines 60–68 and 171–188 are missing a language identifier, which trips MD040.✍️ Proposed fix
-``` +```text ┌─────────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Stream Agent │────▶│ Prometheus │────▶│ Grafana │ │ (port 9464) │ │ (port 9090) │ │ (port 3000) │ │ │ │ │ │ │ │ Metrics endpoint: │ │ Scrapes every │ │ Pre-built │ │ /metrics │ │ 5 seconds │ │ dashboards │ └─────────────────────┘ └─────────────────┘ └─────────────────┘ -``` +``` -``` +```text 06_prometheus_metrics_example/ ├── prometheus_metrics_example.py # Main example code ├── docker-compose.yml # Prometheus + Grafana stack ├── observability/ │ ├── prometheus/ │ │ └── prometheus.yml # Prometheus config │ └── grafana/ │ ├── dashboards/ │ │ └── stream-agents.json # Pre-built dashboard │ ├── provisioning/ │ │ ├── dashboards/ │ │ │ └── default.yml # Dashboard provisioning │ │ └── datasources/ │ │ └── prometheus.yml # Datasource config │ └── init-home-dashboard.sh # Sets home dashboard └── README.md -``` +```Also applies to: 171-188
🧹 Nitpick comments (1)
examples/06_prometheus_metrics_example/prometheus_metrics_example.py (1)
42-49: Guard against duplicate log handlers on repeated imports.
Line 49 adds a handler unconditionally; re-imports can double-log. A simplelogger.handlersguard prevents that.♻️ Proposed fix
handler = logging.StreamHandler(sys.stderr) handler.setFormatter( logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s") ) -logger.addHandler(handler) +if not logger.handlers: + logger.addHandler(handler)
Summary
Extracted from the old
more-metricsbranch to add the Grafana visualization setup to the existing prometheus metrics example.Test plan
docker compose up -din the example directorySummary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.