feat: add Grafana dashboard to prometheus metrics example by d3xvn · Pull Request #295 · GetStream/Vision-Agents

d3xvn · 2026-01-16T17:13:52Z

Summary

Add docker-compose setup with Prometheus and Grafana for visualizing agent metrics
Include pre-built Grafana dashboard showing LLM, STT, TTS, and turn detection latencies
Update README with setup instructions for the full observability stack

Extracted from the old more-metrics branch to add the Grafana visualization setup to the existing prometheus metrics example.

Test plan

Run docker compose up -d in the example directory
Start the prometheus_metrics_example agent
Open Grafana at http://localhost:3000
Verify the dashboard loads and shows metrics when making a call

Summary by CodeRabbit

New Features
- Full observability stack (Prometheus + Grafana) deployable via docker-compose, with auto-provisioned Stream Agents performance dashboard and init automation.
- Two Quick Start options: metrics-only endpoint or full observability deployment.
- Example runtime now exposes a metrics endpoint and logs metrics/startup info.
Documentation
- Expanded Quick Start, architecture/flow, project layout, and troubleshooting for Prometheus/Grafana.
- Renamed metrics with _milliseconds/_total suffixes and updated example PromQL and Grafana guidance.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-16T17:14:13Z

📝 Walkthrough

Walkthrough

Adds a complete observability example: Prometheus + Grafana configs, docker-compose stack and Grafana provisioning/init, a prebuilt Grafana dashboard, README quick-start updates, metric renames, and example code adjusted to use PrometheusMetricReader/MeterProvider and async agent.join.

Changes

Cohort / File(s)	Summary
README / Docs `examples/06_prometheus_metrics_example/README.md`	Reworked Quick Start with two options (metrics-only, full stack), architecture/flow updates, metric name suffixes (`_milliseconds`, `_total`), PromQL examples, Files and Troubleshooting sections.
Docker Compose / Infra `examples/06_prometheus_metrics_example/docker-compose.yml`	New compose stack: `prometheus`, `grafana`, and `grafana-init` services; volumes, observability network, startup ordering.
Grafana - Dashboards & Provisioning `examples/06_prometheus_metrics_example/observability/grafana/dashboards/stream-agents.json`, `.../provisioning/dashboards/default.yml`, `.../provisioning/datasources/prometheus.yml`	New prebuilt dashboard JSON (latency percentiles, durations, error rates), provisioning provider, and Prometheus datasource (uid `prometheus`, proxy to http://prometheus:9090).
Grafana Init Script `examples/06_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh`	Init script that polls Grafana health and sets org/admin user home dashboard via Grafana API.
Prometheus Config `examples/06_prometheus_metrics_example/observability/prometheus/prometheus.yml`	New Prometheus scrape config: scrape stream-agents at `host.docker.internal:9464` every 5s; self-scrape Prometheus.
Example Code `examples/06_prometheus_metrics_example/prometheus_metrics_example.py`	Moved to runtime PrometheusMetricReader + MeterProvider setup, dotenv logging, removed MetricsCollector attachment, started Prometheus HTTP server in main, and switched to `async with agent.join(call):`.

Sequence Diagram(s)

sequenceDiagram
    participant App as Stream Agent\n(:9464 /metrics)
    participant Prom as Prometheus\n(Scraper)
    participant Graf as Grafana\n(Visualizer)
    participant Init as Grafana Init\n(Provisioner)

    App->>Prom: Expose /metrics endpoint
    loop every 5s
        Prom->>App: GET /metrics
        App-->>Prom: Metrics payload (Prometheus format)
        Note over Prom: Store time-series
    end

    Init->>Graf: Poll /api/health
    Graf-->>Init: 200 OK
    Init->>Graf: POST /api/org/preferences (set home dashboard UID)
    Graf-->>Init: 200 OK
    Init->>Graf: POST /api/user/preferences (set admin home UID)
    Graf-->>Init: 200 OK

    Graf->>Prom: PromQL queries for dashboard panels
    Prom-->>Graf: Time-series responses
    Graf->>Graf: Render dashboard

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: added metrics and example #278: Related Prometheus/Grafana metrics integration and metric naming changes.

Suggested labels

assets

Suggested reviewers

Nash0x7E2

Poem

I count the seconds like small teeth of glass,
the dashboards bloom black as a closed mouth.
Metrics peel open: p50, p95, a bright / 99,
the agent keeps its ledger of the hush.
Prometheus takes the pulse; Grafana makes it look like weather.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly matches the primary change: adding Grafana dashboard visualization to the prometheus metrics example.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@examples/03_prometheus_metrics_example/docker-compose.yml`:
- Around line 25-32: Change the exposed port mapping from "3000:3000" to bind
only on localhost (e.g. "127.0.0.1:3000:3000") and update the Grafana env vars
to reduce anonymous privileges and re-enable login by setting
GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer and GF_AUTH_DISABLE_LOGIN_FORM=false (leave
GF_AUTH_ANONYMOUS_ENABLED as needed); this keeps the service inaccessible from
external hosts and prevents anonymous users from having Admin rights.

In
`@examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh`:
- Around line 18-21: The curl PUT to Grafana's /api/org/preferences is missing
basic auth and will likely 401; update the cURL command that targets
http://grafana:3000/api/org/preferences to include the same authentication as
the other call (e.g., add -u "admin:admin" or use the existing credential
variable) so the request is authorized when calling the Org preferences endpoint
in init-home-dashboard.sh.

In `@examples/03_prometheus_metrics_example/README.md`:
- Around line 59-67: The fenced code blocks in README.md that contain the ASCII
diagram (the three-box Stream Agent → Prometheus → Grafana diagram) and the
directory tree under "03_prometheus_metrics_example/" must include a language
identifier to satisfy MD040; change the opening triple backticks for those
blocks from ``` to ```text (both the diagram block and the subsequent
directory-tree block), and ensure any other similar fenced blocks in the file
(e.g., the ones referenced around lines 155-172) are updated the same way.
- Line 23: Replace bare URLs in the README (specifically occurrences of the
string "http://localhost:9464/metrics" found in the content) with proper
Markdown links using the [label](url) syntax (e.g., a descriptive label like
"metrics endpoint" linking to http://localhost:9464/metrics) so markdownlint
rule MD034 is satisfied; update all occurrences (the one shown and the other two
instances) to use this link syntax and ensure any other bare URLs in the file
are converted similarly.

🧹 Nitpick comments (5)

examples/03_prometheus_metrics_example/observability/prometheus/prometheus.yml (1)

9-16: host.docker.internal may not resolve on Linux without additional configuration.

This hostname works out-of-the-box on Docker Desktop (macOS/Windows), but on native Linux Docker, it requires adding extra_hosts in the docker-compose service definition. Since this is a development example, consider adding a note in the README or docker-compose file for Linux users.
examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh (1)
3-8: Consider adding a timeout to the health check loop.

The current loop will wait indefinitely if Grafana fails to start. For a more robust script, consider adding a maximum retry count or timeout.
Optional: Add timeout
 # Wait for Grafana to be ready
 echo "Waiting for Grafana to be ready..."
+MAX_RETRIES=30
+RETRY_COUNT=0
-until curl -s http://grafana:3000/api/health > /dev/null 2>&1; do
+until curl -s http://grafana:3000/api/health > /dev/null 2>&1 || [ $RETRY_COUNT -ge $MAX_RETRIES ]; do
     echo "Grafana not ready yet, waiting..."
     sleep 2
+    RETRY_COUNT=$((RETRY_COUNT + 1))
 done
+
+if [ $RETRY_COUNT -ge $MAX_RETRIES ]; then
+    echo "Grafana failed to start within timeout"
+    exit 1
+fi
examples/03_prometheus_metrics_example/observability/grafana/dashboards/stream-agents.json (1)

1-822: LGTM! Well-structured dashboard with consistent panel configurations.

The dashboard is well-organized with appropriate PromQL queries using histogram_quantile for latency percentiles. The datasource UID prometheus correctly references the provisioned datasource.

One observation: the threshold value of 80ms is uniform across all latency panels (LLM, STT, TTS, Turn Detection). These services typically have different latency profiles—LLM responses are often slower than STT/TTS. Consider customizing thresholds per panel type for more meaningful alerting, though this is purely a nice-to-have refinement.
examples/03_prometheus_metrics_example/docker-compose.yml (2)
48-50: Ensure grafana-init waits for Grafana readiness.

depends_on only waits for container start, not service readiness. The init script can race and fail. Wrap the command with a wait loop to poll the /api/health endpoint:
🛠️ Suggested implementation
-    command: sh /init-home-dashboard.sh
+    command: sh -c 'until curl -sf http://grafana:3000/api/health; do sleep 2; done; sh /init-home-dashboard.sh'
4-4: Pin Docker image tags to specific versions instead of latest.

Using latest makes deployments non-reproducible and risks unexpected breaking changes. For example, prom/prometheus :latest currently points to v2.x while v3.x releases exist. Pin explicit version tags (e.g., v2.53.5, v3.8.1) or use environment variables for version management.

Affected lines: 4 (prometheus), 22 (grafana), 44 (curl)
Example using env-vars
-    image: prom/prometheus:latest
+    image: prom/prometheus:${PROMETHEUS_VERSION}
...
-    image: grafana/grafana:latest
+    image: grafana/grafana:${GRAFANA_VERSION}
...
-    image: curlimages/curl:latest
+    image: curlimages/curl:${CURL_VERSION}

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b75e384 and e8b055d.

📒 Files selected for processing (7)

examples/03_prometheus_metrics_example/README.md
examples/03_prometheus_metrics_example/docker-compose.yml
examples/03_prometheus_metrics_example/observability/grafana/dashboards/stream-agents.json
examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh
examples/03_prometheus_metrics_example/observability/grafana/provisioning/dashboards/default.yml
examples/03_prometheus_metrics_example/observability/grafana/provisioning/datasources/prometheus.yml
examples/03_prometheus_metrics_example/observability/prometheus/prometheus.yml

🧰 Additional context used

🪛 Gitleaks (8.30.0)

examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh

[high] 26-27: Discovered a potential basic authorization token provided in a curl command, which could compromise the curl accessed resource.

(curl-auth-user)

🪛 markdownlint-cli2 (0.18.1)

examples/03_prometheus_metrics_example/README.md

23-23: Bare URL used

(MD034, no-bare-urls)

41-41: Bare URL used

(MD034, no-bare-urls)

59-59: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

155-155: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

194-194: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: unit / Ruff
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
GitHub Check: unit / Mypy
GitHub Check: unit / Mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
GitHub Check: unit / Ruff

🔇 Additional comments (6)

examples/03_prometheus_metrics_example/observability/grafana/provisioning/datasources/prometheus.yml (1)

1-12: LGTM!

The datasource configuration is well-structured. The UID prometheus correctly matches the dashboard references, and the timeInterval: 5s aligns with the job-level scrape interval defined in the Prometheus configuration.

examples/03_prometheus_metrics_example/observability/grafana/provisioning/dashboards/default.yml (1)

1-13: LGTM!

The dashboard provisioning configuration is appropriate for a development example. Setting allowUiUpdates: true enables experimentation, and updateIntervalSeconds: 10 provides reasonably responsive dashboard reloading.

examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh (1)

26-30: Hardcoded default credentials are acceptable for this local development example.

The static analysis tool flagged the admin:admin credentials. For a local observability stack example, using Grafana's default credentials is reasonable. If this were production code, you'd want to parameterize these via environment variables.

examples/03_prometheus_metrics_example/docker-compose.yml (1)

55-61: LGTM for volumes and network isolation.

Clean separation and persistence defaults look solid.

examples/03_prometheus_metrics_example/README.md (2)

3-11: Nice overview expansion for Grafana visualization.

Clear, concise positioning of the Prometheus + Grafana flow.

103-125: Grafana dashboard section reads well.

The PromQL examples and feature list are concise and actionable.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

examples/06_prometheus_metrics_example/docker-compose.yml

examples/06_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh

examples/06_prometheus_metrics_example/README.md

Add a docker-compose setup with Prometheus and Grafana for visualizing agent metrics. Includes a pre-built dashboard showing LLM, STT, TTS, and turn detection latencies with percentile breakdowns.

The agent.join() method returns an async context manager, not an awaitable, so it should be used with `async with` instead of `with await`. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Change label grouping from llm_class/stt_class/tts_class to provider (matching actual metric labels emitted by MetricsCollector) - Use increase()[15m] instead of rate()[5m] for STT percentiles to better handle sparse data - Add average latency calculation for STT that always displays data Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@examples/06_prometheus_metrics_example/prometheus_metrics_example.py`:
- Around line 26-40: Move the OpenTelemetry setup so
metrics.set_meter_provider(provider) runs before importing any vision_agents
modules: initialize PrometheusMetricReader(), create
MeterProvider(metric_readers=[reader]) and call
metrics.set_meter_provider(provider) prior to the import lines that bring in
Agent / AgentLauncher /Runner /User (the from vision_agents.core import ...
statement) so the module-level metric instruments in
vision_agents.core.observability.metrics bind to the configured provider instead
of the no-op default.

♻️ Duplicate comments (2)

examples/06_prometheus_metrics_example/README.md (2)

23-23: Replace bare URLs with Markdown links.
Lines 23, 41, and 212 use bare URLs and trip MD034. Please convert them to Markdown link syntax.

✍️ Proposed fix

-Then open http://localhost:9464/metrics in your browser to see raw metrics as you talk to the agent.
+Then open [http://localhost:9464/metrics](http://localhost:9464/metrics) in your browser to see raw metrics as you talk to the agent.

-3. Open Grafana at http://localhost:3000 (no login required - anonymous access enabled)
+3. Open Grafana at [http://localhost:3000](http://localhost:3000) (no login required - anonymous access enabled)

-- Check Prometheus targets at http://localhost:9090/targets
+- Check Prometheus targets at [http://localhost:9090/targets](http://localhost:9090/targets)

Also applies to: 41-41, 212-212

60-68: Add language identifiers to fenced blocks (use text).
Lines 60–68 and 171–188 are missing a language identifier, which trips MD040.

✍️ Proposed fix

-```
+```text
 ┌─────────────────────┐     ┌─────────────────┐     ┌─────────────────┐
 │   Stream Agent      │────▶│   Prometheus    │────▶│    Grafana      │
 │  (port 9464)        │     │   (port 9090)   │     │  (port 3000)    │
 │                     │     │                 │     │                 │
 │  Metrics endpoint:  │     │  Scrapes every  │     │  Pre-built      │
 │  /metrics           │     │  5 seconds      │     │  dashboards     │
 └─────────────────────┘     └─────────────────┘     └─────────────────┘
-```
+```

-```
+```text
 06_prometheus_metrics_example/
 ├── prometheus_metrics_example.py   # Main example code
 ├── docker-compose.yml              # Prometheus + Grafana stack
 ├── observability/
 │   ├── prometheus/
 │   │   └── prometheus.yml          # Prometheus config
 │   └── grafana/
 │       ├── dashboards/
 │       │   └── stream-agents.json  # Pre-built dashboard
 │       ├── provisioning/
 │       │   ├── dashboards/
 │       │   │   └── default.yml     # Dashboard provisioning
 │       │   └── datasources/
 │       │       └── prometheus.yml  # Datasource config
 │       └── init-home-dashboard.sh  # Sets home dashboard
 └── README.md
-```
+```

Also applies to: 171-188

🧹 Nitpick comments (1)

examples/06_prometheus_metrics_example/prometheus_metrics_example.py (1)
42-49: Guard against duplicate log handlers on repeated imports.
Line 49 adds a handler unconditionally; re-imports can double-log. A simple logger.handlers guard prevents that.
♻️ Proposed fix
 handler = logging.StreamHandler(sys.stderr)
 handler.setFormatter(
     logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
 )
-logger.addHandler(handler)
+if not logger.handlers:
+    logger.addHandler(handler)

examples/06_prometheus_metrics_example/prometheus_metrics_example.py

github-actions bot added examples config docs project-info labels Jan 16, 2026

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

feat: add Grafana dashboard to prometheus metrics example

03e64f2

Add a docker-compose setup with Prometheus and Grafana for visualizing agent metrics. Includes a pre-built dashboard showing LLM, STT, TTS, and turn detection latencies with percentile breakdowns.

d3xvn force-pushed the feat/add-grafana branch from e8b055d to 03e64f2 Compare January 23, 2026 16:25

d3xvn and others added 4 commits January 23, 2026 17:48

fix: use async with for agent.join() in prometheus example

8638c8b

The agent.join() method returns an async context manager, not an awaitable, so it should be used with `async with` instead of `with await`. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Cleanup Prometheus example

df1757f

Remove call id from README.md

b2a933a

dangusev approved these changes Jan 23, 2026

View reviewed changes

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

examples/06_prometheus_metrics_example/prometheus_metrics_example.py Show resolved Hide resolved

Fix docstring

4881c84

dangusev merged commit b3d336a into main Jan 26, 2026
10 checks passed

dangusev deleted the feat/add-grafana branch January 26, 2026 10:26

coderabbitai bot mentioned this pull request Jan 26, 2026

fix: prometheus metrics example documentation #311

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Grafana dashboard to prometheus metrics example#295

feat: add Grafana dashboard to prometheus metrics example#295
dangusev merged 6 commits intomainfrom
feat/add-grafana

d3xvn commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 16, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

d3xvn commented Jan 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d3xvn commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 16, 2026 •

edited

Loading