Skip to content

feat: add Grafana dashboard to prometheus metrics example#295

Merged
dangusev merged 6 commits intomainfrom
feat/add-grafana
Jan 26, 2026
Merged

feat: add Grafana dashboard to prometheus metrics example#295
dangusev merged 6 commits intomainfrom
feat/add-grafana

Conversation

@d3xvn
Copy link
Contributor

@d3xvn d3xvn commented Jan 16, 2026

Summary

  • Add docker-compose setup with Prometheus and Grafana for visualizing agent metrics
  • Include pre-built Grafana dashboard showing LLM, STT, TTS, and turn detection latencies
  • Update README with setup instructions for the full observability stack

Extracted from the old more-metrics branch to add the Grafana visualization setup to the existing prometheus metrics example.

Test plan

  • Run docker compose up -d in the example directory
  • Start the prometheus_metrics_example agent
  • Open Grafana at http://localhost:3000
  • Verify the dashboard loads and shows metrics when making a call

Summary by CodeRabbit

  • New Features

    • Full observability stack (Prometheus + Grafana) deployable via docker-compose, with auto-provisioned Stream Agents performance dashboard and init automation.
    • Two Quick Start options: metrics-only endpoint or full observability deployment.
    • Example runtime now exposes a metrics endpoint and logs metrics/startup info.
  • Documentation

    • Expanded Quick Start, architecture/flow, project layout, and troubleshooting for Prometheus/Grafana.
    • Renamed metrics with _milliseconds/_total suffixes and updated example PromQL and Grafana guidance.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

Adds a complete observability example: Prometheus + Grafana configs, docker-compose stack and Grafana provisioning/init, a prebuilt Grafana dashboard, README quick-start updates, metric renames, and example code adjusted to use PrometheusMetricReader/MeterProvider and async agent.join.

Changes

Cohort / File(s) Summary
README / Docs
examples/06_prometheus_metrics_example/README.md
Reworked Quick Start with two options (metrics-only, full stack), architecture/flow updates, metric name suffixes (_milliseconds, _total), PromQL examples, Files and Troubleshooting sections.
Docker Compose / Infra
examples/06_prometheus_metrics_example/docker-compose.yml
New compose stack: prometheus, grafana, and grafana-init services; volumes, observability network, startup ordering.
Grafana - Dashboards & Provisioning
examples/06_prometheus_metrics_example/observability/grafana/dashboards/stream-agents.json, .../provisioning/dashboards/default.yml, .../provisioning/datasources/prometheus.yml
New prebuilt dashboard JSON (latency percentiles, durations, error rates), provisioning provider, and Prometheus datasource (uid prometheus, proxy to http://prometheus:9090).
Grafana Init Script
examples/06_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh
Init script that polls Grafana health and sets org/admin user home dashboard via Grafana API.
Prometheus Config
examples/06_prometheus_metrics_example/observability/prometheus/prometheus.yml
New Prometheus scrape config: scrape stream-agents at host.docker.internal:9464 every 5s; self-scrape Prometheus.
Example Code
examples/06_prometheus_metrics_example/prometheus_metrics_example.py
Moved to runtime PrometheusMetricReader + MeterProvider setup, dotenv logging, removed MetricsCollector attachment, started Prometheus HTTP server in main, and switched to async with agent.join(call):.

Sequence Diagram(s)

sequenceDiagram
    participant App as Stream Agent\n(:9464 /metrics)
    participant Prom as Prometheus\n(Scraper)
    participant Graf as Grafana\n(Visualizer)
    participant Init as Grafana Init\n(Provisioner)

    App->>Prom: Expose /metrics endpoint
    loop every 5s
        Prom->>App: GET /metrics
        App-->>Prom: Metrics payload (Prometheus format)
        Note over Prom: Store time-series
    end

    Init->>Graf: Poll /api/health
    Graf-->>Init: 200 OK
    Init->>Graf: POST /api/org/preferences (set home dashboard UID)
    Graf-->>Init: 200 OK
    Init->>Graf: POST /api/user/preferences (set admin home UID)
    Graf-->>Init: 200 OK

    Graf->>Prom: PromQL queries for dashboard panels
    Prom-->>Graf: Time-series responses
    Graf->>Graf: Render dashboard
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

assets

Suggested reviewers

  • Nash0x7E2

Poem

I count the seconds like small teeth of glass,
the dashboards bloom black as a closed mouth.
Metrics peel open: p50, p95, a bright / 99,
the agent keeps its ledger of the hush.
Prometheus takes the pulse; Grafana makes it look like weather.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly matches the primary change: adding Grafana dashboard visualization to the prometheus metrics example.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@examples/03_prometheus_metrics_example/docker-compose.yml`:
- Around line 25-32: Change the exposed port mapping from "3000:3000" to bind
only on localhost (e.g. "127.0.0.1:3000:3000") and update the Grafana env vars
to reduce anonymous privileges and re-enable login by setting
GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer and GF_AUTH_DISABLE_LOGIN_FORM=false (leave
GF_AUTH_ANONYMOUS_ENABLED as needed); this keeps the service inaccessible from
external hosts and prevents anonymous users from having Admin rights.

In
`@examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh`:
- Around line 18-21: The curl PUT to Grafana's /api/org/preferences is missing
basic auth and will likely 401; update the cURL command that targets
http://grafana:3000/api/org/preferences to include the same authentication as
the other call (e.g., add -u "admin:admin" or use the existing credential
variable) so the request is authorized when calling the Org preferences endpoint
in init-home-dashboard.sh.

In `@examples/03_prometheus_metrics_example/README.md`:
- Around line 59-67: The fenced code blocks in README.md that contain the ASCII
diagram (the three-box Stream Agent → Prometheus → Grafana diagram) and the
directory tree under "03_prometheus_metrics_example/" must include a language
identifier to satisfy MD040; change the opening triple backticks for those
blocks from ``` to ```text (both the diagram block and the subsequent
directory-tree block), and ensure any other similar fenced blocks in the file
(e.g., the ones referenced around lines 155-172) are updated the same way.
- Line 23: Replace bare URLs in the README (specifically occurrences of the
string "http://localhost:9464/metrics" found in the content) with proper
Markdown links using the [label](url) syntax (e.g., a descriptive label like
"metrics endpoint" linking to http://localhost:9464/metrics) so markdownlint
rule MD034 is satisfied; update all occurrences (the one shown and the other two
instances) to use this link syntax and ensure any other bare URLs in the file
are converted similarly.
🧹 Nitpick comments (5)
examples/03_prometheus_metrics_example/observability/prometheus/prometheus.yml (1)

9-16: host.docker.internal may not resolve on Linux without additional configuration.

This hostname works out-of-the-box on Docker Desktop (macOS/Windows), but on native Linux Docker, it requires adding extra_hosts in the docker-compose service definition. Since this is a development example, consider adding a note in the README or docker-compose file for Linux users.

examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh (1)

3-8: Consider adding a timeout to the health check loop.

The current loop will wait indefinitely if Grafana fails to start. For a more robust script, consider adding a maximum retry count or timeout.

Optional: Add timeout
 # Wait for Grafana to be ready
 echo "Waiting for Grafana to be ready..."
+MAX_RETRIES=30
+RETRY_COUNT=0
-until curl -s http://grafana:3000/api/health > /dev/null 2>&1; do
+until curl -s http://grafana:3000/api/health > /dev/null 2>&1 || [ $RETRY_COUNT -ge $MAX_RETRIES ]; do
     echo "Grafana not ready yet, waiting..."
     sleep 2
+    RETRY_COUNT=$((RETRY_COUNT + 1))
 done
+
+if [ $RETRY_COUNT -ge $MAX_RETRIES ]; then
+    echo "Grafana failed to start within timeout"
+    exit 1
+fi
examples/03_prometheus_metrics_example/observability/grafana/dashboards/stream-agents.json (1)

1-822: LGTM! Well-structured dashboard with consistent panel configurations.

The dashboard is well-organized with appropriate PromQL queries using histogram_quantile for latency percentiles. The datasource UID prometheus correctly references the provisioned datasource.

One observation: the threshold value of 80ms is uniform across all latency panels (LLM, STT, TTS, Turn Detection). These services typically have different latency profiles—LLM responses are often slower than STT/TTS. Consider customizing thresholds per panel type for more meaningful alerting, though this is purely a nice-to-have refinement.

examples/03_prometheus_metrics_example/docker-compose.yml (2)

48-50: Ensure grafana-init waits for Grafana readiness.

depends_on only waits for container start, not service readiness. The init script can race and fail. Wrap the command with a wait loop to poll the /api/health endpoint:

🛠️ Suggested implementation
-    command: sh /init-home-dashboard.sh
+    command: sh -c 'until curl -sf http://grafana:3000/api/health; do sleep 2; done; sh /init-home-dashboard.sh'

4-4: Pin Docker image tags to specific versions instead of latest.

Using latest makes deployments non-reproducible and risks unexpected breaking changes. For example, prom/prometheus :latest currently points to v2.x while v3.x releases exist. Pin explicit version tags (e.g., v2.53.5, v3.8.1) or use environment variables for version management.

Affected lines: 4 (prometheus), 22 (grafana), 44 (curl)

Example using env-vars
-    image: prom/prometheus:latest
+    image: prom/prometheus:${PROMETHEUS_VERSION}
...
-    image: grafana/grafana:latest
+    image: grafana/grafana:${GRAFANA_VERSION}
...
-    image: curlimages/curl:latest
+    image: curlimages/curl:${CURL_VERSION}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b75e384 and e8b055d.

📒 Files selected for processing (7)
  • examples/03_prometheus_metrics_example/README.md
  • examples/03_prometheus_metrics_example/docker-compose.yml
  • examples/03_prometheus_metrics_example/observability/grafana/dashboards/stream-agents.json
  • examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh
  • examples/03_prometheus_metrics_example/observability/grafana/provisioning/dashboards/default.yml
  • examples/03_prometheus_metrics_example/observability/grafana/provisioning/datasources/prometheus.yml
  • examples/03_prometheus_metrics_example/observability/prometheus/prometheus.yml
🧰 Additional context used
🪛 Gitleaks (8.30.0)
examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh

[high] 26-27: Discovered a potential basic authorization token provided in a curl command, which could compromise the curl accessed resource.

(curl-auth-user)

🪛 markdownlint-cli2 (0.18.1)
examples/03_prometheus_metrics_example/README.md

23-23: Bare URL used

(MD034, no-bare-urls)


41-41: Bare URL used

(MD034, no-bare-urls)


59-59: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


155-155: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


194-194: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: unit / Ruff
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
  • GitHub Check: unit / Ruff
🔇 Additional comments (6)
examples/03_prometheus_metrics_example/observability/grafana/provisioning/datasources/prometheus.yml (1)

1-12: LGTM!

The datasource configuration is well-structured. The UID prometheus correctly matches the dashboard references, and the timeInterval: 5s aligns with the job-level scrape interval defined in the Prometheus configuration.

examples/03_prometheus_metrics_example/observability/grafana/provisioning/dashboards/default.yml (1)

1-13: LGTM!

The dashboard provisioning configuration is appropriate for a development example. Setting allowUiUpdates: true enables experimentation, and updateIntervalSeconds: 10 provides reasonably responsive dashboard reloading.

examples/03_prometheus_metrics_example/observability/grafana/init-home-dashboard.sh (1)

26-30: Hardcoded default credentials are acceptable for this local development example.

The static analysis tool flagged the admin:admin credentials. For a local observability stack example, using Grafana's default credentials is reasonable. If this were production code, you'd want to parameterize these via environment variables.

examples/03_prometheus_metrics_example/docker-compose.yml (1)

55-61: LGTM for volumes and network isolation.

Clean separation and persistence defaults look solid.

examples/03_prometheus_metrics_example/README.md (2)

3-11: Nice overview expansion for Grafana visualization.

Clear, concise positioning of the Prometheus + Grafana flow.


103-125: Grafana dashboard section reads well.

The PromQL examples and feature list are concise and actionable.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Add a docker-compose setup with Prometheus and Grafana for visualizing
agent metrics. Includes a pre-built dashboard showing LLM, STT, TTS,
and turn detection latencies with percentile breakdowns.
d3xvn and others added 4 commits January 23, 2026 17:48
The agent.join() method returns an async context manager, not an
awaitable, so it should be used with `async with` instead of `with await`.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change label grouping from llm_class/stt_class/tts_class to provider
  (matching actual metric labels emitted by MetricsCollector)
- Use increase()[15m] instead of rate()[5m] for STT percentiles to
  better handle sparse data
- Add average latency calculation for STT that always displays data

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@examples/06_prometheus_metrics_example/prometheus_metrics_example.py`:
- Around line 26-40: Move the OpenTelemetry setup so
metrics.set_meter_provider(provider) runs before importing any vision_agents
modules: initialize PrometheusMetricReader(), create
MeterProvider(metric_readers=[reader]) and call
metrics.set_meter_provider(provider) prior to the import lines that bring in
Agent / AgentLauncher /Runner /User (the from vision_agents.core import ...
statement) so the module-level metric instruments in
vision_agents.core.observability.metrics bind to the configured provider instead
of the no-op default.
♻️ Duplicate comments (2)
examples/06_prometheus_metrics_example/README.md (2)

23-23: Replace bare URLs with Markdown links.
Lines 23, 41, and 212 use bare URLs and trip MD034. Please convert them to Markdown link syntax.

✍️ Proposed fix
-Then open http://localhost:9464/metrics in your browser to see raw metrics as you talk to the agent.
+Then open [http://localhost:9464/metrics](http://localhost:9464/metrics) in your browser to see raw metrics as you talk to the agent.

-3. Open Grafana at http://localhost:3000 (no login required - anonymous access enabled)
+3. Open Grafana at [http://localhost:3000](http://localhost:3000) (no login required - anonymous access enabled)

-- Check Prometheus targets at http://localhost:9090/targets
+- Check Prometheus targets at [http://localhost:9090/targets](http://localhost:9090/targets)

Also applies to: 41-41, 212-212


60-68: Add language identifiers to fenced blocks (use text).
Lines 60–68 and 171–188 are missing a language identifier, which trips MD040.

✍️ Proposed fix
-```
+```text
 ┌─────────────────────┐     ┌─────────────────┐     ┌─────────────────┐
 │   Stream Agent      │────▶│   Prometheus    │────▶│    Grafana      │
 │  (port 9464)        │     │   (port 9090)   │     │  (port 3000)    │
 │                     │     │                 │     │                 │
 │  Metrics endpoint:  │     │  Scrapes every  │     │  Pre-built      │
 │  /metrics           │     │  5 seconds      │     │  dashboards     │
 └─────────────────────┘     └─────────────────┘     └─────────────────┘
-```
+```

-```
+```text
 06_prometheus_metrics_example/
 ├── prometheus_metrics_example.py   # Main example code
 ├── docker-compose.yml              # Prometheus + Grafana stack
 ├── observability/
 │   ├── prometheus/
 │   │   └── prometheus.yml          # Prometheus config
 │   └── grafana/
 │       ├── dashboards/
 │       │   └── stream-agents.json  # Pre-built dashboard
 │       ├── provisioning/
 │       │   ├── dashboards/
 │       │   │   └── default.yml     # Dashboard provisioning
 │       │   └── datasources/
 │       │       └── prometheus.yml  # Datasource config
 │       └── init-home-dashboard.sh  # Sets home dashboard
 └── README.md
-```
+```

Also applies to: 171-188

🧹 Nitpick comments (1)
examples/06_prometheus_metrics_example/prometheus_metrics_example.py (1)

42-49: Guard against duplicate log handlers on repeated imports.
Line 49 adds a handler unconditionally; re-imports can double-log. A simple logger.handlers guard prevents that.

♻️ Proposed fix
 handler = logging.StreamHandler(sys.stderr)
 handler.setFormatter(
     logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
 )
-logger.addHandler(handler)
+if not logger.handlers:
+    logger.addHandler(handler)

@dangusev dangusev merged commit b3d336a into main Jan 26, 2026
10 checks passed
@dangusev dangusev deleted the feat/add-grafana branch January 26, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants