-
Notifications
You must be signed in to change notification settings - Fork 8
Phase 1 Infrastructure Rollout
Deploy the new telemetry backends following the same ArgoCD + Helm pattern established for Loki. This is a one-time cost and unblocks all service onboarding.
Total estimated effort: ~2–2.5 days Prerequisite for: Phase 2 — Service Onboarding
Labels: phase-1 infrastructure tracing
Estimated Effort: ~1 day
Deploy Grafana Tempo as the distributed trace storage backend. Tempo is the core requirement for end-to-end request visibility — without it, no traces from instrumented services can be stored or queried.
- ArgoCD Application + Helm chart for Tempo following the same pattern as the Loki deployment
- New SeaweedFS bucket:
telemetry-tempo - Tempo configured to use SeaweedFS S3 gateway as object storage backend
- Grafana datasource added pointing to Tempo
Grafana shows a working Tempo datasource. A test trace (e.g. from a manually instrumented script) is queryable in the Explore view.
- ArgoCD Application for Tempo is healthy and synced
-
telemetry-tempobucket exists in SeaweedFS - Tempo is storing trace data to SeaweedFS (verify via bucket contents after test trace)
- Grafana Tempo datasource is configured and returns results
Labels: phase-1 infrastructure metrics
Estimated Effort: ~1 day
Deploy Grafana Mimir as a unified metrics backend. Mimir consolidates the current pattern of one Grafana datasource per namespace into a single backend, and allows alert rules to be written once across all namespaces. It also enables OTel Exemplars — Trace IDs embedded in metric data points that link directly to the causative trace in Tempo.
- ArgoCD Application + Helm chart for Mimir following the same pattern as the Loki deployment
- New SeaweedFS bucket:
telemetry-mimir - Mimir configured to use SeaweedFS S3 gateway as object storage backend
- Grafana datasource added pointing to Mimir
- Alloy remote-write config pointing to Mimir (covered in Alloy config issue below)
Grafana shows a single Mimir datasource. Metrics from all namespaces are queryable without switching datasources.
- ArgoCD Application for Mimir is healthy and synced
-
telemetry-mimirbucket exists in SeaweedFS - Alloy is remote-writing metrics to Mimir (verify via Mimir query returning namespace metrics)
- Grafana Mimir datasource is configured and returns results
- Existing per-namespace Prometheus datasources can be deprecated (or documented for removal)
Labels: phase-1 infrastructure alloy
Estimated Effort: ~2–3h
Update the shared Alloy Helm values to enable OTLP trace receiving (so instrumented services can push traces to Tempo) and Mimir remote-write (so Alloy forwards scraped metrics to Mimir). This is a small diff to the existing shared values file — the same pattern used for Loki log routing.
- Enable
otlp.receiverin Alloy config (gRPC on port 4317, HTTP on port 4318) - Add
prometheus.remote_writeblock pointing to Mimir - Route received OTLP traces to Tempo
- Verify no disruption to existing Loki log forwarding
Alloy accepts OTLP pushes from instrumented services and forwards traces to Tempo. Scraped metrics are remote-written to Mimir.
- Alloy config diff reviewed and merged
- ArgoCD syncs the updated Alloy config to all namespaces without errors
- A test OTLP push to
http://alloy:4317results in a trace visible in Tempo - Mimir receives metrics from Alloy (verify via Mimir query)
- Loki log forwarding unaffected
Labels: phase-1 infrastructure dashboards
Estimated Effort: ~½ day
Import and configure pre-built community dashboards for the new telemetry backends. These provide immediate operational value as soon as services begin pushing traces.
- Trace explorer / service map dashboard (Tempo)
- Service overview dashboard (RED metrics: Rate, Errors, Duration)
- Namespace metrics overview dashboard (Mimir)
- Light customisation to match namespace/label conventions
Grafana home shows populated dashboards with trace, service, and metrics views available out of the box once services are onboarded.
- Tempo trace explorer dashboard is imported and functional
- Service overview dashboard populates once at least one service is instrumented
- Namespace metrics dashboard queries Mimir (not per-namespace Prometheus)
- Dashboards are provisioned via config (not manually created in UI)