Phase 1 Infrastructure Rollout

Deploy the new telemetry backends following the same ArgoCD + Helm pattern established for Loki. This is a one-time cost and unblocks all service onboarding.

Total estimated effort: ~2–2.5 days Prerequisite for: Phase 2 — Service Onboarding

Issue: Deploy Grafana Tempo

Labels: phase-1 infrastructure tracing Estimated Effort: ~1 day

Description

Deploy Grafana Tempo as the distributed trace storage backend. Tempo is the core requirement for end-to-end request visibility — without it, no traces from instrumented services can be stored or queried.

Requirements

ArgoCD Application + Helm chart for Tempo following the same pattern as the Loki deployment
New SeaweedFS bucket: telemetry-tempo
Tempo configured to use SeaweedFS S3 gateway as object storage backend
Grafana datasource added pointing to Tempo

Expected Outcome

Grafana shows a working Tempo datasource. A test trace (e.g. from a manually instrumented script) is queryable in the Explore view.

Acceptance Criteria

ArgoCD Application for Tempo is healthy and synced
telemetry-tempo bucket exists in SeaweedFS
Tempo is storing trace data to SeaweedFS (verify via bucket contents after test trace)
Grafana Tempo datasource is configured and returns results

Issue: Deploy Grafana Mimir

Labels: phase-1 infrastructure metrics Estimated Effort: ~1 day

Description

Deploy Grafana Mimir as a unified metrics backend. Mimir consolidates the current pattern of one Grafana datasource per namespace into a single backend, and allows alert rules to be written once across all namespaces. It also enables OTel Exemplars — Trace IDs embedded in metric data points that link directly to the causative trace in Tempo.

Requirements

ArgoCD Application + Helm chart for Mimir following the same pattern as the Loki deployment
New SeaweedFS bucket: telemetry-mimir
Mimir configured to use SeaweedFS S3 gateway as object storage backend
Grafana datasource added pointing to Mimir
Alloy remote-write config pointing to Mimir (covered in Alloy config issue below)

Expected Outcome

Grafana shows a single Mimir datasource. Metrics from all namespaces are queryable without switching datasources.

Acceptance Criteria

ArgoCD Application for Mimir is healthy and synced
telemetry-mimir bucket exists in SeaweedFS
Alloy is remote-writing metrics to Mimir (verify via Mimir query returning namespace metrics)
Grafana Mimir datasource is configured and returns results
Existing per-namespace Prometheus datasources can be deprecated (or documented for removal)

Issue: Update Alloy Configuration

Labels: phase-1 infrastructure alloy Estimated Effort: ~2–3h

Description

Update the shared Alloy Helm values to enable OTLP trace receiving (so instrumented services can push traces to Tempo) and Mimir remote-write (so Alloy forwards scraped metrics to Mimir). This is a small diff to the existing shared values file — the same pattern used for Loki log routing.

Requirements

Enable otlp.receiver in Alloy config (gRPC on port 4317, HTTP on port 4318)
Add prometheus.remote_write block pointing to Mimir
Route received OTLP traces to Tempo
Verify no disruption to existing Loki log forwarding

Expected Outcome

Alloy accepts OTLP pushes from instrumented services and forwards traces to Tempo. Scraped metrics are remote-written to Mimir.

Acceptance Criteria

Alloy config diff reviewed and merged
ArgoCD syncs the updated Alloy config to all namespaces without errors
A test OTLP push to http://alloy:4317 results in a trace visible in Tempo
Mimir receives metrics from Alloy (verify via Mimir query)
Loki log forwarding unaffected

Issue: Import Grafana Dashboards

Labels: phase-1 infrastructure dashboards Estimated Effort: ~½ day

Description

Import and configure pre-built community dashboards for the new telemetry backends. These provide immediate operational value as soon as services begin pushing traces.

Requirements

Trace explorer / service map dashboard (Tempo)
Service overview dashboard (RED metrics: Rate, Errors, Duration)
Namespace metrics overview dashboard (Mimir)
Light customisation to match namespace/label conventions

Expected Outcome

Grafana home shows populated dashboards with trace, service, and metrics views available out of the box once services are onboarded.

Acceptance Criteria

Tempo trace explorer dashboard is imported and functional
Service overview dashboard populates once at least one service is instrumented
Namespace metrics dashboard queries Mimir (not per-namespace Prometheus)
Dashboards are provisioned via config (not manually created in UI)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phase 1 Infrastructure Rollout

Issue: Deploy Grafana Tempo

Description

Requirements

Expected Outcome

Acceptance Criteria

Issue: Deploy Grafana Mimir

Description

Requirements

Expected Outcome

Acceptance Criteria

Issue: Update Alloy Configuration

Description

Requirements

Expected Outcome

Acceptance Criteria

Issue: Import Grafana Dashboards

Description

Requirements

Expected Outcome

Acceptance Criteria

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally