-
Notifications
You must be signed in to change notification settings - Fork 8
Phase 1 Infrastructure Rollout
Deploy the new telemetry backends following the same ArgoCD + Helm pattern established for Loki. This is a one-time cost and unblocks all service onboarding.
Total estimated effort: ~3.5–4 days Prerequisite for: Phase 2 — Service Onboarding
| Component | Dev | Test | Prod |
|---|---|---|---|
| Tempo |
grafana-community/tempo (monolithic) |
grafana-community/tempo (monolithic) |
grafana-community/tempo-distributed |
| Mimir |
mimir-distributed (replicas: 1) |
mimir-distributed (replicas: 1) |
mimir-distributed (HA) |
| Pyroscope | deferred to Phase 3 | deferred to Phase 3 | deferred to Phase 3 |
Tempo chart rationale: The monolithic chart runs all components in a single binary — sufficient for dev/test at low traffic. tempo-distributed splits into independent components (distributor, ingester, querier, query-frontend, compactor, store-gateway) so each can scale independently in prod. Mimir has no monolithic chart; replicas: 1 on all components gives a minimal functional deployment for dev/test.
dev → test → prod
Deploy to dev first to validate Helm values and SeaweedFS S3 config. Promote proven values to test (different bucket names, same chart). Deploy prod last with distributed chart and HA replica counts.
All deployments follow this policy across all three backends:
| Component type | CPU request | CPU limit | Memory request | Memory limit |
|---|---|---|---|---|
| Ingesters | ✓ | — | ✓ | ✓ (4–8Gi) |
| Store-gateways | ✓ | — | ✓ | ✓ (2–4Gi) |
| Distributors | ✓ | — | ✓ | — |
| Queriers | ✓ | — | ✓ | — |
| Query-frontend | ✓ | — | ✓ | — |
| Compactors | ✓ | — | ✓ | — |
CPU limits are not set to avoid CFS quota throttling — write-path components (ingesters, distributors) experience latency spikes when throttled even when node capacity is available. CPU requests are retained for scheduling and HPA. Memory limits are set only on ingesters and store-gateways where unbounded growth would risk a node-level OOM; a single replica OOM kill is recoverable, a node OOM is not.
Following the dts-{type}-{env}-{purpose} convention established for Loki:
seaweedfs-dev:
dts-traces-dev-blocks-
dts-metrics-dev-blocks,dts-metrics-dev-ruler,dts-metrics-dev-alertmanager
seaweedfs-ha (test + prod):
-
dts-traces-test-blocks,dts-traces-prod-blocks -
dts-metrics-test-blocks,dts-metrics-test-ruler,dts-metrics-test-alertmanager -
dts-metrics-prod-blocks,dts-metrics-prod-ruler,dts-metrics-prod-alertmanager
Phase 3 (Pyroscope) will add: dts-profiles-{dev,test,prod}-blocks
Each environment gets a matched set of datasources in the shared Grafana instance:
-
Tempo (dev)/Tempo (test)/Tempo (prod) -
Mimir (dev)/Mimir (test)/Mimir (prod)
| Namespace | CPU available | Memory available | Storage available |
|---|---|---|---|
| ca7f8f-dev | 2530m | ~11.3Gi | ~33Gi |
| ca7f8f-test | 5340m | ~24.8Gi | ~30Gi |
| ca7f8f-prod | 1370m | ~7.5Gi | ~9Gi |
| Environment | CPU req | Memory req | Storage PVCs |
|---|---|---|---|
| dev (Tempo monolithic + Mimir ×1) | ~400m | ~2Gi | ~4Gi |
| test (Tempo monolithic + Mimir ×1) | ~900m | ~4Gi | ~8Gi |
| prod (Tempo-distributed + Mimir HA) | ~2900m | ~13.5Gi | ~16Gi |
Phase 3 addition to prod (Pyroscope HA): ~950m CPU, ~3.3Gi memory, ~2Gi storage.
Dev and test fit within existing quotas with comfortable headroom. Prod requires quota increases — included below covering both Phase 1 and Phase 3 to avoid a follow-up request.
| Component | Replicas | CPU req | Memory req | Storage PVC |
|---|---|---|---|---|
| Tempo distributor | ×2 | 200m | 512Mi | — |
| Tempo ingester | ×3 | 600m | 3Gi | 6Gi (2Gi×3) |
| Tempo querier | ×2 | 200m | 1Gi | — |
| Tempo query-frontend | ×2 | 100m | 512Mi | — |
| Tempo compactor | ×1 | 100m | 512Mi | — |
| Tempo store-gateway | ×2 | 200m | 1Gi | — |
| Mimir distributor | ×2 | 200m | 512Mi | — |
| Mimir ingester | ×3 | 600m | 3Gi | 6Gi |
| Mimir querier | ×2 | 200m | 1Gi | — |
| Mimir query-frontend | ×2 | 100m | 512Mi | — |
| Mimir compactor | ×1 | 100m | 512Mi | 2Gi |
| Mimir store-gateway | ×2 | 200m | 1Gi | — |
| Mimir alertmanager | ×2 | 100m | 512Mi | 2Gi |
| Phase 1 total | ~2900m | ~13.5Gi | ~16Gi |
| Resource | Current quota | Currently used | Phase 1 new | Phase 3 new | Total required | Requested | Headroom |
|---|---|---|---|---|---|---|---|
| CPU | 4000m | 2630m | ~2900m | ~950m | ~6480m | 10000m | ~35% |
| Memory | 16Gi | ~8.5Gi | ~13.5Gi | ~3.3Gi | ~25.3Gi | 32Gi | ~21% |
| Storage | 64Gi | ~55Gi | ~16Gi | ~2Gi | ~73Gi | 96Gi | ~24% |
We are deploying Grafana Tempo (distributed tracing) and Grafana Mimir (long-term metrics) as high-availability backends to
ca7f8f-prodin Phase 1, with Grafana Pyroscope (continuous profiling) to follow in Phase 3. These are infrastructure-tier services that underpin monitoring, alerting, and observability for all production workloads.HA requirement: A minimum of 3 ingester replicas per backend (replication factor 2) is required to survive a single pod failure without write-path data loss. Stateless components (distributors, queriers, query-frontends, store-gateways) run at 2 replicas to allow rolling updates without downtime. Ingesters require local WAL PVCs — data is flushed to SeaweedFS S3 on graceful shutdown, but local PVC ensures no loss during pod restarts or rolling deployments.
No CPU limits: CPU limits are intentionally not set. The Linux CFS scheduler throttles pods that exceed their CPU quota within a 100ms window, causing latency spikes in write-path components even when node capacity is available. CPU requests are retained and are sufficient for scheduling and HPA-based autoscaling.
Single request for Phases 1 and 3: Pyroscope prod resource requirements (~950m CPU, ~3.3Gi memory, ~2Gi storage) are included in this request to avoid a follow-up quota increase when Phase 3 is deployed.
Requested increases: CPU 4000m → 10000m, Memory 16Gi → 32Gi, Storage 64Gi → 96Gi.
Labels: phase-1 infrastructure tracing
Estimated Effort: ~1.5 days
Deploy Grafana Tempo as the distributed trace storage backend across all three environments. Tempo is the core requirement for end-to-end request visibility — without it, no traces from instrumented services can be stored or queried.
- ArgoCD ApplicationSet (or three Applications) for Tempo following the same pattern as Loki
-
grafana-community/tempofor dev and test;grafana-community/tempo-distributedfor prod - SeaweedFS buckets created per the naming convention above
- Tempo configured to use SeaweedFS S3 gateway as object storage backend
- Resource policy applied (no CPU limits; memory limits on ingesters and store-gateways only)
- Grafana datasources added:
Tempo (dev),Tempo (test),Tempo (prod)
distributor×2, ingester×3 (RF=2), querier×2, query-frontend×2, compactor×1, store-gateway×2
All three Grafana Tempo datasources are functional. A test trace (e.g. from a manually instrumented script or telemetrygen) is queryable in the Explore view for each environment.
- ArgoCD Application(s) for Tempo are healthy and synced in all three namespaces
-
dts-traces-{dev,test,prod}-blocksbuckets exist in the appropriate SeaweedFS instance - Tempo is storing trace data to SeaweedFS (verify via bucket contents after test trace)
-
Tempo (dev),Tempo (test),Tempo (prod)datasources configured in Grafana and returning results - No CPU limits set; memory limits applied to ingesters and store-gateways only
Labels: phase-1 infrastructure metrics
Estimated Effort: ~1.5 days
Deploy Grafana Mimir as a unified long-term metrics backend across all three environments. Mimir consolidates the current pattern of one Grafana datasource per namespace into a single queryable backend, enables OTel Exemplars (trace IDs embedded in metric datapoints linking directly to Tempo), and supports cross-namespace alerting rules written once.
- ArgoCD ApplicationSet (or three Applications) for
mimir-distributedwith environment-specific values - Dev and test: all component replicas set to 1 (minimal deployment)
- Prod: HA replica counts per breakdown above; PodDisruptionBudgets enabled
- SeaweedFS buckets created:
dts-metrics-{env}-{blocks,ruler,alertmanager} - Resource policy applied (no CPU limits; memory limits on ingesters and store-gateways only)
- Grafana datasources added:
Mimir (dev),Mimir (test),Mimir (prod) - Alloy remote-write config pointing to Mimir (covered in Alloy config issue below)
A single Mimir datasource per environment replaces the per-namespace Prometheus datasources. Metrics from all namespaces are queryable without switching datasources.
- ArgoCD Application(s) for Mimir are healthy and synced in all three namespaces
-
dts-metrics-{dev,test,prod}-{blocks,ruler,alertmanager}buckets exist in the appropriate SeaweedFS instance - Alloy is remote-writing metrics to Mimir (verify via Mimir query returning namespace metrics)
-
Mimir (dev),Mimir (test),Mimir (prod)datasources configured in Grafana and returning results - No CPU limits set; memory limits applied to ingesters and store-gateways only
- Existing per-namespace Prometheus datasources documented for deprecation
Labels: phase-1 infrastructure alloy
Estimated Effort: ~2–3h
Update the shared Alloy Helm values to enable the OTLP receiver for traces and application metrics, and wire routing to the correct Tempo and Mimir instances per namespace. The OTLP receiver handles both signals on the same port (4317 gRPC / 4318 HTTP). Logs are intentionally not wired to the OTLP pipeline — logs continue to flow via the existing loki.source.kubernetes stdout scraping → Loki path.
- Enable OTLP receiver in Alloy config (gRPC on port 4317, HTTP on port 4318)
- Route OTLP traces → correct Tempo instance per namespace (dev → Tempo dev, test → Tempo test, prod → Tempo prod)
- Route OTLP metrics → correct Mimir instance per namespace via remote-write
- Log signal intentionally not wired to OTLP — existing stdout scraping pipeline unchanged
- Verify no disruption to existing Loki log forwarding
Signal routing pattern:
otelcol.receiver.otlp "default" {
grpc { endpoint = "0.0.0.0:4317" }
http { endpoint = "0.0.0.0:4318" }
output {
traces = [otelcol.exporter.otlp.tempo.input]
metrics = [otelcol.exporter.prometheusremotewrite.mimir.input]
# logs intentionally omitted — handled by stdout scraping
}
}Alloy accepts OTLP pushes and routes traces to the correct Tempo instance and application metrics to the correct Mimir instance per environment. The existing log collection pipeline is unaffected.
- Alloy config diff reviewed and merged
- ArgoCD syncs the updated Alloy config to all namespaces without errors
- A test OTLP push to
http://alloy:4317results in a trace visible in the matching Tempo datasource - OTLP metrics from the same push appear in Mimir (verify
http.server.request.durationor equivalent RED metric) - Loki log forwarding unaffected
Labels: phase-1 infrastructure dashboards
Estimated Effort: ~½ day
Import and configure pre-built community dashboards for the new telemetry backends. These provide immediate operational value as soon as services begin pushing traces and metrics.
- Trace explorer / service map dashboard (Tempo)
- Service overview dashboard (RED metrics: Rate, Errors, Duration)
- Namespace metrics overview dashboard (Mimir)
- Dashboards parameterised by datasource variable so a single dashboard covers dev/test/prod via the environment selector
- Light customisation to match namespace/label conventions
Grafana home shows populated dashboards with trace, service, and metrics views available out of the box once services are onboarded in Phase 2.
- Tempo trace explorer dashboard imported and functional
- Service overview dashboard populates once at least one service is instrumented
- Namespace metrics dashboard queries Mimir (not per-namespace Prometheus)
- Dashboards are provisioned via config (not manually created in UI)
- Datasource variable allows switching between dev/test/prod environments