Skip to content

Phase 1 Infrastructure Rollout

Ivan P edited this page Jun 5, 2026 · 6 revisions

Deploy the new telemetry backends following the same ArgoCD + Helm pattern established for Loki. This is a one-time cost and unblocks all service onboarding.

Total estimated effort: ~3.5–4 days Prerequisite for: Phase 2 — Service Onboarding


Deployment Architecture

Chart Selection

Component Dev Test Prod
Tempo grafana-community/tempo (monolithic) grafana-community/tempo (monolithic) grafana-community/tempo-distributed
Mimir mimir-distributed (replicas: 1) mimir-distributed (replicas: 1) mimir-distributed (HA)
Pyroscope deferred to Phase 3 deferred to Phase 3 deferred to Phase 3

Tempo chart rationale: The monolithic chart runs all components in a single binary — sufficient for dev/test at low traffic. tempo-distributed splits into independent components (distributor, ingester, querier, query-frontend, compactor, store-gateway) so each can scale independently in prod. Mimir has no monolithic chart; replicas: 1 on all components gives a minimal functional deployment for dev/test.

Environment Rollout Order

dev → test → prod

Deploy to dev first to validate Helm values and SeaweedFS S3 config. Promote proven values to test (different bucket names, same chart). Deploy prod last with distributed chart and HA replica counts.

Resource Policy

All deployments follow this policy across all three backends:

Component type CPU request CPU limit Memory request Memory limit
Ingesters ✓ (4–8Gi)
Store-gateways ✓ (2–4Gi)
Distributors
Queriers
Query-frontend
Compactors

CPU limits are not set to avoid CFS quota throttling — write-path components (ingesters, distributors) experience latency spikes when throttled even when node capacity is available. CPU requests are retained for scheduling and HPA. Memory limits are set only on ingesters and store-gateways where unbounded growth would risk a node-level OOM; a single replica OOM kill is recoverable, a node OOM is not.

Storage Buckets

Following the dts-{type}-{env}-{purpose} convention established for Loki:

seaweedfs-dev:

  • dts-traces-dev-blocks
  • dts-metrics-dev-blocks, dts-metrics-dev-ruler, dts-metrics-dev-alertmanager

seaweedfs-ha (test + prod):

  • dts-traces-test-blocks, dts-traces-prod-blocks
  • dts-metrics-test-blocks, dts-metrics-test-ruler, dts-metrics-test-alertmanager
  • dts-metrics-prod-blocks, dts-metrics-prod-ruler, dts-metrics-prod-alertmanager

Phase 3 (Pyroscope) will add: dts-profiles-{dev,test,prod}-blocks

Grafana Datasources

Each environment gets a matched set of datasources in the shared Grafana instance:

  • Tempo (dev) / Tempo (test) / Tempo (prod)
  • Mimir (dev) / Mimir (test) / Mimir (prod)

Namespace Resource Analysis

Available Headroom (pre-deployment)

Namespace CPU available Memory available Storage available
ca7f8f-dev 2530m ~11.3Gi ~33Gi
ca7f8f-test 5340m ~24.8Gi ~30Gi
ca7f8f-prod 1370m ~7.5Gi ~9Gi

Estimated New Resource Requests

Environment CPU req Memory req Storage PVCs
dev (Tempo monolithic + Mimir ×1) ~400m ~2Gi ~4Gi
test (Tempo monolithic + Mimir ×1) ~900m ~4Gi ~8Gi
prod (Tempo-distributed + Mimir HA) ~2900m ~13.5Gi ~16Gi

Phase 3 addition to prod (Pyroscope HA): ~950m CPU, ~3.3Gi memory, ~2Gi storage.

Dev and test fit within existing quotas with comfortable headroom. Prod requires quota increases — included below covering both Phase 1 and Phase 3 to avoid a follow-up request.

Prod Component Breakdown (Phase 1)

Component Replicas CPU req Memory req Storage PVC
Tempo distributor ×2 200m 512Mi
Tempo ingester ×3 600m 3Gi 6Gi (2Gi×3)
Tempo querier ×2 200m 1Gi
Tempo query-frontend ×2 100m 512Mi
Tempo compactor ×1 100m 512Mi
Tempo store-gateway ×2 200m 1Gi
Mimir distributor ×2 200m 512Mi
Mimir ingester ×3 600m 3Gi 6Gi
Mimir querier ×2 200m 1Gi
Mimir query-frontend ×2 100m 512Mi
Mimir compactor ×1 100m 512Mi 2Gi
Mimir store-gateway ×2 200m 1Gi
Mimir alertmanager ×2 100m 512Mi 2Gi
Phase 1 total ~2900m ~13.5Gi ~16Gi

Prod Quota Increase Request

Resource Current quota Currently used Phase 1 new Phase 3 new Total required Requested Headroom
CPU 4000m 2630m ~2900m ~950m ~6480m 10000m ~35%
Memory 16Gi ~8.5Gi ~13.5Gi ~3.3Gi ~25.3Gi 32Gi ~21%
Storage 64Gi ~55Gi ~16Gi ~2Gi ~73Gi 96Gi ~24%

Justification

We are deploying Grafana Tempo (distributed tracing) and Grafana Mimir (long-term metrics) as high-availability backends to ca7f8f-prod in Phase 1, with Grafana Pyroscope (continuous profiling) to follow in Phase 3. These are infrastructure-tier services that underpin monitoring, alerting, and observability for all production workloads.

HA requirement: A minimum of 3 ingester replicas per backend (replication factor 2) is required to survive a single pod failure without write-path data loss. Stateless components (distributors, queriers, query-frontends, store-gateways) run at 2 replicas to allow rolling updates without downtime. Ingesters require local WAL PVCs — data is flushed to SeaweedFS S3 on graceful shutdown, but local PVC ensures no loss during pod restarts or rolling deployments.

No CPU limits: CPU limits are intentionally not set. The Linux CFS scheduler throttles pods that exceed their CPU quota within a 100ms window, causing latency spikes in write-path components even when node capacity is available. CPU requests are retained and are sufficient for scheduling and HPA-based autoscaling.

Single request for Phases 1 and 3: Pyroscope prod resource requirements (~950m CPU, ~3.3Gi memory, ~2Gi storage) are included in this request to avoid a follow-up quota increase when Phase 3 is deployed.

Requested increases: CPU 4000m → 10000m, Memory 16Gi → 32Gi, Storage 64Gi → 96Gi.


Issue: Deploy Grafana Tempo

Labels: phase-1 infrastructure tracing Estimated Effort: ~1.5 days

Description

Deploy Grafana Tempo as the distributed trace storage backend across all three environments. Tempo is the core requirement for end-to-end request visibility — without it, no traces from instrumented services can be stored or queried.

Requirements

  • ArgoCD ApplicationSet (or three Applications) for Tempo following the same pattern as Loki
  • grafana-community/tempo for dev and test; grafana-community/tempo-distributed for prod
  • SeaweedFS buckets created per the naming convention above
  • Tempo configured to use SeaweedFS S3 gateway as object storage backend
  • Resource policy applied (no CPU limits; memory limits on ingesters and store-gateways only)
  • Grafana datasources added: Tempo (dev), Tempo (test), Tempo (prod)

Prod replica baseline

distributor×2, ingester×3 (RF=2), querier×2, query-frontend×2, compactor×1, store-gateway×2

Expected Outcome

All three Grafana Tempo datasources are functional. A test trace (e.g. from a manually instrumented script or telemetrygen) is queryable in the Explore view for each environment.

Acceptance Criteria

  • ArgoCD Application(s) for Tempo are healthy and synced in all three namespaces
  • dts-traces-{dev,test,prod}-blocks buckets exist in the appropriate SeaweedFS instance
  • Tempo is storing trace data to SeaweedFS (verify via bucket contents after test trace)
  • Tempo (dev), Tempo (test), Tempo (prod) datasources configured in Grafana and returning results
  • No CPU limits set; memory limits applied to ingesters and store-gateways only

Issue: Deploy Grafana Mimir

Labels: phase-1 infrastructure metrics Estimated Effort: ~1.5 days

Description

Deploy Grafana Mimir as a unified long-term metrics backend across all three environments. Mimir consolidates the current pattern of one Grafana datasource per namespace into a single queryable backend, enables OTel Exemplars (trace IDs embedded in metric datapoints linking directly to Tempo), and supports cross-namespace alerting rules written once.

Requirements

  • ArgoCD ApplicationSet (or three Applications) for mimir-distributed with environment-specific values
  • Dev and test: all component replicas set to 1 (minimal deployment)
  • Prod: HA replica counts per breakdown above; PodDisruptionBudgets enabled
  • SeaweedFS buckets created: dts-metrics-{env}-{blocks,ruler,alertmanager}
  • Resource policy applied (no CPU limits; memory limits on ingesters and store-gateways only)
  • Grafana datasources added: Mimir (dev), Mimir (test), Mimir (prod)
  • Alloy remote-write config pointing to Mimir (covered in Alloy config issue below)

Expected Outcome

A single Mimir datasource per environment replaces the per-namespace Prometheus datasources. Metrics from all namespaces are queryable without switching datasources.

Acceptance Criteria

  • ArgoCD Application(s) for Mimir are healthy and synced in all three namespaces
  • dts-metrics-{dev,test,prod}-{blocks,ruler,alertmanager} buckets exist in the appropriate SeaweedFS instance
  • Alloy is remote-writing metrics to Mimir (verify via Mimir query returning namespace metrics)
  • Mimir (dev), Mimir (test), Mimir (prod) datasources configured in Grafana and returning results
  • No CPU limits set; memory limits applied to ingesters and store-gateways only
  • Existing per-namespace Prometheus datasources documented for deprecation

Issue: Update Alloy Configuration

Labels: phase-1 infrastructure alloy Estimated Effort: ~2–3h

Description

Update the shared Alloy Helm values to enable OTLP trace receiving (so instrumented services can push traces to Tempo) and Mimir remote-write (so Alloy forwards scraped metrics to Mimir). This is a small diff to the existing shared values file — the same pattern used for Loki log routing.

Requirements

  • Enable otlp.receiver in Alloy config (gRPC on port 4317, HTTP on port 4318)
  • Add prometheus.remote_write blocks pointing to the appropriate Mimir instance per namespace (dev → Mimir dev, test → Mimir test, prod → Mimir prod)
  • Route received OTLP traces to the appropriate Tempo instance per namespace
  • Verify no disruption to existing Loki log forwarding

Expected Outcome

Alloy accepts OTLP pushes from instrumented services and routes traces to the correct Tempo instance. Scraped metrics are remote-written to the correct Mimir instance per environment.

Acceptance Criteria

  • Alloy config diff reviewed and merged
  • ArgoCD syncs the updated Alloy config to all namespaces without errors
  • A test OTLP push to http://alloy:4317 in each namespace results in a trace visible in the matching Tempo datasource
  • Mimir receives metrics from Alloy in each environment (verify via Mimir query)
  • Loki log forwarding unaffected

Issue: Import Grafana Dashboards

Labels: phase-1 infrastructure dashboards Estimated Effort: ~½ day

Description

Import and configure pre-built community dashboards for the new telemetry backends. These provide immediate operational value as soon as services begin pushing traces and metrics.

Requirements

  • Trace explorer / service map dashboard (Tempo)
  • Service overview dashboard (RED metrics: Rate, Errors, Duration)
  • Namespace metrics overview dashboard (Mimir)
  • Dashboards parameterised by datasource variable so a single dashboard covers dev/test/prod via the environment selector
  • Light customisation to match namespace/label conventions

Expected Outcome

Grafana home shows populated dashboards with trace, service, and metrics views available out of the box once services are onboarded in Phase 2.

Acceptance Criteria

  • Tempo trace explorer dashboard imported and functional
  • Service overview dashboard populates once at least one service is instrumented
  • Namespace metrics dashboard queries Mimir (not per-namespace Prometheus)
  • Dashboards are provisioned via config (not manually created in UI)
  • Datasource variable allows switching between dev/test/prod environments

Clone this wiki locally