Skip to content

Phase 1 Infrastructure Rollout

Ivan P edited this page Jun 8, 2026 · 6 revisions

Deploy the new telemetry backends following the same ArgoCD + Helm pattern established for Loki. This is a one-time cost and unblocks all service onboarding.

Total estimated effort: ~4–5 days (includes stack validation in dev) Prerequisite for: Phase 2 — Service Onboarding


Deployment Architecture

Chart Selection

Component Dev Test Prod
Tempo grafana-community/tempo (monolithic) grafana-community/tempo (monolithic) grafana-community/tempo-distributed
Mimir mimir-distributed (replicas: 1) mimir-distributed (replicas: 1) mimir-distributed (HA)
Pyroscope deferred to Phase 3 deferred to Phase 3 deferred to Phase 3

Tempo chart rationale: The monolithic chart runs all components in a single binary — sufficient for dev/test at low traffic. tempo-distributed splits into independent components (distributor, ingester, querier, query-frontend, compactor, store-gateway) so each can scale independently in prod. Mimir has no monolithic chart; replicas: 1 on all components gives a minimal functional deployment for dev/test.

Environment Rollout Order

dev → test → prod

Deploy to dev first to validate Helm values and SeaweedFS S3 config. Promote proven values to test (different bucket names, same chart). Deploy prod last with distributed chart and HA replica counts.

Resource Policy

All deployments follow this policy across all three backends:

Component type CPU request CPU limit Memory request Memory limit
Ingesters ✓ (4–8Gi)
Store-gateways ✓ (2–4Gi)
Distributors
Queriers
Query-frontend
Compactors

CPU limits are not set to avoid CFS quota throttling — write-path components (ingesters, distributors) experience latency spikes when throttled even when node capacity is available. CPU requests are retained for scheduling and HPA. Memory limits are set only on ingesters and store-gateways where unbounded growth would risk a node-level OOM; a single replica OOM kill is recoverable, a node OOM is not.

Storage Buckets

Following the dts-{type}-{env}-{purpose} convention established for Loki:

seaweedfs-dev:

  • dts-traces-dev-blocks
  • dts-metrics-dev-blocks, dts-metrics-dev-ruler, dts-metrics-dev-alertmanager

seaweedfs-ha (test + prod):

  • dts-traces-test-blocks, dts-traces-prod-blocks
  • dts-metrics-test-blocks, dts-metrics-test-ruler, dts-metrics-test-alertmanager
  • dts-metrics-prod-blocks, dts-metrics-prod-ruler, dts-metrics-prod-alertmanager

Phase 3 (Pyroscope) will add: dts-profiles-{dev,test,prod}-blocks

Grafana Datasources

Each environment gets a matched set of datasources in the shared Grafana instance:

  • Tempo (dev) / Tempo (test) / Tempo (prod)
  • Mimir (dev) / Mimir (test) / Mimir (prod)

Namespace Resource Analysis

Available Headroom (pre-deployment)

Namespace CPU available Memory available Storage available
ca7f8f-dev 2530m ~11.3Gi ~33Gi
ca7f8f-test 5340m ~24.8Gi ~30Gi
ca7f8f-prod 1370m ~7.5Gi ~9Gi

Estimated New Resource Requests

Environment CPU req Memory req Storage PVCs
dev (Tempo monolithic + Mimir ×1) ~400m ~2Gi ~4Gi
test (Tempo monolithic + Mimir ×1) ~900m ~4Gi ~8Gi
prod (Tempo-distributed + Mimir HA) ~2900m ~13.5Gi ~16Gi

Phase 3 addition to prod (Pyroscope HA): ~950m CPU, ~3.3Gi memory, ~2Gi storage.

Dev and test fit within existing quotas with comfortable headroom. Prod requires quota increases — included below covering both Phase 1 and Phase 3 to avoid a follow-up request.

Prod Component Breakdown (Phase 1)

Component Replicas CPU req Memory req Storage PVC
Tempo distributor ×2 200m 512Mi
Tempo ingester ×3 600m 3Gi 6Gi (2Gi×3)
Tempo querier ×2 200m 1Gi
Tempo query-frontend ×2 100m 512Mi
Tempo compactor ×1 100m 512Mi
Tempo store-gateway ×2 200m 1Gi
Mimir distributor ×2 200m 512Mi
Mimir ingester ×3 600m 3Gi 6Gi
Mimir querier ×2 200m 1Gi
Mimir query-frontend ×2 100m 512Mi
Mimir compactor ×1 100m 512Mi 2Gi
Mimir store-gateway ×2 200m 1Gi
Mimir alertmanager ×2 100m 512Mi 2Gi
Phase 1 total ~2900m ~13.5Gi ~16Gi

Prod Quota Increase Request

Resource Current quota Currently used Phase 1 new Phase 3 new Total required Requested Headroom
CPU 4000m 2630m ~2900m ~950m ~6480m 10000m ~35%
Memory 16Gi ~8.5Gi ~13.5Gi ~3.3Gi ~25.3Gi 32Gi ~21%
Storage 64Gi ~55Gi ~16Gi ~2Gi ~73Gi 96Gi ~24%

Justification

We are deploying Grafana Tempo (distributed tracing) and Grafana Mimir (long-term metrics) as high-availability backends to ca7f8f-prod in Phase 1, with Grafana Pyroscope (continuous profiling) to follow in Phase 3. These are infrastructure-tier services that underpin monitoring, alerting, and observability for all production workloads.

HA requirement: A minimum of 3 ingester replicas per backend (replication factor 2) is required to survive a single pod failure without write-path data loss. Stateless components (distributors, queriers, query-frontends, store-gateways) run at 2 replicas to allow rolling updates without downtime. Ingesters require local WAL PVCs — data is flushed to SeaweedFS S3 on graceful shutdown, but local PVC ensures no loss during pod restarts or rolling deployments.

No CPU limits: CPU limits are intentionally not set. The Linux CFS scheduler throttles pods that exceed their CPU quota within a 100ms window, causing latency spikes in write-path components even when node capacity is available. CPU requests are retained and are sufficient for scheduling and HPA-based autoscaling.

Single request for Phases 1 and 3: Pyroscope prod resource requirements (~950m CPU, ~3.3Gi memory, ~2Gi storage) are included in this request to avoid a follow-up quota increase when Phase 3 is deployed.

Requested increases: CPU 4000m → 10000m, Memory 16Gi → 32Gi, Storage 64Gi → 96Gi.


Issue: Deploy Grafana Tempo

Estimated Effort: ~1.5 days

https://github.com/bcgov/DITP-DevOps/issues/331


Issue: Deploy Grafana Mimir

Estimated Effort: ~1.5 days

https://github.com/bcgov/DITP-DevOps/issues/332


Issue: Update Alloy Configuration

Estimated Effort: ~2–3h

https://github.com/bcgov/DITP-DevOps/issues/333


Issue: Import Grafana Dashboards

Estimated Effort: ~½ day

https://github.com/bcgov/DITP-DevOps/issues/334


Issue: Loki and SeaweedFS Self-Monitoring

Estimated Effort: ~1 day

https://github.com/bcgov/DITP-DevOps/issues/336


Issue: Stack Validation (dev only)

Estimated Effort: ~½-1 day

https://github.com/bcgov/DITP-DevOps/issues/335