Skip to content

Architecture and Infrastructure

Ivan P edited this page Jun 5, 2026 · 3 revisions

Stack Overview

flowchart LR
  subgraph SVC["Service Namespaces (× N)"]
    APP[App Pods]
    AY[Grafana Alloy]
  end

  subgraph MON[Monitoring Namespaces]
    LK["Loki\ndev / test / prod"]
    TM["Tempo\ndev / test / prod\n— Phase 1 —"]
    MR["Mimir\ndev / test / prod\n— Phase 1 —"]
    GF["Grafana\n(shared)"]
  end

  subgraph TOOLS[ca7f8f-tools]
    SWD[SeaweedFS dev\nnon-HA]
    SWH[SeaweedFS HA\ntest + prod]
  end

  APP -->|logs / OTLP| AY
  AY -->|logs| LK
  AY -.->|traces — Phase 1| TM
  AY -.->|metrics remote-write — Phase 1| MR

  LK --> SWD & SWH
  TM -.-> SWD & SWH
  MR -.-> SWD & SWH

  GF -->|queries| LK
  GF -.->|queries — Phase 1| TM & MR
Loading

Solid lines — existing. Dashed lines — Phase 1 additions.


Existing Components

Grafana

Single shared instance (HPA, ca7f8f-test). Keycloak SSO with role mapping. Stateless — no PVC; config and session state externalised to PostgreSQL and Redis.

Datasources are provisioned from values (not via UI). Currently: one Thanos datasource per managed namespace + one Loki datasource per environment. Phase 1 adds Mimir and Tempo datasources per environment.

Dashboards are ConfigMap-driven via a sidecar — any namespace can ship a dashboard by labelling a ConfigMap grafana_dashboard=1. No manual UI imports.

Grafana Alloy

One Deployment per managed namespace (monitoring-collector), namespace-scoped RBAC only. Currently collects pod logs and forwards them to the correct Loki instance for that namespace environment (dev → loki-dev, test → loki-test, prod → loki-prod).

The log_source pod label maps to a Loki retention tier. The default is cdt. Pods can override it (e.g. log_source=partner-team) to land in a different retention bucket.

Phase 1 extends Alloy with an OTLP receiver (traces → Tempo) and Mimir remote-write (metrics).

Grafana Loki

Three separate instances — loki-dev, loki-test, loki-prod — each deployed in its own monitoring namespace. All run Simple Scalable mode (write / read / backend targets). Per-stream retention is enforced via log_source label selectors.

Storage: loki-dev → seaweedfs-dev; loki-test and loki-prod → seaweedfs-ha.

SeaweedFS

Two instances in ca7f8f-tools:

  • seaweedfs-dev — non-HA (single replicas, no replication). Serves dev workloads only.
  • seaweedfs-ha — HA (3 master / 3 volume / 2 filer, rack replication). Serves test and prod workloads.

All backends (Loki, and Phase 1 Tempo/Mimir) access SeaweedFS exclusively via its S3 gateway — an internal ClusterIP service. SeaweedFS internals are not reachable from application namespaces.

Bucket naming convention: dts-{type}-{env}-{purpose} (e.g. dts-logs-prod-chunks, dts-traces-dev-blocks).


New Backends — Phase 1

Component Why Effort
Grafana Tempo Distributed trace storage — end-to-end request visibility across services ~1.5 days
Grafana Mimir Unified metrics backend — single queryable store replacing per-namespace datasources; enables OTel Exemplars ~1.5 days
Alloy config update Enable OTLP receiver for traces + Mimir remote-write for metrics ~2–3h
Grafana Dashboards Import community dashboards for traces, service overview, namespace metrics ~½ day

Pyroscope (profiling backend) is deferred to Phase 3.

See Phase 1 — Infrastructure Rollout for chart selection, environment rollout order, resource policy, storage buckets, and the prod quota increase request.


Data Flow

  1. Edge Collection: Services push traces and application metrics via OTLP (port 4317 gRPC / 4318 HTTP) to the Alloy deployment in their namespace. Logs are not pushed via OTLP — they continue to flow via pod stdout, which Alloy scrapes and forwards to Loki (existing pipeline). Profiling SDKs (pyroscope-io / @pyroscope/nodejs) push profiling data to Alloy's Pyroscope receiver on port 12347. Faro browser payloads are sent to a proxy route on each service's own backend, which forwards to the local Alloy Faro receiver.
  2. Routing: Alloy routes OTLP traces → Tempo and OTLP metrics → Mimir. Logs arrive via stdout scraping → Loki (separate pipeline). OTel log correlation (OTEL_PYTHON_LOG_CORRELATION=true / Node.js equivalent) injects trace_id and span_id into existing stdout log records — enabling log-to-trace linking in Grafana without changing the log collection pipeline. OTel Exemplars embed Trace IDs in metric data points for metric-to-trace drill-down.
  3. Storage Offload: Backends flush immutable blocks to SeaweedFS S3 buckets periodically, decoupling retention from container lifecycles.
  4. Visualization: Grafana queries Mimir and Tempo centrally, bypassing namespace isolation proxies. A latency spike on a Mimir chart can be clicked through directly to the causative Tempo trace. A Loki log entry with a trace_id field links directly to the corresponding Tempo trace.

Faro Collection Pattern

Browser-side telemetry cannot reach an in-cluster Alloy endpoint due to same-origin restrictions. Pattern:

Browser SDK → /faro (same origin, app's own domain)
                    ↓
              Backend proxy route (Express / FastAPI)
                    ↓
              http://alloy:12347 (in-cluster, local namespace)
                    ↓
              Alloy faro.receive_http
              ├── errors / events / Web Vitals → Loki
              └── browser traces → Tempo

Helm Values Design

To keep this non-prescriptive for other deployments:

faro:
  enabled: false       # opt-in — no footprint on existing deployments
  collectorUrl: ""     # URL injected into the browser bundle
  proxy:
    enabled: false     # adds a /faro proxy route to the backend service
    path: "/faro"
    upstreamUrl: ""    # e.g. http://alloy:12347 for in-cluster Alloy

Teams who expose Alloy via an ingress Route set faro.proxy.enabled: false and point faro.collectorUrl at their public endpoint. Teams not using Faro leave everything false.


Environment Variable Conventions

All services use standard OpenTelemetry environment variable names — no custom naming. Values are set per-deployment via Helm chart values to point at the local namespace Alloy service.

# All services (Phase 2)
- name: OTEL_SERVICE_NAME
  value: "<service-name>"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
  value: "http://alloy:4317"        # gRPC; use :4318 for HTTP
- name: OTEL_TRACES_EXPORTER
  value: "otlp"
- name: OTEL_METRICS_EXPORTER
  value: "otlp"

# Python services only
- name: OTEL_PYTHON_LOG_CORRELATION
  value: "true"

# Phase 3
- name: PYROSCOPE_SERVER_ADDRESS
  value: "http://alloy:12347"

OTEL_TRACES_EXPORTER and OTEL_METRICS_EXPORTER activate both signals over the same OTLP connection. The log signal (OTEL_LOGS_EXPORTER) is intentionally not set — logs are collected via pod stdout scraping, not OTLP push. OTEL_PYTHON_LOG_CORRELATION injects trace_id / span_id into existing Python log records so Grafana can link Loki entries to Tempo traces; the Node.js equivalent is achieved via @opentelemetry/instrumentation-pino or -winston depending on the logging library in use.

Clone this wiki locally