Architecture and Infrastructure

Architecture & Infrastructure

Existing Stack (Already Deployed via ArgoCD)

Component	Details
Grafana	Visualization — queries Loki, Mimir, Tempo, Pyroscope
Grafana Alloy	Collector deployed as a Deployment to every managed namespace; services push OTLP; metrics scraped via ServiceMonitors
Grafana Loki	3 instances (dev / test / prod), Simple Scalable mode; dev Alloys → dev Loki; test+prod Alloys → prod Loki
SeaweedFS	S3-compatible object storage; Dev instance + HA instance; all telemetry backends use S3 gateway only

No redeployment needed for the above. Only configuration updates where noted.

New Backends — Phase 1

Component	Why	Effort
Grafana Tempo	Distributed trace storage — end-to-end request visibility across services	~1 day
Grafana Mimir	Unified metrics backend — eliminates per-namespace datasource and alert rule duplication	~1 day
Alloy config update	Enable OTLP receiver for traces + Mimir remote-write for metrics	~2–3h
Grafana Dashboards	Import community dashboards for traces, service overview, namespace metrics	~½ day

Pyroscope (profiling backend) is deferred to Phase 2.

All new backends follow the same ArgoCD Helm chart pattern established for Loki, backed by new SeaweedFS buckets (telemetry-tempo, telemetry-mimir).

Data Flow

Edge Collection: Services push traces, metrics, and profiles via OTLP to the Alloy deployment in their namespace. Language-native profiling SDKs (pyroscope-io / @pyroscope/nodejs) push profiling data to Alloy's Pyroscope receiver. Faro browser payloads are sent to a proxy route on each service's own backend, which forwards to the local Alloy Faro receiver — without exposing any Alloy endpoint publicly.
Routing: Alloy routes traces → Tempo, metrics → Mimir (remote-write), profiles → Pyroscope, logs → Loki. OTel Exemplars map Trace IDs onto metric data points for metric-to-trace drill-down.
Storage Offload: Backends flush immutable blocks to SeaweedFS S3 buckets periodically, decoupling retention from container lifecycles.
Visualization: Grafana queries Mimir and Tempo centrally, bypassing namespace isolation proxies. A latency spike on a Mimir chart can be clicked through directly to the causative Tempo trace.

Faro Collection Pattern

Browser-side telemetry cannot reach an in-cluster Alloy endpoint due to same-origin restrictions. Pattern:

Browser SDK → https://app.example.com/faro (same origin)
                    ↓
              Backend proxy route (Express / FastAPI)
                    ↓
              http://alloy:12347 (in-cluster, local namespace)
                    ↓
              Alloy faro.receive_http
              ├── errors / events / Web Vitals → Loki
              └── browser traces → Tempo

Helm Values Design

To keep this non-prescriptive for other deployments:

faro:
  enabled: false       # opt-in — no footprint on existing deployments
  collectorUrl: ""     # URL injected into the browser bundle
  proxy:
    enabled: false     # adds a /faro proxy route to the backend service
    path: "/faro"
    upstreamUrl: ""    # e.g. http://alloy:12347 for in-cluster Alloy

Teams who expose Alloy via an ingress Route set faro.proxy.enabled: false and point faro.collectorUrl at their public endpoint. Teams not using Faro leave everything false.

Environment Variable Conventions

All services use standard OpenTelemetry environment variable names — no custom naming. Values are set per-deployment via Helm chart values to point at the local namespace Alloy service:

env:
  - name: OTEL_SERVICE_NAME
    value: "<service-name>"
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://alloy:4317"          # gRPC; use :4318 for HTTP
  - name: PYROSCOPE_SERVER_ADDRESS
    value: "http://alloy:12347"         # Phase 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Architecture and Infrastructure

Architecture & Infrastructure

Existing Stack (Already Deployed via ArgoCD)

New Backends — Phase 1

Data Flow

Faro Collection Pattern

Helm Values Design

Environment Variable Conventions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally