Skip to content

Phase 3 Profiling and RUM

Ivan P edited this page Jun 18, 2026 · 6 revisions

Each service is an independent unit of work. Pyroscope applies to all 8 services. Faro applies to the 5 services with browser-facing frontends.

Each service has separate issues for code changes and Helm chart changes.

Total estimated effort: ~27h (includes infrastructure validation) Prerequisite: Phase 2 complete; Pyroscope and Alloy faro.receiver deployed to dev


Issue: Deploy Grafana Pyroscope

Labels: phase-3 infrastructure profiling Estimated Effort: ~1.5 days

Description

Deploy Grafana Pyroscope as the continuous profiling backend, following the same ArgoCD + Helm pattern used for Tempo and Mimir in Phase 1. This is the infrastructure prerequisite for all per-service Phase 3 profiling work — no service onboarding can proceed until Pyroscope is running and reachable from Alloy in each namespace.

Chart Selection

Environment Chart Mode
dev grafana/pyroscope All-in-one (replicas: 1)
test grafana/pyroscope All-in-one (replicas: 1)
prod grafana/pyroscope Distributed (HA)

Rationale: The grafana/pyroscope chart supports both all-in-one and distributed modes via the same chart. All-in-one runs all components (distributor, ingester, querier, compactor, store-gateway) in a single binary — sufficient for dev/test at low profiling volume. Distributed mode splits components for independent scaling in prod, matching the pattern used for Mimir.

Rollout Order

dev → test → prod

Deploy and validate in dev first (confirm Alloy → Pyroscope pipeline, Grafana datasource, S3 bucket access). Promote proven values to test with updated bucket names. Deploy prod last with HA replica counts.

Storage Buckets

Following the dts-{type}-{env}-{purpose} convention established for Loki, Tempo, and Mimir:

SeaweedFS instance Bucket
seaweedfs-dev dts-profiles-dev-blocks
seaweedfs-ha dts-profiles-test-blocks
seaweedfs-ha dts-profiles-prod-blocks

Provision buckets using the existing SeaweedFS bucket provisioning pattern before deploying Pyroscope.

Resource Policy

Follows the same policy established in Phase 1:

  • No CPU limits (avoids CFS throttling on write-path components)
  • Memory limits set on ingesters only (unbounded growth risk)
  • CPU and memory requests set on all components for scheduling

Prod Component Breakdown (HA)

Component Replicas CPU req Memory req Storage PVC
Distributor ×2 100m 256Mi
Ingester ×3 300m 1.5Gi 2Gi (WAL)
Querier ×2 150m 512Mi
Query-frontend ×1 100m 256Mi
Compactor ×1 150m 512Mi
Store-gateway ×2 150m 512Mi
Total ~950m ~3.3Gi ~2Gi

These figures are already accounted for in the Phase 1 quota increase request for ca7f8f-prod.

Requirements

  • Deploy Pyroscope Helm chart via ArgoCD in ca7f8f-dev, ca7f8f-test, and ca7f8f-prod
  • Create SeaweedFS S3 buckets (dts-profiles-{dev,test,prod}-blocks) before deploying
  • Configure S3 storage backend (endpoint, bucket, access credentials via existing secrets — same pattern as Tempo/Mimir)
  • Update Alloy configuration to add a pyroscope.receive component listening on port 12347 and a pyroscope.write component forwarding to the Pyroscope backend in the same namespace
  • Apply network policies to allow:
    • Alloy → Pyroscope traffic on port 4040 (Pyroscope HTTP) within each namespace
    • Service pods → Alloy on port 12347 (already open for OTLP; verify Pyroscope receiver is included)
  • Provision Pyroscope (dev) / Pyroscope (test) / Pyroscope (prod) datasources in Grafana
  • Add Pyroscope entries to the Grafana dashboard imports (Flame Graph explorer)

Acceptance Criteria

  • Pyroscope deployed and healthy in all three namespaces (ca7f8f-dev, ca7f8f-test, ca7f8f-prod)
  • SeaweedFS buckets created and S3 connectivity confirmed from each Pyroscope instance
  • Alloy pyroscope.receive component active on port 12347 in each namespace
  • Alloy pyroscope.write component forwarding profiles to the local Pyroscope backend
  • Pyroscope (dev), Pyroscope (test), Pyroscope (prod) datasources visible and queryable in Grafana
  • Pyroscope /ready endpoint returns healthy in all three namespaces
  • Network policies allow Alloy → Pyroscope and service → Alloy (port 12347) traffic
  • No resource quota violations in any namespace
  • Prod HA: ingester replicas ×3 confirmed with WAL PVCs bound

Pyroscope Continuous Profiling

acapy-agent

Issue: acapy-agent — Pyroscope Profiling (Code)

Labels: phase-3 profiling python acapy Estimated Effort: ~1h

Description: Add pyroscope-io to continuously profile CPU and memory usage in acapy-agent.

Requirements:

  • Add pyroscope-io to dependencies
  • Call pyroscope.configure(application_name="acapy-agent", server_address=..., tags={"version": ...}) in acapy_agent/__main__.py inside the main entry loop

Acceptance Criteria:

  • Profiling data appears in Grafana Pyroscope for acapy-agent
  • CPU flame graphs visible and navigable
  • No measurable performance regression

Issue: acapy-agent — Pyroscope Helm Chart Updates

Labels: phase-3 helm profiling acapy Estimated Effort: ~15min

Description: Add PYROSCOPE_SERVER_ADDRESS to the acapy-agent deployment.

Acceptance Criteria:

  • PYROSCOPE_SERVER_ADDRESS env var present in deployed pod
  • Profiling data visible in Pyroscope

acapy-vc-authn-oidc

Issue: acapy-vc-authn-oidc — Pyroscope Profiling (Code)

Labels: phase-3 profiling python vc-authn Estimated Effort: ~1h

Requirements: Add pyroscope-io; configure in FastAPI startup before the app begins serving requests.

Acceptance Criteria:

  • Profiling data for vc-authn-oidc-controller visible in Pyroscope
  • No measurable performance regression

Issue: acapy-vc-authn-oidc — Pyroscope Helm Chart Updates

Labels: phase-3 helm profiling vc-authn Estimated Effort: ~15min

Acceptance Criteria:

  • PYROSCOPE_SERVER_ADDRESS env var present in deployed pod

traction

Issue: traction — Pyroscope Profiling (Code)

Labels: phase-3 profiling nodejs python traction Estimated Effort: ~2h

Requirements:

  • tenant-ui: Add @pyroscope/nodejs; initialize via Pyroscope.init() and Pyroscope.start() in tracing.ts
  • ACA-Py plugin: Add pyroscope-io; call pyroscope.configure() at startup

Acceptance Criteria:

  • Profiling data for traction-tenant-ui and traction-acapy visible in Pyroscope
  • No measurable performance regression in either component

Issue: traction — Pyroscope Helm Chart Updates

Labels: phase-3 helm profiling traction Estimated Effort: ~15min

Requirements: Add PYROSCOPE_SERVER_ADDRESS to both tenant-ui and ACA-Py deployments.

Acceptance Criteria:

  • Env var present in both pods

acapy-endorser-service

Issue: acapy-endorser-service — Pyroscope Profiling (Code)

Labels: phase-3 profiling python endorser Estimated Effort: ~1h

Requirements: Add pyroscope-io; configure via pyroscope.configure() early in main.py using env vars.

Acceptance Criteria:

  • Profiling data for acapy-endorser-service visible in Pyroscope

Issue: acapy-endorser-service — Pyroscope Helm Chart Updates

Labels: phase-3 helm profiling endorser Estimated Effort: ~15min

Acceptance Criteria:

  • PYROSCOPE_SERVER_ADDRESS present in deployed pod

didwebvh-server-py

Issue: didwebvh-server-py — Pyroscope Profiling (Code)

Labels: phase-3 profiling python didwebvh Estimated Effort: ~1h

Requirements: Add pyroscope-io to server/pyproject.toml; configure in server/main.py before uvicorn.run.

Acceptance Criteria:

  • Profiling data for didwebvh-server visible in Pyroscope

Issue: didwebvh-server-py — Pyroscope Helm Chart Updates

Labels: phase-3 helm profiling didwebvh Estimated Effort: ~15min

Acceptance Criteria:

  • PYROSCOPE_SERVER_ADDRESS present in deployed pod

credo-ts

Issue: credo-ts — Pyroscope Profiling (Code)

Labels: phase-3 profiling typescript credo Estimated Effort: ~1h

Requirements: Add @pyroscope/nodejs; initialize early in the SDK wrapper before the OTel SDK setup using env vars.

Acceptance Criteria:

  • Profiling data for credo-agent visible in Pyroscope

Issue: credo-ts consuming service — Pyroscope Helm Chart Updates

Labels: phase-3 helm profiling credo Estimated Effort: ~15min

Acceptance Criteria:

  • PYROSCOPE_SERVER_ADDRESS present in consuming service pod

didcomm-mediator-credo

Issue: didcomm-mediator-credo — Pyroscope Profiling (Code)

Labels: phase-3 profiling typescript mediator Estimated Effort: ~1h

Requirements: Add @grafana/pyroscope-nodejs; call Pyroscope.init() and Pyroscope.start() in apps/mediator/instrumentation.js before OTel SDK initialization.

Acceptance Criteria:

  • Profiling data for didcomm-mediator-credo visible in Pyroscope

Issue: didcomm-mediator-credo — Pyroscope Helm Chart Updates

Labels: phase-3 helm profiling mediator Estimated Effort: ~15min

Acceptance Criteria:

  • PYROSCOPE_SERVER_ADDRESS present in deployed pod

bc-wallet-demo

Issue: bc-wallet-demo — Pyroscope Profiling (Code)

Labels: phase-3 profiling typescript bc-wallet Estimated Effort: ~2h

Requirements: Add @pyroscope/nodejs to server workspace; initialize via Pyroscope.init({ serverAddress, appName }) in server/src/index.ts at startup.

Acceptance Criteria:

  • Profiling data for bc-wallet-demo-server visible in Pyroscope

Issue: bc-wallet-demo — Pyroscope Helm Chart Updates

Labels: phase-3 helm profiling bc-wallet Estimated Effort: ~15min

Acceptance Criteria:

  • PYROSCOPE_SERVER_ADDRESS present in deployed pod

Faro Real User Monitoring

Applies to: traction, acapy-vc-authn-oidc, didwebvh-server-py, credo-ts, bc-wallet-demo.

Each service requires a backend proxy route (code) and opt-in Helm chart values. See Architecture & Infrastructure for the Faro collection pattern and Helm values design.


traction

Issue: traction — Faro RUM (Code)

Labels: phase-3 faro rum nodejs traction Estimated Effort: ~3h

Requirements:

  • Add @grafana/faro-web-sdk, @grafana/faro-web-tracing to the frontend workspace
  • Add Express proxy route (POST /faro) in tenant-ui that forwards browser payloads to http://alloy:12347
  • Initialize Faro in Vue's main.ts inside loadApp(), pointing collectorUrl at the app's own /faro path

Acceptance Criteria:

  • Browser errors and Web Vitals appear in Loki as structured log entries
  • Browser-side traces (XHR/Fetch spans) appear in Tempo linked to backend spans
  • Proxy route forwards correctly without CORS errors in browser console

Issue: traction — Faro Helm Chart Updates

Labels: phase-3 helm faro traction Estimated Effort: ~30min

Requirements: Add faro.enabled, faro.proxy.enabled, faro.proxy.upstreamUrl, faro.collectorUrl to chart values for both tenant-ui and ACA-Py deployments.

Acceptance Criteria:

  • faro.enabled: false by default — no change to existing deployments
  • When enabled, faro.collectorUrl is injected into browser bundle
  • Proxy route activated when faro.proxy.enabled: true

acapy-vc-authn-oidc

Issue: acapy-vc-authn-oidc — Faro RUM (Code)

Labels: phase-3 faro rum python vc-authn Estimated Effort: ~4h

Requirements:

  • Add FastAPI proxy route (POST /faro) forwarding browser payloads to http://alloy:12347
  • Create Nuxt 3 client plugin faro.client.ts pointing collectorUrl at the app's own /faro path
  • Inject Faro CDN script tag into Jinja2 templates (e.g. verified_credentials.html)

Acceptance Criteria:

  • Browser errors and Web Vitals from both Nuxt 3 and Jinja2 surfaces appear in Loki
  • Browser traces appear in Tempo
  • No CORS errors in browser console

Issue: acapy-vc-authn-oidc — Faro Helm Chart Updates

Labels: phase-3 helm faro vc-authn Estimated Effort: ~30min

Requirements: Add faro.enabled, faro.proxy.enabled, faro.proxy.upstreamUrl, faro.collectorUrl to chart values.

Acceptance Criteria:

  • faro.enabled: false by default
  • Faro collector URL injected when enabled

didwebvh-server-py

Issue: didwebvh-server-py — Faro RUM (Code)

Labels: phase-3 faro rum python didwebvh Estimated Effort: ~2h

Requirements:

  • Add FastAPI proxy route (POST /faro) forwarding browser payloads to http://alloy:12347
  • Inject Faro Web SDK initialization script tag into server/app/templates/components/head.jinja
  • Pass collectorUrl via server/config.py from environment

Acceptance Criteria:

  • Web Explorer browser events appear in Loki
  • Proxy route forwards without CORS errors

Issue: didwebvh-server-py — Faro Helm Chart Updates

Labels: phase-3 helm faro didwebvh Estimated Effort: ~30min

Requirements: Add faro.enabled, faro.proxy.enabled, faro.proxy.upstreamUrl, faro.collectorUrl to chart values.

Acceptance Criteria:

  • faro.enabled: false by default

credo-ts

Issue: credo-ts — Faro RUM (Code)

Labels: phase-3 faro rum typescript credo Estimated Effort: ~5h

Requirements:

  • Standard Faro Web SDK configuration inside React or Vue consumer roots
  • instrumentations: [...getWebInstrumentations(), new TracingInstrumentation()]

Note: Proxy route responsibility falls on the consuming application's backend, not credo-ts itself.

Acceptance Criteria:

  • Browser events from consumer app appear in Loki
  • Browser traces appear in Tempo

Issue: credo-ts consuming service — Faro Helm Chart Updates

Labels: phase-3 helm faro credo Estimated Effort: ~30min

Requirements: Add faro.* values to the consuming service's Helm chart.

Acceptance Criteria:

  • faro.enabled: false by default

bc-wallet-demo

Issue: bc-wallet-demo — Faro RUM (Code)

Labels: phase-3 faro rum typescript bc-wallet Estimated Effort: ~4h

Requirements:

  • Add Express proxy route (POST /faro) on backend server forwarding to http://alloy:12347
  • Add @grafana/faro-web-sdk, @grafana/faro-web-tracing to frontend workspace
  • Initialize Faro in frontend/src/index.tsx before createRoot, pointing collectorUrl at the app's own /faro path

Acceptance Criteria:

  • React app browser errors and Web Vitals appear in Loki
  • Browser XHR/Fetch traces appear in Tempo linked to backend spans
  • No CORS errors in browser console

Issue: bc-wallet-demo — Faro Helm Chart Updates

Labels: phase-3 helm faro bc-wallet Estimated Effort: ~30min

Requirements: Add faro.enabled, faro.proxy.enabled, faro.proxy.upstreamUrl, faro.collectorUrl (injected as build-time args or public/config.json) for the frontend build.

Acceptance Criteria:

  • faro.enabled: false by default
  • Faro collector URL injected into React bundle when enabled

Issue: Phase 3 Infrastructure Validation (dev only)

Estimated Effort: ~½ day

Description

Validate that Pyroscope and Alloy's Faro receiver are correctly deployed and accepting data in ca7f8f-dev before beginning per-service Phase 3 onboarding. This is the go/no-go gate for Phase 3 service work.

Unlike Phase 1, there is no synthetic profiling data generator equivalent to telemetrygen. Pyroscope is validated via health check and API reachability. The Faro receiver is validated by sending a test payload directly to Alloy — no browser or instrumented service required. Per-service data flow validation is covered by each service's acceptance criteria.

Requirements

  • Confirm Pyroscope is running and healthy in ca7f8f-dev (query the /ready endpoint)
  • Confirm Pyroscope is reachable from within the namespace (e.g. from a debug pod)
  • Send a minimal test payload to Alloy's Faro receiver endpoint (http://alloy:12347/collect) and confirm it is accepted (HTTP 200)
  • Confirm the test Faro payload appears as a log entry in Loki (dev)
  • Resolve any RBAC, network policy, or Alloy pipeline issues before proceeding

Expected Outcome

Pyroscope is running and reachable from within ca7f8f-dev. Alloy's Faro receiver is accepting payloads and routing them to Loki. The infrastructure is confirmed ready for per-service Phase 3 onboarding.

Acceptance Criteria

  • Pyroscope /ready endpoint returns healthy in ca7f8f-dev
  • Pyroscope reachable from within the namespace on its configured port
  • Alloy faro.receiver accepts a test POST to /collect without error
  • Test Faro payload visible as a log entry in Grafana → Loki (dev)
  • Any network policy or RBAC issues identified and resolved

Clone this wiki locally