-
Notifications
You must be signed in to change notification settings - Fork 8
Phase 3 Profiling and RUM
Each service is an independent unit of work. Pyroscope applies to all 8 services. Faro applies to the 5 services with browser-facing frontends.
Each service has separate issues for code changes and Helm chart changes.
Total estimated effort: ~27h (includes infrastructure validation) Prerequisite: Phase 2 complete; Pyroscope and Alloy faro.receiver deployed to dev
Labels: phase-3 infrastructure profiling
Estimated Effort: ~1.5 days
Deploy Grafana Pyroscope as the continuous profiling backend, following the same ArgoCD + Helm pattern used for Tempo and Mimir in Phase 1. This is the infrastructure prerequisite for all per-service Phase 3 profiling work — no service onboarding can proceed until Pyroscope is running and reachable from Alloy in each namespace.
| Environment | Chart | Mode |
|---|---|---|
| dev | grafana/pyroscope |
All-in-one (replicas: 1) |
| test | grafana/pyroscope |
All-in-one (replicas: 1) |
| prod | grafana/pyroscope |
Distributed (HA) |
Rationale: The grafana/pyroscope chart supports both all-in-one and distributed modes via the same chart. All-in-one runs all components (distributor, ingester, querier, compactor, store-gateway) in a single binary — sufficient for dev/test at low profiling volume. Distributed mode splits components for independent scaling in prod, matching the pattern used for Mimir.
dev → test → prod
Deploy and validate in dev first (confirm Alloy → Pyroscope pipeline, Grafana datasource, S3 bucket access). Promote proven values to test with updated bucket names. Deploy prod last with HA replica counts.
Following the dts-{type}-{env}-{purpose} convention established for Loki, Tempo, and Mimir:
| SeaweedFS instance | Bucket |
|---|---|
| seaweedfs-dev | dts-profiles-dev-blocks |
| seaweedfs-ha | dts-profiles-test-blocks |
| seaweedfs-ha | dts-profiles-prod-blocks |
Provision buckets using the existing SeaweedFS bucket provisioning pattern before deploying Pyroscope.
Follows the same policy established in Phase 1:
- No CPU limits (avoids CFS throttling on write-path components)
- Memory limits set on ingesters only (unbounded growth risk)
- CPU and memory requests set on all components for scheduling
| Component | Replicas | CPU req | Memory req | Storage PVC |
|---|---|---|---|---|
| Distributor | ×2 | 100m | 256Mi | — |
| Ingester | ×3 | 300m | 1.5Gi | 2Gi (WAL) |
| Querier | ×2 | 150m | 512Mi | — |
| Query-frontend | ×1 | 100m | 256Mi | — |
| Compactor | ×1 | 150m | 512Mi | — |
| Store-gateway | ×2 | 150m | 512Mi | — |
| Total | ~950m | ~3.3Gi | ~2Gi |
These figures are already accounted for in the Phase 1 quota increase request for ca7f8f-prod.
- Deploy Pyroscope Helm chart via ArgoCD in
ca7f8f-dev,ca7f8f-test, andca7f8f-prod - Create SeaweedFS S3 buckets (
dts-profiles-{dev,test,prod}-blocks) before deploying - Configure S3 storage backend (endpoint, bucket, access credentials via existing secrets — same pattern as Tempo/Mimir)
- Update Alloy configuration to add a
pyroscope.receivecomponent listening on port 12347 and apyroscope.writecomponent forwarding to the Pyroscope backend in the same namespace - Apply network policies to allow:
- Alloy → Pyroscope traffic on port 4040 (Pyroscope HTTP) within each namespace
- Service pods → Alloy on port 12347 (already open for OTLP; verify Pyroscope receiver is included)
- Provision
Pyroscope (dev)/Pyroscope (test)/Pyroscope (prod)datasources in Grafana - Add Pyroscope entries to the Grafana dashboard imports (Flame Graph explorer)
- Pyroscope deployed and healthy in all three namespaces (
ca7f8f-dev,ca7f8f-test,ca7f8f-prod) - SeaweedFS buckets created and S3 connectivity confirmed from each Pyroscope instance
- Alloy
pyroscope.receivecomponent active on port 12347 in each namespace - Alloy
pyroscope.writecomponent forwarding profiles to the local Pyroscope backend -
Pyroscope (dev),Pyroscope (test),Pyroscope (prod)datasources visible and queryable in Grafana - Pyroscope
/readyendpoint returns healthy in all three namespaces - Network policies allow Alloy → Pyroscope and service → Alloy (port 12347) traffic
- No resource quota violations in any namespace
- Prod HA: ingester replicas ×3 confirmed with WAL PVCs bound
Labels: phase-3 profiling python acapy
Estimated Effort: ~1h
Description: Add pyroscope-io to continuously profile CPU and memory usage in acapy-agent.
Requirements:
- Add
pyroscope-ioto dependencies - Call
pyroscope.configure(application_name="acapy-agent", server_address=..., tags={"version": ...})inacapy_agent/__main__.pyinside the main entry loop
Acceptance Criteria:
- Profiling data appears in Grafana Pyroscope for
acapy-agent - CPU flame graphs visible and navigable
- No measurable performance regression
Labels: phase-3 helm profiling acapy
Estimated Effort: ~15min
Description: Add PYROSCOPE_SERVER_ADDRESS to the acapy-agent deployment.
Acceptance Criteria:
-
PYROSCOPE_SERVER_ADDRESSenv var present in deployed pod - Profiling data visible in Pyroscope
Labels: phase-3 profiling python vc-authn
Estimated Effort: ~1h
Requirements: Add pyroscope-io; configure in FastAPI startup before the app begins serving requests.
Acceptance Criteria:
- Profiling data for
vc-authn-oidc-controllervisible in Pyroscope - No measurable performance regression
Labels: phase-3 helm profiling vc-authn
Estimated Effort: ~15min
Acceptance Criteria:
-
PYROSCOPE_SERVER_ADDRESSenv var present in deployed pod
Labels: phase-3 profiling nodejs python traction
Estimated Effort: ~2h
Requirements:
-
tenant-ui: Add
@pyroscope/nodejs; initialize viaPyroscope.init()andPyroscope.start()intracing.ts -
ACA-Py plugin: Add
pyroscope-io; callpyroscope.configure()at startup
Acceptance Criteria:
- Profiling data for
traction-tenant-uiandtraction-acapyvisible in Pyroscope - No measurable performance regression in either component
Labels: phase-3 helm profiling traction
Estimated Effort: ~15min
Requirements: Add PYROSCOPE_SERVER_ADDRESS to both tenant-ui and ACA-Py deployments.
Acceptance Criteria:
- Env var present in both pods
Labels: phase-3 profiling python endorser
Estimated Effort: ~1h
Requirements: Add pyroscope-io; configure via pyroscope.configure() early in main.py using env vars.
Acceptance Criteria:
- Profiling data for
acapy-endorser-servicevisible in Pyroscope
Labels: phase-3 helm profiling endorser
Estimated Effort: ~15min
Acceptance Criteria:
-
PYROSCOPE_SERVER_ADDRESSpresent in deployed pod
Labels: phase-3 profiling python didwebvh
Estimated Effort: ~1h
Requirements: Add pyroscope-io to server/pyproject.toml; configure in server/main.py before uvicorn.run.
Acceptance Criteria:
- Profiling data for
didwebvh-servervisible in Pyroscope
Labels: phase-3 helm profiling didwebvh
Estimated Effort: ~15min
Acceptance Criteria:
-
PYROSCOPE_SERVER_ADDRESSpresent in deployed pod
Labels: phase-3 profiling typescript credo
Estimated Effort: ~1h
Requirements: Add @pyroscope/nodejs; initialize early in the SDK wrapper before the OTel SDK setup using env vars.
Acceptance Criteria:
- Profiling data for
credo-agentvisible in Pyroscope
Labels: phase-3 helm profiling credo
Estimated Effort: ~15min
Acceptance Criteria:
-
PYROSCOPE_SERVER_ADDRESSpresent in consuming service pod
Labels: phase-3 profiling typescript mediator
Estimated Effort: ~1h
Requirements: Add @grafana/pyroscope-nodejs; call Pyroscope.init() and Pyroscope.start() in apps/mediator/instrumentation.js before OTel SDK initialization.
Acceptance Criteria:
- Profiling data for
didcomm-mediator-credovisible in Pyroscope
Labels: phase-3 helm profiling mediator
Estimated Effort: ~15min
Acceptance Criteria:
-
PYROSCOPE_SERVER_ADDRESSpresent in deployed pod
Labels: phase-3 profiling typescript bc-wallet
Estimated Effort: ~2h
Requirements: Add @pyroscope/nodejs to server workspace; initialize via Pyroscope.init({ serverAddress, appName }) in server/src/index.ts at startup.
Acceptance Criteria:
- Profiling data for
bc-wallet-demo-servervisible in Pyroscope
Labels: phase-3 helm profiling bc-wallet
Estimated Effort: ~15min
Acceptance Criteria:
-
PYROSCOPE_SERVER_ADDRESSpresent in deployed pod
Applies to: traction, acapy-vc-authn-oidc, didwebvh-server-py, credo-ts, bc-wallet-demo.
Each service requires a backend proxy route (code) and opt-in Helm chart values. See Architecture & Infrastructure for the Faro collection pattern and Helm values design.
Labels: phase-3 faro rum nodejs traction
Estimated Effort: ~3h
Requirements:
- Add
@grafana/faro-web-sdk,@grafana/faro-web-tracingto the frontend workspace - Add Express proxy route (
POST /faro) in tenant-ui that forwards browser payloads tohttp://alloy:12347 - Initialize Faro in Vue's
main.tsinsideloadApp(), pointingcollectorUrlat the app's own/faropath
Acceptance Criteria:
- Browser errors and Web Vitals appear in Loki as structured log entries
- Browser-side traces (XHR/Fetch spans) appear in Tempo linked to backend spans
- Proxy route forwards correctly without CORS errors in browser console
Labels: phase-3 helm faro traction
Estimated Effort: ~30min
Requirements: Add faro.enabled, faro.proxy.enabled, faro.proxy.upstreamUrl, faro.collectorUrl to chart values for both tenant-ui and ACA-Py deployments.
Acceptance Criteria:
-
faro.enabled: falseby default — no change to existing deployments - When enabled,
faro.collectorUrlis injected into browser bundle - Proxy route activated when
faro.proxy.enabled: true
Labels: phase-3 faro rum python vc-authn
Estimated Effort: ~4h
Requirements:
- Add FastAPI proxy route (
POST /faro) forwarding browser payloads tohttp://alloy:12347 - Create Nuxt 3 client plugin
faro.client.tspointingcollectorUrlat the app's own/faropath - Inject Faro CDN script tag into Jinja2 templates (e.g.
verified_credentials.html)
Acceptance Criteria:
- Browser errors and Web Vitals from both Nuxt 3 and Jinja2 surfaces appear in Loki
- Browser traces appear in Tempo
- No CORS errors in browser console
Labels: phase-3 helm faro vc-authn
Estimated Effort: ~30min
Requirements: Add faro.enabled, faro.proxy.enabled, faro.proxy.upstreamUrl, faro.collectorUrl to chart values.
Acceptance Criteria:
-
faro.enabled: falseby default - Faro collector URL injected when enabled
Labels: phase-3 faro rum python didwebvh
Estimated Effort: ~2h
Requirements:
- Add FastAPI proxy route (
POST /faro) forwarding browser payloads tohttp://alloy:12347 - Inject Faro Web SDK initialization script tag into
server/app/templates/components/head.jinja - Pass
collectorUrlviaserver/config.pyfrom environment
Acceptance Criteria:
- Web Explorer browser events appear in Loki
- Proxy route forwards without CORS errors
Labels: phase-3 helm faro didwebvh
Estimated Effort: ~30min
Requirements: Add faro.enabled, faro.proxy.enabled, faro.proxy.upstreamUrl, faro.collectorUrl to chart values.
Acceptance Criteria:
-
faro.enabled: falseby default
Labels: phase-3 faro rum typescript credo
Estimated Effort: ~5h
Requirements:
- Standard Faro Web SDK configuration inside React or Vue consumer roots
instrumentations: [...getWebInstrumentations(), new TracingInstrumentation()]
Note: Proxy route responsibility falls on the consuming application's backend, not credo-ts itself.
Acceptance Criteria:
- Browser events from consumer app appear in Loki
- Browser traces appear in Tempo
Labels: phase-3 helm faro credo
Estimated Effort: ~30min
Requirements: Add faro.* values to the consuming service's Helm chart.
Acceptance Criteria:
-
faro.enabled: falseby default
Labels: phase-3 faro rum typescript bc-wallet
Estimated Effort: ~4h
Requirements:
- Add Express proxy route (
POST /faro) on backend server forwarding tohttp://alloy:12347 - Add
@grafana/faro-web-sdk,@grafana/faro-web-tracingto frontend workspace - Initialize Faro in
frontend/src/index.tsxbeforecreateRoot, pointingcollectorUrlat the app's own/faropath
Acceptance Criteria:
- React app browser errors and Web Vitals appear in Loki
- Browser XHR/Fetch traces appear in Tempo linked to backend spans
- No CORS errors in browser console
Labels: phase-3 helm faro bc-wallet
Estimated Effort: ~30min
Requirements: Add faro.enabled, faro.proxy.enabled, faro.proxy.upstreamUrl, faro.collectorUrl (injected as build-time args or public/config.json) for the frontend build.
Acceptance Criteria:
-
faro.enabled: falseby default - Faro collector URL injected into React bundle when enabled
Estimated Effort: ~½ day
Validate that Pyroscope and Alloy's Faro receiver are correctly deployed and accepting data in ca7f8f-dev before beginning per-service Phase 3 onboarding. This is the go/no-go gate for Phase 3 service work.
Unlike Phase 1, there is no synthetic profiling data generator equivalent to telemetrygen. Pyroscope is validated via health check and API reachability. The Faro receiver is validated by sending a test payload directly to Alloy — no browser or instrumented service required. Per-service data flow validation is covered by each service's acceptance criteria.
- Confirm Pyroscope is running and healthy in
ca7f8f-dev(query the/readyendpoint) - Confirm Pyroscope is reachable from within the namespace (e.g. from a debug pod)
- Send a minimal test payload to Alloy's Faro receiver endpoint (
http://alloy:12347/collect) and confirm it is accepted (HTTP 200) - Confirm the test Faro payload appears as a log entry in Loki (dev)
- Resolve any RBAC, network policy, or Alloy pipeline issues before proceeding
Pyroscope is running and reachable from within ca7f8f-dev. Alloy's Faro receiver is accepting payloads and routing them to Loki. The infrastructure is confirmed ready for per-service Phase 3 onboarding.
- Pyroscope
/readyendpoint returns healthy inca7f8f-dev - Pyroscope reachable from within the namespace on its configured port
- Alloy faro.receiver accepts a test POST to
/collectwithout error - Test Faro payload visible as a log entry in Grafana → Loki (dev)
- Any network policy or RBAC issues identified and resolved