Skip to content

feat(metric): DORA (real) — Datadog-derived CFR + MTTR + rollback rate (#15)#41

Merged
trentas merged 1 commit into
mainfrom
feat/datadog-slice-4
May 13, 2026
Merged

feat(metric): DORA (real) — Datadog-derived CFR + MTTR + rollback rate (#15)#41
trentas merged 1 commit into
mainfrom
feat/datadog-slice-4

Conversation

@trentas
Copy link
Copy Markdown
Contributor

@trentas trentas commented May 13, 2026

Stacked on top of #40 (slice 3). Re-target to main after #40 merges.

Summary

Slice 4 wires the engine to the external events that slice 3 ingests. Every analysis run that has an active Datadog integration now emits a dora_* family on ReportMetrics.

Engine (Python)

  • iris/models/external.pyExternalDeployment / ExternalDeploymentCommit / ExternalIncident / ExternalDORAData (the payload shape the aggregator consumes).
  • iris/analysis/dora_real.py — computes CFR, MTTR per-deploy (p50 / p90), MTTR per-incident (p50 / p90), rollback rate, lead time, deploy frequency, and a remediation_type distribution. Tri-state change_failure handled correctly — null is excluded from the CFR denominator and surfaced as dora_deployments_pending_evaluation.
  • iris/metrics/aggregator.pyaggregate(...) gains an optional external_data parameter; result is merged into ReportMetrics.
  • iris/models/metrics.py — 15 new dora_* fields.
  • iris/reports/narrative.py + iris/i18n.py — descriptive findings (CFR, MTTR, rollback rate) in en + pt-br.
  • iris/ingestion/external_reader.py + iris/cli.py — opportunistic fetch when the CLI is logged in. Any failure (no auth, no integration, network, malformed) falls through with None — standalone iris . runs keep working unchanged.

Platform

  • src/app/api/integrations/datadog/events/route.ts — new GET endpoint authed by the iris_* api token. Returns deployments (with their commits) and incidents for the org window. Distinguishes "no active integration" (source: null) from "no events in window" (source: "datadog", empty arrays).
  • src/types/metrics.ts — TS mirror of the new fields.

Docs / tests

  • docs/METRICS.md — entries for the entire dora_* family, including the tri-state semantics and the dual-MTTR (per-deploy vs per-incident) story.
  • tests/test_dora_real.py — 10 tests: tri-state CFR, MTTR per deploy/incident, rollback rate (incl. null when no failures), lead time aggregation, deploy frequency windowing.

Verified locally

  • python -m pytest tests/ -q → 109 passed
  • cd platform && npx tsc --noEmit → clean
  • cd platform && npx vitest run → 171 passed
  • cd platform && npx eslint ... → clean

Out of scope (lands in slice 5)

  • Dashboard surfacing of the dora_* fields (DORA card on /[tenant]/dashboard with the "Datadog" badge).
  • "CFR by code origin" correlation card — pure Supabase join (external_deployment_commits.commit_sha ↔ commit_origin.commit_sha filtered on change_failure = true).
  • Rollback-rate by origin — same join, filtered on remediation_type = 'rollback'.
  • "Days since last incident registered" detail-page hint (§9.8).

Test plan

  • PR feat(integrations): slice 3 — daily Datadog sync + cron + UI surfacing (#15) #40 merges first → rebase this branch onto main, re-target to main
  • After deploy: with an org that has a connected Datadog integration AND a successful cron run, hit GET /api/integrations/datadog/events?from=…&to=… with a valid iris_* token; confirm source: "datadog" and a populated deployments array
  • With a brand-new org (no integration), the same endpoint returns source: null and empty arrays
  • Running iris . while logged in: the resulting metrics.json carries dora_source: "datadog" and the fifteen dora_* fields
  • Running iris . while logged out: no dora_* fields in the payload (standalone behavior preserved)
  • report.md "Key findings" section includes the DORA descriptive bullet when dora_cfr is set

🤖 Generated with Claude Code

@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickbus-iris Ready Ready Preview, Comment May 13, 2026 9:26pm

Request Review

trentas added a commit that referenced this pull request May 13, 2026
Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@trentas trentas force-pushed the feat/datadog-slice-3 branch from 5a87157 to 8a147f0 Compare May 13, 2026 21:22
#15)

Slice 4 of the Datadog integration. Wires the engine to consume the
external events ingested in slice 3 and emit a new dora_* metric family
on every analysis run that has an active integration.

Engine (Python):
- iris/models/external.py — dataclasses for the pre-fetched payload
  (ExternalDeployment / ExternalDeploymentCommit / ExternalIncident /
  ExternalDORAData) the aggregator consumes.
- iris/analysis/dora_real.py — computes CFR, per-deploy MTTR (p50/p90),
  per-incident MTTR (p50/p90), rollback rate, lead time, deploy
  frequency, and a remediation_type distribution. Tri-state
  change_failure is handled correctly: null deploys are excluded from
  the CFR denominator and surfaced as a separate "pending" bucket.
- iris/metrics/aggregator.py — new optional external_data param that
  routes through dora_real and merges the result into ReportMetrics.
- iris/models/metrics.py — adds the fifteen dora_* fields.
- iris/reports/narrative.py + iris/i18n.py — descriptive findings
  (CFR, MTTR per-deploy, rollback rate; en + pt-br copy) emitted when
  the metric is populated.
- iris/ingestion/external_reader.py + iris/cli.py — opportunistic
  fetch from the platform when the CLI is logged in. Any failure
  (no auth, no integration, network, malformed payload) falls through
  with None, so standalone `iris .` runs are unaffected.

Platform:
- src/app/api/integrations/datadog/events/route.ts — new GET endpoint
  scoped by api token. Returns deployments (with their commits) and
  incidents for the org in the requested window. Distinguishes "no
  active integration" (source: null, empty arrays) from "no events
  in window" (source: "datadog", empty arrays).
- src/types/metrics.ts — mirrors the new ReportMetrics fields.

Docs/tests:
- docs/METRICS.md — full entries for the dora_* family with the tri-state
  semantics and the dual-MTTR (per-deploy vs per-incident) story.
- tests/test_dora_real.py — 10 tests covering tri-state CFR, MTTR per
  deploy/incident, rollback rate (incl. null when no failures),
  lead-time aggregation, and deploy-frequency windowing.

Out of scope (slice 5): dashboard surfacing, "CFR by code origin"
correlation card, and the rollback-by-origin breakdown — those are
platform-side joins of external_deployment_commits against commit_origin
and don't need the engine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@trentas trentas force-pushed the feat/datadog-slice-4 branch from b7e21d3 to b58823e Compare May 13, 2026 21:25
@trentas trentas changed the base branch from feat/datadog-slice-3 to main May 13, 2026 21:25
@trentas trentas merged commit 62ca616 into main May 13, 2026
5 checks passed
trentas added a commit that referenced this pull request May 13, 2026
Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trentas added a commit that referenced this pull request May 13, 2026
#15) (#42)

* feat(dashboard): DORA section + CFR-by-origin correlation + setup docs (#15)

Slice 5 — closes the Datadog integration loop. The dashboard now
surfaces the dora_* metric family with a "Datadog" badge, an
AI-vs-human CFR correlation card backed by a per-commit join, and a
silent-decay guard on the integration detail page. Final piece:
customer-facing setup documentation.

Engine (Python):
- iris/analysis/dora_real.py — new ``cfr_by_origin`` /
  ``rollback_rate_by_origin`` breakdowns when the aggregator passes
  the local commit-origin map. Per-commit join: each commit on each
  evaluated deploy is bucketed by its origin; commits not present in
  the local window are dropped silently but reflected in coverage_pct
  so the dashboard can warn when attribution is thin.
- iris/metrics/aggregator.py — passes ``origin_map`` through to
  ``analyze_dora_real``.
- iris/models/metrics.py — adds ``dora_cfr_by_origin`` and
  ``dora_rollback_rate_by_origin``.
- tests/test_dora_real.py — 4 new tests covering the per-commit join,
  unknown-commit handling with coverage reporting, rollback filtering,
  and the no-origin-map default.

Platform:
- src/types/metrics.ts — TS mirrors of the two new dora_* fields.
- src/types/org-summary.ts — new OrgDORA aggregation type.
- lib/queries/org-summary.ts — computeDORA() sums deploys / failures /
  rollbacks across repos, weights CFR by evaluated deploys, and
  aggregates the by-origin breakdown. Returns null when no repo has an
  active integration.
- src/app/[tenant]/dashboard/sections/DORAOverview.tsx — headline
  cards (CFR, MTTR per failed deploy, deploy frequency, lead time)
  plus a "Datadog" badge, a fact strip (deploys / rollback rate /
  pending), and the CFR-by-origin + rollback-rate-by-origin
  correlation tables. The correlation card stays hidden until the org
  has ≥ 10 failed deploys (per §9.6 — was 10 incidents pre-revision).
- src/app/[tenant]/dashboard/page.tsx — wires the new section in.
- src/components/integrations/datadog-connect-form.tsx +
  src/app/[tenant]/settings/integrations/[provider]/page.tsx — the
  §9.8 silent-decay hint: "last incident registered X days ago" on
  the detail page. Days are server-computed to keep the client
  component pure.
- platform/lib/translations.ts — full en + pt-br copy for the new
  surfaces.
- platform/tests/dora-aggregation.test.ts — 4 tests for computeDORA().

Docs:
- docs/integrations/datadog.md — customer setup guide. Covers the
  Application Key scope, regional sites, the connect flow, the cron
  schedule, what we read / don't read, repository matching, the
  disconnect behavior, and operational notes (backfill window, rate
  limits, encryption rotation).
- docs/METRICS.md — adds the two new dora_*_by_origin fields and the
  module-map row.

Verified:
- python -m pytest tests/ -q → 113 passed (4 new)
- platform: npx tsc --noEmit → clean
- platform: npx vitest run → 175 passed (4 new)
- platform: npx eslint → clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(integrations): coverage_pct math + dead code + form gating on error (#15)

Three issues surfaced in the slice 5 audit:

1. `iris/analysis/dora_real.py` — the per-origin `coverage_pct` divided
   each origin's commits by `(this origin + ALL unknowns)`, so every
   origin's coverage dropped by the full unknown count. The right
   semantic is org-wide attribution coverage. Hoisted to a single
   result field `cfr_by_origin_coverage_pct` and removed from each
   per-origin dict.

2. Same file — `_referenced` was assigned and immediately popped from
   the dict; dead code, dropped.

3. `platform/src/components/integrations/datadog-connect-form.tsx` —
   the connected card only rendered when `status === "active"`, so an
   integration in `status: "error"` fell through to the connect form
   and lost the very surfaces (last_sync_at, last_error, unmatched
   count, days-since-last-incident) the operator needs to debug. Now
   renders the status card for both `active` and `error`, with the
   shield icon and copy switched to an error variant when the sync is
   failing.

Schema / TS / docs aligned:
- `iris/models/metrics.py` adds `dora_cfr_by_origin_coverage_pct`.
- `iris/metrics/aggregator.py` wires it.
- `platform/src/types/metrics.ts` drops `coverage_pct` from the
  per-origin shape and adds the new top-level field.
- `docs/METRICS.md` updates the field table and the explanatory blurb;
  module-map row picks up the new field.
- `platform/lib/translations.ts` — en + pt-br copy for the new error
  state.

Tests:
- `tests/test_dora_real.py` — old `coverage_pct` assertion replaced by
  two focused tests (mixed known/unknown drops org-wide coverage; full
  attribution reports 100%; no origin map → None).
- `platform/tests/dora-aggregation.test.ts` — adjusts mock payloads to
  drop the (now-removed) `coverage_pct` field on per-origin entries.

Verified: pytest 115 passed (16 dora_real tests), tsc clean, vitest 175
passed, eslint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(platform): restore build version in footer (#15)

The footer's `process.env.NEXT_PUBLIC_BUILD_VERSION || "dev"` lookup
fell back to "dev" on every Vercel deploy because the env var was
never wired up. Reads from `package.json` at config-load time and
appends the Vercel commit SHA (`VERCEL_GIT_COMMIT_SHA`) when present
so production / preview deploys carry a unique identifier between
releases.

Loaded via fs instead of an ESM JSON import to stay portable across
Next's TS loader and direct Node ESM execution (the latter requires
`with { type: "json" }`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v1.0.6 — Datadog DORA integration (#15)

Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@trentas trentas deleted the feat/datadog-slice-4 branch May 14, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant