feat(metric): DORA (real) — Datadog-derived CFR + MTTR + rollback rate (#15)#41
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
7 tasks
trentas
added a commit
that referenced
this pull request
May 13, 2026
Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for v1.0.6 covering the Datadog integration end-to-end across slices 1-5 (PRs #36, #37, #39, #40, #41, #42). Highlights: - Connect flow + encrypted credentials (slice 2) - Daily Vercel Cron sync into external_deployments / _commits / _incidents (slice 3) - Engine consumes events and emits 18 new dora_* fields including CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead time / deploy frequency / by-origin breakdowns (slice 4 + 5) - Dashboard DORA section with the "Datadog" badge and the AI-vs-human correlation card (slice 5) - Setup docs at docs/integrations/datadog.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5a87157 to
8a147f0
Compare
#15) Slice 4 of the Datadog integration. Wires the engine to consume the external events ingested in slice 3 and emit a new dora_* metric family on every analysis run that has an active integration. Engine (Python): - iris/models/external.py — dataclasses for the pre-fetched payload (ExternalDeployment / ExternalDeploymentCommit / ExternalIncident / ExternalDORAData) the aggregator consumes. - iris/analysis/dora_real.py — computes CFR, per-deploy MTTR (p50/p90), per-incident MTTR (p50/p90), rollback rate, lead time, deploy frequency, and a remediation_type distribution. Tri-state change_failure is handled correctly: null deploys are excluded from the CFR denominator and surfaced as a separate "pending" bucket. - iris/metrics/aggregator.py — new optional external_data param that routes through dora_real and merges the result into ReportMetrics. - iris/models/metrics.py — adds the fifteen dora_* fields. - iris/reports/narrative.py + iris/i18n.py — descriptive findings (CFR, MTTR per-deploy, rollback rate; en + pt-br copy) emitted when the metric is populated. - iris/ingestion/external_reader.py + iris/cli.py — opportunistic fetch from the platform when the CLI is logged in. Any failure (no auth, no integration, network, malformed payload) falls through with None, so standalone `iris .` runs are unaffected. Platform: - src/app/api/integrations/datadog/events/route.ts — new GET endpoint scoped by api token. Returns deployments (with their commits) and incidents for the org in the requested window. Distinguishes "no active integration" (source: null, empty arrays) from "no events in window" (source: "datadog", empty arrays). - src/types/metrics.ts — mirrors the new ReportMetrics fields. Docs/tests: - docs/METRICS.md — full entries for the dora_* family with the tri-state semantics and the dual-MTTR (per-deploy vs per-incident) story. - tests/test_dora_real.py — 10 tests covering tri-state CFR, MTTR per deploy/incident, rollback rate (incl. null when no failures), lead-time aggregation, and deploy-frequency windowing. Out of scope (slice 5): dashboard surfacing, "CFR by code origin" correlation card, and the rollback-by-origin breakdown — those are platform-side joins of external_deployment_commits against commit_origin and don't need the engine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b7e21d3 to
b58823e
Compare
trentas
added a commit
that referenced
this pull request
May 13, 2026
Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for v1.0.6 covering the Datadog integration end-to-end across slices 1-5 (PRs #36, #37, #39, #40, #41, #42). Highlights: - Connect flow + encrypted credentials (slice 2) - Daily Vercel Cron sync into external_deployments / _commits / _incidents (slice 3) - Engine consumes events and emits 18 new dora_* fields including CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead time / deploy frequency / by-origin breakdowns (slice 4 + 5) - Dashboard DORA section with the "Datadog" badge and the AI-vs-human correlation card (slice 5) - Setup docs at docs/integrations/datadog.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trentas
added a commit
that referenced
this pull request
May 13, 2026
#15) (#42) * feat(dashboard): DORA section + CFR-by-origin correlation + setup docs (#15) Slice 5 — closes the Datadog integration loop. The dashboard now surfaces the dora_* metric family with a "Datadog" badge, an AI-vs-human CFR correlation card backed by a per-commit join, and a silent-decay guard on the integration detail page. Final piece: customer-facing setup documentation. Engine (Python): - iris/analysis/dora_real.py — new ``cfr_by_origin`` / ``rollback_rate_by_origin`` breakdowns when the aggregator passes the local commit-origin map. Per-commit join: each commit on each evaluated deploy is bucketed by its origin; commits not present in the local window are dropped silently but reflected in coverage_pct so the dashboard can warn when attribution is thin. - iris/metrics/aggregator.py — passes ``origin_map`` through to ``analyze_dora_real``. - iris/models/metrics.py — adds ``dora_cfr_by_origin`` and ``dora_rollback_rate_by_origin``. - tests/test_dora_real.py — 4 new tests covering the per-commit join, unknown-commit handling with coverage reporting, rollback filtering, and the no-origin-map default. Platform: - src/types/metrics.ts — TS mirrors of the two new dora_* fields. - src/types/org-summary.ts — new OrgDORA aggregation type. - lib/queries/org-summary.ts — computeDORA() sums deploys / failures / rollbacks across repos, weights CFR by evaluated deploys, and aggregates the by-origin breakdown. Returns null when no repo has an active integration. - src/app/[tenant]/dashboard/sections/DORAOverview.tsx — headline cards (CFR, MTTR per failed deploy, deploy frequency, lead time) plus a "Datadog" badge, a fact strip (deploys / rollback rate / pending), and the CFR-by-origin + rollback-rate-by-origin correlation tables. The correlation card stays hidden until the org has ≥ 10 failed deploys (per §9.6 — was 10 incidents pre-revision). - src/app/[tenant]/dashboard/page.tsx — wires the new section in. - src/components/integrations/datadog-connect-form.tsx + src/app/[tenant]/settings/integrations/[provider]/page.tsx — the §9.8 silent-decay hint: "last incident registered X days ago" on the detail page. Days are server-computed to keep the client component pure. - platform/lib/translations.ts — full en + pt-br copy for the new surfaces. - platform/tests/dora-aggregation.test.ts — 4 tests for computeDORA(). Docs: - docs/integrations/datadog.md — customer setup guide. Covers the Application Key scope, regional sites, the connect flow, the cron schedule, what we read / don't read, repository matching, the disconnect behavior, and operational notes (backfill window, rate limits, encryption rotation). - docs/METRICS.md — adds the two new dora_*_by_origin fields and the module-map row. Verified: - python -m pytest tests/ -q → 113 passed (4 new) - platform: npx tsc --noEmit → clean - platform: npx vitest run → 175 passed (4 new) - platform: npx eslint → clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(integrations): coverage_pct math + dead code + form gating on error (#15) Three issues surfaced in the slice 5 audit: 1. `iris/analysis/dora_real.py` — the per-origin `coverage_pct` divided each origin's commits by `(this origin + ALL unknowns)`, so every origin's coverage dropped by the full unknown count. The right semantic is org-wide attribution coverage. Hoisted to a single result field `cfr_by_origin_coverage_pct` and removed from each per-origin dict. 2. Same file — `_referenced` was assigned and immediately popped from the dict; dead code, dropped. 3. `platform/src/components/integrations/datadog-connect-form.tsx` — the connected card only rendered when `status === "active"`, so an integration in `status: "error"` fell through to the connect form and lost the very surfaces (last_sync_at, last_error, unmatched count, days-since-last-incident) the operator needs to debug. Now renders the status card for both `active` and `error`, with the shield icon and copy switched to an error variant when the sync is failing. Schema / TS / docs aligned: - `iris/models/metrics.py` adds `dora_cfr_by_origin_coverage_pct`. - `iris/metrics/aggregator.py` wires it. - `platform/src/types/metrics.ts` drops `coverage_pct` from the per-origin shape and adds the new top-level field. - `docs/METRICS.md` updates the field table and the explanatory blurb; module-map row picks up the new field. - `platform/lib/translations.ts` — en + pt-br copy for the new error state. Tests: - `tests/test_dora_real.py` — old `coverage_pct` assertion replaced by two focused tests (mixed known/unknown drops org-wide coverage; full attribution reports 100%; no origin map → None). - `platform/tests/dora-aggregation.test.ts` — adjusts mock payloads to drop the (now-removed) `coverage_pct` field on per-origin entries. Verified: pytest 115 passed (16 dora_real tests), tsc clean, vitest 175 passed, eslint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(platform): restore build version in footer (#15) The footer's `process.env.NEXT_PUBLIC_BUILD_VERSION || "dev"` lookup fell back to "dev" on every Vercel deploy because the env var was never wired up. Reads from `package.json` at config-load time and appends the Vercel commit SHA (`VERCEL_GIT_COMMIT_SHA`) when present so production / preview deploys carry a unique identifier between releases. Loaded via fs instead of an ESM JSON import to stay portable across Next's TS loader and direct Node ESM execution (the latter requires `with { type: "json" }`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v1.0.6 — Datadog DORA integration (#15) Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for v1.0.6 covering the Datadog integration end-to-end across slices 1-5 (PRs #36, #37, #39, #40, #41, #42). Highlights: - Connect flow + encrypted credentials (slice 2) - Daily Vercel Cron sync into external_deployments / _commits / _incidents (slice 3) - Engine consumes events and emits 18 new dora_* fields including CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead time / deploy frequency / by-origin breakdowns (slice 4 + 5) - Dashboard DORA section with the "Datadog" badge and the AI-vs-human correlation card (slice 5) - Setup docs at docs/integrations/datadog.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on top of #40 (slice 3). Re-target to
mainafter #40 merges.Summary
Slice 4 wires the engine to the external events that slice 3 ingests. Every analysis run that has an active Datadog integration now emits a
dora_*family onReportMetrics.Engine (Python)
iris/models/external.py—ExternalDeployment/ExternalDeploymentCommit/ExternalIncident/ExternalDORAData(the payload shape the aggregator consumes).iris/analysis/dora_real.py— computes CFR, MTTR per-deploy (p50 / p90), MTTR per-incident (p50 / p90), rollback rate, lead time, deploy frequency, and aremediation_typedistribution. Tri-statechange_failurehandled correctly —nullis excluded from the CFR denominator and surfaced asdora_deployments_pending_evaluation.iris/metrics/aggregator.py—aggregate(...)gains an optionalexternal_dataparameter; result is merged intoReportMetrics.iris/models/metrics.py— 15 newdora_*fields.iris/reports/narrative.py+iris/i18n.py— descriptive findings (CFR, MTTR, rollback rate) in en + pt-br.iris/ingestion/external_reader.py+iris/cli.py— opportunistic fetch when the CLI is logged in. Any failure (no auth, no integration, network, malformed) falls through withNone— standaloneiris .runs keep working unchanged.Platform
src/app/api/integrations/datadog/events/route.ts— newGETendpoint authed by theiris_*api token. Returns deployments (with their commits) and incidents for the org window. Distinguishes "no active integration" (source: null) from "no events in window" (source: "datadog", empty arrays).src/types/metrics.ts— TS mirror of the new fields.Docs / tests
docs/METRICS.md— entries for the entiredora_*family, including the tri-state semantics and the dual-MTTR (per-deploy vs per-incident) story.tests/test_dora_real.py— 10 tests: tri-state CFR, MTTR per deploy/incident, rollback rate (incl. null when no failures), lead time aggregation, deploy frequency windowing.Verified locally
python -m pytest tests/ -q→ 109 passedcd platform && npx tsc --noEmit→ cleancd platform && npx vitest run→ 171 passedcd platform && npx eslint ...→ cleanOut of scope (lands in slice 5)
dora_*fields (DORA card on/[tenant]/dashboardwith the "Datadog" badge).external_deployment_commits.commit_sha ↔ commit_origin.commit_shafiltered onchange_failure = true).remediation_type = 'rollback'.Test plan
main, re-target tomainGET /api/integrations/datadog/events?from=…&to=…with a validiris_*token; confirmsource: "datadog"and a populateddeploymentsarraysource: nulland empty arraysiris .while logged in: the resultingmetrics.jsoncarriesdora_source: "datadog"and the fifteendora_*fieldsiris .while logged out: nodora_*fields in the payload (standalone behavior preserved)report.md"Key findings" section includes the DORA descriptive bullet whendora_cfris set🤖 Generated with Claude Code