Skip to content

docs(integrations): planning doc for Datadog integration (#15)#36

Merged
trentas merged 1 commit into
mainfrom
plan/datadog-integration
May 13, 2026
Merged

docs(integrations): planning doc for Datadog integration (#15)#36
trentas merged 1 commit into
mainfrom
plan/datadog-integration

Conversation

@trentas
Copy link
Copy Markdown
Contributor

@trentas trentas commented May 13, 2026

Summary

Pre-implementation plan for #15. Doesn't ship code — resolves the five open questions in the issue, lays out a 5-PR breakdown, and flags the decisions that still need explicit user input before ingestion slices can land.

Decisions made (all overridable on review):

# Question Picked Reason
1 DORA API v2 vs generic Metrics API DORA API v2 Event-level data with commit_sha/repository_url — needed for attribution.
2 Cron infra Vercel Cron Same repo, same deploy, 300 s budget enough for daily pulls.
3 Encryption Postgres pgcrypto + env master key Defense-in-depth; KMS deferred to v2.
4 RLS Service-role-only + RLS as defense-in-depth Mirrors organizations access.
5 Service → repo matching Auto by repository_url + manual override Customer-controlled mapping when DD lacks the URL.

PR plan (each ~independent, depends only on the prior):

# Slice LOC
1 UI skeleton (page + nav, no DB) ~120
2 DB + encryption + connect flow ~450
3 Ingestion + cron + tables ~600
4 Engine consumes external data ~350
5 Dashboard surfacing + new correlation ~250

User can stop at any boundary — slice 2 alone gives "Connect Datadog" UX useful for marketing/sales validation, slice 3 starts producing data, etc.

Decisions still needed from the user (block slice 3+)

  1. Initial backfill window (proposed: 30 days)
  2. Cron cadence (proposed: daily at 04:00 UTC)
  3. Real Datadog API key to test against
  4. First correlation card to surface on the dashboard
  5. Confirmation that picking this issue effectively opens Stage 3

PR 1 ships independent of all of these.

Test plan

  • Doc reads end-to-end and references real file paths / existing modules
  • Review the proposed answers in §1 + the schema in §3; flag anything you want to change before PR 1 starts
  • Answer the §7 decisions when ready to move past slice 1

🤖 Generated with Claude Code

Pre-implementation plan for the Datadog integration that resolves the
five open questions from #15, proposes a 5-PR breakdown, and surfaces
the decisions that still need user input before slice 3 ships.

Headline decisions documented (all overridable on review):

- DORA Metrics API v2 (event-level), not the generic Metrics API
- Vercel Cron for the daily pull, not Supabase Edge Functions or a
  standalone worker
- Postgres pgcrypto with INTEGRATIONS_ENCRYPTION_KEY for at-rest
  credential encryption; KMS deferred to v2
- service→repo matched via Datadog's repository_url field with manual
  override when unmatched
- Initial backfill capped at 30 days to bound first-sync API cost

Five user-facing decisions are flagged at the end (backfill window,
cron cadence, test Datadog key, first correlation card, Stage 3
opening). PR 1 (UI skeleton) ships independent of all of them; PRs 2-5
gate on these answers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickbus-iris Building Building Preview, Comment May 13, 2026 2:53am

Request Review

@trentas trentas merged commit 2b85f70 into main May 13, 2026
3 of 4 checks passed
@trentas trentas deleted the plan/datadog-integration branch May 13, 2026 02:54
trentas added a commit that referenced this pull request May 13, 2026
Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trentas added a commit that referenced this pull request May 13, 2026
Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trentas added a commit that referenced this pull request May 13, 2026
#15) (#42)

* feat(dashboard): DORA section + CFR-by-origin correlation + setup docs (#15)

Slice 5 — closes the Datadog integration loop. The dashboard now
surfaces the dora_* metric family with a "Datadog" badge, an
AI-vs-human CFR correlation card backed by a per-commit join, and a
silent-decay guard on the integration detail page. Final piece:
customer-facing setup documentation.

Engine (Python):
- iris/analysis/dora_real.py — new ``cfr_by_origin`` /
  ``rollback_rate_by_origin`` breakdowns when the aggregator passes
  the local commit-origin map. Per-commit join: each commit on each
  evaluated deploy is bucketed by its origin; commits not present in
  the local window are dropped silently but reflected in coverage_pct
  so the dashboard can warn when attribution is thin.
- iris/metrics/aggregator.py — passes ``origin_map`` through to
  ``analyze_dora_real``.
- iris/models/metrics.py — adds ``dora_cfr_by_origin`` and
  ``dora_rollback_rate_by_origin``.
- tests/test_dora_real.py — 4 new tests covering the per-commit join,
  unknown-commit handling with coverage reporting, rollback filtering,
  and the no-origin-map default.

Platform:
- src/types/metrics.ts — TS mirrors of the two new dora_* fields.
- src/types/org-summary.ts — new OrgDORA aggregation type.
- lib/queries/org-summary.ts — computeDORA() sums deploys / failures /
  rollbacks across repos, weights CFR by evaluated deploys, and
  aggregates the by-origin breakdown. Returns null when no repo has an
  active integration.
- src/app/[tenant]/dashboard/sections/DORAOverview.tsx — headline
  cards (CFR, MTTR per failed deploy, deploy frequency, lead time)
  plus a "Datadog" badge, a fact strip (deploys / rollback rate /
  pending), and the CFR-by-origin + rollback-rate-by-origin
  correlation tables. The correlation card stays hidden until the org
  has ≥ 10 failed deploys (per §9.6 — was 10 incidents pre-revision).
- src/app/[tenant]/dashboard/page.tsx — wires the new section in.
- src/components/integrations/datadog-connect-form.tsx +
  src/app/[tenant]/settings/integrations/[provider]/page.tsx — the
  §9.8 silent-decay hint: "last incident registered X days ago" on
  the detail page. Days are server-computed to keep the client
  component pure.
- platform/lib/translations.ts — full en + pt-br copy for the new
  surfaces.
- platform/tests/dora-aggregation.test.ts — 4 tests for computeDORA().

Docs:
- docs/integrations/datadog.md — customer setup guide. Covers the
  Application Key scope, regional sites, the connect flow, the cron
  schedule, what we read / don't read, repository matching, the
  disconnect behavior, and operational notes (backfill window, rate
  limits, encryption rotation).
- docs/METRICS.md — adds the two new dora_*_by_origin fields and the
  module-map row.

Verified:
- python -m pytest tests/ -q → 113 passed (4 new)
- platform: npx tsc --noEmit → clean
- platform: npx vitest run → 175 passed (4 new)
- platform: npx eslint → clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(integrations): coverage_pct math + dead code + form gating on error (#15)

Three issues surfaced in the slice 5 audit:

1. `iris/analysis/dora_real.py` — the per-origin `coverage_pct` divided
   each origin's commits by `(this origin + ALL unknowns)`, so every
   origin's coverage dropped by the full unknown count. The right
   semantic is org-wide attribution coverage. Hoisted to a single
   result field `cfr_by_origin_coverage_pct` and removed from each
   per-origin dict.

2. Same file — `_referenced` was assigned and immediately popped from
   the dict; dead code, dropped.

3. `platform/src/components/integrations/datadog-connect-form.tsx` —
   the connected card only rendered when `status === "active"`, so an
   integration in `status: "error"` fell through to the connect form
   and lost the very surfaces (last_sync_at, last_error, unmatched
   count, days-since-last-incident) the operator needs to debug. Now
   renders the status card for both `active` and `error`, with the
   shield icon and copy switched to an error variant when the sync is
   failing.

Schema / TS / docs aligned:
- `iris/models/metrics.py` adds `dora_cfr_by_origin_coverage_pct`.
- `iris/metrics/aggregator.py` wires it.
- `platform/src/types/metrics.ts` drops `coverage_pct` from the
  per-origin shape and adds the new top-level field.
- `docs/METRICS.md` updates the field table and the explanatory blurb;
  module-map row picks up the new field.
- `platform/lib/translations.ts` — en + pt-br copy for the new error
  state.

Tests:
- `tests/test_dora_real.py` — old `coverage_pct` assertion replaced by
  two focused tests (mixed known/unknown drops org-wide coverage; full
  attribution reports 100%; no origin map → None).
- `platform/tests/dora-aggregation.test.ts` — adjusts mock payloads to
  drop the (now-removed) `coverage_pct` field on per-origin entries.

Verified: pytest 115 passed (16 dora_real tests), tsc clean, vitest 175
passed, eslint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(platform): restore build version in footer (#15)

The footer's `process.env.NEXT_PUBLIC_BUILD_VERSION || "dev"` lookup
fell back to "dev" on every Vercel deploy because the env var was
never wired up. Reads from `package.json` at config-load time and
appends the Vercel commit SHA (`VERCEL_GIT_COMMIT_SHA`) when present
so production / preview deploys carry a unique identifier between
releases.

Loaded via fs instead of an ESM JSON import to stay portable across
Next's TS loader and direct Node ESM execution (the latter requires
`with { type: "json" }`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v1.0.6 — Datadog DORA integration (#15)

Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant