Skip to content

feat(integrations): slice 3 — daily Datadog sync + cron + UI surfacing (#15)#40

Merged
trentas merged 2 commits into
mainfrom
feat/datadog-slice-3
May 13, 2026
Merged

feat(integrations): slice 3 — daily Datadog sync + cron + UI surfacing (#15)#40
trentas merged 2 commits into
mainfrom
feat/datadog-slice-3

Conversation

@trentas
Copy link
Copy Markdown
Contributor

@trentas trentas commented May 13, 2026

Stacked on top of #39 (slice 2). Re-target to main after #39 merges, then rebase this branch onto main so reviewers see a clean diff.

Summary

Slice 3 of #15: ingest DORA events daily and persist them so slice 4/5 can read them.

  • Migrations015_external_deployments, 016_external_deployment_commits, 017_external_incidents. Schema follows §9.2 of docs/PLAN-datadog.md (tri-state change_failure, recovery_time_sec, remediation_*, service/env/team as text[] on incidents). UNIQUE (provider, provider_event_id) makes the upsert idempotent.
  • Client extensionslistDeployments / listFailures wrap POST /api/v2/dora/{deployments,failures} with the body shape from §8 (data.type = "dora_*_list_request", ISO 8601 from/to with trailing Z).
  • Sync module (platform/lib/integrations/datadog/sync.ts) — per-org pipeline:
    • Cursor by last_sync_at; 30-day default backfill on first run.
    • Time-slicing pagination (§9.5 — DORA v2 has no cursor mechanism). Anti-spin guard for the sub-second-co-occurrence edge case.
    • Idempotent upsert by (provider, provider_event_id); deployment commits upserted on the composite PK so reruns don't fail.
    • Repository matching via normalizeRepoSlug — handles git@host:org/repo.git, ssh://, https://, .git, www., and trailing slashes.
    • Flips status to error and writes last_error on failure; resets to active with a fresh last_sync_at on success.
  • Cronvercel.json registers 0 4 * * * UTC against /api/cron/sync-integrations. Route is auth-gated by CRON_SECRET (Authorization: Bearer … from Vercel Cron, or x-cron-secret for manual dev triggers); in non-production with no secret, the route is open for npm run dev ergonomics.
  • UI — provider detail page now queries the unmatched-deployments count and the connect form surfaces it alongside last_sync_at / last_error. Translation keys in en-US + pt-BR.
  • Teststests/datadog-sync.test.ts covers slug normalization across all common remote-URL shapes, pagination-helper edge cases, and the DD zero-value timestamp guard.

Open decisions (default applied, redirect welcome)

Out of scope (deferred to slice 4 / 5)

  • iris/analysis/dora_real.py (CFR + MTTR computation from external_*)
  • Dashboard badges (Datadog vs Estimated)
  • CFR-by-code-origin correlation card
  • Rollback-rate metric
  • "Days since last incident registered" hint (§9.8)

Test plan

  • PR feat(integrations): slice 2 — DB + encryption + connect flow for Datadog (#15) #39 merges first; then rebase this branch onto main and re-target to main
  • npx supabase migration up applies 015/016/017 cleanly
  • Set CRON_SECRET in Vercel (Preview + Production)
  • Manually trigger GET /api/cron/sync-integrations with x-cron-secret: …; verify response shows succeeded: 1, failed: 0 for the active integration
  • After the sync, inspect external_deployments — every row should have a non-null started_at, idempotent rerun produces zero net changes
  • Confirm external_deployment_commits is populated with change_lead_time / time_to_deploy per commit
  • Confirm external_incidents.service is a text array and the GIN index is usable
  • On the provider detail page, the unmatched-deployments hint appears only when count > 0
  • Run npx vitest run tests/datadog-sync.test.ts — 13 tests pass

🤖 Generated with Claude Code

@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickbus-iris Ready Ready Preview, Comment May 13, 2026 9:22pm

Request Review

trentas added a commit that referenced this pull request May 13, 2026
Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@trentas trentas changed the base branch from feat/datadog-slice-2 to main May 13, 2026 21:21
trentas and others added 2 commits May 13, 2026 18:21
…ts + incidents (#15)

Adds the three tables that the daily Datadog sync writes into:

- external_deployments — DORA deployment events (one row per Datadog
  event id). Tri-state change_failure (TRUE | FALSE | NULL = pending
  evaluation), recovery_time_sec and remediation_* for per-deploy MTTR,
  and dd_repository_id retained for debuggability when repo matching
  fails. UNIQUE (provider, provider_event_id) for idempotent upsert.
- external_deployment_commits — per-commit detail unpacked from
  attributes.commits[]. This is the join key for the AI-vs-human CFR
  correlation in slice 5 (commit_sha ↔ commit_origin.commit_sha).
- external_incidents — DORA failure events. service / env / team as
  TEXT[] because Datadog returns them as arrays; GIN index on service
  for the per-service incident queries.

Schema and field choices match the production probe in
docs/PLAN-datadog.md §9.2 (sampled 500 deploys / 5 failures on a real
tenant); no RLS, consistent with the rest of the iris-specific tables.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#15)

Builds the daily ingestion pipeline on top of the slice 3 migrations:

- platform/lib/integrations/datadog/client.ts — adds listDeployments
  and listFailures wrappers around POST /api/v2/dora/{deployments,failures}.
  Time-slicing pagination is implemented in the caller; the client just
  shapes the request body and surfaces structured errors.
- platform/lib/integrations/datadog/sync.ts — the per-org sync. Cursor
  by last_sync_at with a 30-day default backfill, idempotent upsert by
  (provider, provider_event_id), per-commit join table populated from
  attributes.commits[], DD repository_id ↔ repositories.remote_url
  matched via normalized slug, anti-spin guard for the §9.5 boundary
  case. Status / last_error / last_sync_at flip on the org_integrations
  row at the end of each run.
- platform/src/app/api/cron/sync-integrations/route.ts — Vercel Cron
  handler. Auth-gated by CRON_SECRET (Bearer or x-cron-secret header);
  in non-production with no secret configured, the route is open so
  `npm run dev` can hit it directly. Iterates active integrations and
  fans out per-provider sync sequentially within the 300s budget.
- platform/vercel.json — registers the cron at 0 4 * * * UTC.
- env.example — documents CRON_SECRET and how to generate it.
- platform/src/app/[tenant]/settings/integrations/[provider]/page.tsx
  + datadog-connect-form.tsx — surfaces the unmatched-deployments
  count alongside last_sync_at / last_error / connected_at on the
  detail page so customers can spot repo-mapping drift early.
- platform/tests/datadog-sync.test.ts — covers slug normalization,
  pagination-helper edge cases, and the Datadog zero-value timestamp
  guard observed in pull_requests[] during the §9.2 probe.

Engine integration (dora_real.py) and the dashboard surfacing land in
slices 4 and 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@trentas trentas force-pushed the feat/datadog-slice-3 branch from 5a87157 to 8a147f0 Compare May 13, 2026 21:22
@trentas trentas merged commit c34e450 into main May 13, 2026
4 checks passed
trentas added a commit that referenced this pull request May 13, 2026
Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trentas added a commit that referenced this pull request May 13, 2026
#15) (#42)

* feat(dashboard): DORA section + CFR-by-origin correlation + setup docs (#15)

Slice 5 — closes the Datadog integration loop. The dashboard now
surfaces the dora_* metric family with a "Datadog" badge, an
AI-vs-human CFR correlation card backed by a per-commit join, and a
silent-decay guard on the integration detail page. Final piece:
customer-facing setup documentation.

Engine (Python):
- iris/analysis/dora_real.py — new ``cfr_by_origin`` /
  ``rollback_rate_by_origin`` breakdowns when the aggregator passes
  the local commit-origin map. Per-commit join: each commit on each
  evaluated deploy is bucketed by its origin; commits not present in
  the local window are dropped silently but reflected in coverage_pct
  so the dashboard can warn when attribution is thin.
- iris/metrics/aggregator.py — passes ``origin_map`` through to
  ``analyze_dora_real``.
- iris/models/metrics.py — adds ``dora_cfr_by_origin`` and
  ``dora_rollback_rate_by_origin``.
- tests/test_dora_real.py — 4 new tests covering the per-commit join,
  unknown-commit handling with coverage reporting, rollback filtering,
  and the no-origin-map default.

Platform:
- src/types/metrics.ts — TS mirrors of the two new dora_* fields.
- src/types/org-summary.ts — new OrgDORA aggregation type.
- lib/queries/org-summary.ts — computeDORA() sums deploys / failures /
  rollbacks across repos, weights CFR by evaluated deploys, and
  aggregates the by-origin breakdown. Returns null when no repo has an
  active integration.
- src/app/[tenant]/dashboard/sections/DORAOverview.tsx — headline
  cards (CFR, MTTR per failed deploy, deploy frequency, lead time)
  plus a "Datadog" badge, a fact strip (deploys / rollback rate /
  pending), and the CFR-by-origin + rollback-rate-by-origin
  correlation tables. The correlation card stays hidden until the org
  has ≥ 10 failed deploys (per §9.6 — was 10 incidents pre-revision).
- src/app/[tenant]/dashboard/page.tsx — wires the new section in.
- src/components/integrations/datadog-connect-form.tsx +
  src/app/[tenant]/settings/integrations/[provider]/page.tsx — the
  §9.8 silent-decay hint: "last incident registered X days ago" on
  the detail page. Days are server-computed to keep the client
  component pure.
- platform/lib/translations.ts — full en + pt-br copy for the new
  surfaces.
- platform/tests/dora-aggregation.test.ts — 4 tests for computeDORA().

Docs:
- docs/integrations/datadog.md — customer setup guide. Covers the
  Application Key scope, regional sites, the connect flow, the cron
  schedule, what we read / don't read, repository matching, the
  disconnect behavior, and operational notes (backfill window, rate
  limits, encryption rotation).
- docs/METRICS.md — adds the two new dora_*_by_origin fields and the
  module-map row.

Verified:
- python -m pytest tests/ -q → 113 passed (4 new)
- platform: npx tsc --noEmit → clean
- platform: npx vitest run → 175 passed (4 new)
- platform: npx eslint → clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(integrations): coverage_pct math + dead code + form gating on error (#15)

Three issues surfaced in the slice 5 audit:

1. `iris/analysis/dora_real.py` — the per-origin `coverage_pct` divided
   each origin's commits by `(this origin + ALL unknowns)`, so every
   origin's coverage dropped by the full unknown count. The right
   semantic is org-wide attribution coverage. Hoisted to a single
   result field `cfr_by_origin_coverage_pct` and removed from each
   per-origin dict.

2. Same file — `_referenced` was assigned and immediately popped from
   the dict; dead code, dropped.

3. `platform/src/components/integrations/datadog-connect-form.tsx` —
   the connected card only rendered when `status === "active"`, so an
   integration in `status: "error"` fell through to the connect form
   and lost the very surfaces (last_sync_at, last_error, unmatched
   count, days-since-last-incident) the operator needs to debug. Now
   renders the status card for both `active` and `error`, with the
   shield icon and copy switched to an error variant when the sync is
   failing.

Schema / TS / docs aligned:
- `iris/models/metrics.py` adds `dora_cfr_by_origin_coverage_pct`.
- `iris/metrics/aggregator.py` wires it.
- `platform/src/types/metrics.ts` drops `coverage_pct` from the
  per-origin shape and adds the new top-level field.
- `docs/METRICS.md` updates the field table and the explanatory blurb;
  module-map row picks up the new field.
- `platform/lib/translations.ts` — en + pt-br copy for the new error
  state.

Tests:
- `tests/test_dora_real.py` — old `coverage_pct` assertion replaced by
  two focused tests (mixed known/unknown drops org-wide coverage; full
  attribution reports 100%; no origin map → None).
- `platform/tests/dora-aggregation.test.ts` — adjusts mock payloads to
  drop the (now-removed) `coverage_pct` field on per-origin entries.

Verified: pytest 115 passed (16 dora_real tests), tsc clean, vitest 175
passed, eslint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(platform): restore build version in footer (#15)

The footer's `process.env.NEXT_PUBLIC_BUILD_VERSION || "dev"` lookup
fell back to "dev" on every Vercel deploy because the env var was
never wired up. Reads from `package.json` at config-load time and
appends the Vercel commit SHA (`VERCEL_GIT_COMMIT_SHA`) when present
so production / preview deploys carry a unique identifier between
releases.

Loaded via fs instead of an ESM JSON import to stay portable across
Next's TS loader and direct Node ESM execution (the latter requires
`with { type: "json" }`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v1.0.6 — Datadog DORA integration (#15)

Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@trentas trentas deleted the feat/datadog-slice-3 branch May 14, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant