Skip to content

feat(integrations): slice 2 — DB + encryption + connect flow for Datadog (#15)#39

Merged
trentas merged 6 commits into
mainfrom
feat/datadog-slice-2
May 13, 2026
Merged

feat(integrations): slice 2 — DB + encryption + connect flow for Datadog (#15)#39
trentas merged 6 commits into
mainfrom
feat/datadog-slice-2

Conversation

@trentas
Copy link
Copy Markdown
Contributor

@trentas trentas commented May 13, 2026

Summary

Slice 2 of the Datadog integration (#15): persist tenant-scoped credentials, encrypt them at rest, and ship the connect/disconnect flow end-to-end.

  • DB — migration 014_org_integrations.sql adds the org_integrations table (one row per tenant × provider) plus the credential RPCs (set_integration_credentials, clear_integration_credentials, get_integration_credentials). RPCs are schema-qualified for pgcrypto.
  • Encryptionplatform/lib/encryption.ts wraps AES-GCM with a server-side master key (INTEGRATIONS_ENCRYPTION_KEY, see env.example). Credentials never leave the API route in plaintext.
  • Datadog clientplatform/lib/integrations/datadog/client.ts validates the API/APP key pair against GET /api/v1/validate and the site-specific base URL.
  • Connect flowPOST/DELETE /api/[tenant]/integrations/[provider] route, datadog-connect-form.tsx client component, and the wired [provider]/page.tsx settings detail page. Errors are surfaced through lib/translations.ts.

Bundled chores:

  • chore(supabase): broaden the .gitignore to **/supabase/.temp/ and untrack the per-machine CLI cache that was leaking into commits.
  • chore(husky): AI co-author prepare-commit-msg hook + once-a-day background auto-push. Flag for review — the auto-push hook is opinionated; happy to drop or move to local-only if the team doesn't want it shared.
  • docs: post-probe PLAN-datadog.md revision and scripts/datadog_dora_probe.py (the probe that established DORA-API is write-only and we should use Metrics + Events + Incidents instead).

Closes part of #15.

Test plan

  • pnpm --filter platform exec supabase migration up applies cleanly locally
  • INTEGRATIONS_ENCRYPTION_KEY set in Vercel env (Preview + Production)
  • Connect a Datadog account from /settings/integrations/datadog with valid keys → row appears in org_integrations, status flips to connected
  • Connect with invalid keys → translated error, no row written
  • Disconnect → row cleared, UI back to disconnected state
  • Decrypted credentials round-trip through get_integration_credentials RPC

🤖 Generated with Claude Code

trentas and others added 6 commits May 13, 2026 14:25
…15)

Live probes against a real Datadog tenant closed the high-impact unknowns
that the original plan left open. Adds §9 to docs/PLAN-datadog.md with the
schema corrections that came out of those probes — most notably:

- Deployment events carry change_failure, recovery_time_sec, and
  remediation directly, so CFR and per-deploy MTTR both live on the
  deployment table, not on the failures table.
- Failures have no commit attribution; field types are arrays (service,
  env, team) and triggering_commit_sha should be dropped from §3.
- Pagination has no cursor mechanism — time-slicing with an inclusive
  boundary is the only path; idempotency via (provider, provider_event_id)
  makes the boundary overlap a no-op.
- All 500 sampled deploys are source: "apm_deployments"; commits[]
  doesn't truncate even at number_of_commits=85; change_failure is
  tri-state (true/false/null).

scripts/datadog_dora_probe.py is the throwaway exploration tool that
produced these findings. Stdlib-only, three modes: single-page list,
--dump for offline inspection, and --paginate-test for the cursor probe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dog (#15)

Second slice of the Datadog integration per docs/PLAN-datadog.md §4 PR 2.
Wires the "Coming soon" stub from slice 1 into a working connect flow
against Datadog's DORA Metrics API v2.

What lands:

- Migration 014_org_integrations.sql with the multi-provider table,
  pgcrypto extension, and two SECURITY INVOKER RPCs
  (encrypt_credentials / decrypt_credentials) revoked from anon and
  authenticated roles so only the service-role path can touch them.
- platform/lib/encryption.ts wraps the RPCs in typed helpers and masks
  secrets for display.
- platform/lib/integrations/datadog/client.ts validates credentials with
  a 1-hour, limit=1 ping to POST /api/v2/dora/deployments. Handles the
  six supported Datadog sites and normalizes the common us1.datadoghq.com
  mislabel to the actual host.
- /api/organizations/[organizationId]/integrations/[provider] route with
  GET (status + masked credential), POST (validate → encrypt → upsert),
  DELETE (mark disconnected, NULL out credentials_encrypted, preserve
  historical row per the issue's AC).
- Datadog detail page now renders a real form (or the connected view)
  instead of the "Coming soon" card. The catalog page reads status from
  org_integrations.
- INTEGRATIONS_ENCRYPTION_KEY documented in env.example with the
  openssl-based generation recipe.
- Translations under settings.integrations.datadog.* in en-US and pt-BR.

Slice 2 does not call Datadog except at validation time. Slice 3 will
add the cron-driven sync that populates external_deployments and
external_incidents.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eaks

- .husky/prepare-commit-msg detects AI agent env vars and appends a
  Co-Authored-By line before the commit is created. Safer than a
  post-commit + amend flow.
- .husky/post-commit triggers the Iris auto-push (analyze + push to
  the platform) once per day in the background.
- .gitignore adds .vercel/ for Vercel CLI local state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the cached project ref, pooler URL, and runtime version stamps
written by the Supabase CLI on link/status. No functional impact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
)

Supabase installs pgcrypto into the `extensions` schema, not `public`,
so unqualified `pgp_sym_encrypt` / `pgp_sym_decrypt` fail with
SQLSTATE 42883 when the migration runs against a Supabase project.

Qualifies both calls as `extensions.pgp_sym_encrypt` and
`extensions.pgp_sym_decrypt`. `encode` / `decode` are built-ins in
`pg_catalog` and don't need qualification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The supabase/.temp/ pattern only matched the repo root; the actual cache
lives at platform/supabase/.temp/ and was getting committed on every CLI
status/link. Switch to **/supabase/.temp/ and remove the tracked files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickbus-iris Ready Ready Preview, Comment May 13, 2026 7:57pm

Request Review

@trentas trentas merged commit 816ebea into main May 13, 2026
4 checks passed
trentas added a commit that referenced this pull request May 13, 2026
Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trentas added a commit that referenced this pull request May 13, 2026
Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trentas added a commit that referenced this pull request May 13, 2026
#15) (#42)

* feat(dashboard): DORA section + CFR-by-origin correlation + setup docs (#15)

Slice 5 — closes the Datadog integration loop. The dashboard now
surfaces the dora_* metric family with a "Datadog" badge, an
AI-vs-human CFR correlation card backed by a per-commit join, and a
silent-decay guard on the integration detail page. Final piece:
customer-facing setup documentation.

Engine (Python):
- iris/analysis/dora_real.py — new ``cfr_by_origin`` /
  ``rollback_rate_by_origin`` breakdowns when the aggregator passes
  the local commit-origin map. Per-commit join: each commit on each
  evaluated deploy is bucketed by its origin; commits not present in
  the local window are dropped silently but reflected in coverage_pct
  so the dashboard can warn when attribution is thin.
- iris/metrics/aggregator.py — passes ``origin_map`` through to
  ``analyze_dora_real``.
- iris/models/metrics.py — adds ``dora_cfr_by_origin`` and
  ``dora_rollback_rate_by_origin``.
- tests/test_dora_real.py — 4 new tests covering the per-commit join,
  unknown-commit handling with coverage reporting, rollback filtering,
  and the no-origin-map default.

Platform:
- src/types/metrics.ts — TS mirrors of the two new dora_* fields.
- src/types/org-summary.ts — new OrgDORA aggregation type.
- lib/queries/org-summary.ts — computeDORA() sums deploys / failures /
  rollbacks across repos, weights CFR by evaluated deploys, and
  aggregates the by-origin breakdown. Returns null when no repo has an
  active integration.
- src/app/[tenant]/dashboard/sections/DORAOverview.tsx — headline
  cards (CFR, MTTR per failed deploy, deploy frequency, lead time)
  plus a "Datadog" badge, a fact strip (deploys / rollback rate /
  pending), and the CFR-by-origin + rollback-rate-by-origin
  correlation tables. The correlation card stays hidden until the org
  has ≥ 10 failed deploys (per §9.6 — was 10 incidents pre-revision).
- src/app/[tenant]/dashboard/page.tsx — wires the new section in.
- src/components/integrations/datadog-connect-form.tsx +
  src/app/[tenant]/settings/integrations/[provider]/page.tsx — the
  §9.8 silent-decay hint: "last incident registered X days ago" on
  the detail page. Days are server-computed to keep the client
  component pure.
- platform/lib/translations.ts — full en + pt-br copy for the new
  surfaces.
- platform/tests/dora-aggregation.test.ts — 4 tests for computeDORA().

Docs:
- docs/integrations/datadog.md — customer setup guide. Covers the
  Application Key scope, regional sites, the connect flow, the cron
  schedule, what we read / don't read, repository matching, the
  disconnect behavior, and operational notes (backfill window, rate
  limits, encryption rotation).
- docs/METRICS.md — adds the two new dora_*_by_origin fields and the
  module-map row.

Verified:
- python -m pytest tests/ -q → 113 passed (4 new)
- platform: npx tsc --noEmit → clean
- platform: npx vitest run → 175 passed (4 new)
- platform: npx eslint → clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(integrations): coverage_pct math + dead code + form gating on error (#15)

Three issues surfaced in the slice 5 audit:

1. `iris/analysis/dora_real.py` — the per-origin `coverage_pct` divided
   each origin's commits by `(this origin + ALL unknowns)`, so every
   origin's coverage dropped by the full unknown count. The right
   semantic is org-wide attribution coverage. Hoisted to a single
   result field `cfr_by_origin_coverage_pct` and removed from each
   per-origin dict.

2. Same file — `_referenced` was assigned and immediately popped from
   the dict; dead code, dropped.

3. `platform/src/components/integrations/datadog-connect-form.tsx` —
   the connected card only rendered when `status === "active"`, so an
   integration in `status: "error"` fell through to the connect form
   and lost the very surfaces (last_sync_at, last_error, unmatched
   count, days-since-last-incident) the operator needs to debug. Now
   renders the status card for both `active` and `error`, with the
   shield icon and copy switched to an error variant when the sync is
   failing.

Schema / TS / docs aligned:
- `iris/models/metrics.py` adds `dora_cfr_by_origin_coverage_pct`.
- `iris/metrics/aggregator.py` wires it.
- `platform/src/types/metrics.ts` drops `coverage_pct` from the
  per-origin shape and adds the new top-level field.
- `docs/METRICS.md` updates the field table and the explanatory blurb;
  module-map row picks up the new field.
- `platform/lib/translations.ts` — en + pt-br copy for the new error
  state.

Tests:
- `tests/test_dora_real.py` — old `coverage_pct` assertion replaced by
  two focused tests (mixed known/unknown drops org-wide coverage; full
  attribution reports 100%; no origin map → None).
- `platform/tests/dora-aggregation.test.ts` — adjusts mock payloads to
  drop the (now-removed) `coverage_pct` field on per-origin entries.

Verified: pytest 115 passed (16 dora_real tests), tsc clean, vitest 175
passed, eslint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(platform): restore build version in footer (#15)

The footer's `process.env.NEXT_PUBLIC_BUILD_VERSION || "dev"` lookup
fell back to "dev" on every Vercel deploy because the env var was
never wired up. Reads from `package.json` at config-load time and
appends the Vercel commit SHA (`VERCEL_GIT_COMMIT_SHA`) when present
so production / preview deploys carry a unique identifier between
releases.

Loaded via fs instead of an ESM JSON import to stay portable across
Next's TS loader and direct Node ESM execution (the latter requires
`with { type: "json" }`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v1.0.6 — Datadog DORA integration (#15)

Bumps platform/package.json (1.0.0 → 1.0.6 — catching up from the
initial scaffold), pyproject.toml (1.0.5 → 1.0.6), and
iris/cli.py:VERSION (v1.0.5 → v1.0.6). Adds the CHANGELOG entry for
v1.0.6 covering the Datadog integration end-to-end across slices 1-5
(PRs #36, #37, #39, #40, #41, #42).

Highlights:
- Connect flow + encrypted credentials (slice 2)
- Daily Vercel Cron sync into external_deployments / _commits /
  _incidents (slice 3)
- Engine consumes events and emits 18 new dora_* fields including
  CFR / MTTR per-deploy / MTTR per-incident / rollback rate / lead
  time / deploy frequency / by-origin breakdowns (slice 4 + 5)
- Dashboard DORA section with the "Datadog" badge and the AI-vs-human
  correlation card (slice 5)
- Setup docs at docs/integrations/datadog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trentas added a commit that referenced this pull request May 13, 2026
…ir (#43)

The `supabase/.temp/` glob only matches at repo root, but the actual
Supabase CLI cache lives at `platform/supabase/.temp/`. The fix landed
on the slice 2 branch (commit 356591d) but didn't make it into the
squash merge that became #39 on main — so every `supabase status` /
`supabase link` was re-creating untracked files locally.

Switch to `**/supabase/.temp/` so the rule applies regardless of where
the cache directory is rooted.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant