Skip to content

Spec 20: PR / CI webhook ingest — link commits to sessions to PRs to CI runs #92

@0bserver07

Description

@0bserver07

Goal

Ingest GitHub / GitLab PR webhooks + CI run results into the local store so we can link "session X → commit Y → PR Z → CI ran W → 2 reverts later." Foundation for outcome attribution v2 (Spec 22) and the comparative benchmark (Spec 26).

Why now

Yield tab today is git-correlation by cwd — coarse. To answer "did this session ship code that actually held up?", we need PR + CI + downstream-touch data. That data is only outside the local store right now.

Schema

v017 — two additive tables:

CREATE TABLE pr_outcomes (
  id INTEGER PRIMARY KEY,
  provider TEXT NOT NULL,         -- 'github' | 'gitlab'
  repo_slug TEXT NOT NULL,        -- 'owner/repo'
  pr_number INTEGER NOT NULL,
  title TEXT,
  state TEXT NOT NULL,            -- 'open' | 'merged' | 'closed'
  merged_at TEXT,
  reverted_at TEXT,
  author TEXT,
  raw_json TEXT NOT NULL,
  UNIQUE (provider, repo_slug, pr_number)
);

CREATE TABLE ci_runs (
  id INTEGER PRIMARY KEY,
  provider TEXT NOT NULL,         -- 'github-actions' | 'gitlab-ci' | 'circleci' | ...
  repo_slug TEXT NOT NULL,
  run_id TEXT NOT NULL,           -- provider-side id
  commit_sha TEXT NOT NULL,
  status TEXT NOT NULL,           -- 'success' | 'failure' | 'cancelled' | 'in_progress'
  workflow_name TEXT,
  started_ts TEXT,
  completed_ts TEXT,
  raw_json TEXT NOT NULL,
  UNIQUE (provider, run_id)
);
CREATE INDEX idx_pr_outcomes_repo ON pr_outcomes(repo_slug, state);
CREATE INDEX idx_ci_runs_commit ON ci_runs(commit_sha);

raw_json keeps the full webhook payload for future analysis.

User-visible surface

  • CLI: stackunderflow ingest github --repo owner/repo --token $GH_TOKEN [--since 30d] — backfill via REST API.
  • CLI: stackunderflow ingest webhook serve --port 8096 — opt-in webhook receiver. Validates HMAC signature.
  • API: POST /api/webhooks/github, POST /api/webhooks/gitlab, POST /api/webhooks/ci — receive + validate + insert.
  • Meta-agent tool: get_pr_outcomes(repo, state?, since?) and get_ci_runs(commit_sha?, status?).
  • UI: extend Yield tab to show PR + CI columns alongside commit data.

Implementation plan

  1. v017 migration.
  2. New service stackunderflow/services/github_ingest.py (REST backfill).
  3. New stackunderflow/routes/webhooks.py (signature validation + insert).
  4. CLI commands.
  5. Meta-agent tool entries.
  6. Yield-tab UI extension (optional in v1 — JSON API alone unblocks Spec 22).

Tests

  • Signature validation for GitHub (HMAC-SHA256) and GitLab (token compare).
  • REST-backfill with mocked responses (pagination, rate-limit retry).
  • Schema migration idempotency.
  • Yield-tab integration test (if UI shipped).

Hard parts

  • Webhook signature validation is security-critical. Use hmac.compare_digest. Reject on missing / mismatched signatures with 403.
  • Token storage: don't store the GH token in the database. Read from env (STACKUNDERFLOW_GITHUB_TOKEN) or settings file (encrypted-at-rest — defer encryption to Spec 28).
  • "linking session to commit" — that's Spec 22's job. This spec just ingests the data; the link is downstream.

Out of scope

  • Bitbucket / Codeberg / self-hosted Gitea (defer — same pattern, just adapter work).
  • Encrypted token storage (Spec 28).
  • Auto-linking sessions to PRs (Spec 22).

Dependencies

  • Spec 22 (outcome attribution) consumes this.

Estimated effort

Size L — single agent, ~2 hr.

Hard rules

  • DO NOT touch versions / CHANGELOG headings.
  • Pre-assigned schema slot: v017.
  • Branch: feat/pr-ci-webhook-ingest off main.

Metadata

Metadata

Assignees

No one assigned

    Labels

    size-l~2 hr agent runspecSpec/feature for an agent to implementwave-2Wave 2: outcome-attribution rails

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions