Spec 21: per-session static analysis pass — complexity / coverage / lint deltas

## Goal
For every session that touched code, compute pre/post static-analysis deltas: cyclomatic complexity, test-coverage, lint findings, type-completeness. Surface "sessions where the agent reduced complexity by 20%+" vs "increased it by 20%+" — the first metric that lets you say agent X is actually better than agent Y on YOUR code.

## Why now
Outcome attribution today is git-correlation. That tells you the code shipped; it doesn't tell you whether the code is good. Static analysis is the cheapest objective signal.

## Schema
**v018** — `static_analysis_findings` table:

```sql
CREATE TABLE static_analysis_findings (
  id INTEGER PRIMARY KEY,
  session_id TEXT NOT NULL,
  file_path TEXT NOT NULL,
  language TEXT NOT NULL,         -- 'python' | 'typescript' | 'go' | ...
  ts TEXT NOT NULL,               -- when the analysis ran
  metric TEXT NOT NULL,           -- 'complexity' | 'coverage' | 'lint_count' | 'type_completeness'
  pre_value REAL,                 -- before the session's edits
  post_value REAL,                -- after
  delta REAL,                     -- post - pre (NULL when one side is unobservable)
  details_json TEXT,              -- per-metric extras (e.g. lint rule ids)
  UNIQUE (session_id, file_path, metric)
);
CREATE INDEX idx_sa_session ON static_analysis_findings(session_id);
CREATE INDEX idx_sa_file ON static_analysis_findings(file_path);
```

Additive, `IF NOT EXISTS`-guarded.

## User-visible surface
- **CLI**: `stackunderflow analyze session <id>` runs analysis on a single session's touched files (using Playback v2 to reconstruct pre/post states).
- **CLI**: `stackunderflow analyze backfill [--since 30d] [--limit N]` runs analysis on every recent session lacking findings.
- **API**: `GET /api/static-analysis/session/{id}` — return findings for a session.
- **Meta-agent tool**: `get_session_quality(session_id)` returns a structured quality summary.
- **UI**: Quality column on Sessions tab + a "Quality" panel on the per-session detail view.

## Implementation plan
1. v018 migration.
2. New module `stackunderflow/services/static_analysis/` with one analyzer per language:
   - `python_analyzer.py` — `radon` for complexity (already-popular, MIT, optional dep), `coverage.py` parse, `ruff --output-format=json` for lint, `mypy --no-error-summary` for type completeness.
   - `typescript_analyzer.py` — `tsc --noEmit --pretty false` for type errors, `eslint --format json` for lint. Complexity: defer (no clean cross-toolchain answer).
   - `go_analyzer.py` — `go vet`, `gocyclo`, `go test -coverprofile`. Defer if `go` not on PATH.
3. Coordinator in `services/static_analysis/runner.py` — reconstruct pre/post via Playback v2's `reconstruct_fs_at(at_pre)` / `reconstruct_fs_at(at_post)`, write to a tmpdir, run the analyzer, persist deltas.
4. Optional dep: add `[analysis]` extra in `pyproject.toml` with `radon`, `coverage`, `mypy`. Check for binaries (`tsc`, `eslint`, `go`) at runtime; skip cleanly if missing.
5. CLI + API + meta-agent wiring.
6. Backfill batch with concurrency cap (analyzers fork shell processes — cap at `min(4, cpu_count)`).

## Tests
- Each analyzer: synthetic file with known complexity/coverage/lint result, assert metric.
- Coordinator: pre + post fixture, assert delta computation.
- Missing-binary handling: TS analyzer skips cleanly when `tsc` not on PATH.
- Backfill: idempotent (re-running doesn't duplicate findings).

## Hard parts
- Cross-language is genuinely hard. Python / TS / Go cover ~80% of usage; the long tail (Rust, Ruby, Java, Swift, etc.) is per-language adapter work. Document explicitly which languages are supported v1.
- "pre" state for a session sometimes doesn't exist (the file was created in the session). Handle: pre_value = NULL, delta = NULL, details_json = `{"reason": "file_created_in_session"}`.
- Some analyzers are slow (mypy on a big project can be 30s+). Use timeouts (default 60s per file) and cache results.
- Coverage requires running tests — that's a SEPARATE deliverable, defer (Spec 22 sub-task). v1 handles complexity + lint + types only.

## Out of scope
- Test-running for coverage measurement (separate spec — needs sandboxing).
- Rust / Java / Swift / Ruby analyzers.
- Real-time analysis as the agent edits (defer; this is offline backfill).

## Dependencies
- None blocking. Playback v2 (shipped) provides pre/post reconstruction.
- Consumed by Spec 22 (outcome attribution v2) and Spec 26 (comparative benchmark).

## Estimated effort
**Size L** — single agent, ~2-2.5 hr.

## Hard rules
- DO NOT touch versions / CHANGELOG headings.
- Pre-assigned schema slot: **v018**.
- Branch: `feat/static-analysis-pass` off main.
- New optional dep `[analysis]` in `pyproject.toml` is allowed (similar to `[embeddings]`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec 21: per-session static analysis pass — complexity / coverage / lint deltas #93

Goal

Why now

Schema

User-visible surface

Implementation plan

Tests

Hard parts

Out of scope

Dependencies

Estimated effort

Hard rules

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Spec 21: per-session static analysis pass — complexity / coverage / lint deltas #93

Description

Goal

Why now

Schema

User-visible surface

Implementation plan

Tests

Hard parts

Out of scope

Dependencies

Estimated effort

Hard rules

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions