Skip to content

feat(q2): AI-Native Synthesis MVP — discovery sections from commits + statics#72

Merged
masonwyatt23 merged 1 commit into
mainfrom
feat/q2-ai-synthesis-mvp
May 23, 2026
Merged

feat(q2): AI-Native Synthesis MVP — discovery sections from commits + statics#72
masonwyatt23 merged 1 commit into
mainfrom
feat/q2-ai-synthesis-mvp

Conversation

@masonwyatt23
Copy link
Copy Markdown
Member

Summary

Implements the Q2 supporting pillar from wad-d-instrumentation-genome-2-sequencing.md — a background LLM pass that converts recent commit sections + relevant static genome sections into AI-synthesized discoveries.

  • New discovery section type (manifest v2 extension, additive): kind: \"discovery\" + sourceTrust: \"synthesis\" (new union value), stored at .ashlrcode/genome/discoveries/<id>.json, capped at DISCOVERY_RETENTION_LIMIT=30 (mirrors commit retention).
  • scripts/genome-synthesizer.ts — CLI-only background pass that uses the existing servers/_llm-providers/ infrastructure via selectProvider().
  • Wired into ashlr__grep: when discovery sections exist and keywords match, they're prepended under a ## Discoveries block ahead of commits and static sections. Static/commit ordering unchanged when no discoveries exist.

Tier gating (hardcoded at the top of synthesize())

Tier Behavior
free Hard-gated OFF. synthesize() returns { skipped: true, reason: \"free-tier\" } before ever touching the LLM. Even --force cannot override this.
pro Throttled to once per 7 days per repo via .ashlrcode/genome/_synthesis-state.json.
team Throttled to once per 24 hours per repo.
--force Bypasses throttle for manual runs (does NOT bypass the free-tier gate).
--dry-run Parse + score + prompt, but never write to disk.

Privacy

  • Only commit-section summaries (user-authored commit messages) are sent to the LLM — never raw diff content.
  • Paths matching secret patterns (secrets/, .env, *.pem, id_rsa*, credentials.json) are redacted from filesChanged lists and stripped from the prompt body before construction.

Sample discovery section (illustrative)

{
  \"id\": \"9f1c4ae27e8a3d12\",
  \"summary\": \"Recent activity tightens v1.31 SessionEnd hook diagnostics (hook-health surfacing + onboarding hero) and renames Grok references to grok-4.3 — the WAD-D telemetry path is being prepped without touching the post-commit hot path.\",
  \"evidence\": [
    { \"path\": \"hooks/session-end.ts\" },
    { \"path\": \"scripts/install-genome-hooks.ts\", \"lineRange\": [40, 120] },
    { \"path\": \"servers/_ask-router.ts\" }
  ],
  \"sourceCommits\": [
    \"1b1632a036383af3cc21a66f2cd8ac3254474eaf\",
    \"7b1430c4b44bf5984c805e71fa464721815dfb94\",
    \"d20bc62a181cc6a8135daa736ffd1e1cf31536b5\"
  ],
  \"synthesizedAt\": \"2026-05-22T22:00:00.000Z\",
  \"confidence\": 0.74
}

Files

  • New: scripts/genome-synthesizer.ts, servers/_genome-discoveries.ts, __tests__/genome-synthesizer.test.ts
  • Modified: servers/_manifest-v2.ts (added DiscoverySection types, \"synthesis\" to SourceTrust, \"discovery\" to kind union), servers/_genome-commits.ts (exported tokenize + renamed scoreCommitscoreSectionMeta for reuse), servers/grep-server.ts (prepend discovery block).

Strict LLM-response validation

  • Malformed entries dropped: missing summary, hallucinated SHA (not in known-commits set), wrong shape.
  • Confidence clamped to [0, 1].
  • Output capped at 3 discoveries per run.
  • Markdown code fences (```json) stripped defensively.

Test plan

  • 23 new bun:test cases in __tests__/genome-synthesizer.test.ts
  • Free tier verified to NEVER invoke the mock provider (assertion in test)
  • Pro tier + within 7d window → \"throttled\"
  • Team tier + 36h since last run → not throttled, writes
  • --force bypasses throttle
  • Malformed LLM entries dropped, valid ones still write
  • --dry-run writes no files, no state update, returns wouldWriteIds
  • Secret-path redaction verified (no secrets/, .env paths reach the LLM)
  • retrieveDiscoverySections surfaces matching discoveries via ashlr__grep
  • Backward compat: no discovery sections → empty list, grep behaves as Q1
  • Full suite: 3033 pass / 0 fail / 1 skip
  • bunx tsc --noEmit clean

Constraints honored

  • No new external dependencies.
  • Free tier never makes an LLM call (gated at function top, before any provider lookup).
  • CLI-only — synthesizer never runs in a hook hot path; respects the 2s safety net.
  • Cherry-picked cleanly off main.

🤖 Generated with Claude Code

@vercel
Copy link
Copy Markdown

vercel Bot commented May 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
ashlr-plugin-site Ready Ready Preview, Comment May 23, 2026 1:22am

Request Review

… statics

Implements the Q2 supporting pillar from
wad-d-instrumentation-genome-2-sequencing.md:

- New `discovery` section type (manifest v2 extension, additive):
  `kind: "discovery"` + `sourceTrust: "synthesis"` (new union value),
  stored at `.ashlrcode/genome/discoveries/<id>.json`, capped at
  DISCOVERY_RETENTION_LIMIT=30 (mirrors commit retention pattern).

- `scripts/genome-synthesizer.ts` — background LLM pass that reads
  the last N commit sections + relevant static sections and emits 1-3
  structured insights ("3 files use this util wrong", etc). Uses the
  existing `servers/_llm-providers/` infrastructure via selectProvider().
  CLI: `bun run scripts/genome-synthesizer.ts [--dry-run] [--max-commits=N] [--force]`.

- Tiered cadence (per the plan):
  * free → hard-gated OFF at the top of synthesize(); never reaches the LLM
  * pro  → throttled to once per 7 days per repo via _synthesis-state.json
  * team → throttled to once per 24 hours per repo
  * --force bypasses throttle for manual runs

- Privacy: only commit-section summaries (user-authored) go to the LLM,
  never raw diff content. Paths under secrets/, .env, *.pem, id_rsa,
  credentials.json are redacted before prompt construction.

- Wired into ashlr__grep: when discovery sections exist and keywords
  match the query, they're prepended under a `## Discoveries` block
  ahead of commits and static sections. Backward-compat preserved —
  static/commit ordering unchanged when no discoveries exist.

- Strict LLM-response validation: malformed entries (missing summary,
  hallucinated SHA, wrong shape) are dropped. Confidence clamped to
  [0,1]. Output capped at 3 discoveries per run.

- Tests: 23 new bun:test cases covering tier gating (free never calls
  LLM, pro 7d throttle, team 24h throttle, --force bypass), happy path,
  malformed-response handling, dry-run no-write contract, no-provider /
  llm-failed branches, secret-path redaction, CLI arg parsing, and
  grep retrieval wire-up. 3059 plugin / 0 fail.

No new dependencies. CLI-only — never in a hook hot path; respects the
2s hook safety net.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@masonwyatt23 masonwyatt23 force-pushed the feat/q2-ai-synthesis-mvp branch from bf13689 to 751154d Compare May 23, 2026 01:18
@masonwyatt23 masonwyatt23 merged commit bee0c4e into main May 23, 2026
13 of 14 checks passed
@masonwyatt23 masonwyatt23 deleted the feat/q2-ai-synthesis-mvp branch May 23, 2026 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant