Skip to content

feat(addie): lead with coverage before drafting RFC issues#3815

Merged
bokelley merged 2 commits intomainfrom
bokelley/spec-feedback-coverage-rule
May 2, 2026
Merged

feat(addie): lead with coverage before drafting RFC issues#3815
bokelley merged 2 commits intomainfrom
bokelley/spec-feedback-coverage-rule

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented May 2, 2026

Summary

  • Extends Addie's "Spec Feedback Response Pattern" in server/src/addie/rules/behaviors.md with two clauses inside the existing 1-VERIFY / 2-POSITION / 3-CLOSE-LOOP structure.
  • Step 2 gains "Lead with coverage when verification reveals overlap": if search_docs / get_schema show overlap with existing primitives, the reply must open with what's already covered before anything else. Factually wrong premises (proposal extends a non-existent field) get a one-line correction first.
  • Step 3 gains "Draft only after the caller has seen the coverage statement": when step 2 surfaces overlap or factual error, draft_github_issue is NOT called in the same turn as verification. Coverage-leading reply first, draft on confirmation.

Why

Same-Addie-different-answers failure on 2026-05-01: Jeffrey Mayer (DanAds) drafted four RFCs with web-Addie, posted to Slack, Slack-Addie pushed back on each with spec citations. Both surfaces use the same rules — the rule itself was missing the load-bearing instruction. The existing rule listed "does the spec already handle this?" as one of four bullets to consider; it didn't say "if yes, lead with that before drafting."

Validation

Eval at #3804 with RFC_USE_LOCAL_PROMPT=1 (validates against this branch's rules, not deployed prod) at N=3 runs per scenario:

Scenario Without rule (prod prompt) With rule (this branch)
CPQ pricing 0/3 — drafted on every run 3/3 — no draft
TMP signals cap varies — sometimes drafts 3/3 — no draft
Bilateral trust varies 3/3 — no draft
brand.json varies 3/3 — no draft
Total inconsistent 12/12 runs, 0 premature drafts

Under the rule, draft_github_issue does not fire in the same turn as the verification on any of the 12 runs. The model leads with "most of this is already covered: pricing_options + buying_mode: refine + account" and offers to draft a narrower scope on confirmation — exactly the Slack-Addie behavior from Jeffrey's screenshot.

Dependencies

Should land after #3804 (eval framework). The eval is the regression check that this rule edit holds up over time.

Files

  • server/src/addie/rules/behaviors.md — the rule edit
  • .changeset/addie-spec-feedback-coverage-rule.md — empty changeset (non-protocol change)
  • static/schemas/source/index.json — pre-existing main-branch hygiene fix the pre-push hook required (one-line "published_version": "3.0.3" added by npm run build:schemas)

Test plan

  • CI green
  • Manual smoke: eval at RFC_USE_LOCAL_PROMPT=1 shows 3/3 on CPQ at N=3
  • After deploy, the same eval at default (prod prompt) should also show 3/3 on CPQ — confirms the rule is live

bokelley and others added 2 commits May 2, 2026 07:05
Extends "Spec Feedback Response Pattern" with two clauses inside the
existing 1-VERIFY / 2-POSITION / 3-CLOSE-LOOP structure:

- Step 2 gains "Lead with coverage when verification reveals overlap" —
  if search_docs/get_schema show overlap with existing primitives, the
  reply must open with what's already covered (named fields, tasks,
  modes) before anything else. If the proposal extends a non-existent
  field, state that as a factual correction first; reviewers bounce
  the issue on the wrong-shape premise alone.

- Step 3 gains "Draft only after the caller has seen the coverage
  statement" — when step 2 surfaces overlap or factual error,
  draft_github_issue is NOT called in the same turn as verification.
  Coverage-leading reply first, draft on confirmation.

Targets the same-Addie-different-answers failure mode (Jeffrey Mayer
2026-05-01): web-Addie verified, found overlap, drafted anyway. Slack-Addie
got the same data and led with coverage. Both surfaces use the same rules,
so the rule itself was the leverage point.

Validated end-to-end against the eval at PR #3804 with RFC_USE_LOCAL_PROMPT=1
across N=3 runs per scenario:
- CPQ pricing:   prod prompt 0/3 (drafted) → with rule 3/3 (no draft)
- TMP signals:   3/3 with rule, no draft on any run
- Bilateral:     3/3 with rule, no draft on any run
- brand.json:    3/3 with rule, no draft on any run

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n-sync

`npm run build:schemas` output that pre-push hook (verify-version-sync)
requires; main was missing the field on this branch base.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley bokelley merged commit 609e27e into main May 2, 2026
14 checks passed
@bokelley bokelley deleted the bokelley/spec-feedback-coverage-rule branch May 2, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant