feat(addie): lead with coverage before drafting RFC issues#3815
Merged
feat(addie): lead with coverage before drafting RFC issues#3815
Conversation
Extends "Spec Feedback Response Pattern" with two clauses inside the existing 1-VERIFY / 2-POSITION / 3-CLOSE-LOOP structure: - Step 2 gains "Lead with coverage when verification reveals overlap" — if search_docs/get_schema show overlap with existing primitives, the reply must open with what's already covered (named fields, tasks, modes) before anything else. If the proposal extends a non-existent field, state that as a factual correction first; reviewers bounce the issue on the wrong-shape premise alone. - Step 3 gains "Draft only after the caller has seen the coverage statement" — when step 2 surfaces overlap or factual error, draft_github_issue is NOT called in the same turn as verification. Coverage-leading reply first, draft on confirmation. Targets the same-Addie-different-answers failure mode (Jeffrey Mayer 2026-05-01): web-Addie verified, found overlap, drafted anyway. Slack-Addie got the same data and led with coverage. Both surfaces use the same rules, so the rule itself was the leverage point. Validated end-to-end against the eval at PR #3804 with RFC_USE_LOCAL_PROMPT=1 across N=3 runs per scenario: - CPQ pricing: prod prompt 0/3 (drafted) → with rule 3/3 (no draft) - TMP signals: 3/3 with rule, no draft on any run - Bilateral: 3/3 with rule, no draft on any run - brand.json: 3/3 with rule, no draft on any run Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n-sync `npm run build:schemas` output that pre-push hook (verify-version-sync) requires; main was missing the field on this branch base. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
server/src/addie/rules/behaviors.mdwith two clauses inside the existing 1-VERIFY / 2-POSITION / 3-CLOSE-LOOP structure.search_docs/get_schemashow overlap with existing primitives, the reply must open with what's already covered before anything else. Factually wrong premises (proposal extends a non-existent field) get a one-line correction first.draft_github_issueis NOT called in the same turn as verification. Coverage-leading reply first, draft on confirmation.Why
Same-Addie-different-answers failure on 2026-05-01: Jeffrey Mayer (DanAds) drafted four RFCs with web-Addie, posted to Slack, Slack-Addie pushed back on each with spec citations. Both surfaces use the same rules — the rule itself was missing the load-bearing instruction. The existing rule listed "does the spec already handle this?" as one of four bullets to consider; it didn't say "if yes, lead with that before drafting."
Validation
Eval at #3804 with
RFC_USE_LOCAL_PROMPT=1(validates against this branch's rules, not deployed prod) at N=3 runs per scenario:Under the rule,
draft_github_issuedoes not fire in the same turn as the verification on any of the 12 runs. The model leads with "most of this is already covered: pricing_options + buying_mode: refine + account" and offers to draft a narrower scope on confirmation — exactly the Slack-Addie behavior from Jeffrey's screenshot.Dependencies
Should land after #3804 (eval framework). The eval is the regression check that this rule edit holds up over time.
Files
server/src/addie/rules/behaviors.md— the rule edit.changeset/addie-spec-feedback-coverage-rule.md— empty changeset (non-protocol change)static/schemas/source/index.json— pre-existing main-branch hygiene fix the pre-push hook required (one-line"published_version": "3.0.3"added bynpm run build:schemas)Test plan
RFC_USE_LOCAL_PROMPT=1shows 3/3 on CPQ at N=3