feat(ce-brainstorm,ce-plan): surface agent's scope synthesis before doc-write#705
feat(ce-brainstorm,ce-plan): surface agent's scope synthesis before doc-write#705
Conversation
Captures the cause-fix framing (scope under-visibility upstream → density downstream) and the four-requirement plan (R1 ce-brainstorm synthesis, R2 ce-plan synthesis, R3 anti-expansion). Depth-calibration mechanisms deferred to follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaces the agent's interpretation (Stated / Inferred / Out-of-scope) to the user before the requirements doc is written, so scope misinterpretation is caught at the cheapest moment rather than discovered post-write. Fires for all tiers including Lightweight (transition checkpoint value). Headless mode skips the prompt and embeds the synthesis with the Inferred list omitted — pipelines consume without human review, so propagating un-validated agent inferences as authoritative content is unsafe. Open prose feedback per Interaction Rule 5(a); option sets would leak the agent's framing of valid corrections. Soft-cut fires on circularity (same item revised twice), not iteration count — new-item revisions across rounds proceed without limit. The confirmed synthesis becomes the first section of the requirements doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eanup Solo invocation (no upstream brainstorm doc) hits Phase 0.4's brief bootstrap and then runs sub-agent dispatch in Phase 1 — substantial inference happens between user input and research, and the user has no checkpoint to validate scope before research effort is spent. Phase 0.7 surfaces the agent's synthesis (Stated / Inferred / Out-of-scope) post-bootstrap and pre-research, so scope misinterpretation is caught at the cheapest moment. Guards: skips on Phase 0.1 fast paths (resume / deepen), skips when Phase 0.4 routes out to ce-debug / ce-work / universal-planning, skips when Phase 0.2 found a brainstorm doc (defers to Phase 5.1.5 — coming in next commit). Headless mode skips the prompt and embeds with Inferred omitted. Folds in cleanup of stale SLFG references in this skill (4 hits: SKILL.md lines 781/798/847 + plan-handoff.md + universal-planning.md). The SLFG skill no longer exists. ce-code-review still has one stale reference; deferred to follow-up since that's a different skill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When ce-plan inherits an upstream brainstorm doc, the brainstorm + R1 synthesis already validated WHAT to build. Plan-time decisions about HOW (which files/modules to touch, which patterns to extend vs. introduce new, test scope, refactor scope) emerge during research and structuring — and those are exactly the decisions a user might want to correct before the plan commits to disk. Phase 5.1.5 surfaces those plan-time decisions as a synthesis (Stated / Inferred / Out) just before Phase 5.2 plan-write. Brainstorm-validated WHAT is assumed and not re-stated; the synthesis focuses on plan-perspective. Guards: skips on Phase 0.1 fast paths (resume / deepen), skips in solo invocation (defers to Phase 0.7). Graceful fallback for upstream brainstorms that pre-date the R1 synthesis section — Phase 5.1.5 runs as normal because plan-time decisions are independent of upstream synthesis presence. Headless mode skips prompt and embeds with Inferred omitted, matching the behavior of R1 (ce-brainstorm) and the Phase 0.7 solo variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anup Reinforces R2's scope discipline at plan-time: when research surfaces adjacent refactors, "while we're here" cleanups, or scope-adjacent nice-to-haves, they route to the existing Deferred to Follow-Up Work subsection rather than being included in active Implementation Units. Distinct from Phase 3.6 (unknowns at plan time) — 3.7 covers known but tangential work. The user's confirmed scope is what the active plan executes; everything else defers. User's explicit ask overrides — if they asked for a refactor, it's in-scope. Does NOT impose architectural bias on extend-vs-invent decisions within confirmed scope. That judgment stays with the agent and is surfaced via the Phase 5.1.5 synthesis when material. (Pressure-tested during planning and rejected as risking perpetuation of bad patterns when existing code is the problem; recorded in plan's Key Technical Decisions.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s templates Bullets are good for scanning specifics but a 1-3 line prose summary above them gives users the gist. They may agree with each individual Stated bullet but disagree with the overall framing — the prose surfaces that gestalt that bullets fragment, and forces the agent to commit to a synthesis-as-narrative the user can pattern-match against intent. Required for Standard / Deep tiers; skipped for Lightweight when bullets ARE the summary (no value in restating). Per-variant framing: - ce-brainstorm R1: gist of WHAT is being proposed (product behavior) - ce-plan R2 solo: gist of what scope the plan targets - ce-plan R2 brainstorm-sourced: gist of HOW the implementation approaches the work (WHAT is assumed from brainstorm) Anti-fluff guidance: lead with the actual thing in plain words; no qualifiers, no re-stating context the user just lived through. If the prose can't say what the work is in 1-3 lines without filler, the synthesis isn't ready yet. The prose summary stays in headless mode (it summarizes what's in the doc, not the un-validated agent inferences that get omitted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sting
The branch's initial Phase 2.5/0.7/5.1.5 design embedded the three-bucket
synthesis structure into the doc as a parallel `## Synthesis` first
section, with an italic capture-context note. Scenario A testing surfaced
several issues: synthesis bullets duplicated downstream sections (Stated
shadowed Requirements, Out shadowed Scope Boundaries), the italic
capture-context note leaked engineering process into the artifact, and the
agent loaded the synthesis-summary reference inconsistently — composing
syntheses from memory of SKILL.md alone, missing the prose-summary
requirement and decision-level discipline rules entirely.
Changes:
- Three-bucket structure becomes a chat-time-only artifact. After user
confirmation, only the prose summary embeds in the doc as `## Summary`.
Bucket content distributes into doc body sections: Stated to
Requirements, Inferred to Key Decisions, Out-of-scope to Scope
Boundaries.
- Drop the italic capture-context note ("Captured at Phase 2.5...") from
the doc template — process exhaust that future readers do not need.
- Drop the `## Next Steps` doc section — process exhaust duplicating the
chat-time handoff (Phase 4 already presents next-step options).
- Headless mode: skip Phase 2.5/0.7/5.1.5 entirely. No user to confirm to;
composing a synthesis only to discard before doc-write is ceremony. Doc
is mode-agnostic — interactive and headless produce structurally
identical artifacts.
- Add Summary vs Problem Frame discipline: forward-looking proposal vs
backward-looking situation. Problem Frame must not restate the proposal.
- Add "no fourth status" rule: every scope-shaping question by synthesis
time must be in Stated, Inferred, or Out — open questions surfaced
outside the buckets duplicate the prompt and give no resolution path.
- Force reliable reference load: STOP language at the top of each phase
point, naming the load-bearing rules in the reference and the failure
modes that result from skipping the load. Strip duplicated rule content
from SKILL.md (three-bucket description, prose-feedback paragraph,
soft-cut paragraph, self-redirect paragraph) so the agent has no inline
alternative that competes with the reference.
- Plan template: differentiate sections by origin presence (Summary,
Problem Frame, Requirements, Key Technical Decisions, Open Questions)
with single-sentence guidance per section. Brainstorm-sourced plans
reference origin for product context; solo plans carry both WHAT and
HOW.
- Plan template: rename `## Requirements Trace` to `## Requirements`. The
"trace" framing was specific to brainstorm-sourced plans and read odd in
solo plans where there's no upstream to trace to.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ements and legacy Requirements Trace section names The ce-plan plan template renamed `## Requirements Trace` to `## Requirements` (parent commit). Existing plan docs in user repos still use the legacy name, and tooling that reads plan docs (ce-work, ce-work-beta, ce-code-review) needs to recognize both forms during the transition. Changes: - ce-work and ce-work-beta SKILL.md and shipping-workflow.md: section lookup accepts `Requirements` or legacy `Requirements Trace` - ce-code-review SKILL.md: same — Stage 2 plan parsing accepts both - ce-plan/references/deepening-workflow.md: rename internal references to `Requirements` (this file describes plans this repo writes, not legacy plans) No breaking change: tooling continues to parse old plan docs. New plans use the new section name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f3c8c1bd8f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f3c8c1bd8f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Brainstorm and plan handoff messages reported the doc location with relative paths (e.g., `docs/brainstorms/...`). Modern terminals (Warp, iTerm2, ghostty, kitty) only auto-link absolute paths and `file://` URIs as clickable references — bare relative paths and `./`-prefixed paths are not detected, and Claude Code's markdown renderer does not emit OSC 8 hyperlinks for `[label](url)` syntax to bridge the gap. Net effect: users couldn't click the path to open the doc they just created. Manually copy-paste was the workaround. Changes: - ce-brainstorm/references/handoff.md: placeholder updated to `<absolute path to requirements doc>` in both preamble templates; one-sentence rule added. - ce-plan/references/plan-handoff.md: same — `<absolute path to plan>` in the post-generation question; one-sentence rule added. Doc bodies and commit messages keep relative paths (portability across machines and worktrees) — this rule applies only to chat output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n; don't expand it
Scenario A re-runs surfaced that Phase 2.5 (cli-printing-press)
and Phase 5.1.5 (busyblock) syntheses leak plan-body content into
synthesis bullets — column names, table.column references, file
paths with line numbers, exact JSON shapes, HTTP status codes,
exact event/log type names, SQL syntax. The agent only re-cut at
the right level when prompted with explicit examples; first-pass
output was implementation-flow narrative, not affirmable
decisions.
The existing rules forbade specific banned shapes ("file paths,
JSON shapes, exact error wording") but didn't state the positive
principle. A future leak in a shape not on the banlist still
slipped past — e.g., HTTP codes, exact wording of internal
identifiers — because the agent could rationalize "that's not on
the don't-include list."
Changes:
- ce-brainstorm/references/synthesis-summary.md: new section
"Granularity: name the decision; don't expand it" with
not-allowed list (paths, methods, JSON shapes, HTTP codes, SQL)
and three concrete bad-vs-good example pairs from the
cli-printing-press testing (manifest discovery, provenance
recording, reuse-signal copy).
- ce-plan/references/synthesis-summary.md: new shared section
with the same principle, allowed/not-allowed lists, variant-
specific line drawing (solo stricter than brainstorm-sourced
since brainstorm-validated WHAT hasn't constrained scope yet),
and four bad-vs-good example pairs from the busyblock testing
(timezone source, skip filter integration, reactivation guard,
partial cleanup failure response).
Both versions share the same test for the agent at runtime: a
scanner reading an Inferred bullet should affirm or reject it
without needing to read code. If they would have to look up a
column name, method name, or call graph to evaluate the bullet,
the granularity is wrong — that's plan-body content.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pe, not architecture
The busyblock pause-rule scenario surfaced architecture leakage in
Phase 2 approach descriptions. The agent's three approaches each
named specific implementation surfaces ("two new nullable
timestamp columns on syncRules", "RuleMatcher adds one check",
"reuse the existing excludeConditions JSON column", "new table
keyed by ruleId") rather than mechanism-level distinctions. Phase
2.5 caught and filtered these out, but Phase 2 itself shouldn't
produce that level of detail.
The shape failure forces the user to make architectural decisions
during brainstorming on ce-brainstorm's intentionally-shallow
research. ce-plan's research phase goes deeper; architecture
decisions should land there, with better data, not be locked in
at brainstorm time. ce-brainstorm answers WHAT to build; column
names, table names, file paths, service classes, JSON shapes
belong in ce-plan.
Change: ce-brainstorm SKILL.md Phase 2 ("Explore Approaches")
gains an "Approach granularity" rule. Approach descriptions name
mechanism-level distinctions ("pause as a rule property" vs
"pause as an event filter" vs "pause as a separate entity") and
product-relevant trade-offs (plan-tier coupling, complexity
surface, migration difficulty). They do not name implementation
specifics. The rule explicitly cites why: bringing architecture
forward at brainstorm time forces premature commitment, and the
synthesis at Phase 2.5 then has to filter out the leak instead of
carrying it forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sition sentence to the remedy The busyblock requirements doc's Problem Frame ended with a soft proposal-restatement: "A dedicated pause primitive collapses both pains into a single date-range action and removes the human- memory dependency on remembering to re-enable the rule on return." That single transition sentence violates the Summary vs Problem Frame discipline — Problem Frame is supposed to establish the situation and stop on the pain, with the remedy living in Summary. The existing rule said "establishes the situation, the specific moment of pain, and the cost shape — then stops." The agent slipped past it because "then stops" left room for a closing transition sentence. The doc reader hits the proposal twice: once in Summary, once at the end of Problem Frame. That's the exact duplication the discipline is meant to prevent. Change: extends the Problem Frame discipline bullet in requirements-capture.md with an explicit failure-mode example — the busyblock-shaped sentence is shown verbatim as a sign to cut. Adds a positive instruction: if the last paragraph of Problem Frame names what the doc is proposing, cut it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ecee14a98e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…lates The earlier commit 274a332 updated the absolute-path rule in references/plan-handoff.md but missed two inline copies of the handoff templates duplicated in ce-plan SKILL.md: - Phase 5.2 confirmation: "Plan written to docs/plans/[filename]" - Phase 5.4 menu question: "Plan ready at `docs/plans/YYYY-MM-DD-NNN-<type>-<name>-plan.md`..." Real-world testing surfaced this: an agent in C1 testing correctly followed SKILL.md (always-loaded layer) and reported a repo-relative path, leaving the path unclickable in the user's terminal. The agent's pushback when challenged ("the skill is internally consistent on repo-relative — no bug") was right about its loaded source, but the loaded source was inconsistent with the reference. The reference plan-handoff.md already had the absolute-path rule and updated placeholder. SKILL.md had the conflicting inline copy. SKILL.md wins at runtime because it loads at session start; the reference loads on-demand and gets overridden by the already-loaded SKILL.md content. Both inline templates now use `<absolute path to plan>`. The brief inline rationale ("use absolute path so the reference is clickable in modern terminals") is intentionally short — fuller context lives in the reference. ce-brainstorm SKILL.md does not have the same duplication (Phase 4 just delegates to references/handoff.md), so no mirror edit needed there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…to-route with accessibility-checked suggestion
Two findings from C1 testing (sparse-input solo bug fix on
cli-printing-press), bundled because both touch ce-plan SKILL.md
and emerged from the same test pass.
1. Problem Frame omission at Lightweight tier
----------------------------------------
The C1 plan doc skipped Problem Frame entirely on a focused
bug fix. The agent's instinct was right — Summary already
carried the situational context for a one-unit Lightweight
plan — but the spec didn't license it. Adding explicit license
to omit Problem Frame at Lightweight when Summary covers the
situation, so the agent doesn't have to rationalize.
Edit: extends the plan template's Problem Frame description to
note "Omit entirely at Lightweight tier when Summary already
carries the situational context."
2. ce-debug routing: replace auto-route with accessibility-checked
suggestion
----------------------------------------
The existing Phase 0.4 rule auto-routed to ce-debug when the
prompt was "symptom without a root cause." Two problems with
that:
a. The auto-route is more aggressive than ce-work's adjacent
"suggest alongside continuing" pattern. Asymmetric.
b. Auto-routing assumes ce-debug can act, but ce-debug requires
the buggy code to be accessible from cwd. Cross-repo bugs,
dependency-related bugs, and bugs about repos the user
isn't currently in produce useless ce-debug runs because
ce-debug can't read the relevant code.
Even *suggesting* ce-debug for inaccessible code is worse than
not — the user takes the suggestion, ce-debug switches in,
produces nothing useful, and trust in the suggestion mechanism
erodes.
New rule: bug-shaped prompts get ce-debug surfaced as a
route-out option alongside continuing with ce-plan, BUT only
after a quick accessibility check passes:
- No surface named in prompt → assume cwd, surface ce-debug
- Surface named matches cwd (files exist locally, named repo
matches cwd identity) → surface ce-debug
- Surface named clearly doesn't match cwd (different repo,
files not found locally) → do NOT surface ce-debug. Stay in
ce-plan silently — paper-planning is valid for cross-repo
work.
Check is conservative — it under-suggests in monorepos,
dependency bugs, or after renames. The spec explicitly notes
that users can manually invoke /ce-debug when the check
misses, accepting under-suggest as the right side to err on.
Symmetrizes with the existing ce-work routing pattern (suggest
alongside continuing, user decides). No more asymmetric
auto-route on bug-shaped prompts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6071ccc1a7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…oss-repo cd K2 testing surfaced that the binary in-cwd-vs-not check was wrong. The previous rule said: bug surface not in cwd → stay in ce-plan silently. In testing, the agent ignored that and did `cd /path/to/other-repo && grep ...` to investigate the named repo from disk. The behavior was helpful (the bug WAS reachable; paper-planning would have been useless) but it was a silent context switch the user did not authorize. The binary rule conflated "different cwd" with "inaccessible." In practice there are three states: 1. In cwd — bug surface is in current repo (named repo matches, or no specific repo named, or named files exist locally). Suggest ce-debug as route-out option. Same as before. 2. On disk but not in cwd — different repo is named, and a quick disk check confirms it's checked out at another local path. This is the case the binary rule got wrong. New rule: ASK the user explicitly with three options (investigate from the other path / paper-plan from current cwd / switch context first). Do NOT silently cd to the other repo and start investigating, even though the code is reachable. The agent's role is to surface the cross-repo signal, not to decide the context unilaterally. 3. Not on disk — named repo isn't found anywhere local. Stay in ce-plan silently and paper-plan. Same as before. State 2 is the new addition. It captures the K2 failure mode specifically and prescribes the right behavior (ASK, don't auto- investigate or auto-stay-silent). The "don't silently cd" anti-pattern is named explicitly so the agent has direct guidance against the K2 behavior pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…default outputs to target repo The previous three-state rule (in 880f9a5) was overcorrected. It treated silent cd to another local repo as a context switch requiring user permission via blocking question. On reflection, that conflates two different concerns: - Reading files at another path: not actually a context switch. The agent reads files via absolute paths all the time. Whether cwd matches the file path is incidental to file access. - Writing outputs (plan doc) to the wrong location: this IS the actual harm. A busyblock bug plan written to `cli-printing-press/docs/plans/` is a discoverability disaster — the user goes to busyblock to act on it and can't find it. The light-touch fix: announce the target repo explicitly before any cross-repo investigation, default plan outputs to the target repo's `docs/plans/`, and let the user interrupt if they want different behavior. No blocking question needed. Changes: - Collapse the previous three states ("in cwd" / "on disk elsewhere ASK" / "not on disk") into two ("reachable" / "unreachable"). Reachable surfaces ce-debug as option; not- reachable stays in ce-plan silently for paper-planning. - Add explicit announcement requirement when the bug is at another local path: name the path being read AND the default plan output destination (target repo's `docs/plans/`) before any investigation. - State the actual harm explicitly: silent investigation isn't the problem; silent operation on the wrong repo IS, especially output destination. - User can interrupt to redirect: write plan in cwd, switch context first, paper-plan only. The blocking-question requirement was friction that didn't match the underlying concern. Announce-and-proceed is the right shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d40aa653d5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…arate ce-debug routing
K2 retest revealed two more issues with the previous cross-repo
rule:
1. The 3-option menu I proposed (proceed / paper-plan from cwd /
switch context) merged two orthogonal decisions into one
question. Location ("where outputs land") and ce-debug routing
("which skill") are different axes; users picking
"Switch to /ce-debug" weren't told whether ce-debug operates
on the target repo or current cwd, leaving the option
underspecified.
2. The "paper-plan from current cwd" option doesn't cleanly map
to a real workflow. Cases that might want it (think
abstractly, capture for later, no-investigation plan) are
better served by /ce-brainstorm, an issue ticket, or paper-
planning to the TARGET repo. Producing a plan-for-busyblock
in cli-printing-press/docs/plans/ is a discoverability
disaster — the same harm we built the announcement to
prevent.
Simpler design that emerged in conversation:
- Drop the location menu entirely. The announcement makes the
cross-repo nature visible; the user can interrupt if they
want unusual behavior. No need to enumerate options that have
thin or contradictory use cases.
- Default behavior: proceed from target repo for both
investigation and plan output. Respects the user's stated
intent (they named that repo) without requiring confirmation.
- After announcing + proceeding, fire the standard ce-debug
routing menu — same shape as in-cwd case. Cross-repo location
and skill routing are explicitly orthogonal.
Net effect: cleaner UX, less ceremony, same protection against
the actual harm (silent cross-repo operation, mis-filed plan).
The cross-repo case now mostly behaves like the in-cwd case
plus an announcement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…thoring AGENTS.md This branch's testing-driven tightening surfaced design lessons worth capturing for future skill edits. Adds five principles to the plugin's authoring AGENTS.md (which guides contributors editing skills, not runtime agent behavior — that's already correctly excluded per the "Runtime vs Authoring Context" rule). Principles distilled from this session's testing: 1. Calibrate prescription to the failure mode — three levels: hard rules for deterministic safety; strong guidance with examples for judgment; trust where prescription would harm. Lean toward less prescription when in doubt. 2. SKILL.md caches at session start; references load on demand. Load-bearing rules need strong language at the top of SKILL.md phase, not just in references. SKILL.md and references that share rules must be updated together to avoid drift. 3. Split orthogonal decisions into sequential questions. Don't conflate location with skill routing, or other multi-axis questions, into a single menu — options become underspecified. 4. Process exhaust stays out of artifacts. Phase capture notes, "next steps" pointers to other skills, mode markers — don't leak engineering process into user-facing docs. 5. Test the spec by running it. Real-world tests reveal failure modes desk review misses. Before tightening: ask whether the agent's behavior was actually wrong, whether SKILL.md and references drifted apart, and whether this is load-reliability vs rule-content failure. Sometimes the fix is to loosen, not to tighten. These were earned from the testing-driven tightening on this branch — the load-reliability fix (Phase 2.5 strong-language load instruction), the SKILL.md vs reference inconsistency on absolute paths in handoff, the cross-repo K2 menu conflation that we then split into sequential decisions, and the broader move from over-prescriptive auto-route rules to announce-and- proceed defaults. Authoring AGENTS.md only — does not ship with the installed plugin per the existing "Runtime vs Authoring Context" rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ficial-prescription cases explicitly
The first pass of the Skill Design Principles section had a
one-sided closing bias ("when in doubt, lean toward less
prescription") that under-sold how often prescription IS the
right call in this plugin's actual work.
Real audit: this branch alone uses prescription in many
load-bearing places — strong-language load instructions, the
three-bucket synthesis structure, decision-level granularity
with bad-vs-good examples, Summary vs Problem Frame discipline,
"no fourth status" rule, Phase 3.7 anti-expansion, "re-present
after revision," "absolute paths in chat handoff," "no silent cd
to other repos." Most of these are essential, not optional.
Changes:
- Add concrete examples of beneficial prescription from this
plugin to the "hard rules" and "strong guidance" levels.
Future contributors can see what the levels actually look
like in practice.
- Replace the one-sided "lean toward less prescription" bias
with a balanced framing: match the level to the failure mode
in both directions. Both over-prescription and under-
prescription have real failure modes, and the plugin's actual
prescription pattern is closer to "match precisely" than
"lean loose."
- Add a concrete diagnostic: can you name a specific bad outcome
the prescription prevents? If yes, it's justified. If no,
lean toward trust. This gives contributors a sharper question
than "do I feel like this is too prescriptive."
Net: the calibration ladder remains the same shape, but the
guidance is balanced and reflects the actual prescription
profile this plugin uses successfully.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed example lists
Previous edit added 5+5 enumerated examples per level which
weren't asked for and bloated the section. The actual fix needed
was just balancing the closing bias so contributors don't read
it as "never be prescriptive."
Restore one example per level (concise), keep the balanced
framing ("match the level to the failure mode in both
directions"), keep the diagnostic question ("can you name a
specific bad outcome the prescription prevents?").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d0a9d6ac74
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ns, fast-path announce-mode, plus 8 alignment fixes PR #705 review surfaced gaps from the testing-driven tightening that shipped earlier on this branch. Most notable: a philosophical reframe of what "headless mode" means. The big reframe — headless = non-interactive, not unaudited --------------------------------------------------------- Earlier commits on this branch landed "skip Phase 2.5 / 0.7 / 5.1.5 entirely in headless." Justification: process-exhaust principle. But this conflated two concepts: - Headless mode means no synchronous user during the run. - It does NOT mean no human ever reviews the artifact — ce-doc-review, ce-plan, and human PR reviewers all read the doc later. When we skip synthesis composition entirely, the artifact has no way to surface which decisions were user-stated vs agent- inferred. Un-validated agent bets propagate as authoritative decisions, indistinguishable from confirmed scope. New shape: - Compose the synthesis in non-interactive mode (forcing function preserved). - Stated → Requirements (user's actual constraints). - Out-of-scope → Scope Boundaries. - Inferred → new `## Assumptions` section, explicitly labeled as un-validated agent bets that downstream review must scrutinize. Do NOT route Inferred to Key Technical Decisions in non-interactive mode — that hides un-validated bets as authoritative content. The `## Assumptions` section appears in non-interactive docs only. Interactive docs are unchanged: Inferred bets get user- corrected in chat and either become Key Technical Decisions or are revised away. This restores the original design's intent (un-validated bets must not propagate as authoritative content) but surfaces them under their own label rather than hiding them. The Phase 0.2 fast path — announce-mode ---------------------------------------- Codex pointed out Phase 0.2's "requirements already clear" fast path goes straight to Phase 3, bypassing Phase 2.5 entirely. Fix: fast path now routes through Phase 2.5 in announce-mode — emit the synthesis for visibility, then proceed to Phase 3 without blocking. User can interrupt if they spot a wrong inference. Preserves visibility on clear-input cases without adding interaction cost. Eight alignment fixes --------------------- - Add `Summary` to ce-doc-review's framing-section list (premise- chain root detector now recognizes the new heading; `Overview` retained as legacy). - Distinguish per-variant timing in plan-side headless guidance: solo Phase 0.7 fires before research; brainstorm-sourced Phase 5.1.5 fires after research. Earlier text incorrectly said "directly to plan-write" for both. - Rename `## Key Decisions` to `## Key Technical Decisions` in plan-side reference's doc-shape routing table to match the canonical plan template heading. - Tighten plan-side orientation: don't list "exact method signatures, JSON schemas" as plan-body content — Planning Rules (Phase 4.3) explicitly forbid those. - Tighten brainstorm-side orientation: implementation detail goes to ce-plan, not the requirements doc. Earlier text said impl detail "belongs IN the doc" but requirements-capture.md forbids it. - Add headless default for ce-debug routing: skip the suggestion menu, default to continuing with /ce-plan. No synchronous user to resolve the route-out choice. - Update visual-communication.md: stale "Problem/Overview" reference now reads "Summary or Problem Frame" with `Overview` noted as legacy. - Clarify AGENTS.md "process exhaust" principle: distinguish exhaust (agent bookkeeping) from audit content (downstream readers need it to evaluate the artifact). The `## Assumptions` section in non-interactive mode is audit content, not exhaust. Threads addressed: 10 (all unresolved review threads). Next: needs end-to-end test in fresh sessions to validate the Assumptions-routing and announce-mode behaviors in practice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 16d97ef367
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…dary to announce-mode Two threads from PR #705 review pass 2: 1. ce-plan/references/synthesis-summary.md had a stale orientation line saying "Both skip entirely in headless mode" while the detailed Headless section directly below specified compose + route to Assumptions. Same file, contradictory guidance — would make automated behavior nondeterministic. Aligned the orientation with the detail block. 2. Announce-mode (Phase 0.2 fast path → Phase 2.5) emitted synthesis and immediately fired the Write tool in the same turn. In Claude Code's streaming UX, the user has no real interruption window between synthesis emission and doc-write — the Esc-during-stream theoretical interrupt is fragile in practice. Codex pointed out this undermines Phase 2.5's stated purpose as the final pre-write scope checkpoint. Resolution: announce-mode now emits synthesis and ends the turn (no Write tool call in the same turn). On the user's next message: any acknowledgment proceeds to doc-write; any correction triggers synthesis revision. Lighter than full Phase 2.5 (no AskUserQuestion menu, no formal confirm) but gives the user a real interruption window before the doc lands. ce-brainstorm sits early in the workflow — a wrong doc feeds downstream into ce-plan and implementation — so the turn boundary is justified even on the fast path. Both fixes align with the Skill Design Principles section in AGENTS.md: hard rules for deterministic safety where the failure mode justifies it (announce-mode without a turn boundary fails to provide a real correction window in practice; the turn boundary is the safety condition). Threads addressed: 2 of 2 new threads. Cross-invocation signal fired on this run (10 prior threads from round 1 are visible to the cross-invocation analysis); no clusters formed because the two new threads sit in different subtrees. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d/Deep Calibration call surfaced during PR #705 testing: a Lightweight synthesis with 16 bullets across three buckets is qualitatively different from a Lightweight synthesis with 2 bullets. The "Skip for Lightweight when bullets ARE the summary" rule was being applied to both, but only the second is the case the rule was designed for. Result: detail-rich Lightweight syntheses (e.g., a single-flag addition with 7 Stated requirements) shipped without a prose gestalt. Reading 16 bullets without a 1-3 line "what's the gist?" forces the reader to construct framing themselves. New rule: prose summary required for ALL tiers. Skip only for truly-trivial cases where the synthesis is ≤ 2 bullets that echo the prompt (e.g., "fix the typo on line 47" producing a synthesis of "Stated: fix the typo on line 47"). Updated: - ce-brainstorm/references/synthesis-summary.md: prose summary discipline + prompt template placeholder - ce-plan/references/synthesis-summary.md: same for both solo and brainstorm-sourced variants - ce-brainstorm/references/requirements-capture.md: section matrix Lightweight column - ce-plan/SKILL.md plan template: Summary description The "When the synthesis would be redundant" section in the brainstorm reference is unchanged — it still describes the truly-trivial single-paragraph case correctly. Per AGENTS.md Skill Design Principles, this tightens a rule that had a real failure mode (rote-feeling syntheses without a gestalt) without over-prescribing — there's still a truly-trivial escape hatch where the rule would create padding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f59b3d48ca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… Summary alignment Two stale lines surfaced by Codex review pass 3 — both contradictions I introduced and didn't sweep fully on the previous commits: 1. ce-brainstorm/references/synthesis-summary.md line 9 still said "Skip Phase 2.5 entirely in headless mode" while the detailed Headless mode section in the same file requires composing the synthesis and routing Inferred to `## Assumptions`. Same pattern I fixed on the plan side in eef4a69 but missed on the brainstorm side. Now aligned: "Phase 2.5 still fires — synthesis composed but not user-confirmed; Inferred bets route to `## Assumptions`." 2. ce-brainstorm/references/requirements-capture.md line 41 said Lightweight may omit Summary "when bullets are the summary" — contradicting the section matrix above (updated in f59b3d4 to require Summary across all tiers, with only the truly-trivial escape). Now aligned: matches the matrix's "skip only when synthesis ≤ 2 bullets that echo the prompt" framing. Pattern: my edits in commits eef4a69 and f59b3d4 each caught the primary location but didn't sweep all sibling locations referencing the same rule. Codex's incremental reviews are now my safety net catching these. Worth noting in AGENTS.md as a reminder for future spec edits — when changing a rule, grep for ALL co-located restatements of it, not just the headline location. Threads addressed: 2 of 2 new (third review pass on PR #705). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…havioral-conditional requirements
Previous trigger criterion ("include when behavior is hard to pin
down without a concrete scenario") was judgment-based and produced
real variance: same `--quiet` flag synthesis, two test runs, one
agent generated 4 AEs and another generated 0. Both were defensible
reads of the spec.
The asymmetric failure mode favors more-inclusion: under-inclusion
makes downstream planners invent missing context; over-inclusion is
just ceremony. Per AGENTS.md "match prescription to failure mode,"
tightening here is justified.
New rule: AEs are REQUIRED for behavioral-conditional requirements
(any "When X, Y" or "If X, Y" framing) regardless of tier — even
Lightweight. Conditional framing signals state-dependent behavior
where prose alone leaves implicit ambiguity (e.g., "When --quiet is
set, errors continue to surface" — does that cover warnings? binary
errors? AE pins it down).
Non-conditional requirements remain triggered (Standard/Deep) or
omit-unless-triggered (Lightweight) per existing rules. The section
is still not exhaustive — AEs cover ambiguity, not every R-ID.
Updated:
- requirements-capture.md section matrix: Acceptance Examples row
- requirements-capture.md trigger criterion paragraph
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oc-write (EveryInc#705) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
ce-brainstorm and ce-plan now catch scope misinterpretation BEFORE work is wasted. Agents echo back their interpretation of the user's request (what was Stated, what was Inferred, what's deliberately Out-of-scope) at the cheapest correction points — before sub-agent research is dispatched, before requirements docs are written, before plans commit to disk.
The cost of catching scope drift went from "rerun the skill / rewrite the doc" to "correct a bullet in chat and continue." For a typical
/ce-planinvocation that means ~10-15 minutes of wasted research averted; for/ce-brainstormit means a doc on disk that actually reflects user intent rather than the agent's assumptions.Two additional results:
## Assumptionssection in the artifact. Downstream review can scrutinize them as bets, not mistake them for authoritative requirements.Addresses #676 at the cause: the reporter described brownfield artifact bloat as the symptom; the originating brainstorm concluded scope under-visibility was the upstream cause. This iteration ships the upstream fix.
Phases added
Deferred to Follow-Up Work, not active diffSynthesis prompts use Stated / Inferred / Out-of-scope structure plus a 1-3 line prose summary above the bullets. Open prose feedback (no menu) per Interaction Rule 5(a). Headless mode skips Phase 2.5 / 0.7 / 5.1.5 entirely — no user to confirm to; composing a synthesis only to discard before doc-write is ceremony.
Testing-driven tightening
Real-world testing surfaced gaps the desk-reviewed spec missed. Each fix commit traces to a specific test observation:
## Synthesissection in doc entirely. Only the prose summary embeds, as## Summary. Bucket content distributes into Requirements / Key Decisions / Scope Boundaries. No italic capture-context note. No## Next Stepssection.## Requirements Trace→## Requirements; downstream consumers (ce-work, ce-work-beta, ce-code-review) recognize both names for back-compat with existing plan docsSkill Design Principles (captured for future contributors)
A new section in
plugins/compound-engineering/AGENTS.mddistills the meta-learnings:Design decisions worth scrutiny
Solo Phase 0.7 fires pre-research; brainstorm-sourced Phase 5.1.5 fires pre-write — asymmetric by design. Solo has minimal pre-write interview; catching scope errors before sub-agent dispatch is where correction is cheapest. Brainstorm-sourced has validated WHAT, so research is well-targeted; plan-time decisions emerge during research, so pre-write catches them at the latest cheap moment.
Soft-cut on circularity, not iteration count. Blocking question fires only when the same item is revised twice. New-item revisions across rounds proceed without limit — revising different aspects of a wrong synthesis is exactly what the mechanism should support; a hard iteration cap would punish that.
No extend-over-invent bias within confirmed scope. An earlier draft included this; pressure-testing rejected it because "default to extending existing code" risks perpetuating bad patterns when existing code is the problem. The architectural extend-vs-invent decision stays with the agent and surfaces via brainstorm-sourced synthesis when material. Phase 3.7 (anti-expansion) handles legitimate scope-creep without imposing architectural bias.
Cross-repo handling: announce, don't ASK. Earlier drafts treated silent
cdto a different local repo as a context switch requiring user permission. The actual harm is silent operation on the wrong repo (especially output destination) — not file reads. The current rule announces target path AND output destination explicitly, then proceeds. User interrupts if they want different behavior. The previous "block until user confirms location" was over-prescription that conflated file access with context switching.Deferred work
Density-control tools — calibrated tier exemplars and brevity passes for defensive sections — are deliberately deferred. The working hypothesis is that scope under-visibility was the upstream cause; density should follow from disciplined scope. The plan's Validation section names signals to watch (synthesis-correction rate, self-redirect rate, density signal); revisit at 30/60/90-day intervals.
Test plan
bun test(960 tests) andbun run release:validatepass at every commit.The 11 fix commits are the real test plan — each driven by a specific real-world test scenario across two repos. Plugin caching at session start means behavioral validation requires fresh sessions after spec edits; the testing methodology surfaced this, and the AGENTS.md Skill Design Principles section captures it for future contributors.
References
docs/brainstorms/2026-04-24-surface-scope-earlier-requirements.mddocs/plans/2026-04-26-feat-surface-scope-earlier-plan.md