feat(ce-brainstorm,ce-plan): surface agent's scope synthesis before doc-write by tmchow · Pull Request #705 · EveryInc/compound-engineering-plugin

tmchow · 2026-04-27T07:16:46Z

Summary

ce-brainstorm and ce-plan now catch scope misinterpretation BEFORE work is wasted. Agents echo back their interpretation of the user's request (what was Stated, what was Inferred, what's deliberately Out-of-scope) at the cheapest correction points — before sub-agent research is dispatched, before requirements docs are written, before plans commit to disk.

The cost of catching scope drift went from "rerun the skill / rewrite the doc" to "correct a bullet in chat and continue." For a typical /ce-plan invocation that means ~10-15 minutes of wasted research averted; for /ce-brainstorm it means a doc on disk that actually reflects user intent rather than the agent's assumptions.

Two additional results:

Non-interactive runs (LFG, batch invocations) now surface un-validated agent inferences in a labeled ## Assumptions section in the artifact. Downstream review can scrutinize them as bets, not mistake them for authoritative requirements.
Meta-principles captured in the plugin's authoring AGENTS.md — calibration of prescription vs trust, SKILL.md vs reference load semantics, when to split menu decisions, what counts as process exhaust vs audit content. Future skill edits inherit the discipline rather than rediscovering it through testing.

Addresses #676 at the cause: the reporter described brownfield artifact bloat as the symptom; the originating brainstorm concluded scope under-visibility was the upstream cause. This iteration ships the upstream fix.

Phases added

Skill	Phase	Fires when	Surfaces
ce-brainstorm	Phase 2.5	After Phase 2 approach selection, before Phase 3 doc-write. All tiers including Lightweight.	Scope synthesis before the requirements doc lands
ce-plan	Phase 0.7	Solo invocation: after Phase 0.4 bootstrap, before Phase 1 research	Full-breadth scope synthesis (Phase 0.4 inferences are load-bearing)
ce-plan	Phase 5.1.5	Brainstorm-sourced invocation: after Phase 1 research, before Phase 5.2 plan-write	Plan-time decisions only (HOW; brainstorm validated WHAT)
ce-plan	Phase 3.7	Plan structuring	Anti-expansion: tangential cleanup → `Deferred to Follow-Up Work`, not active diff

Synthesis prompts use Stated / Inferred / Out-of-scope structure plus a 1-3 line prose summary above the bullets. Open prose feedback (no menu) per Interaction Rule 5(a). Headless mode skips Phase 2.5 / 0.7 / 5.1.5 entirely — no user to confirm to; composing a synthesis only to discard before doc-write is ceremony.

Testing-driven tightening

Real-world testing surfaced gaps the desk-reviewed spec missed. Each fix commit traces to a specific test observation:

Concern	Tightening
Synthesis embedding shape	Drop `## Synthesis` section in doc entirely. Only the prose summary embeds, as `## Summary`. Bucket content distributes into Requirements / Key Decisions / Scope Boundaries. No italic capture-context note. No `## Next Steps` section.
Three-bucket structure	Explicit chat-time-only artifact; doesn't carry into doc as a parallel section
Decision-level granularity	"Name the decision; don't expand it" with bad-vs-good examples. Synthesis bullets must be affirmable/rejectable without reading code — no column names, file paths, JSON shapes, HTTP codes, or exact wording leak into bullets
Phase 2 approach granularity	Approach pros/cons name mechanism shape, not architecture. Implementation specifics belong to ce-plan, not ce-brainstorm
Problem Frame discipline	Strict stop on the pain — no transition sentence to the remedy. Lightweight tier may omit Problem Frame entirely when Summary covers the situation
Plan template differentiation	Per-section guidance for brainstorm-sourced vs solo plans. Renamed `## Requirements Trace` → `## Requirements`; downstream consumers (ce-work, ce-work-beta, ce-code-review) recognize both names for back-compat with existing plan docs
Reference load reliability	"STOP. Read references/synthesis-summary.md before composing" at top of phase. Stripped duplicated rule content from SKILL.md so the reference is the single source of truth — competing inline content in SKILL.md was making the reference feel optional
Absolute paths in handoff	Chat output uses absolute paths (clickable in modern terminals via OSC 8 / auto-detection). Doc bodies and commit messages keep relative paths for portability
ce-debug routing	Replace auto-route with accessibility-checked suggestion. Bug-shaped prompt + reachable code → surface ce-debug as menu option. Cross-repo case: announce target repo and output destination, default proceed from target for both investigation and plan-write

Skill Design Principles (captured for future contributors)

A new section in plugins/compound-engineering/AGENTS.md distills the meta-learnings:

Calibrate prescription level to the failure mode — three rough levels (hard rules for deterministic safety; strong guidance with examples for judgment calls; trust where prescription harms). Match in both directions; can you name a specific bad outcome the prescription prevents?
SKILL.md caches at session start; references load on demand — implications for where load-bearing rules live and why SKILL.md and references must stay in sync
Split orthogonal decisions into sequential questions — don't conflate "where to operate" with "which skill" into one menu
Process exhaust stays out of artifacts — engineering process metadata doesn't belong in user-facing docs
Test the spec by running it — desk review misses what real-world testing surfaces (load reliability, plugin caching, agent interpretation drift, conflation in menu shapes)

Design decisions worth scrutiny

Solo Phase 0.7 fires pre-research; brainstorm-sourced Phase 5.1.5 fires pre-write — asymmetric by design. Solo has minimal pre-write interview; catching scope errors before sub-agent dispatch is where correction is cheapest. Brainstorm-sourced has validated WHAT, so research is well-targeted; plan-time decisions emerge during research, so pre-write catches them at the latest cheap moment.

Soft-cut on circularity, not iteration count. Blocking question fires only when the same item is revised twice. New-item revisions across rounds proceed without limit — revising different aspects of a wrong synthesis is exactly what the mechanism should support; a hard iteration cap would punish that.

No extend-over-invent bias within confirmed scope. An earlier draft included this; pressure-testing rejected it because "default to extending existing code" risks perpetuating bad patterns when existing code is the problem. The architectural extend-vs-invent decision stays with the agent and surfaces via brainstorm-sourced synthesis when material. Phase 3.7 (anti-expansion) handles legitimate scope-creep without imposing architectural bias.

Cross-repo handling: announce, don't ASK. Earlier drafts treated silent cd to a different local repo as a context switch requiring user permission. The actual harm is silent operation on the wrong repo (especially output destination) — not file reads. The current rule announces target path AND output destination explicitly, then proceeds. User interrupts if they want different behavior. The previous "block until user confirms location" was over-prescription that conflated file access with context switching.

Deferred work

Density-control tools — calibrated tier exemplars and brevity passes for defensive sections — are deliberately deferred. The working hypothesis is that scope under-visibility was the upstream cause; density should follow from disciplined scope. The plan's Validation section names signals to watch (synthesis-correction rate, self-redirect rate, density signal); revisit at 30/60/90-day intervals.

Test plan

bun test (960 tests) and bun run release:validate pass at every commit.

The 11 fix commits are the real test plan — each driven by a specific real-world test scenario across two repos. Plugin caching at session start means behavioral validation requires fresh sessions after spec edits; the testing methodology surfaced this, and the AGENTS.md Skill Design Principles section captures it for future contributors.

References

Origin: docs/brainstorms/2026-04-24-surface-scope-earlier-requirements.md
Plan: docs/plans/2026-04-26-feat-surface-scope-earlier-plan.md
Issue: EveryInc/compound-engineering-plugin#676

Captures the cause-fix framing (scope under-visibility upstream → density downstream) and the four-requirement plan (R1 ce-brainstorm synthesis, R2 ce-plan synthesis, R3 anti-expansion). Depth-calibration mechanisms deferred to follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Surfaces the agent's interpretation (Stated / Inferred / Out-of-scope) to the user before the requirements doc is written, so scope misinterpretation is caught at the cheapest moment rather than discovered post-write. Fires for all tiers including Lightweight (transition checkpoint value). Headless mode skips the prompt and embeds the synthesis with the Inferred list omitted — pipelines consume without human review, so propagating un-validated agent inferences as authoritative content is unsafe. Open prose feedback per Interaction Rule 5(a); option sets would leak the agent's framing of valid corrections. Soft-cut fires on circularity (same item revised twice), not iteration count — new-item revisions across rounds proceed without limit. The confirmed synthesis becomes the first section of the requirements doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eanup Solo invocation (no upstream brainstorm doc) hits Phase 0.4's brief bootstrap and then runs sub-agent dispatch in Phase 1 — substantial inference happens between user input and research, and the user has no checkpoint to validate scope before research effort is spent. Phase 0.7 surfaces the agent's synthesis (Stated / Inferred / Out-of-scope) post-bootstrap and pre-research, so scope misinterpretation is caught at the cheapest moment. Guards: skips on Phase 0.1 fast paths (resume / deepen), skips when Phase 0.4 routes out to ce-debug / ce-work / universal-planning, skips when Phase 0.2 found a brainstorm doc (defers to Phase 5.1.5 — coming in next commit). Headless mode skips the prompt and embeds with Inferred omitted. Folds in cleanup of stale SLFG references in this skill (4 hits: SKILL.md lines 781/798/847 + plan-handoff.md + universal-planning.md). The SLFG skill no longer exists. ce-code-review still has one stale reference; deferred to follow-up since that's a different skill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When ce-plan inherits an upstream brainstorm doc, the brainstorm + R1 synthesis already validated WHAT to build. Plan-time decisions about HOW (which files/modules to touch, which patterns to extend vs. introduce new, test scope, refactor scope) emerge during research and structuring — and those are exactly the decisions a user might want to correct before the plan commits to disk. Phase 5.1.5 surfaces those plan-time decisions as a synthesis (Stated / Inferred / Out) just before Phase 5.2 plan-write. Brainstorm-validated WHAT is assumed and not re-stated; the synthesis focuses on plan-perspective. Guards: skips on Phase 0.1 fast paths (resume / deepen), skips in solo invocation (defers to Phase 0.7). Graceful fallback for upstream brainstorms that pre-date the R1 synthesis section — Phase 5.1.5 runs as normal because plan-time decisions are independent of upstream synthesis presence. Headless mode skips prompt and embeds with Inferred omitted, matching the behavior of R1 (ce-brainstorm) and the Phase 0.7 solo variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…anup Reinforces R2's scope discipline at plan-time: when research surfaces adjacent refactors, "while we're here" cleanups, or scope-adjacent nice-to-haves, they route to the existing Deferred to Follow-Up Work subsection rather than being included in active Implementation Units. Distinct from Phase 3.6 (unknowns at plan time) — 3.7 covers known but tangential work. The user's confirmed scope is what the active plan executes; everything else defers. User's explicit ask overrides — if they asked for a refactor, it's in-scope. Does NOT impose architectural bias on extend-vs-invent decisions within confirmed scope. That judgment stays with the agent and is surfaced via the Phase 5.1.5 synthesis when material. (Pressure-tested during planning and rejected as risking perpetuation of bad patterns when existing code is the problem; recorded in plan's Key Technical Decisions.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…s templates Bullets are good for scanning specifics but a 1-3 line prose summary above them gives users the gist. They may agree with each individual Stated bullet but disagree with the overall framing — the prose surfaces that gestalt that bullets fragment, and forces the agent to commit to a synthesis-as-narrative the user can pattern-match against intent. Required for Standard / Deep tiers; skipped for Lightweight when bullets ARE the summary (no value in restating). Per-variant framing: - ce-brainstorm R1: gist of WHAT is being proposed (product behavior) - ce-plan R2 solo: gist of what scope the plan targets - ce-plan R2 brainstorm-sourced: gist of HOW the implementation approaches the work (WHAT is assumed from brainstorm) Anti-fluff guidance: lead with the actual thing in plain words; no qualifiers, no re-stating context the user just lived through. If the prose can't say what the work is in 1-3 lines without filler, the synthesis isn't ready yet. The prose summary stays in headless mode (it summarizes what's in the doc, not the un-validated agent inferences that get omitted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sting The branch's initial Phase 2.5/0.7/5.1.5 design embedded the three-bucket synthesis structure into the doc as a parallel `## Synthesis` first section, with an italic capture-context note. Scenario A testing surfaced several issues: synthesis bullets duplicated downstream sections (Stated shadowed Requirements, Out shadowed Scope Boundaries), the italic capture-context note leaked engineering process into the artifact, and the agent loaded the synthesis-summary reference inconsistently — composing syntheses from memory of SKILL.md alone, missing the prose-summary requirement and decision-level discipline rules entirely. Changes: - Three-bucket structure becomes a chat-time-only artifact. After user confirmation, only the prose summary embeds in the doc as `## Summary`. Bucket content distributes into doc body sections: Stated to Requirements, Inferred to Key Decisions, Out-of-scope to Scope Boundaries. - Drop the italic capture-context note ("Captured at Phase 2.5...") from the doc template — process exhaust that future readers do not need. - Drop the `## Next Steps` doc section — process exhaust duplicating the chat-time handoff (Phase 4 already presents next-step options). - Headless mode: skip Phase 2.5/0.7/5.1.5 entirely. No user to confirm to; composing a synthesis only to discard before doc-write is ceremony. Doc is mode-agnostic — interactive and headless produce structurally identical artifacts. - Add Summary vs Problem Frame discipline: forward-looking proposal vs backward-looking situation. Problem Frame must not restate the proposal. - Add "no fourth status" rule: every scope-shaping question by synthesis time must be in Stated, Inferred, or Out — open questions surfaced outside the buckets duplicate the prompt and give no resolution path. - Force reliable reference load: STOP language at the top of each phase point, naming the load-bearing rules in the reference and the failure modes that result from skipping the load. Strip duplicated rule content from SKILL.md (three-bucket description, prose-feedback paragraph, soft-cut paragraph, self-redirect paragraph) so the agent has no inline alternative that competes with the reference. - Plan template: differentiate sections by origin presence (Summary, Problem Frame, Requirements, Key Technical Decisions, Open Questions) with single-sentence guidance per section. Brainstorm-sourced plans reference origin for product context; solo plans carry both WHAT and HOW. - Plan template: rename `## Requirements Trace` to `## Requirements`. The "trace" framing was specific to brainstorm-sourced plans and read odd in solo plans where there's no upstream to trace to. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ements and legacy Requirements Trace section names The ce-plan plan template renamed `## Requirements Trace` to `## Requirements` (parent commit). Existing plan docs in user repos still use the legacy name, and tooling that reads plan docs (ce-work, ce-work-beta, ce-code-review) needs to recognize both forms during the transition. Changes: - ce-work and ce-work-beta SKILL.md and shipping-workflow.md: section lookup accepts `Requirements` or legacy `Requirements Trace` - ce-code-review SKILL.md: same — Stage 2 plan parsing accepts both - ce-plan/references/deepening-workflow.md: rename internal references to `Requirements` (this file describes plans this repo writes, not legacy plans) No breaking change: tooling continues to parse old plan docs. New plans use the new section name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f3c8c1bd8f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f3c8c1bd8f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Brainstorm and plan handoff messages reported the doc location with relative paths (e.g., `docs/brainstorms/...`). Modern terminals (Warp, iTerm2, ghostty, kitty) only auto-link absolute paths and `file://` URIs as clickable references — bare relative paths and `./`-prefixed paths are not detected, and Claude Code's markdown renderer does not emit OSC 8 hyperlinks for `[label](url)` syntax to bridge the gap. Net effect: users couldn't click the path to open the doc they just created. Manually copy-paste was the workaround. Changes: - ce-brainstorm/references/handoff.md: placeholder updated to `<absolute path to requirements doc>` in both preamble templates; one-sentence rule added. - ce-plan/references/plan-handoff.md: same — `<absolute path to plan>` in the post-generation question; one-sentence rule added. Doc bodies and commit messages keep relative paths (portability across machines and worktrees) — this rule applies only to chat output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n; don't expand it Scenario A re-runs surfaced that Phase 2.5 (cli-printing-press) and Phase 5.1.5 (busyblock) syntheses leak plan-body content into synthesis bullets — column names, table.column references, file paths with line numbers, exact JSON shapes, HTTP status codes, exact event/log type names, SQL syntax. The agent only re-cut at the right level when prompted with explicit examples; first-pass output was implementation-flow narrative, not affirmable decisions. The existing rules forbade specific banned shapes ("file paths, JSON shapes, exact error wording") but didn't state the positive principle. A future leak in a shape not on the banlist still slipped past — e.g., HTTP codes, exact wording of internal identifiers — because the agent could rationalize "that's not on the don't-include list." Changes: - ce-brainstorm/references/synthesis-summary.md: new section "Granularity: name the decision; don't expand it" with not-allowed list (paths, methods, JSON shapes, HTTP codes, SQL) and three concrete bad-vs-good example pairs from the cli-printing-press testing (manifest discovery, provenance recording, reuse-signal copy). - ce-plan/references/synthesis-summary.md: new shared section with the same principle, allowed/not-allowed lists, variant- specific line drawing (solo stricter than brainstorm-sourced since brainstorm-validated WHAT hasn't constrained scope yet), and four bad-vs-good example pairs from the busyblock testing (timezone source, skip filter integration, reactivation guard, partial cleanup failure response). Both versions share the same test for the agent at runtime: a scanner reading an Inferred bullet should affirm or reject it without needing to read code. If they would have to look up a column name, method name, or call graph to evaluate the bullet, the granularity is wrong — that's plan-body content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…pe, not architecture The busyblock pause-rule scenario surfaced architecture leakage in Phase 2 approach descriptions. The agent's three approaches each named specific implementation surfaces ("two new nullable timestamp columns on syncRules", "RuleMatcher adds one check", "reuse the existing excludeConditions JSON column", "new table keyed by ruleId") rather than mechanism-level distinctions. Phase 2.5 caught and filtered these out, but Phase 2 itself shouldn't produce that level of detail. The shape failure forces the user to make architectural decisions during brainstorming on ce-brainstorm's intentionally-shallow research. ce-plan's research phase goes deeper; architecture decisions should land there, with better data, not be locked in at brainstorm time. ce-brainstorm answers WHAT to build; column names, table names, file paths, service classes, JSON shapes belong in ce-plan. Change: ce-brainstorm SKILL.md Phase 2 ("Explore Approaches") gains an "Approach granularity" rule. Approach descriptions name mechanism-level distinctions ("pause as a rule property" vs "pause as an event filter" vs "pause as a separate entity") and product-relevant trade-offs (plan-tier coupling, complexity surface, migration difficulty). They do not name implementation specifics. The rule explicitly cites why: bringing architecture forward at brainstorm time forces premature commitment, and the synthesis at Phase 2.5 then has to filter out the leak instead of carrying it forward. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sition sentence to the remedy The busyblock requirements doc's Problem Frame ended with a soft proposal-restatement: "A dedicated pause primitive collapses both pains into a single date-range action and removes the human- memory dependency on remembering to re-enable the rule on return." That single transition sentence violates the Summary vs Problem Frame discipline — Problem Frame is supposed to establish the situation and stop on the pain, with the remedy living in Summary. The existing rule said "establishes the situation, the specific moment of pain, and the cost shape — then stops." The agent slipped past it because "then stops" left room for a closing transition sentence. The doc reader hits the proposal twice: once in Summary, once at the end of Problem Frame. That's the exact duplication the discipline is meant to prevent. Change: extends the Problem Frame discipline bullet in requirements-capture.md with an explicit failure-mode example — the busyblock-shaped sentence is shown verbatim as a sign to cut. Adds a positive instruction: if the last paragraph of Problem Frame names what the doc is proposing, cut it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecee14a98e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…lates The earlier commit 274a332 updated the absolute-path rule in references/plan-handoff.md but missed two inline copies of the handoff templates duplicated in ce-plan SKILL.md: - Phase 5.2 confirmation: "Plan written to docs/plans/[filename]" - Phase 5.4 menu question: "Plan ready at `docs/plans/YYYY-MM-DD-NNN-<type>-<name>-plan.md`..." Real-world testing surfaced this: an agent in C1 testing correctly followed SKILL.md (always-loaded layer) and reported a repo-relative path, leaving the path unclickable in the user's terminal. The agent's pushback when challenged ("the skill is internally consistent on repo-relative — no bug") was right about its loaded source, but the loaded source was inconsistent with the reference. The reference plan-handoff.md already had the absolute-path rule and updated placeholder. SKILL.md had the conflicting inline copy. SKILL.md wins at runtime because it loads at session start; the reference loads on-demand and gets overridden by the already-loaded SKILL.md content. Both inline templates now use `<absolute path to plan>`. The brief inline rationale ("use absolute path so the reference is clickable in modern terminals") is intentionally short — fuller context lives in the reference. ce-brainstorm SKILL.md does not have the same duplication (Phase 4 just delegates to references/handoff.md), so no mirror edit needed there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…to-route with accessibility-checked suggestion Two findings from C1 testing (sparse-input solo bug fix on cli-printing-press), bundled because both touch ce-plan SKILL.md and emerged from the same test pass. 1. Problem Frame omission at Lightweight tier ---------------------------------------- The C1 plan doc skipped Problem Frame entirely on a focused bug fix. The agent's instinct was right — Summary already carried the situational context for a one-unit Lightweight plan — but the spec didn't license it. Adding explicit license to omit Problem Frame at Lightweight when Summary covers the situation, so the agent doesn't have to rationalize. Edit: extends the plan template's Problem Frame description to note "Omit entirely at Lightweight tier when Summary already carries the situational context." 2. ce-debug routing: replace auto-route with accessibility-checked suggestion ---------------------------------------- The existing Phase 0.4 rule auto-routed to ce-debug when the prompt was "symptom without a root cause." Two problems with that: a. The auto-route is more aggressive than ce-work's adjacent "suggest alongside continuing" pattern. Asymmetric. b. Auto-routing assumes ce-debug can act, but ce-debug requires the buggy code to be accessible from cwd. Cross-repo bugs, dependency-related bugs, and bugs about repos the user isn't currently in produce useless ce-debug runs because ce-debug can't read the relevant code. Even *suggesting* ce-debug for inaccessible code is worse than not — the user takes the suggestion, ce-debug switches in, produces nothing useful, and trust in the suggestion mechanism erodes. New rule: bug-shaped prompts get ce-debug surfaced as a route-out option alongside continuing with ce-plan, BUT only after a quick accessibility check passes: - No surface named in prompt → assume cwd, surface ce-debug - Surface named matches cwd (files exist locally, named repo matches cwd identity) → surface ce-debug - Surface named clearly doesn't match cwd (different repo, files not found locally) → do NOT surface ce-debug. Stay in ce-plan silently — paper-planning is valid for cross-repo work. Check is conservative — it under-suggests in monorepos, dependency bugs, or after renames. The spec explicitly notes that users can manually invoke /ce-debug when the check misses, accepting under-suggest as the right side to err on. Symmetrizes with the existing ce-work routing pattern (suggest alongside continuing, user decides). No more asymmetric auto-route on bug-shaped prompts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6071ccc1a7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…oss-repo cd K2 testing surfaced that the binary in-cwd-vs-not check was wrong. The previous rule said: bug surface not in cwd → stay in ce-plan silently. In testing, the agent ignored that and did `cd /path/to/other-repo && grep ...` to investigate the named repo from disk. The behavior was helpful (the bug WAS reachable; paper-planning would have been useless) but it was a silent context switch the user did not authorize. The binary rule conflated "different cwd" with "inaccessible." In practice there are three states: 1. In cwd — bug surface is in current repo (named repo matches, or no specific repo named, or named files exist locally). Suggest ce-debug as route-out option. Same as before. 2. On disk but not in cwd — different repo is named, and a quick disk check confirms it's checked out at another local path. This is the case the binary rule got wrong. New rule: ASK the user explicitly with three options (investigate from the other path / paper-plan from current cwd / switch context first). Do NOT silently cd to the other repo and start investigating, even though the code is reachable. The agent's role is to surface the cross-repo signal, not to decide the context unilaterally. 3. Not on disk — named repo isn't found anywhere local. Stay in ce-plan silently and paper-plan. Same as before. State 2 is the new addition. It captures the K2 failure mode specifically and prescribes the right behavior (ASK, don't auto- investigate or auto-stay-silent). The "don't silently cd" anti-pattern is named explicitly so the agent has direct guidance against the K2 behavior pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…default outputs to target repo The previous three-state rule (in 880f9a5) was overcorrected. It treated silent cd to another local repo as a context switch requiring user permission via blocking question. On reflection, that conflates two different concerns: - Reading files at another path: not actually a context switch. The agent reads files via absolute paths all the time. Whether cwd matches the file path is incidental to file access. - Writing outputs (plan doc) to the wrong location: this IS the actual harm. A busyblock bug plan written to `cli-printing-press/docs/plans/` is a discoverability disaster — the user goes to busyblock to act on it and can't find it. The light-touch fix: announce the target repo explicitly before any cross-repo investigation, default plan outputs to the target repo's `docs/plans/`, and let the user interrupt if they want different behavior. No blocking question needed. Changes: - Collapse the previous three states ("in cwd" / "on disk elsewhere ASK" / "not on disk") into two ("reachable" / "unreachable"). Reachable surfaces ce-debug as option; not- reachable stays in ce-plan silently for paper-planning. - Add explicit announcement requirement when the bug is at another local path: name the path being read AND the default plan output destination (target repo's `docs/plans/`) before any investigation. - State the actual harm explicitly: silent investigation isn't the problem; silent operation on the wrong repo IS, especially output destination. - User can interrupt to redirect: write plan in cwd, switch context first, paper-plan only. The blocking-question requirement was friction that didn't match the underlying concern. Announce-and-proceed is the right shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d40aa653d5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…arate ce-debug routing K2 retest revealed two more issues with the previous cross-repo rule: 1. The 3-option menu I proposed (proceed / paper-plan from cwd / switch context) merged two orthogonal decisions into one question. Location ("where outputs land") and ce-debug routing ("which skill") are different axes; users picking "Switch to /ce-debug" weren't told whether ce-debug operates on the target repo or current cwd, leaving the option underspecified. 2. The "paper-plan from current cwd" option doesn't cleanly map to a real workflow. Cases that might want it (think abstractly, capture for later, no-investigation plan) are better served by /ce-brainstorm, an issue ticket, or paper- planning to the TARGET repo. Producing a plan-for-busyblock in cli-printing-press/docs/plans/ is a discoverability disaster — the same harm we built the announcement to prevent. Simpler design that emerged in conversation: - Drop the location menu entirely. The announcement makes the cross-repo nature visible; the user can interrupt if they want unusual behavior. No need to enumerate options that have thin or contradictory use cases. - Default behavior: proceed from target repo for both investigation and plan output. Respects the user's stated intent (they named that repo) without requiring confirmation. - After announcing + proceeding, fire the standard ce-debug routing menu — same shape as in-cwd case. Cross-repo location and skill routing are explicitly orthogonal. Net effect: cleaner UX, less ceremony, same protection against the actual harm (silent cross-repo operation, mis-filed plan). The cross-repo case now mostly behaves like the in-cwd case plus an announcement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…thoring AGENTS.md This branch's testing-driven tightening surfaced design lessons worth capturing for future skill edits. Adds five principles to the plugin's authoring AGENTS.md (which guides contributors editing skills, not runtime agent behavior — that's already correctly excluded per the "Runtime vs Authoring Context" rule). Principles distilled from this session's testing: 1. Calibrate prescription to the failure mode — three levels: hard rules for deterministic safety; strong guidance with examples for judgment; trust where prescription would harm. Lean toward less prescription when in doubt. 2. SKILL.md caches at session start; references load on demand. Load-bearing rules need strong language at the top of SKILL.md phase, not just in references. SKILL.md and references that share rules must be updated together to avoid drift. 3. Split orthogonal decisions into sequential questions. Don't conflate location with skill routing, or other multi-axis questions, into a single menu — options become underspecified. 4. Process exhaust stays out of artifacts. Phase capture notes, "next steps" pointers to other skills, mode markers — don't leak engineering process into user-facing docs. 5. Test the spec by running it. Real-world tests reveal failure modes desk review misses. Before tightening: ask whether the agent's behavior was actually wrong, whether SKILL.md and references drifted apart, and whether this is load-reliability vs rule-content failure. Sometimes the fix is to loosen, not to tighten. These were earned from the testing-driven tightening on this branch — the load-reliability fix (Phase 2.5 strong-language load instruction), the SKILL.md vs reference inconsistency on absolute paths in handoff, the cross-repo K2 menu conflation that we then split into sequential decisions, and the broader move from over-prescriptive auto-route rules to announce-and- proceed defaults. Authoring AGENTS.md only — does not ship with the installed plugin per the existing "Runtime vs Authoring Context" rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ficial-prescription cases explicitly The first pass of the Skill Design Principles section had a one-sided closing bias ("when in doubt, lean toward less prescription") that under-sold how often prescription IS the right call in this plugin's actual work. Real audit: this branch alone uses prescription in many load-bearing places — strong-language load instructions, the three-bucket synthesis structure, decision-level granularity with bad-vs-good examples, Summary vs Problem Frame discipline, "no fourth status" rule, Phase 3.7 anti-expansion, "re-present after revision," "absolute paths in chat handoff," "no silent cd to other repos." Most of these are essential, not optional. Changes: - Add concrete examples of beneficial prescription from this plugin to the "hard rules" and "strong guidance" levels. Future contributors can see what the levels actually look like in practice. - Replace the one-sided "lean toward less prescription" bias with a balanced framing: match the level to the failure mode in both directions. Both over-prescription and under- prescription have real failure modes, and the plugin's actual prescription pattern is closer to "match precisely" than "lean loose." - Add a concrete diagnostic: can you name a specific bad outcome the prescription prevents? If yes, it's justified. If no, lean toward trust. This gives contributors a sharper question than "do I feel like this is too prescriptive." Net: the calibration ladder remains the same shape, but the guidance is balanced and reflects the actual prescription profile this plugin uses successfully. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ed example lists Previous edit added 5+5 enumerated examples per level which weren't asked for and bloated the section. The actual fix needed was just balancing the closing bias so contributors don't read it as "never be prescriptive." Restore one example per level (concise), keep the balanced framing ("match the level to the failure mode in both directions"), keep the diagnostic question ("can you name a specific bad outcome the prescription prevents?"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0a9d6ac74

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ns, fast-path announce-mode, plus 8 alignment fixes PR #705 review surfaced gaps from the testing-driven tightening that shipped earlier on this branch. Most notable: a philosophical reframe of what "headless mode" means. The big reframe — headless = non-interactive, not unaudited --------------------------------------------------------- Earlier commits on this branch landed "skip Phase 2.5 / 0.7 / 5.1.5 entirely in headless." Justification: process-exhaust principle. But this conflated two concepts: - Headless mode means no synchronous user during the run. - It does NOT mean no human ever reviews the artifact — ce-doc-review, ce-plan, and human PR reviewers all read the doc later. When we skip synthesis composition entirely, the artifact has no way to surface which decisions were user-stated vs agent- inferred. Un-validated agent bets propagate as authoritative decisions, indistinguishable from confirmed scope. New shape: - Compose the synthesis in non-interactive mode (forcing function preserved). - Stated → Requirements (user's actual constraints). - Out-of-scope → Scope Boundaries. - Inferred → new `## Assumptions` section, explicitly labeled as un-validated agent bets that downstream review must scrutinize. Do NOT route Inferred to Key Technical Decisions in non-interactive mode — that hides un-validated bets as authoritative content. The `## Assumptions` section appears in non-interactive docs only. Interactive docs are unchanged: Inferred bets get user- corrected in chat and either become Key Technical Decisions or are revised away. This restores the original design's intent (un-validated bets must not propagate as authoritative content) but surfaces them under their own label rather than hiding them. The Phase 0.2 fast path — announce-mode ---------------------------------------- Codex pointed out Phase 0.2's "requirements already clear" fast path goes straight to Phase 3, bypassing Phase 2.5 entirely. Fix: fast path now routes through Phase 2.5 in announce-mode — emit the synthesis for visibility, then proceed to Phase 3 without blocking. User can interrupt if they spot a wrong inference. Preserves visibility on clear-input cases without adding interaction cost. Eight alignment fixes --------------------- - Add `Summary` to ce-doc-review's framing-section list (premise- chain root detector now recognizes the new heading; `Overview` retained as legacy). - Distinguish per-variant timing in plan-side headless guidance: solo Phase 0.7 fires before research; brainstorm-sourced Phase 5.1.5 fires after research. Earlier text incorrectly said "directly to plan-write" for both. - Rename `## Key Decisions` to `## Key Technical Decisions` in plan-side reference's doc-shape routing table to match the canonical plan template heading. - Tighten plan-side orientation: don't list "exact method signatures, JSON schemas" as plan-body content — Planning Rules (Phase 4.3) explicitly forbid those. - Tighten brainstorm-side orientation: implementation detail goes to ce-plan, not the requirements doc. Earlier text said impl detail "belongs IN the doc" but requirements-capture.md forbids it. - Add headless default for ce-debug routing: skip the suggestion menu, default to continuing with /ce-plan. No synchronous user to resolve the route-out choice. - Update visual-communication.md: stale "Problem/Overview" reference now reads "Summary or Problem Frame" with `Overview` noted as legacy. - Clarify AGENTS.md "process exhaust" principle: distinguish exhaust (agent bookkeeping) from audit content (downstream readers need it to evaluate the artifact). The `## Assumptions` section in non-interactive mode is audit content, not exhaust. Threads addressed: 10 (all unresolved review threads). Next: needs end-to-end test in fresh sessions to validate the Assumptions-routing and announce-mode behaviors in practice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16d97ef367

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…dary to announce-mode Two threads from PR #705 review pass 2: 1. ce-plan/references/synthesis-summary.md had a stale orientation line saying "Both skip entirely in headless mode" while the detailed Headless section directly below specified compose + route to Assumptions. Same file, contradictory guidance — would make automated behavior nondeterministic. Aligned the orientation with the detail block. 2. Announce-mode (Phase 0.2 fast path → Phase 2.5) emitted synthesis and immediately fired the Write tool in the same turn. In Claude Code's streaming UX, the user has no real interruption window between synthesis emission and doc-write — the Esc-during-stream theoretical interrupt is fragile in practice. Codex pointed out this undermines Phase 2.5's stated purpose as the final pre-write scope checkpoint. Resolution: announce-mode now emits synthesis and ends the turn (no Write tool call in the same turn). On the user's next message: any acknowledgment proceeds to doc-write; any correction triggers synthesis revision. Lighter than full Phase 2.5 (no AskUserQuestion menu, no formal confirm) but gives the user a real interruption window before the doc lands. ce-brainstorm sits early in the workflow — a wrong doc feeds downstream into ce-plan and implementation — so the turn boundary is justified even on the fast path. Both fixes align with the Skill Design Principles section in AGENTS.md: hard rules for deterministic safety where the failure mode justifies it (announce-mode without a turn boundary fails to provide a real correction window in practice; the turn boundary is the safety condition). Threads addressed: 2 of 2 new threads. Cross-invocation signal fired on this run (10 prior threads from round 1 are visible to the cross-invocation analysis); no clusters formed because the two new threads sit in different subtrees. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…d/Deep Calibration call surfaced during PR #705 testing: a Lightweight synthesis with 16 bullets across three buckets is qualitatively different from a Lightweight synthesis with 2 bullets. The "Skip for Lightweight when bullets ARE the summary" rule was being applied to both, but only the second is the case the rule was designed for. Result: detail-rich Lightweight syntheses (e.g., a single-flag addition with 7 Stated requirements) shipped without a prose gestalt. Reading 16 bullets without a 1-3 line "what's the gist?" forces the reader to construct framing themselves. New rule: prose summary required for ALL tiers. Skip only for truly-trivial cases where the synthesis is ≤ 2 bullets that echo the prompt (e.g., "fix the typo on line 47" producing a synthesis of "Stated: fix the typo on line 47"). Updated: - ce-brainstorm/references/synthesis-summary.md: prose summary discipline + prompt template placeholder - ce-plan/references/synthesis-summary.md: same for both solo and brainstorm-sourced variants - ce-brainstorm/references/requirements-capture.md: section matrix Lightweight column - ce-plan/SKILL.md plan template: Summary description The "When the synthesis would be redundant" section in the brainstorm reference is unchanged — it still describes the truly-trivial single-paragraph case correctly. Per AGENTS.md Skill Design Principles, this tightens a rule that had a real failure mode (rote-feeling syntheses without a gestalt) without over-prescribing — there's still a truly-trivial escape hatch where the rule would create padding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f59b3d48ca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… Summary alignment Two stale lines surfaced by Codex review pass 3 — both contradictions I introduced and didn't sweep fully on the previous commits: 1. ce-brainstorm/references/synthesis-summary.md line 9 still said "Skip Phase 2.5 entirely in headless mode" while the detailed Headless mode section in the same file requires composing the synthesis and routing Inferred to `## Assumptions`. Same pattern I fixed on the plan side in eef4a69 but missed on the brainstorm side. Now aligned: "Phase 2.5 still fires — synthesis composed but not user-confirmed; Inferred bets route to `## Assumptions`." 2. ce-brainstorm/references/requirements-capture.md line 41 said Lightweight may omit Summary "when bullets are the summary" — contradicting the section matrix above (updated in f59b3d4 to require Summary across all tiers, with only the truly-trivial escape). Now aligned: matches the matrix's "skip only when synthesis ≤ 2 bullets that echo the prompt" framing. Pattern: my edits in commits eef4a69 and f59b3d4 each caught the primary location but didn't sweep all sibling locations referencing the same rule. Codex's incremental reviews are now my safety net catching these. Worth noting in AGENTS.md as a reminder for future spec edits — when changing a rule, grep for ALL co-located restatements of it, not just the headline location. Threads addressed: 2 of 2 new (third review pass on PR #705). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…havioral-conditional requirements Previous trigger criterion ("include when behavior is hard to pin down without a concrete scenario") was judgment-based and produced real variance: same `--quiet` flag synthesis, two test runs, one agent generated 4 AEs and another generated 0. Both were defensible reads of the spec. The asymmetric failure mode favors more-inclusion: under-inclusion makes downstream planners invent missing context; over-inclusion is just ceremony. Per AGENTS.md "match prescription to failure mode," tightening here is justified. New rule: AEs are REQUIRED for behavioral-conditional requirements (any "When X, Y" or "If X, Y" framing) regardless of tier — even Lightweight. Conditional framing signals state-dependent behavior where prose alone leaves implicit ambiguity (e.g., "When --quiet is set, errors continue to surface" — does that cover warnings? binary errors? AE pins it down). Non-conditional requirements remain triggered (Standard/Deep) or omit-unless-triggered (Lightweight) per existing rules. The section is still not exhaustive — AEs cover ambiguity, not every R-ID. Updated: - requirements-capture.md section matrix: Acceptance Examples row - requirements-capture.md trigger criterion paragraph Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…oc-write (EveryInc#705) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tmchow and others added 8 commits April 26, 2026 21:35

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-brainstorm/references/synthesis-summary.md Outdated

Comment thread plugins/compound-engineering/skills/ce-plan/references/synthesis-summary.md Outdated

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/SKILL.md

Comment thread plugins/compound-engineering/skills/ce-plan/references/synthesis-summary.md Outdated

tmchow and others added 4 commits April 27, 2026 12:50

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/references/synthesis-summary.md Outdated

tmchow and others added 2 commits April 27, 2026 14:02

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/references/synthesis-summary.md Outdated

Comment thread plugins/compound-engineering/skills/ce-brainstorm/references/synthesis-summary.md Outdated

tmchow and others added 2 commits April 27, 2026 16:54

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-brainstorm/SKILL.md Outdated

tmchow and others added 4 commits April 27, 2026 17:05

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/SKILL.md

Comment thread plugins/compound-engineering/skills/ce-plan/SKILL.md

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/references/synthesis-summary.md Outdated

Comment thread plugins/compound-engineering/skills/ce-brainstorm/SKILL.md Outdated

tmchow and others added 2 commits April 27, 2026 20:00

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-brainstorm/references/synthesis-summary.md Outdated

Comment thread plugins/compound-engineering/skills/ce-brainstorm/references/requirements-capture.md Outdated

tmchow and others added 2 commits April 27, 2026 20:16

tmchow changed the title ~~feat(skills): surface agent's scope synthesis before doc-write~~ feat(ce-brainstorm,ce-plan): surface agent's scope synthesis before doc-write Apr 28, 2026

tmchow merged commit 41e7f72 into main Apr 28, 2026
2 checks passed

github-actions Bot mentioned this pull request Apr 27, 2026

chore: release main #703

Merged

Conversation

tmchow commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phases added

Testing-driven tightening

Skill Design Principles (captured for future contributors)

Design decisions worth scrutiny

Deferred work

Test plan

References

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmchow commented Apr 27, 2026 •

edited

Loading