feat(ce-ideate): subject gate, surprise-me, and warrant contract by tmchow · Pull Request #671 · EveryInc/compound-engineering-plugin

tmchow · 2026-04-24T09:03:45Z

Summary

/ce-ideate no longer burns ~9 agents on a wrong interpretation of a bare prompt: when the subject is ambiguous (improvements, ideas, an empty prompt) the skill asks one scope question and offers "Surprise me — let the agent decide what to focus on" as a first-class option. Every surviving idea now carries tagged warrant (direct:, external:, or reasoned:) so the reader can trace the leap back to specific evidence, prior art, or a first-principles argument — filtering rejects unjustified speculation and subject-replacement moves outright. Three commits bundle the ce-ideate redesign, a plugin-authoring note surfaced by that work (runtime vs. authoring context for AGENTS.md), and a scratch-path convention change applied consistently to ce-ideate and ce-demo-reel in the same pass.

ce-ideate: what changed

Phase 0 rebuilt around what downstream agents actually need. The old Phase 0 classified mode first and only sometimes asked a question when mode confidence was low; the new Phase 0 asks about subject first, then about grounding substance, then about mode only when it genuinely stays ambiguous:

0.2 Subject-Identification Gate (all modes) asks one scope question when the subject is ambiguous. Model applies judgment to what the words refer to (a generic catch-all vs. a named feature), not phrase length. browser sniff and dark mode default to identifiable; improvements and ideas trigger the gate. Always offers "Surprise me" as a real option.
0.3 Mode Classification drives dispatch routing only; active confirmation fires only when mode stays ambiguous after subject is known. Prescribed correction phrases ("say 'actually this is outside the repo' to switch") removed in favor of plain mode statements.
0.4 Context-Substance Gate (elsewhere modes only) asks for URL/paste/description when Phase 1 agents would otherwise have nothing to synthesize. Scoped to substance, never subject characterization.
0.5 Focus Modulation detects tactical scope signals (polish, typos, quick wins) and lowers the Phase 2 ambition floor for this run.

Surprise-me promoted to a first-class mode. Phase 1 produces richer material when invoked — representative file sampling per top-level area, recent PR/commit activity as signal, issue themes passed as first-class input rather than footnote. Phase 2 sub-agents each discover their own subject through their frame's lens (different frames finding different subjects is the feature). Cross-cutting synthesis becomes the magic layer and expects 5-8 combinations vs. 3-5 in specified mode.

Universal warrant contract. Every idea articulates tagged warrant:

direct: for quoted evidence from the repo, docs, issues, or user-supplied context
external: for cited prior art or domain research with a source
reasoned: for explicit first-principles argument (written out, not gestured at)

Warrant is required, not optional. Filtering rejects any idea that lacks warrant, whose warrant does not support the claimed move, or that replaces the subject rather than operating on it. Meeting-test applies as default ambition floor, relaxed only when Phase 0.5 detected tactical focus. The artifact template and Phase 4 presentation surface Warrant: explicitly so users see the chain from their reality to the leap.

Scratch paths: /tmp directly

$TMPDIR on macOS resolves to /var/folders/64/.../T/, which is hostile UX for users who want to inspect scratch checkpoints, grep them, or copy them out. Per-run throwaway (mktemp -d) continues to use $TMPDIR — those files are never meant to be user-accessible. Cross-invocation reusable scratch (ce-ideate's raw-candidates.md, survivors.md, V15 web-research cache; ce-demo-reel's --output-dir) now lives at /tmp/compound-engineering/<skill>/ directly. The repo-root AGENTS.md Scratch Space convention and cross-platform note updated to match.

Plugin AGENTS.md: runtime vs. authoring

Plugin-level AGENTS.md and CLAUDE.md are authoring context — they do not ship with the installed plugin. Skills run against the user's AGENTS.md, not this repo's. Runtime guidance for skills must live in SKILL.md or files under references/, never in plugin or repo AGENTS.md files. Surfaced during the ce-ideate design (early drafts placed shared runtime principles in AGENTS.md, where they would have been invisible to the installed skill at runtime). Noting it so future plugin edits don't repeat the mistake.

Test plan

Validated live in two end-to-end runs on a separate repo (printing-press):

Specified mode — /ce-ideate browser sniff → 7 survivors, every one with tagged warrant citing specific files or plan documents; ambition floor held; subject identity preserved; rejections included appropriate warrant-integrity cuts ("HAR-input adapters — tactical; no current catalog evidence," "SaaS hosted — solo project; no multi-user signal in grounding").
Surprise-me — bare prompt → "Surprise me" selected → 7 survivors spanning 7 distinct subjects (measurement infrastructure, IR + fingerprint lattice, retros as temporal knowledge, preflight/security, drift detection, MCP as contract, hazard classification). External warrants included FDA bioequivalence, IBC §307 building code, Kahl 2021 BirdNET abstention, Cloudflare Code Mode benchmarks. Cross-cutting synthesis produced 8 additional combinations (the surprise-me target).

Full bun test passes (905 tests, 0 failures). bun run release:validate clean.

… runtime The plugin is distributed and installed into end-user environments, where skills run against the user's AGENTS.md, not this repo's. Runtime guidance for skills must live inside SKILL.md or references/, never in plugin or repo AGENTS.md files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…based generation Phase 0 restructured around what downstream agents actually need: - 0.2 Subject-Identification Gate asks one scope question ("Specify a subject / Surprise me / Cancel") when the subject is ambiguous, preserving greenfield intent and keeping "Surprise me" as a real first-class option. Questions are only about what to ideate on, never about solution direction, constraints, or characterization - those belong to ce-brainstorm. - 0.3 Mode Classification drives dispatch routing only; active confirmation fires only when mode is ambiguous after subject is known. Prescribed correction phrases ("say X to switch") removed in favor of plain mode statements. - 0.4 Context-Substance Gate asks for URL/paste/description in elsewhere modes when Phase 1 agents would otherwise have nothing substantive to work with. Scoped to substance, not subject characterization. - 0.5 Focus Modulation detects tactical scope signals (polish, typos, quick wins) and lowers the Phase 2 ambition floor accordingly. Surprise-me promoted to a first-class mode. Phase 1 produces richer grounding when invoked (representative file sampling, recent activity, issue themes as first-class input). Phase 2 sub-agents discover their own subjects per frame - different frames finding different subjects is the feature. Cross-cutting synthesis is the magic layer and expects 5-8 combinations rather than the 3-5 of specified mode. Universal warrant contract across all frames, all modes: every idea articulates tagged warrant (direct: for quoted evidence, external: for cited prior art, reasoned: for written-out first-principles argument). Unjustified speculation does not surface. Meeting-test floor applies as default; subject identity preserved (no subject-replacement ideas). Post-ideation filtering gains warrant-integrity rejection criteria; artifact template and Phase 4 presentation surface Warrant: explicitly. universal-ideation (non-software elsewhere mode) inherits the same contract. Its discrimination-test paragraph now cross-references the SKILL.md Phase 0.2 questioning principles instead of leaving intake-scope undefined. Subject-identification uses model judgment over word lists: vagueness is about what words refer to (a generic catch-all quality) rather than phrase length. Short phrases that plausibly name a feature or concept (browser sniff, dark mode, cache invalidation) default to identifiable; a cheap repo check resolves genuine ambiguity without triggering the gate unnecessarily. Scratch relocated from \${TMPDIR:-/tmp} to /tmp directly. /var/folders/... on macOS is hostile UX for users who want to inspect checkpoints or copy them out; per-user isolation was not valuable for ephemeral ideation scratch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ratch $TMPDIR on macOS resolves to /var/folders/64/.../T/, which is hostile for users who want to inspect scratch checkpoints, grep them, or copy them out. The per-user isolation $TMPDIR provides is not valuable for cross-invocation reusable scratch where users are the intended audience. /tmp is writable on macOS, Linux, and WSL and is universally accessible. Per-run throwaway continues to use mktemp -d, which resolves to $TMPDIR on macOS - correct for throwaway files that are never meant to be user-accessible. Updates the repo-root Scratch Space convention in AGENTS.md and brings ce-demo-reel into compliance (Python default_dir + upload options doc). ce-ideate was already updated in the prior commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b2421759f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-24T09:08:38Z

 **Decision 1 — repo-grounded vs elsewhere.** Weigh prompt content first, topic-repo coherence second, and CWD repo presence as supporting evidence only.

- Positive signals for **repo-grounded**: prompt references repo files, code, architecture, modules, tests, or workflows; topic is clearly bounded by the current codebase.
+- Positive signals for **repo-grounded**: prompt references repo files, code, architecture, modules, tests, or workflows; topic is clearly bounded by the current codebase. Issue-tracker intent from 0.2 is always repo-grounded.


Do not force issue-tracker prompts into repo mode

This line makes any run flagged as issue-tracker intent in 0.2 automatically repo-grounded, even though 0.2 executes before mode classification and can match generic prompts like “GitHub issues” that refer to an external project. In those cases, Phase 1 will run codebase/issue intelligence against the current repo (or fail due missing remote/auth) instead of the user’s intended target, which breaks routing for a valid class of requests that previously stayed outside repo mode.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-24T09:08:38Z

+- **Repo mode surprise-me:** the codebase-scan sub-agent samples a few representative files per top-level area (not just reads the top-level layout + AGENTS.md), surfaces recent PR/commit activity as signal about what's actively being worked on, and — when issue intelligence runs — passes issue themes as first-class input rather than footnote. Keep the scan bounded: representative, not exhaustive.
+- **Elsewhere mode surprise-me:** user-context synthesis extracts themes, recurring language, tensions, and omissions from whatever the user supplied, rather than just restating it. Web research broadens beyond narrow prior-art for a single subject toward the domain's landscape.


Make surprise-me grounding instructions executable

This adds a requirement to sample representative files and surface recent PR/commit activity in repo surprise-me runs, but the actual quick-scan dispatch prompt below still instructs the sub-agent to read only top-level docs/layout and avoid deeper search. Because the dispatched prompt is what controls behavior, surprise-me grounding cannot reliably produce the richer signals this section now requires, so the new mode degrades to shallow scans.

Useful? React with 👍 / 👎.

tmchow and others added 3 commits April 24, 2026 01:54

tmchow merged commit 6514b1f into main Apr 24, 2026
2 checks passed

github-actions Bot mentioned this pull request Apr 24, 2026

chore: release main #661

Merged

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 24, 2026

chore: release main #680

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ce-ideate): subject gate, surprise-me, and warrant contract#671

feat(ce-ideate): subject gate, surprise-me, and warrant contract#671
tmchow merged 3 commits intomainfrom
tmchow/ce-ideate-clarify-q

tmchow commented Apr 24, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		- Repo mode surprise-me: the codebase-scan sub-agent samples a few representative files per top-level area (not just reads the top-level layout + AGENTS.md), surfaces recent PR/commit activity as signal about what's actively being worked on, and — when issue intelligence runs — passes issue themes as first-class input rather than footnote. Keep the scan bounded: representative, not exhaustive.
		- Elsewhere mode surprise-me: user-context synthesis extracts themes, recurring language, tensions, and omissions from whatever the user supplied, rather than just restating it. Web research broadens beyond narrow prior-art for a single subject toward the domain's landscape.

Conversation

tmchow commented Apr 24, 2026

Summary

ce-ideate: what changed

Scratch paths: /tmp directly

Plugin AGENTS.md: runtime vs. authoring

Test plan

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant