feat(ce-ideate): subject gate, surprise-me, and warrant contract#671
feat(ce-ideate): subject gate, surprise-me, and warrant contract#671
Conversation
… runtime The plugin is distributed and installed into end-user environments, where skills run against the user's AGENTS.md, not this repo's. Runtime guidance for skills must live inside SKILL.md or references/, never in plugin or repo AGENTS.md files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…based generation
Phase 0 restructured around what downstream agents actually need:
- 0.2 Subject-Identification Gate asks one scope question ("Specify a
subject / Surprise me / Cancel") when the subject is ambiguous,
preserving greenfield intent and keeping "Surprise me" as a real
first-class option. Questions are only about what to ideate on,
never about solution direction, constraints, or characterization -
those belong to ce-brainstorm.
- 0.3 Mode Classification drives dispatch routing only; active
confirmation fires only when mode is ambiguous after subject is
known. Prescribed correction phrases ("say X to switch") removed
in favor of plain mode statements.
- 0.4 Context-Substance Gate asks for URL/paste/description in
elsewhere modes when Phase 1 agents would otherwise have nothing
substantive to work with. Scoped to substance, not subject
characterization.
- 0.5 Focus Modulation detects tactical scope signals (polish, typos,
quick wins) and lowers the Phase 2 ambition floor accordingly.
Surprise-me promoted to a first-class mode. Phase 1 produces richer
grounding when invoked (representative file sampling, recent activity,
issue themes as first-class input). Phase 2 sub-agents discover their
own subjects per frame - different frames finding different subjects
is the feature. Cross-cutting synthesis is the magic layer and expects
5-8 combinations rather than the 3-5 of specified mode.
Universal warrant contract across all frames, all modes: every idea
articulates tagged warrant (direct: for quoted evidence, external:
for cited prior art, reasoned: for written-out first-principles
argument). Unjustified speculation does not surface. Meeting-test
floor applies as default; subject identity preserved (no
subject-replacement ideas). Post-ideation filtering gains
warrant-integrity rejection criteria; artifact template and Phase 4
presentation surface Warrant: explicitly.
universal-ideation (non-software elsewhere mode) inherits the same
contract. Its discrimination-test paragraph now cross-references the
SKILL.md Phase 0.2 questioning principles instead of leaving
intake-scope undefined.
Subject-identification uses model judgment over word lists:
vagueness is about what words refer to (a generic catch-all quality)
rather than phrase length. Short phrases that plausibly name a
feature or concept (browser sniff, dark mode, cache invalidation)
default to identifiable; a cheap repo check resolves genuine
ambiguity without triggering the gate unnecessarily.
Scratch relocated from \${TMPDIR:-/tmp} to /tmp directly.
/var/folders/... on macOS is hostile UX for users who want to
inspect checkpoints or copy them out; per-user isolation was not
valuable for ephemeral ideation scratch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ratch $TMPDIR on macOS resolves to /var/folders/64/.../T/, which is hostile for users who want to inspect scratch checkpoints, grep them, or copy them out. The per-user isolation $TMPDIR provides is not valuable for cross-invocation reusable scratch where users are the intended audience. /tmp is writable on macOS, Linux, and WSL and is universally accessible. Per-run throwaway continues to use mktemp -d, which resolves to $TMPDIR on macOS - correct for throwaway files that are never meant to be user-accessible. Updates the repo-root Scratch Space convention in AGENTS.md and brings ce-demo-reel into compliance (Python default_dir + upload options doc). ce-ideate was already updated in the prior commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4b2421759f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| **Decision 1 — repo-grounded vs elsewhere.** Weigh prompt content first, topic-repo coherence second, and CWD repo presence as supporting evidence only. | ||
|
|
||
| - Positive signals for **repo-grounded**: prompt references repo files, code, architecture, modules, tests, or workflows; topic is clearly bounded by the current codebase. | ||
| - Positive signals for **repo-grounded**: prompt references repo files, code, architecture, modules, tests, or workflows; topic is clearly bounded by the current codebase. Issue-tracker intent from 0.2 is always repo-grounded. |
There was a problem hiding this comment.
Do not force issue-tracker prompts into repo mode
This line makes any run flagged as issue-tracker intent in 0.2 automatically repo-grounded, even though 0.2 executes before mode classification and can match generic prompts like “GitHub issues” that refer to an external project. In those cases, Phase 1 will run codebase/issue intelligence against the current repo (or fail due missing remote/auth) instead of the user’s intended target, which breaks routing for a valid class of requests that previously stayed outside repo mode.
Useful? React with 👍 / 👎.
| - **Repo mode surprise-me:** the codebase-scan sub-agent samples a few representative files per top-level area (not just reads the top-level layout + AGENTS.md), surfaces recent PR/commit activity as signal about what's actively being worked on, and — when issue intelligence runs — passes issue themes as first-class input rather than footnote. Keep the scan bounded: representative, not exhaustive. | ||
| - **Elsewhere mode surprise-me:** user-context synthesis extracts themes, recurring language, tensions, and omissions from whatever the user supplied, rather than just restating it. Web research broadens beyond narrow prior-art for a single subject toward the domain's landscape. |
There was a problem hiding this comment.
Make surprise-me grounding instructions executable
This adds a requirement to sample representative files and surface recent PR/commit activity in repo surprise-me runs, but the actual quick-scan dispatch prompt below still instructs the sub-agent to read only top-level docs/layout and avoid deeper search. Because the dispatched prompt is what controls behavior, surprise-me grounding cannot reliably produce the richer signals this section now requires, so the new mode degrades to shallow scans.
Useful? React with 👍 / 👎.
Summary
/ce-ideateno longer burns ~9 agents on a wrong interpretation of a bare prompt: when the subject is ambiguous (improvements,ideas, an empty prompt) the skill asks one scope question and offers "Surprise me — let the agent decide what to focus on" as a first-class option. Every surviving idea now carries tagged warrant (direct:,external:, orreasoned:) so the reader can trace the leap back to specific evidence, prior art, or a first-principles argument — filtering rejects unjustified speculation and subject-replacement moves outright. Three commits bundle the ce-ideate redesign, a plugin-authoring note surfaced by that work (runtime vs. authoring context for AGENTS.md), and a scratch-path convention change applied consistently to ce-ideate and ce-demo-reel in the same pass.ce-ideate: what changed
Phase 0 rebuilt around what downstream agents actually need. The old Phase 0 classified mode first and only sometimes asked a question when mode confidence was low; the new Phase 0 asks about subject first, then about grounding substance, then about mode only when it genuinely stays ambiguous:
browser sniffanddark modedefault to identifiable;improvementsandideastrigger the gate. Always offers "Surprise me" as a real option.polish,typos,quick wins) and lowers the Phase 2 ambition floor for this run.Surprise-me promoted to a first-class mode. Phase 1 produces richer material when invoked — representative file sampling per top-level area, recent PR/commit activity as signal, issue themes passed as first-class input rather than footnote. Phase 2 sub-agents each discover their own subject through their frame's lens (different frames finding different subjects is the feature). Cross-cutting synthesis becomes the magic layer and expects 5-8 combinations vs. 3-5 in specified mode.
Universal warrant contract. Every idea articulates tagged warrant:
direct:for quoted evidence from the repo, docs, issues, or user-supplied contextexternal:for cited prior art or domain research with a sourcereasoned:for explicit first-principles argument (written out, not gestured at)Warrant is required, not optional. Filtering rejects any idea that lacks warrant, whose warrant does not support the claimed move, or that replaces the subject rather than operating on it. Meeting-test applies as default ambition floor, relaxed only when Phase 0.5 detected tactical focus. The artifact template and Phase 4 presentation surface
Warrant:explicitly so users see the chain from their reality to the leap.Scratch paths: /tmp directly
$TMPDIRon macOS resolves to/var/folders/64/.../T/, which is hostile UX for users who want to inspect scratch checkpoints, grep them, or copy them out. Per-run throwaway (mktemp -d) continues to use$TMPDIR— those files are never meant to be user-accessible. Cross-invocation reusable scratch (ce-ideate'sraw-candidates.md,survivors.md, V15 web-research cache; ce-demo-reel's--output-dir) now lives at/tmp/compound-engineering/<skill>/directly. The repo-rootAGENTS.mdScratch Space convention and cross-platform note updated to match.Plugin AGENTS.md: runtime vs. authoring
Plugin-level
AGENTS.mdandCLAUDE.mdare authoring context — they do not ship with the installed plugin. Skills run against the user'sAGENTS.md, not this repo's. Runtime guidance for skills must live inSKILL.mdor files underreferences/, never in plugin or repoAGENTS.mdfiles. Surfaced during the ce-ideate design (early drafts placed shared runtime principles in AGENTS.md, where they would have been invisible to the installed skill at runtime). Noting it so future plugin edits don't repeat the mistake.Test plan
Validated live in two end-to-end runs on a separate repo (
printing-press):/ce-ideate browser sniff→ 7 survivors, every one with tagged warrant citing specific files or plan documents; ambition floor held; subject identity preserved; rejections included appropriate warrant-integrity cuts ("HAR-input adapters — tactical; no current catalog evidence," "SaaS hosted — solo project; no multi-user signal in grounding").Full
bun testpasses (905 tests, 0 failures).bun run release:validateclean.