Skip to content

feat(ce-ideate): subject gate, surprise-me, and warrant contract#671

Merged
tmchow merged 3 commits intomainfrom
tmchow/ce-ideate-clarify-q
Apr 24, 2026
Merged

feat(ce-ideate): subject gate, surprise-me, and warrant contract#671
tmchow merged 3 commits intomainfrom
tmchow/ce-ideate-clarify-q

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Apr 24, 2026

Summary

/ce-ideate no longer burns ~9 agents on a wrong interpretation of a bare prompt: when the subject is ambiguous (improvements, ideas, an empty prompt) the skill asks one scope question and offers "Surprise me — let the agent decide what to focus on" as a first-class option. Every surviving idea now carries tagged warrant (direct:, external:, or reasoned:) so the reader can trace the leap back to specific evidence, prior art, or a first-principles argument — filtering rejects unjustified speculation and subject-replacement moves outright. Three commits bundle the ce-ideate redesign, a plugin-authoring note surfaced by that work (runtime vs. authoring context for AGENTS.md), and a scratch-path convention change applied consistently to ce-ideate and ce-demo-reel in the same pass.

ce-ideate: what changed

Phase 0 rebuilt around what downstream agents actually need. The old Phase 0 classified mode first and only sometimes asked a question when mode confidence was low; the new Phase 0 asks about subject first, then about grounding substance, then about mode only when it genuinely stays ambiguous:

  • 0.2 Subject-Identification Gate (all modes) asks one scope question when the subject is ambiguous. Model applies judgment to what the words refer to (a generic catch-all vs. a named feature), not phrase length. browser sniff and dark mode default to identifiable; improvements and ideas trigger the gate. Always offers "Surprise me" as a real option.
  • 0.3 Mode Classification drives dispatch routing only; active confirmation fires only when mode stays ambiguous after subject is known. Prescribed correction phrases ("say 'actually this is outside the repo' to switch") removed in favor of plain mode statements.
  • 0.4 Context-Substance Gate (elsewhere modes only) asks for URL/paste/description when Phase 1 agents would otherwise have nothing to synthesize. Scoped to substance, never subject characterization.
  • 0.5 Focus Modulation detects tactical scope signals (polish, typos, quick wins) and lowers the Phase 2 ambition floor for this run.

Surprise-me promoted to a first-class mode. Phase 1 produces richer material when invoked — representative file sampling per top-level area, recent PR/commit activity as signal, issue themes passed as first-class input rather than footnote. Phase 2 sub-agents each discover their own subject through their frame's lens (different frames finding different subjects is the feature). Cross-cutting synthesis becomes the magic layer and expects 5-8 combinations vs. 3-5 in specified mode.

Universal warrant contract. Every idea articulates tagged warrant:

  • direct: for quoted evidence from the repo, docs, issues, or user-supplied context
  • external: for cited prior art or domain research with a source
  • reasoned: for explicit first-principles argument (written out, not gestured at)

Warrant is required, not optional. Filtering rejects any idea that lacks warrant, whose warrant does not support the claimed move, or that replaces the subject rather than operating on it. Meeting-test applies as default ambition floor, relaxed only when Phase 0.5 detected tactical focus. The artifact template and Phase 4 presentation surface Warrant: explicitly so users see the chain from their reality to the leap.

Scratch paths: /tmp directly

$TMPDIR on macOS resolves to /var/folders/64/.../T/, which is hostile UX for users who want to inspect scratch checkpoints, grep them, or copy them out. Per-run throwaway (mktemp -d) continues to use $TMPDIR — those files are never meant to be user-accessible. Cross-invocation reusable scratch (ce-ideate's raw-candidates.md, survivors.md, V15 web-research cache; ce-demo-reel's --output-dir) now lives at /tmp/compound-engineering/<skill>/ directly. The repo-root AGENTS.md Scratch Space convention and cross-platform note updated to match.

Plugin AGENTS.md: runtime vs. authoring

Plugin-level AGENTS.md and CLAUDE.md are authoring context — they do not ship with the installed plugin. Skills run against the user's AGENTS.md, not this repo's. Runtime guidance for skills must live in SKILL.md or files under references/, never in plugin or repo AGENTS.md files. Surfaced during the ce-ideate design (early drafts placed shared runtime principles in AGENTS.md, where they would have been invisible to the installed skill at runtime). Noting it so future plugin edits don't repeat the mistake.

Test plan

Validated live in two end-to-end runs on a separate repo (printing-press):

  • Specified mode/ce-ideate browser sniff → 7 survivors, every one with tagged warrant citing specific files or plan documents; ambition floor held; subject identity preserved; rejections included appropriate warrant-integrity cuts ("HAR-input adapters — tactical; no current catalog evidence," "SaaS hosted — solo project; no multi-user signal in grounding").
  • Surprise-me — bare prompt → "Surprise me" selected → 7 survivors spanning 7 distinct subjects (measurement infrastructure, IR + fingerprint lattice, retros as temporal knowledge, preflight/security, drift detection, MCP as contract, hazard classification). External warrants included FDA bioequivalence, IBC §307 building code, Kahl 2021 BirdNET abstention, Cloudflare Code Mode benchmarks. Cross-cutting synthesis produced 8 additional combinations (the surprise-me target).

Full bun test passes (905 tests, 0 failures). bun run release:validate clean.


Compound Engineering
Claude Code

tmchow and others added 3 commits April 24, 2026 01:54
… runtime

The plugin is distributed and installed into end-user environments,
where skills run against the user's AGENTS.md, not this repo's.
Runtime guidance for skills must live inside SKILL.md or references/,
never in plugin or repo AGENTS.md files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…based generation

Phase 0 restructured around what downstream agents actually need:

- 0.2 Subject-Identification Gate asks one scope question ("Specify a
  subject / Surprise me / Cancel") when the subject is ambiguous,
  preserving greenfield intent and keeping "Surprise me" as a real
  first-class option. Questions are only about what to ideate on,
  never about solution direction, constraints, or characterization -
  those belong to ce-brainstorm.
- 0.3 Mode Classification drives dispatch routing only; active
  confirmation fires only when mode is ambiguous after subject is
  known. Prescribed correction phrases ("say X to switch") removed
  in favor of plain mode statements.
- 0.4 Context-Substance Gate asks for URL/paste/description in
  elsewhere modes when Phase 1 agents would otherwise have nothing
  substantive to work with. Scoped to substance, not subject
  characterization.
- 0.5 Focus Modulation detects tactical scope signals (polish, typos,
  quick wins) and lowers the Phase 2 ambition floor accordingly.

Surprise-me promoted to a first-class mode. Phase 1 produces richer
grounding when invoked (representative file sampling, recent activity,
issue themes as first-class input). Phase 2 sub-agents discover their
own subjects per frame - different frames finding different subjects
is the feature. Cross-cutting synthesis is the magic layer and expects
5-8 combinations rather than the 3-5 of specified mode.

Universal warrant contract across all frames, all modes: every idea
articulates tagged warrant (direct: for quoted evidence, external:
for cited prior art, reasoned: for written-out first-principles
argument). Unjustified speculation does not surface. Meeting-test
floor applies as default; subject identity preserved (no
subject-replacement ideas). Post-ideation filtering gains
warrant-integrity rejection criteria; artifact template and Phase 4
presentation surface Warrant: explicitly.

universal-ideation (non-software elsewhere mode) inherits the same
contract. Its discrimination-test paragraph now cross-references the
SKILL.md Phase 0.2 questioning principles instead of leaving
intake-scope undefined.

Subject-identification uses model judgment over word lists:
vagueness is about what words refer to (a generic catch-all quality)
rather than phrase length. Short phrases that plausibly name a
feature or concept (browser sniff, dark mode, cache invalidation)
default to identifiable; a cheap repo check resolves genuine
ambiguity without triggering the gate unnecessarily.

Scratch relocated from \${TMPDIR:-/tmp} to /tmp directly.
/var/folders/... on macOS is hostile UX for users who want to
inspect checkpoints or copy them out; per-user isolation was not
valuable for ephemeral ideation scratch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ratch

$TMPDIR on macOS resolves to /var/folders/64/.../T/, which is hostile
for users who want to inspect scratch checkpoints, grep them, or copy
them out. The per-user isolation $TMPDIR provides is not valuable for
cross-invocation reusable scratch where users are the intended
audience. /tmp is writable on macOS, Linux, and WSL and is universally
accessible.

Per-run throwaway continues to use mktemp -d, which resolves to
$TMPDIR on macOS - correct for throwaway files that are never meant
to be user-accessible.

Updates the repo-root Scratch Space convention in AGENTS.md and
brings ce-demo-reel into compliance (Python default_dir + upload
options doc). ce-ideate was already updated in the prior commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tmchow tmchow merged commit 6514b1f into main Apr 24, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Apr 24, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b2421759f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

**Decision 1 — repo-grounded vs elsewhere.** Weigh prompt content first, topic-repo coherence second, and CWD repo presence as supporting evidence only.

- Positive signals for **repo-grounded**: prompt references repo files, code, architecture, modules, tests, or workflows; topic is clearly bounded by the current codebase.
- Positive signals for **repo-grounded**: prompt references repo files, code, architecture, modules, tests, or workflows; topic is clearly bounded by the current codebase. Issue-tracker intent from 0.2 is always repo-grounded.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not force issue-tracker prompts into repo mode

This line makes any run flagged as issue-tracker intent in 0.2 automatically repo-grounded, even though 0.2 executes before mode classification and can match generic prompts like “GitHub issues” that refer to an external project. In those cases, Phase 1 will run codebase/issue intelligence against the current repo (or fail due missing remote/auth) instead of the user’s intended target, which breaks routing for a valid class of requests that previously stayed outside repo mode.

Useful? React with 👍 / 👎.

Comment on lines +211 to +212
- **Repo mode surprise-me:** the codebase-scan sub-agent samples a few representative files per top-level area (not just reads the top-level layout + AGENTS.md), surfaces recent PR/commit activity as signal about what's actively being worked on, and — when issue intelligence runs — passes issue themes as first-class input rather than footnote. Keep the scan bounded: representative, not exhaustive.
- **Elsewhere mode surprise-me:** user-context synthesis extracts themes, recurring language, tensions, and omissions from whatever the user supplied, rather than just restating it. Web research broadens beyond narrow prior-art for a single subject toward the domain's landscape.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make surprise-me grounding instructions executable

This adds a requirement to sample representative files and surface recent PR/commit activity in repo surprise-me runs, but the actual quick-scan dispatch prompt below still instructs the sub-agent to read only top-level docs/layout and avoid deeper search. Because the dispatched prompt is what controls behavior, surprise-me grounding cannot reliably produce the richer signals this section now requires, so the new mode degrades to shallow scans.

Useful? React with 👍 / 👎.

@github-actions github-actions Bot mentioned this pull request Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant