Skip to content

feat(ce-plan): approach-altitude plan-for-a-plan with ce-work non-code carve-out#905

Merged
tmchow merged 7 commits into
mainfrom
tmchow/ce-plan-articulation-flow
Jun 4, 2026
Merged

feat(ce-plan): approach-altitude plan-for-a-plan with ce-work non-code carve-out#905
tmchow merged 7 commits into
mainfrom
tmchow/ce-plan-articulation-flow

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Jun 4, 2026

What

Adds an approach altitude to ce-plan: for a hard problem, answer one level up — produce a grounded plan for how the deliverable will be made and hold at a checkpoint before committing, instead of zero-shotting a fragile result. Non-code deliverables flow to a new lightweight ce-work carve-out.

How it works

  • Entry (ce-plan SKILL.md Phase 0.1a — sits between the resume/deepen fast paths and the domain split, so it's domain-general): the explicit path ("plan for a plan", "don't write it yet — plan the approach") is always honored; a proactive offer fires only when method-uncertainty AND cost-of-getting-it-wrong are both high — a single dismissible line, never a nag.
  • Flow (references/approach-altitude.md): light recon (skim, not deep-read) → chat-first approach-plan (file-optional, deepenable) → checkpoint (do it now / save for later).
  • Boundary = code vs. knowledge-work, not plan vs. execute. Code still flows to ce-work's normal path; a non-code deliverable is marked execution: knowledge-work and runs through ce-work's carve-out (references/non-code-execution.md) — or any agent, since the plan stays portable. ce-plan never executes.
  • Disjoint from the three existing in-chat approach surfaces (answer-seeking's plan-of-attack, the scoping synthesis, deepening) via explicit guards (R16).

Validation

  • Mechanical: bun test tests/frontmatter.test.ts → 331 pass; bun run release:validate → in sync.
  • Behavioral (the key gate — no beta): ran the proactive-trigger eval in fresh sessions (skill prose caches at session start, so it can't be tested in-authoring). 42 faithful dry-runs across negative controls, should-offer, explicit, and borderline cases. The first pass caught a real over-fire: N4 (a costly-but-method-obvious 40-endpoint migration) fired the offer 1/12 because "multiple methodologies" leaked to include routine rollout/sequencing variants — the exact new-hammer failure mode the design guards against. Fixed by tightening the method-uncertainty wording to exclude rollout/scope/ordering variants (routing them to the Phase 0.7 scoping synthesis per R16) and restating "cost alone never fires." Re-validated in a fresh session: N4 1/12 → 0/8, no regression on should-offer (6/6) or explicit-hold (6/6).

Deferred (see plan)

  • Config kill-switch for the proactive offer (no beta; the trigger wording is the lever — an accepted exposure).
  • ce-work carve-out save/commit behavior beyond write + report.

Surfaces changed

ce-plan (SKILL.md gate + approach-altitude.md + execution marker field + R16 guards), ce-work and ce-work-beta (non-code carve-out, kept in parity), and the ce-plan / ce-work skill docs.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e0edcd443

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-work/SKILL.md Outdated
@tmchow
Copy link
Copy Markdown
Collaborator Author

tmchow commented Jun 4, 2026

Regression validation (existing behavior)

Ran a fresh-session regression eval covering the existing flows this change touches — 19/19 match pre-change behavior, 0 regressions:

  • ce-work Phase 0 triage (the highest-risk change): unmarked code plan → normal code lifecycle (3/3); bare prompt → normal triage, marker path untouched (3/3); execution: knowledge-work plan → non-code carve-out (2/2).
  • ce-plan non-software plan-seeking (trip / study / offsite): the 0.1a gate stays silent and routes to universal-planning unchanged (9/9).
  • Software-with-brainstorm: 0.1a not reached, normal planning (2/2).

Combined with the earlier proactive-trigger eval — which caught and fixed an over-fire (N4, a costly-but-method-obvious migration, fired 1/12; tightened wording; re-validated 0/8) — the gate is validated for both new behavior and existing-behavior regression.

@tmchow tmchow merged commit fbd0faf into main Jun 4, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant