feat(ce-plan,ce-brainstorm): add output:html mode by tmchow · Pull Request #826 · EveryInc/compound-engineering-plugin

tmchow · 2026-05-12T20:40:56Z

Summary

ce-plan and ce-brainstorm can now emit a single self-contained HTML rendering alongside the markdown source when invoked with output:html, or with plan_output: html / brainstorm_output: html set in .compound-engineering/config.local.yaml. Markdown remains canonical; HTML is a projection composed per artifact by the agent following content-shape questions and a minimal opinionated fallback style, with no hardcoded template grammar.

Both skills change together because the HTML composition rules are shared. The reference content (references/html-output.md) is duplicated byte-for-byte across both skills per the plugin's cross-skill convention, and the SKILL.md, handoff reference, and test changes run in mirror.

Design decisions

Agent-driven composition, not script-driven. The skill's core value here is per-artifact judgment about which HTML affordances fit which content (tabular, sequential, branching, interactive). A deterministic transformer would produce uniform output that defeats the point. The reference gives the agent content-shape questions and an opinionated fallback CSS; the agent composes per artifact.

The reference treats markdown as content source, not structural authority. Markdown's lists/sections/tables are presentation defaults, not semantic ground truth. The agent re-derives structure from semantic content per section. Content-shape questions include a hard rule (5+ items sharing uniform structure render as <table> regardless of how the markdown source structured them), a sticky-TOC affordance with concrete trigger (5+ top-level sections OR ~400 lines), and reverse-traceability columns for ID-anchored content (R-IDs showing which U-IDs satisfy each). These three nudges turn "could be HTML" into "is reliably HTML-shaped."

Markdown stays canonical; HTML is a projection. ce-work and other downstream consumers still read the markdown. The HTML is for richer human review and sharing. Pipeline mode (LFG, any disable-model-invocation context) forces md regardless of CLI or config preference, so ce-work never gets orphan HTML.

Precedence with explicit-wins-over-implicit. Style preferences resolve in this order: conversation, then any preferred stylesheet reference in loaded agent-instruction context (typically AGENTS.md or CLAUDE.md), then DESIGN.md from the repo, then the skill's fallback default. Phase 0.0 tracks OUTPUT_FORMAT_SOURCE (cli / config / default / pipeline-forced) alongside the resolved value. An explicit output:md (CLI or config) beats a sibling .html presence on resume; users can disable HTML emission without manually deleting files. At compose time the agent scans loaded context for any stylesheet reference (file path, URL, named library, or style brand) and inlines or composes-in-spirit accordingly.

Composition after safe_auto. HTML composes after ce-doc-review's safe_auto fixes land on the markdown, so the first emission reflects autofixes. Within a single skill run, HTML re-renders whenever the markdown is mutated (deepen, doc-review, HITL Proof resync, post-pull resync). Across multi-run lifecycles the HTML can drift; a visible staleness banner in every artifact surfaces source path and composition timestamp for detection.

Single-file invariant survives every path. Inline CSS, inline SVG, inline images via base64 or SVG. No companion .css / .js / .svg files. CDN webfonts permitted only with a complete offline-readable fallback font stack. Frontmatter preserved as <script type="application/json"> with < escaped to < (HTML entity) to prevent </script> injection. Small inline <script> for active-section TOC tracking is acceptable; React/Vue/etc. frameworks are not.

Agent-consumability rules in the reference. A downstream agent reading the HTML as text linearly (not via DOM extraction) needs semantic structure reachable in source: <article> per unit card, <dl> for metadata, <table> for tabular content, <details> / <summary> for collapsibles, field labels as visible text rather than data attributes, and U-IDs / R-IDs as visible text in headings and cells rather than only as id="".

Menu mutual exclusion in HTML mode. Proof operates on markdown plans/requirements docs and cannot ingest HTML, so HTML-mode users see "Open in browser" in place of "Open in Proof" at the same option slot. Menu bullets are split into two 4. (or 3.) lines, each tagged with its OUTPUT_FORMAT precondition; the agent renders exactly one. /ce-work stays the recommended option in both modes since ce-work consumes the markdown.

Evidence

The HTML dogfood for this PR's own plan is committed at docs/plans/2026-05-11-001-feat-output-html-mode-plan.html. It was composed by hand during plan authoring, before any skill code was written, and validated the approach (per-affordance judgment, single-file rendering, sticky TOC with active-section indicator, anchor permalinks, collapsible Implementation Unit subsections via native <details> / <summary>). Open the file locally to see what output:html produces.

Test plan

bun test: 1389 pass, 0 fail.

tests/skills/ce-plan-output-mode.test.ts: argument-hint advertises the flag, Phase 0.0 resolution lives inline, token-parsing names both mode: and output:, Phase 5.2 defers HTML compose, Phase 5.3.9 owns the compose, menu shows "Open in browser" in HTML mode.
tests/skills/ce-brainstorm-output-mode.test.ts: mirror per skill, plus the handoff non-propagation rule (ce-plan re-resolves its own config independently).
tests/skills/html-output-invariants.test.ts: every promised rule in the reference holds (single-file invariant, inline CSS, < → < escape near a <script> reference, anchor IDs, precedence stack, active-recall, content-shape questions as actual questions, affordance idioms, fallback CSS with dark-mode and responsive breakpoints, agent-consumability rules, staleness signal, post-compose audit, markdown-not-structural-authority, uniform-shape table rule, sticky-TOC affordance with concrete trigger, reverse traceability, inline-script permission for active-section tracking).
tests/compound-support-files.test.ts: byte-for-byte duplication enforced across the two skills' html-output.md.
tests/skills/ce-plan-handoff-routing.test.ts: "Open in browser" added to the per-option inline-routing assertions.

bun run release:validate: in sync, no drift.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f95c05a338

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Two findings from the Codex review on #826. P1 — html-output.md said "Escape `<` as `<`", which is a no-op. The intended guidance is to escape `<` as `<` (the HTML entity) inside the JSON payload embedded in `<script type="application/json">`, so any frontmatter value containing the literal substring `</script>` cannot terminate the script tag. Browsers unescape when reading the script element's text content, so downstream JSON parsers see clean characters. The invariant test was loose enough to pass on either `<` or the word "escape"; tightened to require the literal entity AND require it to sit near a `<script>` reference. P2 — HTML composition in Phase 5.3.9 (ce-plan) and Phase 3 (ce-brainstorm) recomposed whenever an `.html` sibling was marked for re-render in Phase 0.1, even when pipeline mode forced `OUTPUT_FORMAT=md`. In automated contexts (LFG) that resume an existing plan with an HTML sibling, this would emit HTML, violating the explicit md-only pipeline rule and the ce-work consumption guarantee. The composition gate is now two conditions: (a) HTML wanted via CLI/config/sibling-re-render AND (b) not in pipeline mode. Both must hold. Same gate applies to mid-run re-renders triggered by HITL Proof resync. The plan body was also updated to match the corrected escape rule. Refs: #826

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a898b3eeff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…after Proof pull Two follow-up findings from the Codex re-review of #826. P2 — Sibling-rerender suppression on explicit output:md. The prior fix gated HTML composition on (HTML wanted OR sibling marked) AND not-in-pipeline, but the sibling-marked branch still fired even when a user explicitly passed output:md on resume. That made it impossible to disable HTML emission once a plan had an .html sibling without manually deleting the sibling file, which contradicted the documented CLI arg > config > default precedence. Phase 0.0 now tracks OUTPUT_FORMAT_SOURCE alongside OUTPUT_FORMAT, with values cli / config / default / pipeline-forced. Phase 0.1 only marks the sibling for re-render when SOURCE=default — when neither an explicit CLI arg nor a config preference resolved. An explicit md choice (CLI or config) now beats sibling presence; users can drop back to markdown-only on resume without filesystem surgery. P2 — Recompose HTML after Proof pull. The localSynced:false and done_for_now branches in plan-handoff.md and ce-brainstorm/handoff.md offered a Pull workflow that updated the local markdown and re-ran ce-doc-review, but they didn't re-apply the HTML composition rule. The .html sibling could be stale in the same run after a successful pull, contradicting the recompose-after-markdown-mutations contract. Both branches now re-apply the HTML composition rule (gated by its two conditions) after the pull before re-rendering the menu, matching the localSynced:true branch behavior. Refs: #826

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e565de2db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… Proof in HTML mode Proof operates on markdown plans/requirements docs and cannot ingest HTML. The previous menu rendering presented option 4 (ce-plan) and option 3 (ce-brainstorm) as a single bullet containing both "Open in Proof" and "Open in browser" labels with an inline italic conditional. An agent reading that bullet could render both labels visibly, leaving an HTML-mode user with the option of picking Proof and getting markdown loaded in the editor unexpectedly. Splitting the conditional bullet into two separate option-N lines — each tagged with its `OUTPUT_FORMAT` precondition — gives the agent an unambiguous instruction: render exactly one. The mutual exclusion stays intact and the menu cap is still honored. Adds explicit "Proof operates on markdown" rationale to the prose so the why is reachable without skill redesign. Refs: #826

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26e814ce25

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8b521043b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… resolve webfont contradiction Two findings from the Codex re-review of #826. P2 — Menu gate / composition gate mismatch. The "Open in browser" menu option gated on OUTPUT_FORMAT=html, but the HTML composition step (Phase 5.3.9 in ce-plan, Phase 3 in ce-brainstorm) fires under a broader predicate: OUTPUT_FORMAT=html OR Phase 0.1 marked an existing .html sibling for re-render. The OUTPUT_FORMAT_SOURCE machinery added in the previous round preserved the explicit-md-beats-sibling rule but exposed this latent gap. A resume run with a .html sibling but OUTPUT_FORMAT=md (default-source) regenerated the HTML in Phase 3/5.3.9 — but the menu still surfaced "Open in Proof" and hid the fresh HTML the user just implicitly requested via sibling re-render. The menu gate now uses the same predicate as the composition gate (HTML wanted OR sibling-marked re-render), so resume runs surface the artifact they just produced. P2 — Webfont link contradiction in reference. The active-recall block said "Never emit a <link rel='stylesheet'> to an external sheet" while the Fallback default style section showed exactly that — a <link rel="stylesheet"> pointing to Google Fonts CSS. Both could not be right. The documented webfont exception ("CDN webfonts permitted only with offline-readable fallback stack") is the intended rule; the absolute prohibition was the bug. Qualified the rule to scope the prohibition to layout/typography stylesheets the doc cannot read offline, while explicitly permitting <link rel="stylesheet"> for CDN webfont CSS with the fallback condition stated nearby. Three new invariant tests pin both fixes — menu-gate-matches-composition-gate on both skills, and no-absolute-stylesheet-prohibition in the reference. Byte-for-byte parity preserved across ce-plan and ce-brainstorm html-output.md. Refs: #826

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a464837eae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… </script> injection Two findings from the Codex re-review of #826. P1 — Wrong escape was shipped across two prior rounds. The reference said "escape `<` as ` & lt; `" and claimed browsers unescape it when reading the script tag's text content. That claim is factually incorrect: <script type="application/json"> is HTML raw-text content where HTML entities are NOT decoded. .textContent returns the literal four-character string '<', not '<', so any frontmatter value containing '<' is corrupted instead of round-tripping. The correct escape is the JSON Unicode escape `<`, which JSON parsers natively decode back to '<'. Replaced the escape rule with explicit literal-six-character `<` guidance, added an explicit warning against the HTML-entity approach naming the round-trip failure mode, and tightened the invariant test to require the literal `<` AND forbid future regression to `<`. P2 — Hard-invariant bullet still ambiguous on `<link rel="stylesheet">`. Line 22 said "No companion .css, .js, or .svg files" and an agent could reasonably read that as banning ALL `<link rel="stylesheet">`, even though the webfont exception was documented in the next bullet. Folded the exception into line 22 so an agent reading the no-companion-files rule cannot miss the webfont carve-out and the explicit boundary on what kinds of external stylesheets are permitted (CDN webfont CSS only — never layout, color, or design-system stylesheets). Refs: #826

…enu-rendering predicate Codex caught a real follow-on bug from the previous round. Menu rendering now uses the broader HTML-emitted predicate (OUTPUT_FORMAT=html OR Phase 0.1 marked a sibling for re-render), but the routing block was still gated on OUTPUT_FORMAT=html only. So a resume run that surfaces "Open in browser" via the sibling-rerender branch would fail to actually open the file when the user picked it — the routing's condition is false even though the menu showed the option. Aligned all three Open-in-browser routing surfaces (ce-plan SKILL.md inline, plan-handoff.md elaborate, ce-brainstorm handoff.md) to the same HTML-emitted predicate the menu uses. Same predicate as the composition gate. All three surfaces now consistent. Refs: #826

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c8de465db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69c7be2b40

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1187a56b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…e prose Codex review (PR #826) flagged two precision bugs in the Phase 0.0 output-mode spec: 1. Step 2 said "if YAML contains `plan_output: md |html`", which would match commented examples like `# plan_output: html` shipped in the config template — silently flipping users into HTML on every run. Now the spec requires an ACTIVE (non-commented) key, names the shipped-template failure mode inline, and says comments must be ignored. 2. The unknown-output-value note hardcoded "defaulting to md", but step 1's drop-and-fall-through still lets step 2 (config) resolve to html or step 4 (pipeline) force md. The note now reflects the actual final resolved mode after steps 2-4, not a hardcoded value. Same fix on both ce-plan (plan_output) and ce-brainstorm (brainstorm_output) for parity. Tests pin the principle's presence (active-key requirement, final-mode-reflection) rather than specific phrasings.

Plan document for adding an output:html argument to ce-plan and ce-brainstorm. The HTML sibling is a dogfood artifact composed by hand from the plan during authoring — it validated the approach (agent-driven composition with content-shape questions and a minimal opinionated fallback style) before any skill code was written.

ce-plan and ce-brainstorm now accept an output:html / output:md argument that emits a single self-contained HTML rendering alongside the markdown source. Markdown remains canonical; HTML is a projection composed per artifact by the agent following content-shape questions and a minimal opinionated fallback style — no hardcoded template grammar. Resolution precedence is CLI arg > config (.compound-engineering/config.local.yaml keys plan_output and brainstorm_output) > default md, with a hard pipeline-mode override that forces md so ce-work and other downstream consumers always have their markdown input. Composition timing is after ce-doc-review safe_auto fixes apply, so the first HTML emission reflects autofixes. Within a single skill run, HTML re-renders whenever the markdown is mutated (deepen fast path, post-doc-review, HITL Proof resync). Across multi-run lifecycles the HTML may drift; this is a known limitation surfaced via a staleness banner. The HTML composition guidance lives in references/html-output.md, duplicated byte-for-byte between ce-plan and ce-brainstorm (enforced by tests/compound-support-files.test.ts). The reference carries hard invariants (single self-contained file, inline CSS, inline SVG, CDN webfonts only with offline fallback stack), content-shape questions, affordance idioms, agent-consumability rules, and a post-compose audit checklist. User style preferences supersede skill defaults via active-recall at compose time: the agent scans loaded context for any preferred stylesheet reference (file path, URL, named library, or style brand — typically in AGENTS.md or CLAUDE.md), and inlines or composes-in-spirit accordingly. DESIGN.md sits below the agent-instruction tier as the only filesystem read the skill does. Post-generation menu adds Open in browser as a mutual-exclusion replacement for Open in Proof in HTML mode (keeps the option cap honored). /ce-work remains the recommended next step in both modes since ce-work consumes the markdown. ce-setup config-template gains commented plan_output and brainstorm_output keys so users discover them through the standard config-bootstrap path. Tests assert reference-content invariants and skill-body load-bearing rules (argument-hint advertises output:html, resolution prose is inline not solely in references, mutual-exclusion menu rule present, byte-for-byte parity of the duplicated reference enforced).

Two findings from the Codex review on #826. P1 — html-output.md said "Escape `<` as `<`", which is a no-op. The intended guidance is to escape `<` as `<` (the HTML entity) inside the JSON payload embedded in `<script type="application/json">`, so any frontmatter value containing the literal substring `</script>` cannot terminate the script tag. Browsers unescape when reading the script element's text content, so downstream JSON parsers see clean characters. The invariant test was loose enough to pass on either `<` or the word "escape"; tightened to require the literal entity AND require it to sit near a `<script>` reference. P2 — HTML composition in Phase 5.3.9 (ce-plan) and Phase 3 (ce-brainstorm) recomposed whenever an `.html` sibling was marked for re-render in Phase 0.1, even when pipeline mode forced `OUTPUT_FORMAT=md`. In automated contexts (LFG) that resume an existing plan with an HTML sibling, this would emit HTML, violating the explicit md-only pipeline rule and the ce-work consumption guarantee. The composition gate is now two conditions: (a) HTML wanted via CLI/config/sibling-re-render AND (b) not in pipeline mode. Both must hold. Same gate applies to mid-run re-renders triggered by HITL Proof resync. The plan body was also updated to match the corrected escape rule. Refs: #826

…after Proof pull Two follow-up findings from the Codex re-review of #826. P2 — Sibling-rerender suppression on explicit output:md. The prior fix gated HTML composition on (HTML wanted OR sibling marked) AND not-in-pipeline, but the sibling-marked branch still fired even when a user explicitly passed output:md on resume. That made it impossible to disable HTML emission once a plan had an .html sibling without manually deleting the sibling file, which contradicted the documented CLI arg > config > default precedence. Phase 0.0 now tracks OUTPUT_FORMAT_SOURCE alongside OUTPUT_FORMAT, with values cli / config / default / pipeline-forced. Phase 0.1 only marks the sibling for re-render when SOURCE=default — when neither an explicit CLI arg nor a config preference resolved. An explicit md choice (CLI or config) now beats sibling presence; users can drop back to markdown-only on resume without filesystem surgery. P2 — Recompose HTML after Proof pull. The localSynced:false and done_for_now branches in plan-handoff.md and ce-brainstorm/handoff.md offered a Pull workflow that updated the local markdown and re-ran ce-doc-review, but they didn't re-apply the HTML composition rule. The .html sibling could be stale in the same run after a successful pull, contradicting the recompose-after-markdown-mutations contract. Both branches now re-apply the HTML composition rule (gated by its two conditions) after the pull before re-rendering the menu, matching the localSynced:true branch behavior. Refs: #826

… Proof in HTML mode Proof operates on markdown plans/requirements docs and cannot ingest HTML. The previous menu rendering presented option 4 (ce-plan) and option 3 (ce-brainstorm) as a single bullet containing both "Open in Proof" and "Open in browser" labels with an inline italic conditional. An agent reading that bullet could render both labels visibly, leaving an HTML-mode user with the option of picking Proof and getting markdown loaded in the editor unexpectedly. Splitting the conditional bullet into two separate option-N lines — each tagged with its `OUTPUT_FORMAT` precondition — gives the agent an unambiguous instruction: render exactly one. The mutual exclusion stays intact and the menu cap is still honored. Adds explicit "Proof operates on markdown" rationale to the prose so the why is reachable without skill redesign. Refs: #826

…ime variance Observed a real-world plan rendered with output:html that produced a materially different artifact than the dogfood, despite identical reference content being loaded. Diagnosis: the reference under-prescribed on four specific affordances I baked into the dogfood by hand but never wrote down as instructions. Different agent runs picked validly from the affordance menu and produced visibly different outputs. Four tightenings, all in references/html-output.md (duplicated byte-for-byte across both skills): 1. "Markdown is the content source, not the structural authority." Markdown's lists/sections/tables are presentation defaults, NOT semantic ground truth. The agent must re-derive structure from semantic content per section. Without this, agents inherit the markdown source's bulleted-list rendering for content that would scan faster as a table. 2. Uniform-shape hard rule. If 5+ items in a section share uniform structure (ID + body, name + value, label + description, decision + rationale, risk + mitigation), render as <table> regardless of how the markdown source structured them. Promoted from soft content-shape question to load-bearing directive with concrete example shapes named. 3. Sticky TOC sidebar with active-section indicator added to affordance idioms, with a concrete trigger (5+ top-level sections OR ~400 lines). Single-column-only on a long plan was the biggest UX miss in the observed divergence. 4. Reverse traceability via downstream-references column on ID-anchored tables (e.g., R3 row shows "U2, U5, U7" satisfying it). Forward references read easily; reverse lookup requires the column or scanning every unit. Also clarified the inline-script boundary: a small inline <script> for active-section tracking / anchor-permalink behavior is acceptable. The no-JS-framework rule applies to React/Vue/etc., not to ~15 lines of vanilla IntersectionObserver code. Five new invariant tests pin each tightening so future drift fails CI. Byte-for-byte parity preserved across ce-plan and ce-brainstorm copies.

… resolve webfont contradiction Two findings from the Codex re-review of #826. P2 — Menu gate / composition gate mismatch. The "Open in browser" menu option gated on OUTPUT_FORMAT=html, but the HTML composition step (Phase 5.3.9 in ce-plan, Phase 3 in ce-brainstorm) fires under a broader predicate: OUTPUT_FORMAT=html OR Phase 0.1 marked an existing .html sibling for re-render. The OUTPUT_FORMAT_SOURCE machinery added in the previous round preserved the explicit-md-beats-sibling rule but exposed this latent gap. A resume run with a .html sibling but OUTPUT_FORMAT=md (default-source) regenerated the HTML in Phase 3/5.3.9 — but the menu still surfaced "Open in Proof" and hid the fresh HTML the user just implicitly requested via sibling re-render. The menu gate now uses the same predicate as the composition gate (HTML wanted OR sibling-marked re-render), so resume runs surface the artifact they just produced. P2 — Webfont link contradiction in reference. The active-recall block said "Never emit a <link rel='stylesheet'> to an external sheet" while the Fallback default style section showed exactly that — a <link rel="stylesheet"> pointing to Google Fonts CSS. Both could not be right. The documented webfont exception ("CDN webfonts permitted only with offline-readable fallback stack") is the intended rule; the absolute prohibition was the bug. Qualified the rule to scope the prohibition to layout/typography stylesheets the doc cannot read offline, while explicitly permitting <link rel="stylesheet"> for CDN webfont CSS with the fallback condition stated nearby. Three new invariant tests pin both fixes — menu-gate-matches-composition-gate on both skills, and no-absolute-stylesheet-prohibition in the reference. Byte-for-byte parity preserved across ce-plan and ce-brainstorm html-output.md. Refs: #826

… </script> injection Two findings from the Codex re-review of #826. P1 — Wrong escape was shipped across two prior rounds. The reference said "escape `<` as ` & lt; `" and claimed browsers unescape it when reading the script tag's text content. That claim is factually incorrect: <script type="application/json"> is HTML raw-text content where HTML entities are NOT decoded. .textContent returns the literal four-character string '<', not '<', so any frontmatter value containing '<' is corrupted instead of round-tripping. The correct escape is the JSON Unicode escape `<`, which JSON parsers natively decode back to '<'. Replaced the escape rule with explicit literal-six-character `<` guidance, added an explicit warning against the HTML-entity approach naming the round-trip failure mode, and tightened the invariant test to require the literal `<` AND forbid future regression to `<`. P2 — Hard-invariant bullet still ambiguous on `<link rel="stylesheet">`. Line 22 said "No companion .css, .js, or .svg files" and an agent could reasonably read that as banning ALL `<link rel="stylesheet">`, even though the webfont exception was documented in the next bullet. Folded the exception into line 22 so an agent reading the no-companion-files rule cannot miss the webfont carve-out and the explicit boundary on what kinds of external stylesheets are permitted (CDN webfont CSS only — never layout, color, or design-system stylesheets). Refs: #826

…enu-rendering predicate Codex caught a real follow-on bug from the previous round. Menu rendering now uses the broader HTML-emitted predicate (OUTPUT_FORMAT=html OR Phase 0.1 marked a sibling for re-render), but the routing block was still gated on OUTPUT_FORMAT=html only. So a resume run that surfaces "Open in browser" via the sibling-rerender branch would fail to actually open the file when the user picked it — the routing's condition is false even though the menu showed the option. Aligned all three Open-in-browser routing surfaces (ce-plan SKILL.md inline, plan-handoff.md elaborate, ce-brainstorm handoff.md) to the same HTML-emitted predicate the menu uses. Same predicate as the composition gate. All three surfaces now consistent. Refs: #826

…s-design distinction The earlier framing — "markdown is the content source, not the structural authority" — implied chat context was the authoritative content source and markdown was secondary. That's not right: when the user names a markdown doc as input (e.g., pointing the skill at a source path), the markdown IS a valid input alongside chat context. The actual rule is about WHICH ASPECT of the source the agent treats as authoritative — not which source. Content and semantics: yes, authoritative. Design and presentation (bullet vs section vs table): no, not authoritative. The agent re-chooses HTML affordances per content shape regardless of how the markdown source presented the same content. Reframed the rule as "the markdown is a source of content, not a source of design" and updated the test predicate to match.

…fault-closed collapsibles Three under-specifications in references/html-output.md surfaced by a real-world dogfood (cli-printing-press cloak plan): Palette / body-bold readability. The dark-mode fallback used --accent #5eead4 (Tailwind cyan-300 brightness) and --accent-text #99f6e4 (brighter still). When the agent styled .kd-list strong with var(--accent-text), every <strong> in a 10-item Key Technical Decisions section became bright teal, producing a "screaming" effect that fatigued the eye. Two-part fix: (1) tone down the palette one notch — --accent: #2dd4bf, --accent-text: #5eead4 — keeping the teal family but reducing visual loudness; (2) more fundamentally, a new "Color usage rules" section instructs agents to reserve --accent text color for status chips, ID chips, links, and section borders, and NOT to color <strong> in body content by default. Bold weight already carries emphasis; coloring every bold in a long list overwhelms regardless of which hue is chosen. The palette tweak is the safety net; the don't-color-bolds rule is the harder constraint. Diagram trigger / per-shape rule. The earlier diagram idiom was a soft "Inline SVG flowcharts/sequences/data-flow for branching or temporal logic" — easy to answer "no" to. The cloak plan was architectural (CDP attach topology, port-discovery sequence, preflight contract flow) and produced zero SVGs because the rule had no trigger threshold or multi-diagram framing. New "Diagrams: when and how many" section names explicit triggers per shape category (3+ components → component topology, 3+ named steps → sequence, 3+ states → state machine, 3+ decision points → flowchart, 3+ stages → data-flow). Trigger fires PER SHAPE, not per diagram, so a plan with multiple distinct shapes renders multiple diagrams rather than one combined. Anti-padding rule with explicit "does each diagram add information not in the others?" test prevents agents from rendering redundant diagrams to look thorough. Default-closed for unit collapsibles. The collapsibles idiom said "readers expand only what they need" — implying closed-by-default, but never stating it explicitly. The cloak plan rendered Approach with <details open> on every Implementation Unit, defeating the scan-friendly compactness the pattern is meant to provide. Rule is now explicit: all <details> inside repeating cards start closed (no `open` attribute). The metadata strip above is the primary surface; subsection labels are clickable affordances. Five new invariant tests pin each tightening so future drift fails CI. Byte-for-byte parity preserved across ce-plan and ce-brainstorm copies.

…dd presence audits Real-world dogfood (cli-printing-press cloak plan, re-run with the prior per-shape diagram fix loaded) STILL produced zero diagrams despite three architecture triggers firing (5-component topology, lifecycle, 3-decision branching logic). Agent self-diagnosed: "I made an in-the-moment call to skip the SVG for this iteration to save tokens." Trigger fired, agent saw it, agent skipped anyway. Root causes per the agent's analysis: 1. Trigger language was softer than intent. The "Render an inline SVG diagram when..." phrasing reads as recommendation. Compare to the uniform-shape rule which IS marked **load-bearing** — that marker is what makes the table rule fire reliably even when the agent is token-pressured. Diagram trigger lacked the equivalent weight. 2. Post-compose audit only flagged style issues in diagrams that EXIST (spatial logic, color usage). It did NOT check "did the content satisfy any architecture triggers, and if so are matching diagrams in the output?" The most common dogfood failure mode (recognize trigger, skip rendering) was structurally invisible to the audit. Three fixes: - Mark Architecture trigger as **(load-bearing)** with same-footing language as the uniform-shape table rule. Add an explicit clause: "Token cost is not a valid reason to skip a triggered diagram." Reframe "render when..." as "render for EVERY shape category the doc satisfies." - New "Diagram-presence audit (load-bearing)" step at the top of the post-compose audit checklist. Procedure: count the firing triggers; count the SVGs in the output; SVG count must be at least the count of distinct shape categories that fired. Missing required diagrams is the most common dogfood failure; structurally visible in the audit catches it. - New "Table-presence audit" and "Body-bold color audit" steps, parallel to the diagram-presence audit. Each catches the recognize-but-skip failure for its respective rule. The audit section now leads with presence checks before style checks. Two new invariant tests pin the load-bearing marker and the presence-audit structure so a future drift back to soft language fails CI. Byte-for-byte parity preserved across both skills' copies.

The HTML-side diagram audit caught zero SVGs on a plan with three triggers firing. The agent self-diagnosed two failures: the HTML-side trigger was soft (now fixed with the load-bearing marker and presence audit), AND the markdown-side Phase 3.4 "(Optional)" framing meant the agent never produced the upstream sketches that would have rendered into HTML. Same softness pattern, parallel fix: - Phase 3.4 header changed from "(Optional)" to "(Load-bearing when triggers fire)". The 'optional' marker was what agents read as skippable. - New "Architecture triggers (load-bearing)" subsection inside 3.4 with the same per-shape trigger conditions used on the HTML side (3+ components, 3+ protocol steps, 3+ states, lifecycle, 3+ decision points, 3+ data-flow stages, mode/flag combos, DSL/API design, non-obvious single-component). Each trigger names its required sketch type. - Per-shape rule: multiple firing triggers produce multiple sketches in the same HTD section, not one combined. Anti-padding rule prevents redundant sketches. - Explicit "token cost is not a valid reason to skip" clause, matching the HTML-side language and addressing the actual failure mode the agent self-reported. Phase 5.1 review checklist gains a "High-Level Technical Design presence audit (load-bearing)": count firing triggers, count sketches, sketches >= firing trigger categories. Parallel to the HTML-side diagram-presence audit added in the previous commit. plan-template.md HTD section comment rewritten from "Optional: Include this section only when..." to LOAD-BEARING framing with the trigger conditions and the multi-sketch rule. The template comment is what agents see at plan-write time; aligning it with the SKILL.md guidance removes the softness from both surfaces simultaneously. Six new tests in tests/skills/ce-plan-htd-trigger.test.ts pin: no (Optional) marker, load-bearing language, trigger-condition presence, token-skip rejection, per-shape framing, Phase 5.1 audit, and template comment alignment. Drift back to soft framing fails CI.

Parallel to the ce-plan Phase 3.4 / html-output.md fixes — same softness risk on the ce-brainstorm side, different content shapes. Requirements docs trigger on user flows, actors, acceptance examples, and lifecycles rather than architectural components and protocols, but the structural issue is identical: a soft "include when significantly easier" framing reads as skippable to a token-pressured agent. The diagram type differs; the risk does not. Three edits: - visual-communication.md trigger table is now marked load-bearing with same-footing language as the ce-plan HTD fix. Trigger conditions rephrased with concrete "3+ X" thresholds: Key Flow with 3+ steps, Key Flow with 2+ actors handing off, 3+ behavioral modes, 3+ interacting participants in Actors, entity lifecycle with 3+ states, acceptance examples with 3+ branching steps, multiple competing approaches. Per-shape rule (multiple triggers → multiple visuals) and anti-padding rule (each visual must add information the others don't) added. Explicit "token cost is not a valid reason to skip" clause matches the ce-plan side. - requirements-capture.md "## Visual communication" section rewritten from soft "when significantly easier to understand with one" to load-bearing framing that names the trigger categories inline and points to the full table in the reference. This section is the agent's first encounter with the visual rule at brainstorm-write time; soft framing here would defeat the detailed trigger table. - requirements-capture.md Finalization checklist gains a load-bearing "Visual-aid presence audit": count firing triggers, count visuals, visuals >= firing trigger categories. Parallel to the ce-plan Phase 5.1 HTD presence audit added in the previous commit. Five new tests in tests/skills/ce-brainstorm-visual-trigger.test.ts pin the load-bearing markers, concrete thresholds, per-shape rule, anti-padding rule, capture-side framing, and finalization audit. Drift back to soft framing fails CI. Color palette fixes (dark-mode --accent tone-down, don't-color-body-bolds rule) were already in ce-brainstorm via byte-for-byte parity in references/html-output.md — the duplication test enforces sync, so the earlier commits applied automatically.

Brainstorm HTML output can now include directional wireframe mockups when the requirements doc describes a user-facing visual surface. Scoped to HTML-only (the canonical markdown stays prose); plans deliberately do not get this affordance because mockups in plans over-prescribe implementation choices that should remain open during build. Guardrails keep the affordance from drifting into mockup-as-spec: fidelity ceiling (gray boxes + placeholder copy, not pixel-perfect), static only (no JS / no live data), anti-padding (one wireframe per distinct visual concept), and a mandatory directional caption with required wording. Reference content is duplicated byte-for-byte across ce-plan and ce-brainstorm; tests pin section presence, scope gate, plan exclusion, each guardrail, and required caption wording.

…on plan diagrams The 2026-05-12 cloak-browser plan dogfood surfaced two reference gaps. The topology SVG had a long curved arrow running through two text labels ("DevToolsActivePort", "writes port") and a caption disassociated from the arrow it described — source looked fine, rendered output had legibility bugs. The agent designed coordinates by hand without rendering. New layout-legibility rules give a pre-emit checklist: no arrow paths through text labels (with paint-order halo as the named fix), labels adjacent to their arrow midpoint, avoid long curves traversing the diagram, and call out component topology with 5+ boxes as the highest-risk shape. Separately, the same plan added "directional guidance for review, not implementation specification" before the SVGs and a "(directional, not implementation spec)" qualifier on a unit-card subsection — the wireframe caption pattern over-generalized to plan architecture diagrams. Plan diagrams render the same authoritative content as the prose; the prose-is-authoritative rule already governs disagreement. Explicitly forbid the hedging phrases on plan diagrams and tie the rule back to the existing principle.

…s aren't buried In dogfood, Key Decisions sat at section 9 (between Scope Boundaries and Dependencies) where decisions like "default engine is cloak, not opt-in", "library install, not CLI install", and "anonymous-only scope" got lost under the detail. Those are framing choices that constrain Requirements, Flows, Acceptance Examples, and Scope — readers should encounter them as the doc's narrative spine, not as bottom-of-doc reference material. Move Key Decisions to position 3, right after Problem Frame, so the order reads: what (Summary) → why (Problem Frame) → opinionated choices that shape everything below (Key Decisions) → detail. R-ID references inside decisions still read fine even though R-IDs are introduced later in the doc; "Affects R7" doesn't require R7 to have been enumerated yet. Update both the Section matrix and the Template; add a rationale paragraph under the matrix explaining the placement so a future maintainer optimizing for "audit content at the end" doesn't slide it back down. Pin the order with five invariant tests covering matrix position, template order, single-heading uniqueness, and inline rationale presence.

… not pinned values Dogfood: the 2026-05-13 cloak brainstorm flowchart had a `stroke-width: 3px` halo on 11-13px diagram labels that bled into the glyph strokes, muting the text color in dark mode. The same brainstorm invented a `--surface-tint-2` tier to give decision diamonds a slightly-lighter grey than rectangle boxes — a 7-unit RGB delta that's barely perceptible and that dark-mode browser extensions can mishandle. The fix is principle-level, not value-level. Pinning `2px not 3px` or banning `--surface-tint-2` specifically would drift across artifacts and recur with different numbers. Instead, teach: - Halo width is a judgment call: narrow enough not to bleed into glyph strokes, wide enough to mask underlying arrows. Verify by inspecting rendered text against the same text outside the diagram. - Differentiate diagram shapes by geometry first (diamond vs rect), fill semantics second (accent-soft, warn-soft). Resist additional neutral-tint tiers when geometry already differentiates — small RGB deltas don't survive dark-mode extensions and printing consistently. Tests follow the same shape: check for the principle's presence (regex for "judgment call" / "verify by inspecting" / "geometry first"), not for specific values.

…ackground Dogfood: the 2026-05-13 cloak brainstorm flowchart applied --text-muted to secondary labels inside --accent-soft (dark teal) and --warn-soft (dark amber) container shapes. The result reads as washed-out grey on tinted fill — F1's "R1, R2, R5, R6" subtitle and Stock Chromium's "summary: WAF risk (R6)" subtitle both lose perceptual contrast even though the luminance delta looks acceptable on paper. The failure is contextual: --text-muted is calibrated for prose on the page bg. On a tinted container fill the hue contrast collapses (grey on dark teal, grey on dark amber) regardless of luminance. The principle- level fix: text color is chosen against the LOCAL fill, not the page bg. Inside tinted containers, use the same-hue lighter variant (accent-text / warn-text / info-text) or drop the muting entirely and rely on font-size and weight for hierarchy. Verify per artifact by reading each filled shape at rendered scale. Tests follow the principle, not specific colors.

…e prose Codex review (PR #826) flagged two precision bugs in the Phase 0.0 output-mode spec: 1. Step 2 said "if YAML contains `plan_output: md |html`", which would match commented examples like `# plan_output: html` shipped in the config template — silently flipping users into HTML on every run. Now the spec requires an ACTIVE (non-commented) key, names the shipped-template failure mode inline, and says comments must be ignored. 2. The unknown-output-value note hardcoded "defaulting to md", but step 1's drop-and-fall-through still lets step 2 (config) resolve to html or step 4 (pipeline) force md. The note now reflects the actual final resolved mode after steps 2-4, not a hardcoded value. Same fix on both ce-plan (plan_output) and ce-brainstorm (brainstorm_output) for parity. Tests pin the principle's presence (active-key requirement, final-mode-reflection) rather than specific phrasings.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ff5fcca34

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-14T18:05:02Z

+HTML view written to <absolute path to .html>
+```
+
+If a later phase in this run mutates the `.md` again (HITL Proof resync), re-compose the HTML at that point — same two-condition gate — so the view stays aligned with the markdown source.


Recompose HTML after every post-write markdown mutation

This new rule only mandates HTML recomposition after HITL Proof resync, but the same handoff flow still allows other markdown-mutating paths (Run deeper doc review can apply edits, and Other free-form input explicitly accepts plan revisions) before returning to the menu. In runs where HTML is being emitted (output:html or sibling re-render), those non-HITL edits can leave the .html sibling stale within the same session, contradicting the stated "view stays aligned" contract.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-14T18:05:02Z

+- **Avoid long curves that traverse the diagram** to connect a component on one side to one on the other. If A and D need a labeled connection across a multi-component layout, prefer (a) reordering boxes so A and D are adjacent, (b) numbered step badges next to each participant that the caption ties together, or (c) a short labeled-channel notation with a co-located legend, rather than one curve that crosses three or four unrelated elements.
+- **Reserve label-clear corridors.** When placing arrows in a dense diagram, plan label positions before drawing lines so labels and lines do not have to compete for the same pixels. Component topology diagrams with 5+ boxes and multiple cross-arrows are the highest-risk shape for this — sequence diagrams (vertical lifelines) and flowcharts (vertical decision flow) are lower-risk because their layout is constrained.
+
+**Plan architecture diagrams are not directional sketches.** Do not add hedging captions or section preambles to plan SVG diagrams — phrases like "directional guidance for review, not implementation specification," "treat as context, not code to reproduce," or "(directional, not implementation spec)" do not belong on plan diagrams or on unit-card technical-design subsections. Plan diagrams render the same authoritative content as the surrounding prose; the prose-is-authoritative rule already governs disagreement. Hedging language is reserved for the wireframe affordance below, which carries a *required* directional caption because the wireframe is explicitly NOT a spec. Architecture diagrams in plans are the opposite: an alternate rendering of the authoritative content.


Harmonize plan-diagram caption rule across plan and HTML guides

This new prohibition says plan diagrams must not use directional/hedging captions, but ce-plan guidance still instructs authors to add exactly that framing (e.g., in skills/ce-plan/SKILL.md and references/plan-template.md). The contradictory instructions make HTML composition nondeterministic: agents can reasonably keep or strip the same caption depending on which source they prioritize, so output consistency regresses.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/references/html-output.md Outdated

Comment thread plugins/compound-engineering/skills/ce-plan/references/plan-handoff.md Outdated

tmchow mentioned this pull request May 12, 2026

feat(ce-plan,_shared): add --html flag and shared HTML output reference #809

Closed

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/references/plan-handoff.md

Comment thread plugins/compound-engineering/skills/ce-plan/references/plan-handoff.md Outdated

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/references/plan-handoff.md Outdated

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-brainstorm/references/handoff.md Outdated

Comment thread plugins/compound-engineering/skills/ce-plan/references/html-output.md

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/references/html-output.md Outdated

Comment thread plugins/compound-engineering/skills/ce-plan/references/html-output.md

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/references/plan-handoff.md Outdated

tmchow marked this pull request as draft May 13, 2026 00:44

tmchow marked this pull request as ready for review May 13, 2026 22:39

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/SKILL.md Outdated

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-brainstorm/SKILL.md Outdated

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/SKILL.md Outdated

tmchow added 11 commits May 14, 2026 10:55

tmchow added 9 commits May 14, 2026 10:55

tmchow force-pushed the tmchow/ce-plan-html-output branch from ab81216 to 5ff5fcc Compare May 14, 2026 17:59

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

Conversation

tmchow commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decisions

Evidence

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmchow commented May 12, 2026 •

edited

Loading