Skip to content

feat(doc-review, learnings-researcher): tiers, chain grouping, rewrite#601

Merged
tmchow merged 22 commits intomainfrom
tmchow/doc-review-autofix
Apr 20, 2026
Merged

feat(doc-review, learnings-researcher): tiers, chain grouping, rewrite#601
tmchow merged 22 commits intomainfrom
tmchow/doc-review-autofix

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Apr 19, 2026

Summary

ce-doc-review previously surfaced 14+ findings per run, all routed to manual, forcing reviewers to re-litigate the same premise through many rephrasings. This PR cuts engagement to ~4-6 real decisions per review while preserving coverage quality.

Six compounding mechanisms deliver the drop:

Mechanism What it does
3-tier autofix classification safe_auto (silent) / gated_auto (one-click) / manual (judgment) replaces binary auto/manual
Premise-dependency chain grouping Collapses findings whose fixes hinge on one root into a single decision with cascading dependents
FYI routing Advisory observations surface without forcing a decision
Walk-through + bulk preview + 4-option routing Ports ce-code-review's interaction model
In-doc Open Questions deferral Deferred findings attach to the document, no external tracker
Multi-round decision memory Prior-round Skip/Defer suppresses re-raises; fix-landed verification confirms prior applies

Ships alongside: ce-learnings-researcher rewritten for domain-agnostic institutional knowledge (architecture patterns, design patterns, tooling decisions, conventions, not just bugs); ce-compound frontmatter enum expanded with four knowledge-track values; 10 existing entries migrated.

Docs:

Key design decisions

A reviewer should scrutinize these specifically; the rest of the diff follows from them.

  • Premise-dependency chain linking uses scope dominance, not confidence, to pick the surviving root when two premise candidates nest. A higher-confidence narrow-scope finding does not dominate a lower-confidence broader-scope finding whose fix would moot the narrower one. See synthesis-and-presentation.md §3.5c.
  • FYI routing required per-persona LOW bands and a template-level advisory rule. Neither alone populates the FYI bucket — per-persona bands need the template anchoring to fire consistently, and the template rule needs per-persona calibration bands to map to. Both shipped together intentionally.
  • Schema conformance fixed by inline enum callouts at the top of the output contract, not just {schema} injection. The JSON schema was insufficient on its own for longer personas (adversarial at 89 lines) — attention got pushed down. Inline enum anchoring at the top of the contract held where injection alone didn't.
  • Coherence safe_auto resilience got strawman-resistance guidance. Without it, coherence silently demoted wrong-count and stale-cross-ref findings to manual on over-charitable re-interpretation. The persona-level patterns section is short and specific — three patterns, explicit strawman test per pattern.
  • ce-doc-review does NOT dispatch ce-learnings-researcher. Per the pipeline-separation learning (docs/solutions/skill-design/research-agent-pipeline-separation-2026-04-05.md). The learnings-researcher refactor benefits ce-plan / ce-code-review / ce-optimize / ce-ideate callers, not doc-review.

Testing

  • bun test: 798 pass, 0 fail. 13 new contract tests cover the three-tier enum, premise-chain mechanics, FYI floor, cross-persona boost, R29/R30 suppression, headless envelope tiers, terminal-question structure, and schema expansion.
  • 6+ headless runs and 5 interactive tests validated the UX target. Engagement on the primary test fixture dropped from 14-19 decisions (pre-work baseline) to 4-6 (target hit) across run variance. Interactive tests covered walk-through + cascade, bulk preview, in-doc deferral, multi-round primer with R29/R30, and the 3-option terminal question with label adaptation.
  • Three seeded fixtures under tests/fixtures/ce-doc-review/ exercise complementary shapes: rename/infra (nested chains), auth migration (peer multi-root + security persona), UI feature (zero-root + design-lens + proportional review). Seed maps documented in each fixture's header comment. Variance is expected; validate trends across runs, not single samples.

Scope boundaries

  • Defer is an in-doc section; no external tracker integration.
  • No persona activation/selection logic change; the 7 personas and conditional triggers stay as-is.
  • No cross-session primer persistence; in-memory across rounds within one session.

Fixes #562 spiritually (document-review analog of the ce-review interactive judgment overhaul).


Compound Engineering
Claude Code

tmchow and others added 9 commits April 18, 2026 16:42
…ion overhaul

Capture requirements brainstorm (38 requirements across 11 sections) and
implementation plan (8 units, 4 phases) for overhauling ce-doc-review:
three-tier autofix classification (safe_auto / gated_auto / manual)
aligned with ce-code-review names, per-severity confidence gates, ported
walk-through + bulk-preview + routing-question interaction, in-doc
Open Questions deferral, multi-round decision memory, domain-agnostic
learnings-researcher rewrite, and ce:compound problem_type enum
expansion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… / design_pattern / tooling_decision / convention

Adds four knowledge-track values to the ce:compound frontmatter schema
to absorb the best_practice overflow that had been serving as a catch-all
for architecture, design, tooling, and convention learnings. best_practice
remains valid but is now documented as the fallback for entries that no
narrower knowledge-track value covers.

Schema updates land in both ce-compound and ce-compound-refresh (duplicate
files are kept in sync intentionally). Category mappings added for the
four new values under docs/solutions/. Plugin AGENTS.md gets a new
"Documented Solutions" discoverability section naming all knowledge-track
and bug-track categories so agents reading project instructions know
the full taxonomy.

Migrates 10 existing best_practice entries to narrower values and resolves
one correctness-gap schema violation (workflow/todo-status-lifecycle.md
-> workflow_issue). Remaining best_practice entries are genuinely broad
enough to stay in that bucket (e.g., ce-pipeline-end-to-end-learnings).

bun test tests/frontmatter.test.ts passes.

Part of the ce-doc-review autofix and interaction overhaul plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… knowledge

The learnings-researcher was shaped around bug-hunting — its taxonomy,
category table, output framing, and DO/DON'T guidelines all assumed the
caller was looking for past defects. But the team captures more than
bugs: architecture patterns, design patterns, tooling decisions,
conventions, and workflow learnings are all first-class in docs/solutions/
now. Authors were retreating to best_practice as a catch-all because the
agent didn't reliably surface non-bug entries.

This rewrite treats all learning shapes equally:

- Identity framing dropped "preventing repeated mistakes" for "find and
  distill applicable learnings" — bug avoidance is one flavor, not the
  premise
- Accepts a structured <work-context> input (Activity / Concepts /
  Decisions / Domains) and still handles free-form text for callers that
  don't use the structured form
- Replaced the hardcoded Feature-Type-to-Directory table with a dynamic
  probe of docs/solutions/ at invocation time so new subdirectories
  (architecture-patterns/, design-patterns/, conventions/, etc.) are
  picked up automatically
- Expanded keyword extraction from 4 dimensions (Module, Technical terms,
  Problem indicators, Component) to 8 by adding Concepts, Decisions,
  Approaches, Domains — so design-pattern and convention queries have
  something to match on
- Made Step 3b (critical-patterns.md read) conditional on file existence
  — the file does not exist in this repo and demanding it was emitting
  warnings
- Reframed output to "Applicable Past Learnings" with a Type field so
  knowledge-track entries surface as themselves (architecture_pattern,
  design_pattern, etc.) instead of being flattened to a bug-shape
- Reworked Integration Points to drop the ce-doc-review reference (it
  explicitly does not dispatch this agent) and stop framing planning-time
  as the agent's primary home
- Updated DO/DON'T guidelines: don't discard candidates for missing
  bug-shaped fields, treat non-bug learnings as first-class, don't assume
  critical-patterns.md exists

Preserved the grep-first filtering strategy intact (it was already
efficient) and the overall 7-step flow. File grew from 246 to 283 lines
due to the expanded taxonomy and structured input documentation.

Part of the ce-doc-review autofix and interaction overhaul plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion, strawman rule, framing guidance

The persona subagent template was producing output that pushed too much
into 'present' (requires user judgment) because the classification rule
was permissive about what counted as a 'reasonable alternative' — (a)
real fix + (b) 'accept the defect' strawman + (c) 'document in release
notes' strawman looked like three valid approaches, inflating the
manual/present bucket. This upgrade sharpens classification and improves
persona output quality across all seven document-review personas via
one file edit.

Changes:

- autofix_class enum renamed and expanded: [auto, present] -> [safe_auto,
  gated_auto, manual]. Aligns names with ce-code-review's first three
  tiers so cross-skill vocabulary is consistent. Drops the advisory tier
  in favor of a presentation-layer FYI subsection handled by synthesis
- Three-tier routing spelled out with one-sentence descriptions and
  examples: safe_auto for single-option fixes, gated_auto for concrete
  fixes that touch meaning/scope (the new default for 'I know the fix
  but the author should sign off'), manual for genuine judgment calls
- Strawman-aware classification rule added: a 'do nothing / accept the
  defect' option is not a real alternative — it is the failure state the
  finding describes. Same for 'document in release notes' and similar
  sidesteps. Listed alternatives must be ones a competent implementer
  would genuinely weigh. Positive and negative example pair included
- Strawman safeguard on safe_auto: if safe_auto is chosen via
  strawman-dismissal, the dismissed alternatives must be named in
  why_it_matters. Any genuine non-strawman alternative downgrades the
  finding to gated_auto (user confirms) rather than silent apply
- Auto-promotion patterns consolidated into an explicit rule set:
  factually incorrect behavior, missing standard security/reliability
  controls, codebase-pattern-resolved fixes (with cited pattern),
  framework-native-API substitutions, mechanically-implied completeness
- Framing guidance block added (ported from ce-code-review's
  subagent-template): observable-consequence-first phrasing,
  why-the-fix-works grounding, 2-4 sentence budget, weak/strong
  illustrative pair. Tuned for document-review's surface — 'observable
  consequence' rather than 'observable behavior' since doc problems
  surface as disagreements, misreads, and wrong decisions rather than
  runtime behavior
- Persona exclusion rule added: personas must exclude '## Deferred /
  Open Questions' sections and their timestamped subsections from review
  scope. Prevents round-2 feedback loops where deferred entries get
  flagged as new findings or quoted as evidence for suppression checks.
  Unit 6 is going to create these sections as the document-native defer
  mechanic

Template grows from 52 to 95 lines, still under the 150-line @ inclusion
threshold so SKILL.md continues to use @-inclusion.

findings-schema.json autofix_class enum and description updated to match.

Part of the ce-doc-review autofix and interaction overhaul plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s and FYI subsection

Rewrites the synthesis pipeline to route findings by the three new tiers
(safe_auto / gated_auto / manual) with per-severity confidence gates and
a distinct FYI subsection for low-confidence manual findings. This is
the load-bearing synthesis change that enables the interaction model
(Unit 5) and multi-round memory (Unit 7) to follow.

Pipeline changes:

- Step 3.2: replace flat 0.50 gate with per-severity table. P0 0.50,
  P1 0.60, P2 0.65, P3 0.75. FYI floor at 0.40: below-gate manual
  findings above the floor survive in an FYI subsection rather than
  being dropped — preserves observational value without forcing decisions
- Step 3.4: delete residual-concern promotion. Replace with cross-persona
  agreement boost (+0.10, capped at 1.0) applied to merged findings that
  multiple personas flagged. Mirrors ce-code-review stage 5 step 4.
  Residual concerns surface in Coverage only; below-gate findings are
  not promoted back into the review surface
- Step 3.5: tier language updated for three tiers. Contradictions route
  to manual with both perspectives
- Step 3.6: auto-promotion patterns consolidated into an explicit rule
  set: codebase-pattern-resolved, factually incorrect behavior, missing
  standard security/reliability controls, framework-native-API
  substitutions, mechanically-implied completeness additions. Promotion
  targets safe_auto or gated_auto depending on whether the addition is
  substantive. Strawman-downgrade safeguard: if safe_auto was chosen via
  strawman-dismissal per the subagent template, synthesis verifies the
  dismissed alternatives were genuinely strawmen and downgrades to
  gated_auto if any were plausible design choices
- Step 3.7: routing rewritten for three tiers + FYI. safe_auto applies
  silently in Phase 4. gated_auto enters walk-through with Apply
  recommended. manual enters walk-through with user-judgment framing.
  FYI-subsection routes outside the walk-through entirely
- State-machine narration added to the phase header so each transition
  (Raised -> Classified -> one of four buckets) is a named step in
  synthesis prose, not an implied carry-forward
- R30 fix-landed matching predicate specified explicitly: fingerprint +
  evidence-substring overlap >50%. Section renames count as different
  locations so a rename-caused re-raise is treated as new, not as
  'fix did not land'

Phase 4 updates:

- Headless envelope extended with distinct 'Gated-auto findings' and
  'FYI observations' sections alongside existing 'Applied N safe_auto
  fixes' and 'Manual findings' sections. Backward-compatible: callers
  that only read the old sections continue to work; new sections surface
  alongside, not instead of, the existing buckets. Landing the envelope
  change here (Unit 4) rather than Unit 8 means ce-plan 5.3.8 headless
  invocations never observe an interstitial state where new tiers are
  produced but the envelope can't carry them
- Interactive presentation split by tier and finding type. FYI observations
  get their own subsection. Applied safe_auto fixes list every change
  with section + reviewer attribution
- Phase 5 simplified: handoff to Unit 5 walkthrough.md for routing
  question and per-finding interaction. Terminal question (Unit 7's
  three-option structure) fires after walk-through or bulk action

Creates tests/fixtures/ce-doc-review/seeded-plan.md — a test fixture
plan doc with known issues planted across tier shapes (3 safe_auto, 3
gated_auto, 5 manual, 5 FYI candidates, 3 drop-worthy P3s). Used for
the validation gate: run the new pipeline against this fixture and
verify bucket distribution matches expectations before shipping Phase 2.

Part of the ce-doc-review autofix and interaction overhaul plan. Unit 4
of 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…and bulk preview

Ports ce-code-review's interactive flow (post-PR #590) into ce-doc-review
while adapting for document-specific semantics. Replaces the current
binary 'refine / complete' Phase 5 question with a four-option routing
question that dispatches to focused interactions sized to the user's
intent.

New references:

- walkthrough.md — the per-finding walk-through that fires when the user
  picks routing option A. Each finding renders in two surfaces: a
  terminal markdown block carrying the explanation (heading, What's
  wrong, Proposed fix, Why it works) and a compact two-line question
  stem carrying the decision. Four per-finding options: Apply / Defer
  (append to Open Questions) / Skip / LFG-the-rest. Post-tie-break
  recommendation marked with '(recommended)'. N=1 and no-append
  adaptations included. Unified completion report covers walk-through,
  LFG, Append-to-Open-Questions, and zero-findings terminal paths

- bulk-preview.md — the compact plan preview before every bulk action
  (routing B, routing C, walk-through LFG-the-rest). Grouped buckets
  (Applying / Appending to Open Questions / Skipping) with one-line
  plain-English summary per finding. Two options: Proceed or Cancel.
  Cancel semantics route back to the originating question (routing or
  per-finding). Proceed executes the plan and emits the unified
  completion report

Differences from ce-code-review's equivalents:

- No tracker-detection tuple or sink-availability logic — Defer is always
  an in-doc append, no external tracker
- No [TRACKER] label substitution — bucket is always 'Appending to Open
  Questions'
- No fixer-subagent batch dispatch — the orchestrator applies
  gated_auto/manual fixes inline (document edits are single-file
  markdown with no cross-file dependencies)
- No advisory variant with Acknowledge — advisory-style findings surface
  in the FYI subsection at the presentation layer, outside the
  walk-through
- No .context/ artifact-lookup paths — ce-doc-review personas don't
  write run artifacts
- Observable-consequence phrasing tuned for document surface (readers,
  implementers, downstream decisions) rather than ce-code-review's
  runtime-behavior framing

SKILL.md changes:

- Added 'Interactive mode rules' section at the top with the
  AskUserQuestion pre-load directive and numbered-list fallback rule.
  Mirrors ce-code-review's approach for reliable question-tool loading
  on Claude Code
- Updated the Phases 3-5 handoff to load walkthrough.md and
  bulk-preview.md after agent dispatch completes, lazily via backtick
  paths per the skill compliance checklist

Part of the ce-doc-review autofix and interaction overhaul plan. Unit 5
of 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the Defer action's document-native analogue to ce-code-review's
tracker-ticket mechanic. When the user chooses Defer on a finding (from
the walk-through or from the bulk-preview Append-to-Open-Questions path),
an entry appends to a '## Deferred / Open Questions' section at the end
of the document under review.

Design decisions:

- In-doc append instead of sibling follow-up file or external tracker:
  deferred findings stay attached to the document they came from,
  naturally discoverable by anyone reading the doc, no new infrastructure
  required. Trade-off: deferred findings visibly mutate the doc — but
  that's the point, an Open Questions section is a canonical place for
  'worth remembering but not acting on now' in planning docs

- Section location handling covers three cases: end-of-doc heading (most
  common — append inside), mid-doc heading (user positioned deliberately
  — still append at that location, don't duplicate at end), absent
  (create at end, or above trailing separators/footers, or after
  frontmatter for frontmatter-only docs)

- Timestamped subsections (### From YYYY-MM-DD review) group multiple
  defers per review and keep sequential reviews distinguishable. Same-day
  reviews within a session share one subsection; cross-day reviews get
  distinct subsections

- Entry format: title, severity, reviewer attribution, confidence, and
  the why_it_matters framing. Intentionally excludes suggested_fix and
  evidence — those live in the run artifact when applicable; the in-doc
  entry is a concern summary for the reader returning months later, not
  a full decision packet

- Idempotence on title collisions within the same subsection prevents
  duplicate entries from LFG-the-rest re-routing or orchestrator retry
  paths

- Concurrent-edit safety: re-read document from disk before every
  append to reduce collision window with user-in-editor writes. No
  persistent lock — interactive review needs observation-before-write,
  not coordination

- Failure path via blocking sub-question (Retry / Record-only / Convert-
  to-Skip) ensures no silent failures. Default-to-Record-only when the
  sub-question goes unanswered keeps in-memory state consistent

- Upstream availability signal cached at Phase 4 start — known-unwritable
  documents suppress the Defer option in the walk-through menu and
  routing question; mid-flow failures route through the per-finding
  failure path without flipping the session signal

Referenced by references/walkthrough.md (per-finding Defer option) and
references/bulk-preview.md (routing option C Proceed). Includes a
worked example showing the appended-content format in context.

Part of the ce-doc-review autofix and interaction overhaul plan. Unit 6
of 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd decision primer

Replaces the current binary 'refine / complete' Phase 5 question with a
three-option terminal question that separates 'apply decisions' from
're-review' — the most common user frustration the brainstorm surfaced
was that the old wording forced a re-review even when the user wanted
to apply fixes and move on.

Terminal question:

- 'Apply decisions and proceed to <next stage>' — default when fixes
  were applied or decisions were made. <next stage> substitutes to
  ce-plan for requirements docs and ce-work for plan docs
- 'Apply decisions and re-review' — opt-in re-review when the user
  believes their edits warrant another pass
- 'Exit without further action' — user wants to stop for now

When fixes_applied_count == 0 (zero-actionable case or all-Skip walk-
through), the 're-review' option drops (re-review is useless when
nothing changed) and the 'Apply decisions and' prefix drops from the
primary option's label so it matches what the system is doing.

Caller-context handling is implicit rather than requiring a formal
nested:true argument. The agent interprets 'Proceed to <next stage>'
contextually from the visible conversation state — when invoked from
inside another skill's flow, it returns control to the caller; when
standalone, it dispatches the next stage skill. No caller-side change
to ce-brainstorm or ce-plan. If implicit handling proves unreliable,
an explicit nested:true flag can be added as a follow-up.

Multi-round decision memory:

- SKILL.md Phase 2 adds a new {decision_primer} template variable that
  accumulates prior-round decisions across the session. Round 1 renders
  an empty <prior-decisions> block; round 2+ renders applied and
  rejected entries by round with section, title, reviewer, confidence,
  and rejection reason. Skip and Defer both count as 'rejected' for
  suppression purposes. Cross-session persistence is out of scope —
  fresh invocations start a fresh round 1
- synthesis-and-presentation.md adds an R29 suppression step. For each
  current-round finding, compare against the primer's rejected list
  using the same fingerprint + evidence-substring predicate as R30.
  On match, drop the current-round finding and log the suppression in
  Coverage. Materially-different exception: if the section was edited
  enough that the prior-round evidence quote no longer appears in the
  current document, treat as new. Synthesis is the authoritative gate;
  the persona instruction below is a soft hint
- subagent-template.md adds a <decision-primer-rules> block that
  instructs personas to honor the primer: don't re-raise a finding whose
  title and evidence pattern-match a prior-round rejected entry unless
  materially different. Prior-round Applied findings are informational
  — the orchestrator verifies those landed via R30. Round 1 runs with
  no primer constraints

Part of the ce-doc-review autofix and interaction overhaul plan. Unit 7
of 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ugh, defer, enum expansion

Extends pipeline-review-contract.test.ts with structural assertions for
the ce-doc-review overhaul. Uses pattern matching on key tokens rather
than exact prose so future wording improvements don't ossify the tests.

New test groups:

- ce-doc-review contract (8 tests):
  - findings-schema autofix_class enum uses [safe_auto, gated_auto,
    manual] — aligned with ce-code-review's first three tier names,
    no advisory tier, no legacy auto/present values
  - subagent template carries framing guidance, strawman rule, strawman
    safeguard, Deferred/Open-Questions exclusion, and decision-primer
    slot + rules
  - synthesis pipeline has per-severity gates (P0 0.50 / P1 0.60 /
    P2 0.65 / P3 0.75), FYI floor at 0.40, cross-persona agreement
    boost (+0.10), three-tier routing, R29 rejected-finding suppression,
    R30 fix-landed matching predicate
  - headless envelope surfaces new tiers distinctly (Applied N safe_auto
    fixes / Gated-auto findings / Manual findings / FYI observations)
    with preserved 'Review complete' terminal signal
  - terminal question is three-option by default with label adaptation
    in the zero-actionable case; next-stage substitution rules
    documented
  - SKILL.md has Interactive mode rules with AskUserQuestion pre-load
    via ToolSearch, decision_primer variable in dispatch, lazy
    references to walkthrough.md and bulk-preview.md
  - walkthrough has routing question with front-loaded distinguishing
    words, four per-finding options, (recommended) marker, no advisory
    variant, no tracker-detection machinery; bulk-preview has Proceed/
    Cancel with three bucket labels, no Acknowledging bucket
  - open-questions-defer implements the append mechanic with timestamped
    subsections, field-level entry format, failure-path sub-question
    (Retry / Record-only / Convert-to-Skip), no tracker-detection logic

- ce-compound frontmatter schema expansion contract (3 tests):
  - problem_type enum includes architecture_pattern, design_pattern,
    tooling_decision, convention — with best_practice retained as
    fallback
  - ce-compound-refresh schema is byte-identical to canonical
    ce-compound schema (kept in sync per AGENTS.md duplicate convention)
  - yaml-schema.md documents category mappings for the four new values

- ce-learnings-researcher domain-agnostic contract (1 test):
  - Agent prompt frames as domain-agnostic (not bug-focused), names
    Architecture / Design / Tooling / Conventions as first-class
    learning shapes, accepts structured <work-context> input, replaces
    the hardcoded category table with a dynamic probe, reads
    critical-patterns.md conditionally, and drops ce-doc-review from
    Integration Points (agent is ce-plan-owned)

All 13 new tests pass. Full suite: 798 pass, 0 fail. bun run
release:validate: in sync.

Part of the ce-doc-review autofix and interaction overhaul plan. Unit 8
of 8 — final unit; this ships the contract tests that lock in the
three-tier / multi-round / interaction-model changes shipped in Units 3
through 7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d897823f52

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-doc-review/SKILL.md
- Preserve evidence snippets in decision primer entries (thread r3106059451, P1)
  R29 rejected-finding suppression and R30 fix-landed verification both
  use fingerprint + evidence-substring overlap (>50%) as their matching
  predicate. The primer format in SKILL.md previously stored only
  section/title/reviewer/confidence/reason, leaving the orchestrator
  without prior-round evidence needed to compute the overlap check.
  Adds an 'Evidence:' line to each primer entry containing the first
  evidence quote truncated to ~120 chars (word-boundary-preserving,
  internal quotes escaped). Documented inline why the field is required.

- Align review-output-template.md with new three-tier taxonomy (thread
  r3106059452, P2)
  The interactive presentation template still referenced the legacy
  'auto/present' contract — summary line, 'Auto-fixes Applied' heading,
  'Auto | Present' coverage columns, and rules text all leaked stale
  bucket semantics into user-facing output. Rewrites the template for
  the new taxonomy: 'Applied N safe_auto fixes' summary line, a Tier
  column in severity tables, a dedicated 'FYI Observations' section,
  and Coverage columns for SafeAuto | GatedAuto | Manual | FYI |
  Residual. Residual Concerns section clarified: no longer promoted
  via residual-concern promotion (dropped in Unit 4) — listed for
  transparency only. Adds a note up-front that this template covers
  Phase 4 interactive presentation specifically; the headless envelope
  lives in synthesis-and-presentation.md.

All contract tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df79db5015

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md Outdated
Both findings fall in the same pattern the prior round surfaced:
high-level rules written without the deterministic mechanics the
orchestrator needs to actually implement them. Cross-invocation
signal from resolve-pr-feedback flagged the theme.

- Add deterministic recommended-action tie-break to synthesis
  (thread r3106064985, P1)
  walkthrough.md said 'recommended action has already been normalized
  by synthesis step 3.6' but step 3.6 only handled auto-promotion;
  there was no rule for picking Apply/Defer/Skip when contributing
  personas implied different actions. Without a tie-break, LFG
  outcomes are non-deterministic across runs and walk-through
  '(recommended)' marks drift.

  Adds synthesis step 3.5b 'Deterministic Recommended-Action Tie-Break'
  implementing 'Skip > Defer > Apply' (most conservative first). Scans
  contributing personas in that order; first match wins. Documents
  persona-to-action mapping (safe_auto/gated_auto -> Apply, manual
  tradeoffs -> Defer, residual-suppression-eligible -> Skip,
  contradiction-set with 'keep as-is' -> Skip). Silent-personas
  fallback to Apply. Records a conflict-context string on the merged
  finding for the walk-through's R15 conflict-context line. Updates
  walkthrough.md's reference from 'synthesis step 3.6' to the correct
  '3.5b'.

  Invariant: walk-through and bulk-preview read recommended_action
  and never recompute. LFG-the-rest and routing option B execute the
  recorded recommendation in bulk, so the same review artifact
  always produces the same bulk plan.

- Strengthen dedup key in open-questions-defer.md (thread r3106064987,
  P2)
  Title-only idempotence silently drops legitimate deferred entries
  when two different concerns share a short title in the same review
  date. Loses user-visible backlog context.

  Compound key is now 'normalize(section) + normalize(title) +
  evidence_fingerprint', mirroring the R29/R30 matching predicate
  (fingerprint + evidence-substring overlap) so cross-round and
  intra-round dedup behave consistently. Evidence fingerprint is the
  first ~120 chars of the finding's first evidence quote (same slice
  used in the decision primer). Falls back to section+title when
  evidence is unavailable.

All contract tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: afc699da8f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md Outdated
Both findings continue the pattern: specification gaps in deterministic
orchestration mechanics. Different gaps each round — Codex keeps
finding adjacent spots where high-level rules need concrete
implementation detail.

- Drop routing-level recommendation labeling (thread r3106084407, P2)
  walkthrough.md's routing question said any of A/B/C/D could be
  marked '(recommended)' based on synthesis, but synthesis only
  defines per-finding recommended_action — there was no rule for
  picking a routing-level recommendation. Rather than invent one
  (a mapping from finding-set shape to routing recommendation would
  pressure users toward automated paths in ways that conflict with
  the user-intent framing of the four options), remove the routing-
  level (recommended) labeling. Per-finding (recommended) inside the
  walk-through and bulk preview still applies — that's deterministic
  via step 3.5b. The routing choice itself is user-intent:
  engage / trust / triage / skim.

- Persist dedup-key fields in the appended entry (thread r3106084409,
  P2)
  Previous round added a compound dedup key (section + title +
  evidence_fingerprint) but the deferred-entry format on disk only
  stored title/severity/reviewer/confidence/why_it_matters — so
  retries or same-day reruns couldn't reliably reconstruct the key
  from persisted data. Duplicates could still slip through.

  Adds an HTML-comment line on each appended entry carrying the
  machine-readable dedup fields: <!-- dedup-key: section=\"...\"
  title=\"...\" evidence=\"...\" -->. Keeps the visible content
  reader-friendly (section now surfaces in the header line alongside
  severity/reviewer/confidence; the HTML comment stays hidden in
  rendered markdown) while persisting exactly the fields Step 4's
  compound-key check needs. Legacy entries without the comment fall
  back to title-only comparison — imperfect but strictly better than
  duplicate-appending.

All contract tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f65830da9b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@tmchow
Copy link
Copy Markdown
Collaborator Author

tmchow commented Apr 19, 2026

Thanks for the thorough review — these are all good catches. Summary of what I plan to address:

P1s (will fix before merge):

  • Add evidence snippet/fingerprint to decision primer entries so R29/R30 can compute overlap deterministically across rounds (SKILL.md:156).
  • Add a deterministic tie-break rule to synthesis step 3.6 for choosing Apply/Defer/Skip when reviewers disagree, so walk-through recommendations are reproducible (walkthrough.md).
  • Gate recommended_action: Apply on suggested_fix presence in 3.5b so silent-on-action manual findings don't auto-route to Apply with no executable fix (synthesis-and-presentation.md:82).

P2s:

  • Update references/review-output-template.md to match the new safe_auto/gated_auto/manual taxonomy, or drop the dependency.
  • Strengthen deferred-entry dedup key to section + title (or the same fingerprint used elsewhere), and persist the key fields in the entry so reruns can reconstruct it (open-questions-defer.md x2).
  • Add routing-level recommendation tie-break (or remove (recommended) from A/B/C/D) so interactive runs are deterministic (walkthrough.md).
  • Suppress routing option C when append_available: false so read-only docs don't expose a guaranteed-failure path (walkthrough.md:23).

Will push a follow-up commit shortly.

tmchow and others added 5 commits April 19, 2026 11:32
…ting

Second seeded fixture alongside seeded-plan.md, covering auth-migration
domain with two deliberately distinct premise roots (managed-auth
migration, custom policy-layer warranted?) at different scopes so the
synthesis multi-root path gets exercised. Includes the same tier
distribution as the rename fixture: 3 safe_auto, 3 gated_auto, roots
with dependents, root-independent manual findings, 4 FYI, 3 drop-worthy
P3s.

Run in parallel with seeded-plan.md to compare calibration across
domains and catch domain-specific persona bias.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d design-lens coverage

Third seeded fixture alongside seeded-plan.md (rename/infra) and
seeded-auth-plan.md (auth migration). Designed to exercise three paths
the other fixtures do not cover:

1. design-lens persona activation — the fixture contains UI components,
   user flows, visual hierarchy, and interaction descriptions so the
   design-lens reviewer triggers and can be observed.
2. Zero-root chain path — every seeded finding is an independent
   execution detail; no premise-level challenges exist. Synthesis
   should skip chain grouping entirely (report 'Chains: 0 roots' or
   omit the line). If a chain appears, a reviewer has over-elevated an
   execution finding to premise status — a calibration signal.
3. Small-document / proportional-review path — ~130 lines of content
   (vs ~210 for the infra fixtures) so adversarial runs Quick mode
   (≤3 findings) or does not activate, and scope-guardian may stay
   inactive given no priority tiers.

Run alongside seeded-plan.md and seeded-auth-plan.md to catch
design-lens regressions, zero-root bugs, and domain bias when reviewing
user-facing feature plans rather than infrastructure plans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…im legacy debris

Tighten the agent for domain-agnostic institutional knowledge retrieval:

- Add runtime guardrails: output bound (up to 5 findings), fallibility
  caveat (flag when learning conflicts with observable code, include
  entry date so caller can judge supersession), gap signal (suggest
  /ce-compound capture when no matches).
- Drop ~40 lines of Rails-specific component/root_cause enum debris
  duplicated from the authoritative yaml-schema.md reference. The
  inline duplication undermined the single-source-of-truth contract
  and carried irrelevant vocabulary (rails_model, thread_violation,
  missing_index) into every invocation in a TypeScript/Bun plugin
  repo.
- Realign Step 3 example queries to domain-agnostic shapes so the
  grep-pre-filter examples don't quietly re-prime bug-shaped search.
- Collapse 4x repetition of the bug-parity rule to a single canonical
  statement; disambiguate the DO/DON'T scope-vs-depth rule (reading
  frontmatter of search-matched candidates is fine; reading frontmatter
  of every file in docs/solutions/ is the anti-pattern).
- Drop the unenforced "30 seconds" performance claim.
- Expand integration points to document all callers: /ce-plan,
  /ce-code-review, /ce-optimize, /ce-ideate, plus standalone invocation.

Net: -39 lines with material quality gain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ot and cascade

Cut user engagement burden on reviews with premise-level fanout: a
single premise challenge ("is this work justified?") often generates
downstream findings that all evaporate if the premise is rejected.
Surfacing each as an independent decision forces the user to
re-litigate the same root question N times.

Synthesis changes (synthesis-and-presentation.md):

- Add step 3.5c Premise-Dependency Chain Linking. Identify P0/P1
  manual findings challenging a foundational premise as roots; link
  dependent findings whose suggested_fix becomes moot if the root's
  fix is accepted. Annotate with depends_on / dependents fields
  (purely annotative — no reclassification).
- Multi-root elevation: elevate ALL candidates matching the root
  criteria independently. A single-root default strands the second
  root's dependents as top-level manual findings — the exact UX
  problem chains are meant to solve. Sanity diagnostic at >3 without
  a hard cap (criteria are the filter).
- Peer-vs-nested mechanical test: roots are peers when accepting
  one's fix does not resolve the other's concern. Nested when one's
  fix moots the other.
- Surviving-root selection under nested: scope dominates confidence.
  When nested, the surviving root is the one whose fix moots the
  other, not the higher-confidence candidate. Confidence is for
  tie-breaking among peers, not for picking dominant root.
- Independence safeguard: findings identifying problems that exist
  regardless of the premise (rollback plans, error handling, test
  coverage, security gaps) stay standalone; safe_auto and gated_auto
  findings never link.
- Count invariant: the dependents array populated in Step 4 is the
  single source of truth for both coverage counting and rendering.
  Prevents coverage/rendering drift (e.g., "6 dependents" reported
  but 5 nested in output).
- Two worked examples in different domains (rename-shape, auth-shape)
  paired with the lesson "same shape, different vocabulary" to
  prevent example-vocabulary overfitting.

Walkthrough changes (walkthrough.md):

- Root-first iteration regardless of severity order within a chain.
- Cascade on Skip/Defer of a root: prompt with Cascade
  (apply same action to all dependents) or Decide each individually.
  Apply on root does NOT cascade — the premise held, dependents each
  need their own decision.
- Orphaned dependents (root suppressed in prior round per R29): treat
  as standalone findings with no chain context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce, and safe_auto resilience

Populate the FYI bucket, stabilize schema conformance across long
persona prompts, and prevent safe_auto from silently collapsing under
over-charitable re-interpretation.

Persona calibration bands (7 personas):

- Add LOW (0.40-0.59) Advisory band tailored to each persona's scope
  (coherence, feasibility, product-lens, design-lens, security-lens,
  scope-guardian, adversarial). Drop suppress floor from 0.50 to 0.40
  so advisory observations without articulable consequence can land
  in FYI rather than either being promoted above gate or silently
  dropped. Without this band, advisory findings (filename asymmetry,
  speculative future-work, stylistic preference) either clutter
  manual or disappear entirely — the FYI bucket stayed empty across
  initial calibration runs.

Subagent template (subagent-template.md):

- Add Schema conformance block at the top of the output contract
  listing exact enum values inline (severity P0/P1/P2/P3, finding_type
  error/omission, autofix_class safe_auto/gated_auto/manual, evidence
  as array). Schema injection via {schema} alone was insufficient —
  longer personas (adversarial 89 lines, scope-guardian 54 lines)
  pushed schema attention down, producing "high/medium/low" severity
  and singleton string evidence. Inline enum anchoring at the top of
  the contract survives longer prompts. Add a severity translation
  rule so informal priority language in persona rubrics maps to P0-P3
  at emit time.
- Add Advisory routing rule using the "what actually breaks if we
  don't fix this?" heuristic to anchor confidence scoring into the
  0.40-0.59 band. Per-persona bands without this template rule
  produce inconsistent results across personas; the template rule
  without per-persona bands has nothing to calibrate against. Both
  are needed.

Coherence persona (ce-coherence-reviewer.agent.md):

- Add Safe_auto patterns you own section naming the three high-stakes
  patterns (header/body count mismatch, cross-reference to a named
  section that does not exist, terminology drift between
  interchangeable synonyms) with strawman-resistance guidance.
  Addresses observed regression where coherence demoted wrong-count
  findings from safe_auto to P1 manual on the hypothetical that a
  missing R6 might exist. The new guidance instructs: "If the only
  reason to not pick safe_auto is that the author might have meant
  something else, consider whether that interpretation is actually
  plausible vs a strawman." Let synthesis's strawman-downgrade
  safeguard catch genuine strawmen; do not pre-demote at the persona
  level.
- Remove stale autofix_class: auto reference (old binary value, now
  split into the 3-tier enum).

Calibration patterns documented (docs/solutions/skill-design/):

- Capture nine durable calibration patterns from the calibration
  runs so future contributors don't "fix" intentional behavior:
  external stale cross-refs route to gated_auto (not safe_auto);
  multi-surface terminology drift routes to gated_auto; peer-vs-nested
  decision test; surviving-root scope dominance; multi-root explicit
  elevation; FYI band + template anchoring both required; inline enum
  callouts vs schema injection; coverage/rendering single source of
  truth; reviewer variance is inherent (validate across multiple
  runs).
- Placed in docs/solutions/skill-design/ so persona agents don't read
  it during dispatch (ce-doc-review does not dispatch
  ce-learnings-researcher per the plan's scope cut, so the doc stays
  outside persona runtime context).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tmchow tmchow changed the title feat(ce-doc-review): autofix tiers, per-finding walk-through, multi-round memory feat(ce-doc-review): autofix tiers, chains, and multi-round memory Apr 19, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c56c64c3f5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md Outdated
@tmchow tmchow changed the title feat(ce-doc-review): autofix tiers, chains, and multi-round memory feat(doc-review, learnings-researcher): tiers, chain grouping, rewrite Apr 19, 2026
Close four spec gaps the Codex reviewer flagged on the downstream
implications of recently-added mechanisms:

- 3.5b actionless default — when contributing personas are all silent
  on action, gate the Apply default on suggested_fix presence; fall
  back to Defer when no fix exists. Adds a cross-branch downgrade:
  if the tie-break winner is Apply but the merged finding has no
  suggested_fix after 3.6/3.7 run, demote to Defer. Prevents LFG and
  bulk-preview from scheduling non-executable Apply actions.
- Routing question option C — suppress when
  open-questions-defer.md's cached append_available is false. Drops
  the menu to three options (A/B/D) and appends a one-line
  explanation, mirroring the per-finding option B suppression
  already documented under Adaptations.
- Cascade persistence — every cascaded decision lands in the
  in-memory decision list (the canonical state-model home for all
  answered findings), with action-specific side effects layered on
  top: cascaded Apply also joins the Apply set, cascaded Defer also
  invokes the open-questions append flow (with per-finding failure
  path on append error), cascaded Skip is decision-list-only. Aligns
  the cascade with the Per-finding routing rules and the state model.
- evidence_fingerprint sanitization — collapse any whitespace run
  (newlines/CRs/tabs) to single spaces, strip HTML-comment-breaking
  sequences, escape quotes, trim — applied before the 120-char
  slice so the dedup-key HTML comment stays on one parseable line.
  Step 4 reconstruction updated to reference the sanitization and
  fall through to the legacy-fallback path on malformed entries
  rather than attempting partial parse.

All 38 contract tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21fb160adf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md Outdated
…ply on no-fix findings

Close two spec gaps flagged by Codex review:

- 3.5c chain predicate mismatch — the linking rule defined dependents
  as findings that "become moot when the root's fix is accepted", but
  the walkthrough and worked examples cascade dependents on root
  Skip/Defer (rejection). Rewrite the Step 2 predicate and
  substitution check around the rejection branch so linking is
  semantically consistent with cascade execution. Worked examples A
  and B already read naturally under the new framing; independence
  safeguard is branch-neutral and unchanged.
- Apply execution on no-fix findings — the prior 3.5b fix demoted
  Apply to Defer for the automated default, but users can still
  manually pick Apply on a manual finding with no suggested_fix
  during walk-through. Add two-layer guards:
    * Decision-time: if user picks Apply on a no-fix finding, fire a
      sub-question with Defer-to-Open-Questions (recommended) / Skip
      / Acknowledge-without-applying. Includes append-availability
      adaptation and cascading-root handling.
    * Execution-time: defensive check before dispatching each
      Apply-set edit — if suggested_fix is missing, skip with a
      recorded warning in the completion report, continue the batch.
  Completion report gains an Acknowledged bucket (omitted when zero).

All 38 contract tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 236de00ab5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md Outdated
Comment thread plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md Outdated
…ded) to A/B/C

Two small spec gaps flagged by Codex review:

- Acknowledged decisions were not persisted across rounds.
  R29/primer carried forward Skip and Defer but not Acknowledged,
  leaving the no-fix-Apply guard case re-surfaceable on round 2.
  Extended the primer carry-forward and R29 rule to treat
  Acknowledged as a rejected-class decision, semantically equivalent
  to Skip for round-to-round suppression.
- The (recommended) marker on per-finding options said "any of the
  four" when synthesis only emits Apply/Defer/Skip — A/B/C.
  Corrected to exclude D (LFG the rest) since that is a workflow
  shortcut for bulk execution, not a finding-level resolution action.

All 38 contract tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2712a2feb9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…teractive template

Two bundled concerns:

1. Separate internal schema vocabulary from user-facing rendered text.
   Schema enum values (safe_auto / gated_auto / manual) are correct
   for the persona output contract and synthesis routing logic, but
   leak into user-facing output as jargon. Rename in rendered text:
     - "Applied N safe_auto fixes" → "Applied N fixes"
     - "Gated-auto findings (concrete fix, requires user
       confirmation)" → "Proposed fixes (confirm each)"
     - "Manual findings (requires judgment)" → "Decisions (requires
       your judgment)"
     - FYI kept as-is (already natural English)
     - Coverage columns SafeAuto/GatedAuto/Manual → Auto/Proposed/
       Decisions
   Schema definitions, persona output contract, synthesis pipeline
   prose, and Tier-column enum values stay — those are internal.

2. Add chain-rendering rules to review-output-template.md
   (interactive mode) to mirror the headless envelope. Dependents
   render only under their root as a Dependents (N) sub-block; MUST
   NOT appear at their own severity position. Count invariant:
   Findings = Auto + Proposed + Decisions + FYI, each counted once in
   its assigned bucket. Optional Chains: N root(s) with M dependents
   line for parity with the headless envelope.

Contract test `headless envelope surfaces new tiers distinctly`
updated to match the new literal substrings (semantic check — bucket
headers exist — preserved).

Verified no caller parses the old substrings: grep against
ce-brainstorm and ce-plan returned zero matches for the legacy
substrings the spec claimed "backward compatibility" on. Dropped
that claim.

All 38 contract tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5afe103e17

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md Outdated
walkthrough.md:197 said the completion report groups acknowledged
entries alongside the Skipped bucket, but lines 249-256 require a
dedicated Acknowledged bucket with its own count, its own per-finding
action label, and its own position in the report ordering
(Applied / Deferred / Skipped / Acknowledged).

Two distinct concerns had gotten tangled in one sentence:

- Report display: Acknowledged is its own bucket (distinct audit
  trail; users can see what was consciously acknowledged vs what was
  skipped as irrelevant).
- R29 suppression: Acknowledged is treated as rejected-class,
  equivalent to Skip (for suppressing re-raises in round N+1).

Rewrote the Acknowledge routing bullet to state both explicitly and
point at the completion-report spec sections below for the full
contract.

All 38 contract tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tmchow tmchow merged commit c1f68d4 into main Apr 20, 2026
2 checks passed
This was referenced Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

What is the best workflow after ce:review - worth introducing a ce:fix?

1 participant