Skip to content

feat(ce-code-review): add cross-model adversarial review pass#1007

Merged
tmchow merged 20 commits into
mainfrom
tmchow/codex-adversarial-review
Jun 26, 2026
Merged

feat(ce-code-review): add cross-model adversarial review pass#1007
tmchow merged 20 commits into
mainfrom
tmchow/codex-adversarial-review

Conversation

@tmchow

@tmchow tmchow commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

ce-code-review's adversarial pass ran only as an in-process subagent — same model, same process as the orchestrator — so its "independent" findings shared the host model's blind spots. This adds a cross-model adversarial pass: when the adversarial reviewer is selected, the review also runs the same brief through a different model family in a separate read-only process and folds its findings into Stage 5, where agreement between the two becomes the strongest signal in the set. The PR also reworks how the review is presented and hardens host self-identification so the pass works across Claude Code, Codex, and Cursor.

The cross-model pass

Self-identifies the host at runtime (no build-time wiring) and picks a peer from a different model family:

Running in Peer Read-only via
Claude Code / Cursor Codex (codex exec) -s read-only hard sandbox
Codex Claude (claude -p) dontAsk + denied Bash/edit tools + embedded diff

The peer returns the same findings-schema.json shape every reviewer uses, enters Stage 5 as adversarial-<peer>, and a shared dedup fingerprint with the in-process adversarial persona promotes the finding (different model families, separate processes — the strongest corroboration available).

The invocation lives in a bundled script, scripts/cross-model-adversarial-review.sh, invoked via a SKILL_DIR anchor: the agent sets SKILL_DIR to the directory it read the SKILL.md from, then runs "$SKILL_DIR/scripts/…". The Bash tool's working directory is the user's project on Claude Code, Codex, and Cursor alike, and no harness exposes a skill-dir env var, so the model supplies the path — deterministic on every host. The script self-locates the persona + schema via BASH_SOURCE, composes the prompt from the canonical references/personas/adversarial-reviewer.md (single source — no duplication, no prompt passed by the orchestrator), and writes adversarial-<peer>.json.

Design decisions

  • Runtime self-identification, not build/convert-time — host markers (CLAUDECODE, CODEX_SANDBOX/CODEX_SESSION_ID/CODEX_THREAD_ID/CODEX_CI, CURSOR_AGENT) plus a process-ancestry fallback when env markers are stripped. An unrecognized host degrades to a silent skip — never a wrong peer.
  • Non-blocking — any gap (no peer, unauthed, timeout, unparseable output) skips the pass; the review never fails on it.
  • Bounded, orphan-safe peer — the peer runs under gtimeout/timeout (kills the whole process tree). Codex streams its reasoning, so it gets an idle-timeout (killed only when output stalls, letting a productive long run continue) with a hard backstop; Claude's single-shot output gets the hard cap.
  • Genuinely read-only — Codex is sandboxed; Claude (no sandbox) denies Bash outright so it can't mutate even under a broad user Bash(*) allow-rule, and reviews an embedded diff.
  • Robust I/O — the diff goes into the peer via stdin (no ARG_MAX ceiling on large diffs); codex's findings come back via -o (written by the CLI outside the sandbox, so it works under -s read-only) with a stdout-recovery fallback if that file ever doesn't materialize. Every peer return is validated against the full Stage 5 contract (reviewer + findings/residual_risks/testing_gaps arrays) and its reviewer name normalized to adversarial-<peer> before fold-in.
  • Launched in parallel with the persona reviewers (a background shell-out, not a subagent), so its runtime overlaps theirs.

Presentation rework

Replaced Stage 6's rigid "use this exact pipe table or it's an instant fail" with action-shaped direction: make four things clear per finding (what/why/response/confidence), group by unit-of-work or decision (decisions a human must make vs mechanical work), spend words only on what the diff can't show, and keep a self-sufficient closing because in long terminal output the bottom is the most-read screen. The team announce drops internal [session model]/[mid-tier] plumbing (model-tier selection is preserved as an internal correctness invariant) and surfaces the cross-model peer prominently. Hard constraints kept: ASCII-only / no box-drawing, stable numbering, verdict + actionable list last.

Authoring guidance (AGENTS.md)

Recast the bundled-script path-resolution guidance into a three-tier model — read-time references (relative, no anchor), prose pointers the agent acts on (relative + a "from this skill's directory" cue), and executed shell (the SKILL_DIR anchor as the conservative default; bare relative is agent-resolved and often works but isn't deterministic). Reframed ${CLAUDE_SKILL_DIR} as a footgun in this cross-agent plugin (it silently misses on every non-Claude install).

Validation

  • Cross-host, end-to-end: all three directions exercised on a live diff, read-only, zero edits to the target — Claude→Codex, Codex→Claude, Cursor→Codex.
  • Path resolution settled empirically across Claude Code, Codex, and Cursor (project CWD on all three; the SKILL_DIR anchor runs the bundled script on all three).
  • skill-creator eval: a fresh agent ran the full skill from disk on a planted-bug repo — set SKILL_DIR, launched the script in the background, codex ran read-only, findings folded into Stage 5, and the cross-user export IDOR was promoted to P0 via in-process adversarial + adversarial-codex agreement.
  • Codex PR-review feedback resolved across rounds: peer-process bounding + idle-timeout, reviewer-name normalization, Claude read-only hardening, full Stage 5 contract validation, claude-via-stdin (ARG_MAX), codex stdout-recovery, schema/auth.
  • bun test 1613 / 0; release:validate in sync.

Behavior is skill prose plus a bundled script, not a runnable UI — validation is the cross-host runs, the eval, and the test suite above.


Compound Engineering
Claude Code

Run the adversarial review through a different model family than the host, in a
separate process, for genuine independence the in-process subagent cannot
provide. Self-identifies the host at runtime (Claude, Codex, or Cursor) and
shells out to the peer CLI read-only -- Codex when host is Claude/Cursor, Claude
when host is Codex. The peer returns the same findings-schema.json shape and
folds into Stage 5 as reviewer `adversarial-<peer>`, so agreement with the
in-process adversarial persona promotes the finding (different model families,
separate processes -- the strongest signal in the set).

Gated on adversarial-reviewer selection plus local-aligned/standalone scope, and
non-blocking: any host-detect, auth, timeout, or parse gap degrades to a silent
skip and never fails the review.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d59e24c9e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/references/cross-model-review.md Outdated
tmchow added 3 commits June 25, 2026 18:19
Codex exposes different session markers per surface: the CLI sets
CODEX_SANDBOX/CODEX_SESSION_ID, but web/API/CI runs set only
CODEX_THREAD_ID/CODEX_CI. Host detection checked only the former, so on
those surfaces XHOST fell through to `unknown` and the cross-model pass
silently skipped even with claude installed and authed. Check the union.

Addresses Codex review feedback on #1007.
No env var Codex sets is guaranteed across every surface, and
shell_environment_policy / cloud / IDE inheritance can strip markers from
subprocesses -- so the env-marker tier (now Tier A) will always be
incomplete and a miss left the cross-model pass silently skipping. Add
Tier B: when Tier A yields `unknown`, walk the process ancestry (basename
only, via `ps -o comm=`) for codex/claude/cursor-agent before giving up.
The launching CLI sits in the process tree even when its env is stripped.
Verified the walk resolves the host on macOS. A miss still degrades to a
silent skip, never a wrong peer.
The team announce leaked internal plumbing to the user (model-tier labels,
scope-mode codenames, disk-staging and dispatch narration) and buried the
cross-model pass in a generic unnamed footer. Add a "report outcomes, not
machinery" operating principle, derive each reviewer's model tier from the
fixed persona rule at dispatch (so the tier no longer has to be printed in
the announce), and require the cross-model pass to surface as its own
prominent line that names the peer model. Guidance is principle-based with
falsifiable requirements, not scripted wording.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ebc73660c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/references/cross-model-review.md Outdated
Comment thread skills/ce-code-review/references/cross-model-review.md Outdated
…l pass

Presentation: replace Stage 6's rigid "use this exact pipe table or it's an
instant fail" with principle-based direction that shapes output to the
reader's next action — per-finding clarity (what/why/response/confidence),
shape serves the finding type, group by unit of work/decision (apply-queue
vs decision-gate), economy of expression not coverage (don't paste files),
and a self-sufficient closing because in long output the bottom is the
most-read screen. Keep only the hard constraints (ASCII / no box-drawing /
stable numbering / verdict last). Template reframed to a canonical section
skeleton with one example rendering; contract test retargeted from the
table shape to the direction + hard constraints.

Cross-model pass: launch the peer in parallel (background) with the persona
reviewers instead of after them, and drop --output-schema on the Codex path
(its strict mode rejects the permissive findings-schema, wasting a call) —
rely on -o capture + the JSON-demanding prompt + Stage 5 malformed-drop.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bef32fdfa6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/references/cross-model-review.md Outdated
tmchow added 2 commits June 25, 2026 22:55
…ness invariant

The earlier presentation cleanup removed the [session model]/[mid-tier]
labels from the user-facing announce, but those labels did two jobs:
visibility AND serving as the external-memory record that the right model
gets applied at dispatch (on a top-tier session, a missed override silently
runs a reviewer at the expensive tier). Removing the display is correct;
losing the discipline is not. Reinstate the tier decision as an explicit
internal working list applied on every dispatch call, framed as a
correctness guarantee with its rationale — display removed, discipline kept.
…SKILL_DIR anchor

Empirical 3-host test (Claude Code, Codex, Cursor) showed all three run a
skill's bundled-script command with pwd = the project, not the skill dir, so
bare-relative fails everywhere (not just Claude), and no *_SKILL_DIR env var
exists on any host. The portable, verified pattern is the model-filled
SKILL_DIR anchor (set SKILL_DIR to the dir the agent read SKILL.md from, run
"$SKILL_DIR/scripts/..."), as used in production by last30days. Reframe
${CLAUDE_SKILL_DIR} as a Claude-only alternative and replace the "avoid
scripts for portability" guidance with "anchor them instead."

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fcffb66e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/references/cross-model-review.md Outdated
…ored script

The cross-model adversarial pass's invocation (prompt composition, read-only
flags, output capture, timeouts) moves from inline reference prose into
scripts/cross-model-adversarial-review.sh, invoked via the SKILL_DIR-from-read-
path anchor (portable across Claude/Codex/Cursor, empirically verified; bare
relative and ${CLAUDE_SKILL_DIR} both fail or are non-portable). The script
reads the canonical adversarial-reviewer persona + findings-schema itself
(single source, no duplication, no prompt passed by the orchestrator), adds a
per-peer timeout (idle for streaming codex, hard cap for single-shot claude),
self-locates via BASH_SOURCE, and is non-blocking. The reference shrinks to:
detect host -> gate -> announce -> run script -> fold in. e2e-verified: the
script ran codex read-only on a real diff and produced 3 schema-shaped findings.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc4bfe4ee8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/scripts/cross-model-adversarial-review.sh Outdated
…sallowed-tools

Addresses Codex review feedback on #1007:
- Bound the peer process itself, not just the wrapper shell: run codex/claude
  in the foreground under gtimeout/timeout (kills the whole process tree on
  expiry), replacing the hand-rolled background+kill that leaked orphans on
  macOS (no setsid) and could write adversarial-<peer>.json after Stage 5 had
  already skipped it. Hard cap 300s via CROSS_MODEL_HARD_SECS.
- Pass --disallowedTools as separate variadic args (Edit Write NotebookEdit)
  rather than one quoted string, which is ambiguous since tool names can
  contain spaces.

(Two other threads — Claude API-key auth and the Codex findings schema — were
already resolved by the earlier script refactor: there is no claude-auth-status
preflight anymore, and the full schema is embedded in the codex prompt, verified
to produce complete schema-shaped findings.)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a58591463

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/scripts/cross-model-adversarial-review.sh Outdated
The previous fix for the orphan-process bug over-corrected to a plain wall-clock
cap, dropping the idle-timeout the cross-model "second opinion" pattern calls for.
Restore it correctly: codex exec streams its reasoning, so watch that stream and
kill only when it stalls for CROSS_MODEL_IDLE_SECS (default 120s) — a productive
long run continues — with a hard backstop CROSS_MODEL_HARD_SECS (default 600s).
Orphan-safety is kept because the peer runs under gtimeout/timeout, which forwards
an external kill to the whole process tree (verified on this host), so the idle
watchdog signals gtimeout rather than an un-propagated subshell. Claude stays
hard-cap only (its --output-format json output is single-shot, nothing to watch).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cdc3ceda56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/scripts/cross-model-adversarial-review.sh Outdated
tmchow added 3 commits June 26, 2026 01:19
…de peer to read-only

Addresses two Codex review threads on #1007:
- Force reviewer = adversarial-<peer> after capture. The persona's example JSON
  uses reviewer:"adversarial"; if the peer echoed that, Stage 5 would fold it as
  the in-process adversarial reviewer and lose the cross-model agreement signal.
- Make the Claude peer genuinely read-only: deny Bash entirely (deny overrides a
  user's broad Bash(*) allow-rule; dontAsk + edit-tool denial alone did not) and
  embed the diff in the prompt so Claude needs no shell. Codex is unchanged --
  it fetches its own diff inside the -s read-only hard sandbox.
- Bare-relative fails LOUDLY (exit 127) with nondeterministic model recovery,
  not a silent skip; frame the SKILL_DIR anchor's value as determinism.
- The genuine silent skip is the guarded ${CLAUDE_SKILL_DIR} form off-Claude
  (unset var -> then-branch never fires); fold the native-Codex/converter
  rationale into that narrower point and make the anchor the stated default.
- Note shell state doesn't persist across Bash-tool calls, so SKILL_DIR must be
  set inline per invocation, and a script needing its own dir uses BASH_SOURCE
  (ref impl: cross-model-adversarial-review.sh).
- Permission caveat: the anchor's dynamic absolute path won't match a static
  Bash pin, so expect a one-time prompt per command; the pin trick is for the
  fixed ${CLAUDE_SKILL_DIR} form.
…agent plugin

Every skill here is authored once and installed across Claude/Codex/Cursor/
Gemini, so ${CLAUDE_SKILL_DIR} (a Claude-only content substitution, empty
elsewhere) silently misses on every non-Claude install. Reframe it from a
neutral alternative to "avoid": use the model-filled SKILL_DIR anchor for all
bundled-script invocations; reach for ${CLAUDE_SKILL_DIR} only for genuinely
Claude-only behavior that never runs elsewhere (essentially never here), and
treat any new use as a smell. Existing guarded uses (ce-compound) should migrate.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77dda0247c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/scripts/cross-model-adversarial-review.sh Outdated
…pt via stdin

Addresses two Codex review threads on #1007:
- Validate the peer return against the full Stage 5 top-level contract before
  fold-in (reviewer string + findings/residual_risks/testing_gaps arrays), and
  backfill the two soft arrays / drop non-array findings. Previously only
  `.findings` was checked, so a return Stage 5 would later drop could pass here.
- Pass the claude peer prompt via stdin instead of a `$(cat …)` argv. Now that
  the diff is embedded in the prompt, a large diff could exceed ARG_MAX and make
  claude fail silently on exactly the big reviews that need it. (claude -p reads
  the prompt from stdin; verified.)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5e6a57385

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/SKILL.md Outdated
tmchow added 2 commits June 26, 2026 01:52
…ction skeleton

Addresses Codex review on #1007: the hard constraint "Verdict and Actionable
list are present, last" appeared to contradict the Stage 6 skeleton, which
places the Actionable Findings section at position 5 (before Coverage/Verdict).
Clarify that the constraint is satisfied by the closing, not the skeleton: the
Verdict is the final report section, immediately followed by the post-report
Actionable recap ("Emit actionable findings summary"), which is the self-
sufficient last word; the in-report Actionable Findings section keeps position 5
as the detailed table.
… model

Independent research surfaced that "bare relative does not work / fails loudly"
overstated it: relative paths in executed bash are agent-resolved (the agent
translates them against the skill dir it loaded), so they often work — they're
just non-deterministic. Recast the section into three tiers:
- Tier 1 read-time refs: relative, no anchor (unchanged).
- Tier 2 prose pointers the agent acts on: relative + a "from this skill's
  directory" cue.
- Tier 3 executed shell: the SKILL_DIR anchor as the conservative house default —
  bare relative can work via agent resolution but isn't deterministic; the
  failure mode is a fenced block copied verbatim into a Bash call.
Scope the CLAUDE_SKILL_DIR replacement to tier-3 executed-shell (was "all
invocations"); keep the footgun reasoning, the two SKILL_DIR constraints, and the
permission caveat. Simplify the example to the minimal inline form.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b25034e62

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/scripts/cross-model-adversarial-review.sh
Comment thread skills/ce-code-review/scripts/cross-model-adversarial-review.sh Outdated
… unavailable

codex exec -o writes the artifact at the CLI level, outside the model's sandbox,
so it succeeds under -s read-only (verified). But to be robust to any sandbox/
config where it doesn't materialize, fall back to recovering the same JSON from
codex's captured stdout (which we already collect for the idle-monitor) via a
balanced-brace scan for the last object with a findings key. No data lost on the
read-only path. Also capture stdout in the no-(g)timeout branch so the fallback
works there too.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc5efa4a46

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/scripts/cross-model-adversarial-review.sh Outdated
…liveness

Addresses Codex review thread "Kill the Codex peer, not only the timeout wrapper":
killing the (g)timeout wrapper on idle wasn't guaranteed to reap codex (-k only
escalates on gtimeout's own expiry, not an external kill, so a peer deferring
SIGTERM could survive wait and write $OUT after a skip). Codex now runs in its own
process group (set -m); the watchdog owns both the idle-timeout and the hard cap
and reaps the whole group (TERM then KILL after a grace). A second-opinion codex
review then found that the grace loop checked only the leader pid — a leader that
exits while a child defers TERM would skip the group KILL — so the loop now checks
group liveness (kill -0 on the negative pgid). Verified against that exact repro
(leader exits on TERM, child ignores it -> group reaped, 0 survivors). claude keeps
the gtimeout wrapper (single-shot; its own -k escalates correctly).
…g can't misfire

Addresses Codex review thread "Avoid treating hidden Codex reasoning as idle": under
a user config with hide_agent_reasoning = true, codex exec can stop writing to the
stream during a long reasoning phase, so the idle watchdog could reap a still-working
peer on big diffs. Force reasoning on for our subprocess (-c hide_agent_reasoning=false)
so the stream stays a reliable liveness signal regardless of the user's global config,
and raise the idle default 120s -> 180s for extra headroom. The hard cap (600s) remains
the backstop.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef3a0a1b99

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/ce-code-review/scripts/cross-model-adversarial-review.sh Outdated
…de peer

Addresses Codex review thread "Deny MCP tools for the Claude peer": under
--permission-mode dontAsk, a user's pre-approved MCP tools (Linear/Slack/deploy
writers) would run without prompting, so the peer wasn't actually read-only despite
the deny list covering only built-in edit/shell tools. Add `mcp__*` (the documented
way to remove MCP tools) and `Task` (a spawned subagent would bypass the deny list).
Also drop `MultiEdit`, which `claude` reports as "matches no known tool" (a no-op
that only emitted a warning). Verified the new list parses with no warnings.
@tmchow tmchow merged commit fdfad38 into main Jun 26, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Jun 26, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dcc6933185

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# read-only even with MCP servers configured. claude can't write a file under those
# perms, so it emits the JSON envelope on stdout (captured to PEERLOG); we extract it.
if [ -n "$TO_BIN" ]; then
"$TO_BIN" -k 10 "$HARD_SECS" claude -p --model opus --permission-mode dontAsk \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Disable Claude hooks for read-only peer

When this runs on a Codex host with user or project Claude hooks configured, this claude -p invocation still loads those hooks; --disallowedTools only removes model-visible tools. I checked the Claude CLI reference, which says --bare skips hooks/MCP/CLAUDE.md, and the hooks reference, which says hooks execute automatically at lifecycle points, so SessionStart/UserPromptSubmit/Stop hook shell or HTTP handlers can still mutate the checkout or trigger side effects despite this pass being advertised as read-only. Add --bare/safe-mode or otherwise disable hook loading for the peer invocation.

Useful? React with 👍 / 👎.

tmchow added a commit that referenced this pull request Jun 26, 2026
…anded

#1007 merged the three-tier AGENTS.md rewrite to main, so the doc no
longer supersedes AGENTS.md — it is the rationale behind it. Replace the
'Repo migration status' note (which assumed the old guard guidance) with
'Where this is codified', and restore the AGENTS.md reference to state
the two agree. Makes the Codex P2 on #1012 moot.
@jarodtaylor

Copy link
Copy Markdown

🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants