Add Claude Code and Codex harnesses by xeophon · Pull Request #1426 · PrimeIntellect-ai/verifiers

xeophon · 2026-05-20T17:28:45Z

Summary

stack on Add V1 harness type aliases #1425 to add packaged Claude Code and Codex command harnesses
register claude, claude-code, codex, and codex-cli aliases through their config classes
export the new harnesses, document their TOML names, and add construction/alias tests

Testing

uv run pytest tests/test_v1_config_extension.py tests/test_v1_harbor_cli.py tests/test_eval_cli.py -q
uv run pre-commit run --all-files
git diff --check harness-type-aliases...HEAD

Stacked on #1425.

Note

Medium Risk
Adds new packaged CLI harness implementations that generate/install/run command programs and wire MCP proxying, which can affect sandbox execution behavior and config validation for users selecting these harness types.

Overview
Adds two new bundled v1 command harnesses, ClaudeCode and Codex, including their typed configs and type/alias registration (e.g. claude/claude-code, codex/codex-cli) so TOML harness.type can select them.

ClaudeCode runs the Anthropic Claude Code CLI in non-interactive mode with MCP proxy config generation, log artifact collection, and configurable permission mode/turn limits; Codex similarly runs the OpenAI Codex CLI via a generated .codex/config.toml, supports sandbox mode and optional reasoning-effort tuning, and reads the Responses API key from rollout State (while explicitly rejecting max_turns overrides).

Exports are plumbed through verifiers.v1 and the root verifiers package, docs/examples are updated to reference the new harness names, and tests are extended to cover alias selection, re-exports, and program-building behavior for both harnesses.

^{Reviewed by Cursor Bugbot for commit e0970f6. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add ClaudeCode and Codex harnesses to the verifiers framework

Adds ClaudeCode (aliases: claude, claude-code) and Codex (aliases: codex, codex-cli) as new harness types, each selectable via harness.type in config.
ClaudeCode runs the Claude Code CLI in non-interactive mode, piping instructions with configurable permission_mode and max_turns, and writes logs to a configurable path.
Codex runs codex exec with configurable sandbox mode and reasoning effort; CODEX_API_KEY is populated dynamically from the active responses endpoint at runtime.
Both harnesses install their respective npm packages during setup and wire MCP integration to the verifiers proxy via stdio.
CodexConfig rejects any attempt to set max_turns with a validation error, as Codex does not support it.

^{Macroscope summarized e0970f6.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit b3feccf. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3feccf4fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-20T17:33:20Z

+EOFMCP
+
+cd "$CLAUDE_WORKDIR"
+claude -p "$(cat {shlex.quote(instruction_path)})" \


Avoid passing full Claude prompt as a CLI argument

This command inlines the entire instruction file into claude -p "$(cat ...)", which makes rollout success depend on OS argv limits (ARG_MAX). Large benchmark tasks or injected context can exceed that limit and fail with an argument-length error before the model runs, causing avoidable rollout failures. Feeding prompt content via stdin or a file-based option avoids this size ceiling.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-20T17:33:20Z

+  --sandbox {shlex.quote(codex_sandbox)} \
+  --model "$OPENAI_MODEL" \
+  --output-last-message {shlex.quote(final_path)} \
+  "$(cat {shlex.quote(prompt_path)})" > {shlex.quote(log_path)} 2>&1


Avoid passing full Codex prompt as a CLI argument

The script builds a prompt file and then expands it into a single argv value via "$(cat ... )" for codex exec. That can hit command-line length limits on larger tasks and fail the rollout with an argument-size error, even though the prompt file already exists on disk. Using stdin or a file-based prompt path keeps behavior stable for long inputs.

Useful? React with 👍 / 👎.

macroscopeapp · 2026-05-20T17:35:22Z

Approvability

Verdict: Needs human review

This PR introduces two new harness integrations (ClaudeCode and Codex) with new runtime behavior. Additionally, unresolved review comments flag potential issues with command-line argument length limits and missing automation flags that could cause runtime failures.

^{You can customize Macroscope's approvability policy. Learn more.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86dcf29b91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-20T17:41:26Z

+  --skip-git-repo-check \
+  --sandbox {shlex.quote(codex_sandbox)} \
+  --model "$OPENAI_MODEL" \
+  --output-last-message {shlex.quote(final_path)} \


Force non-interactive approval mode for codex exec

This harness launches codex exec in unattended eval mode but never sets an explicit automation approval mode (for example --full-auto), so runs can block on approval prompts and eventually timeout instead of completing. I checked the Codex Exec docs (“Approval Modes for Automation” and troubleshooting), which call out --full-auto for automated execution when tasks do not complete automatically; relying on implicit defaults here makes rollout behavior unstable across prompts/configs.

Useful? React with 👍 / 👎.

cursor Bot reviewed May 20, 2026

View reviewed changes

Comment thread verifiers/v1/packages/harnesses/codex.py

chatgpt-codex-connector Bot reviewed May 20, 2026

View reviewed changes

xeophon force-pushed the harness-type-aliases branch from b0256d7 to 163d2de Compare May 20, 2026 17:34

xeophon force-pushed the claude-codex-harnesses branch from b3feccf to 86dcf29 Compare May 20, 2026 17:35

chatgpt-codex-connector Bot reviewed May 20, 2026

View reviewed changes

xeophon force-pushed the harness-type-aliases branch from 163d2de to b3f0633 Compare May 20, 2026 17:55

xeophon force-pushed the claude-codex-harnesses branch from 86dcf29 to d431e08 Compare May 20, 2026 18:01

Add Claude Code and Codex harnesses

e0970f6

xeophon force-pushed the claude-codex-harnesses branch from d431e08 to e0970f6 Compare May 20, 2026 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Claude Code and Codex harnesses#1426

Add Claude Code and Codex harnesses#1426
xeophon wants to merge 1 commit into
harness-type-aliasesfrom
claude-codex-harnesses

xeophon commented May 20, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Uh oh!

macroscopeapp Bot commented May 20, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xeophon commented May 20, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Add ClaudeCode and Codex harnesses to the verifiers framework

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xeophon commented May 20, 2026 •

edited by cursor Bot

Loading

macroscopeapp Bot commented May 20, 2026 •

edited

Loading