fix(codex): re-land approval, sandbox, multi-agentMessage fixes by dcellison · Pull Request #491 · dcellison/kai

dcellison · 2026-05-15T22:14:44Z

Supersedes #488. That PR was filed against an older main and never merged; the codex wizard hardening PR landed in the interim and touched the __init__ block #488 was based on, so this PR re-applies the three fixes on a fresh branch off current main rather than rebasing the original. The original branch is left intact for history.

What this does

Three fixes from the original smoke-test session, folded into one commit:

approvalPolicy: "never" at thread/start. The codex default is on-request, which makes the server emit approval-request notifications and wait for the client to respond before any tool call. The bot has no in-loop approver, so on-request gates silently: codex waits forever, the readline ceiling fires, the operator sees "Codex timed out". This is the user-visible symptom that triggered this whole investigation.
sandbox: "danger-full-access" at thread/start. read-only blocks file writes; workspace-write disables network egress by default (the gh access is blocked by the sandbox symptom). danger-full-access matches the claude backend's posture: claude --print runs unsandboxed with the bot's os_user's permissions, and codex now has the same authority profile. Sandbox variants are kebab-case; the server rejects camelCase at thread/start with "unknown variant".
Per-item agentMessage tracking in the streaming parser. A single turn can emit multiple agentMessage items (preamble, tool call, post-tool summary). The previous parser concatenated deltas across items without a separator and let item N's item/completed.text override the entire accumulated string, erasing prior committed items. Now: committed_text holds completed items joined with \n\n; current_item_text holds the in-flight item's delta sum; current_item_id correlates which item the deltas belong to. item/completed authoritatively sets the CURRENT item's text only, commits it to the prefix, and resets.

Test plan

make test (3136 passed, 1 skipped on this branch; 2 new tests in TestStreamParsing lock the multi-item join with a blank-line separator and the per-item override scope; all 63 existing codex tests still pass without modification)
make check (ruff check + ruff format)
Pre-push grep gate clean
End-to-end smoke on production after merge: reinstall, send a Telegram message that requires gh/curl (currently blocked by the sandbox default), confirm the response arrives without "Codex timed out" or GitHub access is blocked by the sandbox.

What happens to #488

After this PR merges and the production smoke passes, close #488 with a comment pointing at this PR. Until then it stays open as the historical record.

…-agentMessage tracking Restores the three changes from the unmerged fix/codex-approval-policy branch, freshly applied on top of current main so the workspace_config block (touched by the codex wizard hardening PR that merged after the old branch was filed) keeps its current shape. Three fixes folded into one commit: 1. approvalPolicy=never. Codex's default approvalPolicy is "on-request", which makes the server emit approval-request notifications and wait for the client to respond before any tool call. The bot has no human in the loop to approve from Telegram, so on-request gates silently: codex waits forever, the bot's readline ceiling fires, the operator sees "Codex timed out". 2. sandbox=danger-full-access. The default readOnly sandbox prevents codex from writing files. workspace-write disables network egress by default, breaking gh, curl, pip, and anything else codex needs to reach outside the local filesystem. The claude backend runs unsandboxed under the bot's os_user; codex should have the same authority profile, no more, no less. Sandbox variants are kebab-case; the server rejects camelCase spellings at thread/start with "unknown variant". 3. Multi-agentMessage tracking. A single turn can emit MORE than one agentMessage item (e.g. preamble, tool call, post-tool summary). The previous parser concatenated deltas across items without a separator AND let item N's item/completed text field override the entire accumulated string, erasing prior committed items. The Telegram message displayed only the last item's text. Rewrite tracks per-item: committed_text holds completed items joined with blank-line separators; current_item_text holds the in-flight item's delta sum; current_item_id correlates which item the deltas belong to. item/completed authoritatively sets the CURRENT item's text only, commits it to the prefix, and resets. Defensive schema-drift handling: a delta arriving without a prior item/started treats the new itemId as opening a new item; turn/completed flushes any uncommitted current_item_text into the prefix before returning. Two new tests lock the multi-item join with a blank-line separator and the per-item override scope.

Folded into the sandbox+approval+parser PR because it's the same "codex backend works under real load" surface and the same restart cycle. A single codex event can carry the full text of a tool call's output inline (e.g. a `gh pr diff` result on a reasoning item, or an item/completed for a long agentMessage). The asyncio.StreamReader default limit is 64KB; we previously bumped it to 1MB. Real codex PR- review turns observed in production still exceed 1MB and surface "Separator is not found, and chunk exceed the limit" from readline. Bumping to 16MB covers any plausible single-event payload while still bounding memory if codex ever produces a runaway line. A proper streaming reader (chunked + reassemble on newline) would remove the ceiling entirely; deferred until the 16MB ceiling is itself observed in practice. Regression test asserts the spawn-time limit is at least 16MB so a future shrinkback gets caught at test time, not on a live operator turn.

Addresses the non-blocking review warning on PR #491: the parser fix and 16MB stdout limit both have regression tests, but the two fields that unblock production - approvalPolicy=never and sandbox=danger-full-access - were only asserted by the operator manually re-installing and re-pinging. Lock them in the existing handshake test so a future regression that drops or misspells either field gets caught at test time, not on the next live turn.

zigguratt added 3 commits May 15, 2026 18:12

dcellison merged commit cab7ba9 into main May 16, 2026
1 check passed

dcellison deleted the fix/codex-sandbox-and-approval branch May 16, 2026 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(codex): re-land approval, sandbox, multi-agentMessage fixes#491

fix(codex): re-land approval, sandbox, multi-agentMessage fixes#491
dcellison merged 3 commits into
mainfrom
fix/codex-sandbox-and-approval

dcellison commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dcellison commented May 15, 2026

What this does

Test plan

What happens to #488

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants