fix(codex): re-land approval, sandbox, multi-agentMessage fixes#491
Merged
Conversation
…-agentMessage tracking Restores the three changes from the unmerged fix/codex-approval-policy branch, freshly applied on top of current main so the workspace_config block (touched by the codex wizard hardening PR that merged after the old branch was filed) keeps its current shape. Three fixes folded into one commit: 1. approvalPolicy=never. Codex's default approvalPolicy is "on-request", which makes the server emit approval-request notifications and wait for the client to respond before any tool call. The bot has no human in the loop to approve from Telegram, so on-request gates silently: codex waits forever, the bot's readline ceiling fires, the operator sees "Codex timed out". 2. sandbox=danger-full-access. The default readOnly sandbox prevents codex from writing files. workspace-write disables network egress by default, breaking gh, curl, pip, and anything else codex needs to reach outside the local filesystem. The claude backend runs unsandboxed under the bot's os_user; codex should have the same authority profile, no more, no less. Sandbox variants are kebab-case; the server rejects camelCase spellings at thread/start with "unknown variant". 3. Multi-agentMessage tracking. A single turn can emit MORE than one agentMessage item (e.g. preamble, tool call, post-tool summary). The previous parser concatenated deltas across items without a separator AND let item N's item/completed text field override the entire accumulated string, erasing prior committed items. The Telegram message displayed only the last item's text. Rewrite tracks per-item: committed_text holds completed items joined with blank-line separators; current_item_text holds the in-flight item's delta sum; current_item_id correlates which item the deltas belong to. item/completed authoritatively sets the CURRENT item's text only, commits it to the prefix, and resets. Defensive schema-drift handling: a delta arriving without a prior item/started treats the new itemId as opening a new item; turn/completed flushes any uncommitted current_item_text into the prefix before returning. Two new tests lock the multi-item join with a blank-line separator and the per-item override scope.
Folded into the sandbox+approval+parser PR because it's the same "codex backend works under real load" surface and the same restart cycle. A single codex event can carry the full text of a tool call's output inline (e.g. a `gh pr diff` result on a reasoning item, or an item/completed for a long agentMessage). The asyncio.StreamReader default limit is 64KB; we previously bumped it to 1MB. Real codex PR- review turns observed in production still exceed 1MB and surface "Separator is not found, and chunk exceed the limit" from readline. Bumping to 16MB covers any plausible single-event payload while still bounding memory if codex ever produces a runaway line. A proper streaming reader (chunked + reassemble on newline) would remove the ceiling entirely; deferred until the 16MB ceiling is itself observed in practice. Regression test asserts the spawn-time limit is at least 16MB so a future shrinkback gets caught at test time, not on a live operator turn.
Addresses the non-blocking review warning on PR #491: the parser fix and 16MB stdout limit both have regression tests, but the two fields that unblock production - approvalPolicy=never and sandbox=danger-full-access - were only asserted by the operator manually re-installing and re-pinging. Lock them in the existing handshake test so a future regression that drops or misspells either field gets caught at test time, not on the next live turn.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #488. That PR was filed against an older main and never merged; the codex wizard hardening PR landed in the interim and touched the
__init__block #488 was based on, so this PR re-applies the three fixes on a fresh branch off current main rather than rebasing the original. The original branch is left intact for history.What this does
Three fixes from the original smoke-test session, folded into one commit:
approvalPolicy: "never"at thread/start. The codex default ison-request, which makes the server emit approval-request notifications and wait for the client to respond before any tool call. The bot has no in-loop approver, so on-request gates silently: codex waits forever, the readline ceiling fires, the operator sees "Codex timed out". This is the user-visible symptom that triggered this whole investigation.sandbox: "danger-full-access"at thread/start.read-onlyblocks file writes;workspace-writedisables network egress by default (thegh access is blocked by the sandboxsymptom).danger-full-accessmatches the claude backend's posture: claude --print runs unsandboxed with the bot'sos_user's permissions, and codex now has the same authority profile. Sandbox variants are kebab-case; the server rejects camelCase at thread/start with "unknown variant".agentMessagetracking in the streaming parser. A single turn can emit multipleagentMessageitems (preamble, tool call, post-tool summary). The previous parser concatenated deltas across items without a separator and let item N'sitem/completed.textoverride the entire accumulated string, erasing prior committed items. Now:committed_textholds completed items joined with\n\n;current_item_textholds the in-flight item's delta sum;current_item_idcorrelates which item the deltas belong to.item/completedauthoritatively sets the CURRENT item's text only, commits it to the prefix, and resets.Test plan
make test(3136 passed, 1 skipped on this branch; 2 new tests inTestStreamParsinglock the multi-item join with a blank-line separator and the per-item override scope; all 63 existing codex tests still pass without modification)make check(ruff check + ruff format)gh/curl(currently blocked by the sandbox default), confirm the response arrives without "Codex timed out" orGitHub access is blocked by the sandbox.What happens to #488
After this PR merges and the production smoke passes, close #488 with a comment pointing at this PR. Until then it stays open as the historical record.