fix(codex): rewrite to real app-server thread/turn/item protocol#485
Merged
Conversation
The codex backend and triage path were modeled on goose's ACP
session/* + agent_message_chunk vocabulary, but the actual codex
app-server protocol uses a distinct thread/turn/item vocabulary.
The persistent backend's handshake failed with `unknown variant
session/new` (codex enumerated 50+ valid method names instead),
and the triage path's NDJSON parser looked for fields the codex
exec --json schema never emits. Per the operator's go-ahead, this
PR rewrites both surfaces against the schema in
codex-rs/app-server/README.md and codex-rs/exec/src/exec_events.rs
in the openai/codex repo.
App-server (codex.py):
- Handshake is now initialize -> initialized notification -> thread/start.
protocolVersion field dropped from initialize (not in the schema).
initialize.capabilities.optOutNotificationMethods suppresses
remoteControl/status/changed, mcpServer/startupStatus/updated,
thread/started, thread/tokenUsage/updated.
- _session_id holds the codex thread.id for ABC consistency.
- send loop uses turn/start with input array of {type:"text",text}
blocks, optional model override per-turn.
- Event parser: item/started, item/agentMessage/delta (concatenate
delta per itemId), item/completed (authoritative for agentMessage),
turn/completed (terminal), error notification (terminal).
- _write_notification helper for the initialized step.
codex exec --json (triage.py):
- _extract_codex_text now walks events keyed on top-level type
discriminator (thread.started / turn.started / item.started /
item.updated / item.completed / turn.completed / turn.failed /
error). For item.completed of an agent_message item, returns
item.text. Falls back to latest item.updated if no completed
arrives. turn.failed short-circuits to empty string.
- _recover_terminal_text / _recover_chunk_text removed; the new
_recover_agent_message_text replaces them with the right shape.
Tests updated to match the real schemas; full suite 3084 pass.
Smoke-test discovery: sudo from the kai service user uses kai's PATH to resolve bare command names, and on multi-user installs codex typically lives in a per-os_user home (e.g. /Users/daniel/.npm-global/bin) that is NOT on the service user's PATH. The bot's bare `codex` spawn then fails with "a password is required" because sudo cannot find a binary to match the sudoers rule against. - src/kai/codex.py: argv[0] now reads from CODEX_BIN env var, falling back to bare "codex" for single-user installs where it's on PATH. - src/kai/triage.py: same lever for the codex exec --json triage path. - src/kai/install.py: persist CODEX_BIN to /etc/kai/env when set at install time, so the running bot picks up the same absolute path the sudoers rule names. - Tests lock both branches: bare "codex" when env unset, full path when set.
Previous commit wrote CODEX_BIN inside _cmd_config (the wizard), so operators running 'sudo CODEX_BIN=... kai install apply' bypassed it - the wizard had never seen the var. Apply now reads CODEX_BIN from the environment after loading install.conf and injects it into the env dict before _apply_secrets writes /etc/kai/env. Apply-time env wins over any stale install.conf value so the operator's explicit override is honored.
5 tasks
This was referenced May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The codex backend (#482) and codex triage branch (#483) were modeled on goose's ACP
session/*+agent_message_chunkvocabulary. The smoke test that started with PR #484 surfaced that codex app-server actually speaks a different protocol (thread/*+turn/*+item/*) andcodex exec --jsonemits a third schema again ({"type":"item.completed","item":{"type":"agent_message","text":"..."}}). The persistent backend's handshake failed immediately withunknown variant session/new(the server enumerated 50+ valid method names), and the triage NDJSON parser looked for fields the real schema never emits. Smoke test also revealed that sudo from the kai service user can't resolve barecodexwhen the binary lives in a per-os_user npm-global home not on the service PATH.This PR rewrites both surfaces against the actual schema documented in
codex-rs/app-server/README.mdandcodex-rs/exec/src/exec_events.rsin the openai/codex repo, and adds aCODEX_BINlever so the bot invokes codex by absolute path on multi-user installs.Commits
9dac019Protocol rewrite.codex.pyhandshake is nowinitialize -> initialized -> thread/start(noprotocolVersion, opt-out list suppresses noisy notifications)._send_lockedusesturn/startwithinput: [{type:"text",text}]. Event parser handlesitem/started,item/agentMessage/delta(concatenatedeltaper itemId),item/completed(authoritative),turn/completed(terminal),error(terminal).triage.py_extract_codex_textwalks events keyed on top-leveltypediscriminator and returnsitem.textforagent_messageitems.c4ca1f9RuntimeCODEX_BIN.codex.pyandtriage.pyargv[0] reads fromCODEX_BINenv var, falling back to barecodexfor single-user installs where it's on PATH.install.pywritesCODEX_BINto/etc/kai/envwhen set during the wizard.177d813Apply-timeCODEX_BIN. Operators runningsudo CODEX_BIN=... kai install applybypass the wizard; the apply path now reads the env var directly and injects it into the env dict before_apply_secretswrites/etc/kai/env.Tests
tests/test_codex.py: helpers rebuilt against the real schema; new assertions on theoptOutNotificationMethodslist; new test locksCODEX_BINprecedence in the argv.tests/test_triage.py:TestExtractCodexTextrewritten against the real type-discriminated event schema; new test locksCODEX_BINprecedence in the codex exec argv.tests/test_install.py: existing sudoers tests still cover the codex SETENV rule.Smoke test (validated on Mac mini, 2026-05-15)
pytestgreenmake checkcleankaitodanielreaches codexinitialize->initialized->thread/startpsconfirmscodex app-serverisdaniel-owned at runtimeStarting persistent Codex app-server process (model=gpt-5.4-mini, user=daniel)in the logCloses the protocol-mismatch finding from PR #484's smoke test. Refs #480. Follow-up issue tracks wizard-side hardening that surfaced during the smoke test.