feat(triage): codex branch for one-shot issue triage#483
Merged
Conversation
Phase 3 of the codex backend epic (#480). Adds the codex branch to `run_triage` so GitHub webhook-driven issue triage runs through codex when AGENT_BACKEND=codex. src/kai/triage.py: - New `if agent_backend == "codex":` branch invoking `codex exec --json --model <model>` with the model resolved via get_model_for(ModelRole.ISSUE_TRIAGE, "codex", override=...). The override env var is ISSUE_TRIAGE_MODEL_CODEX, read at the call site to match the existing claude/goose pattern. - New `_extract_codex_text` and `_recover_text_from_event` helpers that walk codex's NDJSON event stream and recover the final agent message text. Defensive about field names: tries top-level "text", "delta.text", "content" as string or list-of-blocks, and "item.content" with the list-of-blocks shape. The actual codex CLI schema is not yet pinned by smoke test, so the parser accepts multiple field paths; whichever fires on the pinned version wins, the others stay as fallbacks for schema drift. - No sudo wrap (codex on subscription auth uses the service user's own ~/.codex/auth.json); no --max-budget-usd (subscription auth has no per-call billing, runaway protection comes from _TRIAGE_TIMEOUT). tests/test_triage.py: - TestRunTriageCodex (7 tests): argv asserts "codex" + "exec" + --json (not claude / sudo / --max-budget-usd); registry model selection; ISSUE_TRIAGE_MODEL_CODEX env override; claude_user ignored on codex (no sudo); subprocess failure raises with stderr; NDJSON extraction returns final text not raw stdout. - TestExtractCodexText (10 tests): each defensive field path locked independently (top-level text, delta.text, content as string, content as list-of-blocks, item.content, unknown events skipped, multi-event accumulation, whitespace stripped, non-JSON lines tolerated, empty input returns empty string). Refs #480
…rser
PR review M1: _extract_codex_text accumulated text from every
recognized field path across every event. On the standard streaming
shape that codex emits (delta chunks followed by a terminal
consolidated message), this returned the message twice concatenated.
For triage where the agent message is a JSON object, the doubled
output is `{"labels":[]}{"labels":[]}`-shaped and _parse_triage_json
cannot parse it, breaking the triage path on every codex run that
streams.
Fix: split recovery into terminal and chunk paths.
- _recover_terminal_text recognizes complete-message field paths
(top-level "content" as string or list-of-blocks, "item.content"
as list-of-blocks). When any terminal event appears in the stream,
the most recent one wins; delta chunks are discarded.
- _recover_chunk_text recognizes streaming-delta paths ("delta.text",
top-level "text" as a string). Chunks accumulate only when no
terminal event was emitted in the stream.
The existing 10 TestExtractCodexText cases continue to pass under
the new design (the single-path streams in those tests are
classified consistently). Three new regression tests:
- test_terminal_text_wins_over_accumulated_deltas: worked example
from the review. Asserts deltas + terminal returns the terminal
text once.
- test_last_terminal_wins_on_multiple_terminals: multiple terminal
events resolve to the most recent.
- test_deltas_only_when_no_terminal: no-terminal stream accumulates
deltas as before. Locks the fallback the new logic must not
short-circuit.
Plus an integration test in TestRunTriageCodex:
- test_codex_handles_streaming_deltas_plus_terminal: end-to-end
through run_triage on a delta+terminal stream returns the terminal
JSON exactly once, parseable downstream.
M2 (unverified stdin contract for `codex exec`) and L1 (smoke
checklist needs explicit `codex exec` triage path) folded into the
PR body's smoke checklist rather than code changes; both are
post-merge verification items.
This was referenced May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 of the codex backend epic. Adds the codex branch to
run_triageso GitHub webhook-driven issue triage runs through codex on a codex-active install:run_triagecodex branch invokescodex exec --json --model <model>with the model fromget_model_for(ModelRole.ISSUE_TRIAGE, "codex", override=os.environ.get("ISSUE_TRIAGE_MODEL_CODEX", "")). No sudo wrap (codex uses the service user's own~/.codex/auth.json); no--max-budget-usd(subscription auth has no per-call billing;_TRIAGE_TIMEOUTis the runaway gate)._extract_codex_text/_recover_terminal_text/_recover_chunk_texthelpers parse codex's NDJSON event stream with a terminal-wins discipline: complete-message events (top-levelcontentas string or list-of-blocks;item.contentlist-of-blocks) win over streaming chunks (delta.textor top-leveltext). Without the discipline, a stream containing both deltas AND a terminal consolidated event would yield the message twice concatenated.Phases 1 and 2 (registry +
CodexBackend) are prerequisites and are merged.Refs #480.
Smoke checklist (post-merge, requires
codexCLI installed)Phase 2's smoke gate validates the
codex app-serverJSON-RPC conversational path. This PR introduces a SECOND codex surface (codex exec --jsonfor one-shot triage) with a different event vocabulary and a different stdin/argv contract. The Phase 2 smoke test does NOT validate this surface; the items below need explicit verification:codex exec. The branch passes the prompt viaproc.communicate(input=prompt.encode())with no positional prompt argument and no-argv marker. Goose usesgoose run -i -, claude uses--print; codex has no equivalent explicit affordance verified yet. Ifcodex exec --json --model <m>requires the prompt as a positional argv (or some other marker), the branch needs the prompt appended tocmdrather than written to stdin. Verify by triggering a real GitHub webhook against a codex-active install and inspecting the resulting triage subprocess.codex exec --jsonevent vocabulary. The parser tolerates several field-path variants (terminal:contentstring/list,item.contentlist; chunk:delta.text, top-leveltext). The Phase 2 smoke test exercisessession/update+agent_message_chunk(the conversational ACP-shaped envelope), which is a DIFFERENT vocabulary. Capture a real NDJSON stream fromcodex exec --jsonagainst a known prompt, store it as a fixture undertests/fixtures/codex/, and reshape_recover_terminal_text/_recover_chunk_textif the pinned schema lands outside the recognized paths._parse_triage_json.Test plan (pre-merge)
make check(ruff + format)TestRunTriageCodex(8 tests) covering argv asserts ("codex"+"exec"+--json, never claude / sudo /--max-budget-usd), registry model selection,ISSUE_TRIAGE_MODEL_CODEXenv override, no-sudo even withclaude_userpassed, subprocess failure handling, final-text extraction from NDJSON, AND end-to-end throughrun_triageon a deltas+terminal streamTestExtractCodexText(13 tests) — 10 single-path tests plus 3 regression guards for the terminal-wins discipline (deltas+terminal, last-terminal-wins, deltas-only fallback)Out of scope
review.pyandeval/behavioral.py. Phases 4 and 5 of the epic.