Skip to content

feat(triage): codex branch for one-shot issue triage#483

Merged
dcellison merged 2 commits into
mainfrom
feature/codex-triage-branch
May 15, 2026
Merged

feat(triage): codex branch for one-shot issue triage#483
dcellison merged 2 commits into
mainfrom
feature/codex-triage-branch

Conversation

@dcellison
Copy link
Copy Markdown
Owner

@dcellison dcellison commented May 15, 2026

Summary

Phase 3 of the codex backend epic. Adds the codex branch to run_triage so GitHub webhook-driven issue triage runs through codex on a codex-active install:

  • run_triage codex branch invokes codex exec --json --model <model> with the model from get_model_for(ModelRole.ISSUE_TRIAGE, "codex", override=os.environ.get("ISSUE_TRIAGE_MODEL_CODEX", "")). No sudo wrap (codex uses the service user's own ~/.codex/auth.json); no --max-budget-usd (subscription auth has no per-call billing; _TRIAGE_TIMEOUT is the runaway gate).
  • _extract_codex_text / _recover_terminal_text / _recover_chunk_text helpers parse codex's NDJSON event stream with a terminal-wins discipline: complete-message events (top-level content as string or list-of-blocks; item.content list-of-blocks) win over streaming chunks (delta.text or top-level text). Without the discipline, a stream containing both deltas AND a terminal consolidated event would yield the message twice concatenated.

Phases 1 and 2 (registry + CodexBackend) are prerequisites and are merged.

Refs #480.

Smoke checklist (post-merge, requires codex CLI installed)

Phase 2's smoke gate validates the codex app-server JSON-RPC conversational path. This PR introduces a SECOND codex surface (codex exec --json for one-shot triage) with a different event vocabulary and a different stdin/argv contract. The Phase 2 smoke test does NOT validate this surface; the items below need explicit verification:

  • Stdin contract for codex exec. The branch passes the prompt via proc.communicate(input=prompt.encode()) with no positional prompt argument and no - argv marker. Goose uses goose run -i -, claude uses --print; codex has no equivalent explicit affordance verified yet. If codex exec --json --model <m> requires the prompt as a positional argv (or some other marker), the branch needs the prompt appended to cmd rather than written to stdin. Verify by triggering a real GitHub webhook against a codex-active install and inspecting the resulting triage subprocess.
  • codex exec --json event vocabulary. The parser tolerates several field-path variants (terminal: content string/list, item.content list; chunk: delta.text, top-level text). The Phase 2 smoke test exercises session/update + agent_message_chunk (the conversational ACP-shaped envelope), which is a DIFFERENT vocabulary. Capture a real NDJSON stream from codex exec --json against a known prompt, store it as a fixture under tests/fixtures/codex/, and reshape _recover_terminal_text / _recover_chunk_text if the pinned schema lands outside the recognized paths.
  • Webhook-driven triage path end-to-end. Trigger a real GitHub issue webhook against a codex-active install and verify the triage comment posts. This exercises the integration of the codex branch with the dispatch logic, NDJSON extraction, and _parse_triage_json.

Test plan (pre-merge)

  • make check (ruff + format)
  • Full pytest suite (3074 passed, 1 skipped)
  • Targeted: TestRunTriageCodex (8 tests) covering argv asserts ("codex" + "exec" + --json, never claude / sudo / --max-budget-usd), registry model selection, ISSUE_TRIAGE_MODEL_CODEX env override, no-sudo even with claude_user passed, subprocess failure handling, final-text extraction from NDJSON, AND end-to-end through run_triage on a deltas+terminal stream
  • Targeted: TestExtractCodexText (13 tests) — 10 single-path tests plus 3 regression guards for the terminal-wins discipline (deltas+terminal, last-terminal-wins, deltas-only fallback)
  • Pre-push grep gate clean

Out of scope

  • Codex branches in review.py and eval/behavioral.py. Phases 4 and 5 of the epic.
  • Codex memory extraction. Phase 6 / future memory epic.

zigguratt added 2 commits May 15, 2026 08:54
Phase 3 of the codex backend epic (#480). Adds the codex branch
to `run_triage` so GitHub webhook-driven issue triage runs through
codex when AGENT_BACKEND=codex.

src/kai/triage.py:
- New `if agent_backend == "codex":` branch invoking
  `codex exec --json --model <model>` with the model resolved via
  get_model_for(ModelRole.ISSUE_TRIAGE, "codex", override=...). The
  override env var is ISSUE_TRIAGE_MODEL_CODEX, read at the call
  site to match the existing claude/goose pattern.
- New `_extract_codex_text` and `_recover_text_from_event` helpers
  that walk codex's NDJSON event stream and recover the final
  agent message text. Defensive about field names: tries top-level
  "text", "delta.text", "content" as string or list-of-blocks, and
  "item.content" with the list-of-blocks shape. The actual codex
  CLI schema is not yet pinned by smoke test, so the parser
  accepts multiple field paths; whichever fires on the pinned
  version wins, the others stay as fallbacks for schema drift.
- No sudo wrap (codex on subscription auth uses the service user's
  own ~/.codex/auth.json); no --max-budget-usd (subscription auth
  has no per-call billing, runaway protection comes from
  _TRIAGE_TIMEOUT).

tests/test_triage.py:
- TestRunTriageCodex (7 tests): argv asserts "codex" + "exec" +
  --json (not claude / sudo / --max-budget-usd); registry model
  selection; ISSUE_TRIAGE_MODEL_CODEX env override; claude_user
  ignored on codex (no sudo); subprocess failure raises with
  stderr; NDJSON extraction returns final text not raw stdout.
- TestExtractCodexText (10 tests): each defensive field path
  locked independently (top-level text, delta.text, content as
  string, content as list-of-blocks, item.content, unknown
  events skipped, multi-event accumulation, whitespace stripped,
  non-JSON lines tolerated, empty input returns empty string).

Refs #480
…rser

PR review M1: _extract_codex_text accumulated text from every
recognized field path across every event. On the standard streaming
shape that codex emits (delta chunks followed by a terminal
consolidated message), this returned the message twice concatenated.
For triage where the agent message is a JSON object, the doubled
output is `{"labels":[]}{"labels":[]}`-shaped and _parse_triage_json
cannot parse it, breaking the triage path on every codex run that
streams.

Fix: split recovery into terminal and chunk paths.

- _recover_terminal_text recognizes complete-message field paths
  (top-level "content" as string or list-of-blocks, "item.content"
  as list-of-blocks). When any terminal event appears in the stream,
  the most recent one wins; delta chunks are discarded.
- _recover_chunk_text recognizes streaming-delta paths ("delta.text",
  top-level "text" as a string). Chunks accumulate only when no
  terminal event was emitted in the stream.

The existing 10 TestExtractCodexText cases continue to pass under
the new design (the single-path streams in those tests are
classified consistently). Three new regression tests:

- test_terminal_text_wins_over_accumulated_deltas: worked example
  from the review. Asserts deltas + terminal returns the terminal
  text once.
- test_last_terminal_wins_on_multiple_terminals: multiple terminal
  events resolve to the most recent.
- test_deltas_only_when_no_terminal: no-terminal stream accumulates
  deltas as before. Locks the fallback the new logic must not
  short-circuit.

Plus an integration test in TestRunTriageCodex:

- test_codex_handles_streaming_deltas_plus_terminal: end-to-end
  through run_triage on a delta+terminal stream returns the terminal
  JSON exactly once, parseable downstream.

M2 (unverified stdin contract for `codex exec`) and L1 (smoke
checklist needs explicit `codex exec` triage path) folded into the
PR body's smoke checklist rather than code changes; both are
post-merge verification items.
@dcellison dcellison merged commit 6864797 into main May 15, 2026
1 check passed
@dcellison dcellison deleted the feature/codex-triage-branch branch May 15, 2026 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants