Skip to content

Isolate writable Codex session state for LLM triage runs#12

Merged
R00T-Kim merged 1 commit into
R00T-Kim:mainfrom
NightStalkers-160th:fix-codex-llm-triage-session
Apr 23, 2026
Merged

Isolate writable Codex session state for LLM triage runs#12
R00T-Kim merged 1 commit into
R00T-Kim:mainfrom
NightStalkers-160th:fix-codex-llm-triage-session

Conversation

@NightStalkers-160th
Copy link
Copy Markdown
Contributor

Summary

This change fixes a reproducible Codex CLI session-initialization failure in SCOUT's LLM driver by giving Codex an isolated writable home under the run directory instead of launching it in a fully read-only state.

Problem

In llm_triage, SCOUT launched codex exec with -s read-only. In real runs this caused Codex to fail before inference with errors like:

Failed to create session: Read-only file system
failed to open state db at /home/.../.codex/state_5.sqlite
attempt to write a readonly database

That failure mode was reproduced from saved llm_trace artifacts in a firmware analysis run.

What changed

  • Default Codex sandbox mode for the unified driver is now workspace-write instead of read-only.
  • When CODEX_HOME is not provided, the driver creates an isolated run-local home at run_dir/.codex-home.
  • When an external CODEX_HOME is provided outside the run directory, the driver adds it via --add-dir so Codex can write there intentionally.
  • For the isolated run-local home, the driver seeds auth.json from the user's existing Codex home when available, so authentication can continue without writing into the primary global state directory.

Why this scope

This PR is intentionally narrow: it fixes the deterministic session/state write failure in the driver. It does not claim to fully resolve every possible live-service/auth/runtime cause of llm_triage partial outcomes.

Verification

  • pytest -q tests/test_llm_driver.py
  • pytest -q tests/test_llm_driver_conformance.py
  • python3 -m py_compile src/aiedge/llm_driver.py tests/test_llm_driver.py

Notes

A follow-up may still be needed for any separate live Codex service/auth issues, but this removes the reproducible SCOUT-side read-only session-init failure.


요약

이 변경은 SCOUT의 LLM 드라이버가 Codex CLI를 완전한 읽기 전용 상태로 실행하면서 발생하던 재현 가능한 세션 초기화 실패를 수정합니다. 이제 Codex는 run 디렉터리 아래의 격리된 writable 홈을 사용합니다.

문제

llm_triage에서 SCOUT는 codex exec-s read-only로 실행하고 있었습니다. 실제 실행에서는 이 때문에 추론 이전 단계에서 아래와 같은 에러로 실패했습니다.

Failed to create session: Read-only file system
failed to open state db at /home/.../.codex/state_5.sqlite
attempt to write a readonly database

이 실패는 실제 펌웨어 분석 run의 llm_trace 산출물로 재현 확인했습니다.

변경 사항

  • unified Codex 드라이버의 기본 sandbox 모드를 read-only에서 workspace-write로 변경했습니다.
  • CODEX_HOME이 없으면 run_dir/.codex-home에 격리된 run-local 홈을 생성합니다.
  • 외부 CODEX_HOME이 run 디렉터리 밖에 있으면 --add-dir를 추가해서 의도적으로 writable 하게 만듭니다.
  • 격리된 run-local 홈을 쓸 때는 기존 Codex 홈의 auth.json을 가능한 경우 복사해서, 전역 상태 디렉터리에 쓰지 않고도 인증을 이어갈 수 있게 했습니다.

범위

이 PR은 의도적으로 범위를 좁게 잡았습니다. 즉, 드라이버 레벨의 결정적 session/state write 실패를 고치는 것이 목적입니다. llm_triage의 모든 partial 원인이 완전히 해결되었다고 주장하지는 않습니다.

검증

  • pytest -q tests/test_llm_driver.py
  • pytest -q tests/test_llm_driver_conformance.py
  • python3 -m py_compile src/aiedge/llm_driver.py tests/test_llm_driver.py

비고

Codex 서비스 인증이나 런타임 쪽의 별도 이슈는 후속 조치가 필요할 수 있지만, 이 변경으로 SCOUT 측에서 재현되던 read-only session-init 실패는 제거합니다.

SCOUT was launching  in read-only mode, which caused Codex to fail before inference when it tried to initialize its local session/state files. This change gives Codex an isolated writable home under the run directory by default, keeps external CODEX_HOME overrides working, and seeds auth.json into the isolated home so the CLI can still authenticate without writing back into the user's primary Codex state.

Constraint: SCOUT should keep Codex writes scoped away from the user's primary ~/.codex state
Rejected: Keep read-only sandbox and rely on ~/.codex writes | reproduced readonly session-init failures in llm_triage traces
Rejected: Reuse ~/.codex directly with workspace-write | broadens write scope and mixes run-local state with user-global state
Confidence: medium
Scope-risk: narrow
Reversibility: clean
Directive: Keep Codex writable state isolated to the run directory unless a caller explicitly provides CODEX_HOME
Tested: pytest -q tests/test_llm_driver.py
Tested: pytest -q tests/test_llm_driver_conformance.py
Tested: python3 -m py_compile src/aiedge/llm_driver.py tests/test_llm_driver.py
Not-tested: end-to-end llm_triage success against the live Codex service
@R00T-Kim R00T-Kim merged commit 89927f7 into R00T-Kim:main Apr 23, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants