fix(arg0): add path and operation context to error messages by HaleTom · Pull Request #1 · HaleTom/codex

HaleTom · 2026-04-26T13:47:11Z

When prepend_path_entry_for_codex_aliases() fails during arg0 setup (permission errors, stale lock files, missing temp directories), every ? operator propagates a bare io::Error with only the OS-level message — with no indication of which path or which operation failed. This makes the arg0-related failures in openai#16970 and openai#17240 extremely difficult to diagnose.

Before:

Read-only file system (os error 30)

After:

failed to create symlink /home/user/.codex/tmp/arg0/codex-arg0XXX/apply_patch: Read-only file system (os error 30)

Approach

A ContextIoError wrapper in codex-arg0 that preserves the original io::Error as a source, so the error chain remains introspectable via get_ref()/source() while enriching the Display message:

with_path_context(err, path, operation) → "failed to {operation} \{path}`: {err}"`
with_context(err, message) → "{message}: {err}" (for non-path errors like resolve CODEX_HOME)
deep_raw_os_error(&err) → walks the source chain to recover the original OS error code that io::Error::new clears

Every ? in the public function is now .map_err(|e| with_path_context(e, …))? or .map_err(|e| with_context(e, …))?.

Additional changes

Hoisted std::env::current_exe() out of the alias loop — called once instead of once per alias.
codex-mcp no longer directly depends on codex-utils-absolute-path; it gets it transitively through codex-arg0.

Verification

cargo test -p codex-arg0 — 8 tests pass (including updated unit tests that verify source-chain preservation and deep_raw_os_error recovery)
cargo clippy -p codex-arg0 — no warnings
cargo fmt -p codex-arg0 — clean

Bug report: openai#19674
Related: openai#16970, openai#17240

I have read the CLA Document and I hereby sign the CLA

semanticdiff-com · 2026-04-26T13:47:14Z

Review changes with

Changed Files

File	Status
codex-rs/arg0/src/lib.rs	12% smaller
MODULE.bazel.lock	Unsupported file format
codex-rs/Cargo.lock	Unsupported file format
codex-rs/Cargo.toml	Unsupported file format
codex-rs/arg0/Cargo.toml	Unsupported file format
codex-rs/arg0/clippy.toml	Unsupported file format

…codex_aliases

HaleTom · 2026-04-27T12:51:10Z

I have read the CLA Document and I hereby sign the CLA

Copilot

Pull request overview

This PR improves diagnosability of codex-arg0 failures by enriching propagated io::Error messages with operation + path context (notably around prepend_path_entry_for_codex_aliases()), addressing hard-to-debug arg0 setup errors seen in prior reports.

Changes:

Add with_path_context / with_context helpers and apply them to error propagation sites in prepend_path_entry_for_codex_aliases().
Hoist std::env::current_exe() out of the alias creation loop.
Update lockfile dependencies to reflect a removed direct dependency.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`codex-rs/arg0/src/lib.rs`	Adds contextual error helpers, applies them to arg0 tempdir/lock/alias setup, and adds unit tests for the helpers.
`codex-rs/Cargo.lock`	Removes a direct `codex-utils-absolute-path` entry from `codex-mcp` dependency list (now transitive).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace format!-based error helpers with a ContextIoError wrapper type that stores the original io::Error as source, so get_ref()/ source() traversal still reaches the root cause while the Display message includes operation and path context. Also update unit tests to verify source-chain preservation.

Copilot

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…bility - Replace raw string literal for batch script with format! using explicit \r\n to avoid stray quote line on Windows - Use path.display().to_string() in test assertion instead of hardcoded Unix path separator

Copilot

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The ContextIoError wrapper clears raw_os_error() on the outer error since io::Error::new does not preserve it. Document this and add a unit test verifying the OS error code is still retrievable via the source chain.

Copilot

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

and_downcast is not a standard method; replace with source-chain walk using downcast_ref.

Replaces the ugly inline downcast pattern in doc comments with a reusable helper, and updates the test to use it.

Copilot

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace raw std::fs calls with fs-err in codex-arg0 for ordinary filesystem operations, keeping with_path_context only for lock-acquire semantics. Changes: - Add fs-err as workspace and arg0 crate dependency - Replace std::fs::{create_dir_all,read_dir,remove_dir_all,set_permissions,write} with fs_err equivalents for richer error messages - Replace std::os::unix::fs::symlink with fs_err::os::unix::fs::symlink - Add try_lock_arg0_dir helper: treats WouldBlock as AlreadyExists with path+operation context; janitor uses try_lock_dir which treats WouldBlock as Ok(None) for expected cleanup races - Arg0PathEntryGuard._lock_file changed from std::fs::File to fs_err::File Based on suggestion from joshka to adopt fs-err and add disallowed-methods clippy lint to prevent raw std::fs usage going forward.

- PermissionsExt::from_mode(0o700): use UFCS with std::fs::Permissions (removes unused PermissionsExt import) - doc: remove deep_raw_os_error references (cfg(test) helper not available in production); describe source-chain walk explicitly - reuse exe instead of calling current_exe() again - WouldBlock: remove path from inner msg to avoid path duplication with outer with_path_context wrapper

Copilot

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix PermissionsExt import: add use std::os::unix::fs::PermissionsExt inside the #[cfg(unix)] block where from_mode is called. - Remove #[cfg(unix)] from TempDir import (used on all platforms). - Update doc comments on with_path_context/with_context: remove reference to cfg(test)-gated deep_raw_os_error, describe the source-chain walk pattern directly. - Change try_lock_arg0_dir WouldBlock handling: use io::Error::from(WouldBlock) instead of AlreadyExists kind to preserve WouldBlock semantics (reviewed: callers expect WouldBlock for lock contention, not AlreadyExists). - Remove duplicate lock path from inner WouldBlock error message (outer with_path_context already adds path context). - Fix duplicate current_exe call: reuse exe for codex_self_exe.

HaleTom

Addressing all remaining threads in this round:

d_r3156857572 / d_r3158380827 (PermissionsExt): Added use std::os::unix::fs::PermissionsExt; inside the #[cfg(unix)] block where from_mode is called.

d_r3156857593 / d_r3158346764 / d_r3158346768 (deep_raw_os_error in docs): Updated both with_path_context and with_context doc comments — removed the reference to the #[cfg(test)]-gated helper and replaced with a description of the source-chain walk pattern directly (no function name, actionable in production code).

d_r3156857607 (WouldBlock → AlreadyExists): Changed to std::io::Error::from(std::fs::TryLockError::WouldBlock) preserving the WouldBlock kind as you suggested. The inner error message no longer includes "lock is already held" text (the outer with_path_context already adds "acquire lock" context).

d_r3158380839 (TempDir #[cfg(unix)]): Removed #[cfg(unix)] from use tempfile::TempDir; — TempDir is used unconditionally on all platforms.

d_r3158346778 (duplicate current_exe syscall): The std::env::current_exe() at line 196 is in arg0_dispatch_or_else and the exe computed at line 415 is in prepend_path_entry_for_codex_aliases — these are in different functions. The current_exe passed to run_main_with_arg0_guard serves arg0_dispatch_or_else's own call to linux_sandbox_exe_path, not the arg0 path setup. Not a duplicate.

Committed as dd0d119, pushed.

Copilot

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Create clippy.toml with disallowed-methods list to enforce #![deny(clippy::disallowed_methods)] - Replace with_path_context with with_context on fs_err calls to eliminate redundant double-path wrapping - Collapse redundant try_lock_arg0_dir Err arms - Thread current_exe() through prepend_path_entry_for_codex_aliases to avoid redundant syscall in arg0_dispatch_or_else - Restore security comment for set_permissions(0o700) - Add doc comment noting with_path_context is for non-fs_err paths - Add #[allow] on test module for legitimate std::fs exceptions

Copilot

Pull request overview

Copilot reviewed 4 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The crate-level comment claimed fs_err has no create_dir (only create_dir_all), but fs_err::create_dir does exist. Remove the incorrect claim; both the test-only exception and the clippy.toml reason are now consistent with the actual fs_err API.

HaleTom · 2026-04-29T15:37:01Z

Production-ready bar for this PR

All std::fs call sites in codex-arg0 are replaced with fs_err equivalents, and a clippy::disallowed_methods deny list prevents regressions.
Error context is enriched with operation and path information via with_context / with_path_context, preserving the io::Error kind and source chain.
The current_exe() syscall is hoisted out of the loop and plumbed through, eliminating N-1 redundant syscalls.
deep_raw_os_error is test-only (#[cfg(test)]) and correctly walks the ContextIoError-wrapped source chain.
Clippy deny is scoped to codex-arg0/clippy.toml, not workspace-wide, avoiding breakage of the ~20 other crates.
std::fs::Permissions and std::fs::TryLockError remain as std references (type constructor and lock error enum, not filesystem calls) with inline #[allow] where used.
The fs_err dependency is properly declared in both the workspace Cargo.toml and arg0/Cargo.toml, and MODULE.bazel.lock is updated.
All 8 tests pass, including 3 new tests for the error context helpers.
No functional behavior change — same filesystem operations, same lock semantics, same janitor logic — just better error messages.

Findings

1. Correctness & functional completeness

No issues found in this area based on the diff and reviewed context.

The std::fs → fs_err migration covers all production call sites. The current_exe() hoist is correctly plumbed: the function parameter exe: Option<&Path> is used when provided, falling back to std::env::current_exe() only when None. The codex_self_exe field is now Some(exe) instead of std::env::current_exe().ok() (which could fail silently), which is a functional improvement.

The try_lock_arg0_dir extraction is semantically equivalent to the inlined code, with the addition of with_path_context error enrichment. The WouldBlock → real error behavior is preserved.

[NON-BLOCKING] `deep_raw_os_error` walk could skip `ContextIoError` early for clarity

Type: Plausible risk
Evidence: codex-rs/arg0/src/lib.rs:310-325
Why it matters: The walk does downcast_ref::<io::Error>() at every level, which will fail on ContextIoError intermediate nodes. It still works because source = s.source() advances past them. However, a future reader might wonder why the ContextIoError intermediate node isn't handled or documented. If someone adds another wrapper type, the walk could silently skip levels.
Recommendation: Consider adding a brief comment inside the while loop noting that ContextIoError (and any other non-io::Error wrapper) is transparently skipped via source = s.source(). Not blocking because the current behavior is correct.
Confidence: Low

2. Architecture & boundary integrity

No issues found in this area based on the diff and reviewed context.

The clippy::disallowed_methods deny is correctly scoped to codex-arg0/clippy.toml rather than a workspace-wide clippy.toml, matching the project memory note about ~1041 raw std::fs sites across 147 files needing migration first. The ContextIoError and helpers are private to the module, which is appropriate for a single-crate migration.

3. Code clarity, clean code & maintainability

[NIT] `with_path_context` duplicates path info that `fs_err` already provides

Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:329-348 (doc comment on with_path_context acknowledges this)
Why it matters: The doc comment correctly documents the distinction, but with_path_context exists only for try_lock_arg0_dir (one call site). For fs_err calls that already include the path, with_context is used instead. This is correctly documented and the separation is well-motivated, but it's worth noting that with_path_context is currently only used once.
Recommendation: No action needed — the doc comment is clear about the distinction, and the function exists for semantic completeness even with a single call site.
Confidence: High

4. Comments & code documentation

No issues found in this area based on the diff and reviewed context.

The raw_os_error() caveats on both with_path_context and with_context are thoroughly documented. The try_lock_arg0_dir doc comment clearly explains the semantic distinction from try_lock_dir. The clippy exceptions comment at the top of the file is precise about each allowed category.

5. Tests & validation

No issues found in this area based on the diff and reviewed context.

All 8 tests pass (verified). The 3 new tests cover:

with_path_context_includes_operation_path_and_source: verifies operation, path, original message, and source chain preservation.
deep_raw_os_error_retrieves_original_os_code: verifies OS error code retrieval through the wrapped chain, and that raw_os_error() is None on the outer error.
with_context_includes_custom_message_and_source: verifies context string, original message, and source chain.

These tests would fail if the wiring were missing (e.g., if ContextIoError.source() returned None or if Error::new dropped the kind). Clippy and cargo check pass cleanly.

6. Performance

No issues found in this area based on the diff and reviewed context.

The current_exe() hoist from inside the loop to a single call is a genuine improvement. On a system with 3-4 symlinks in the loop, this eliminates 2-3 redundant readlink("/proc/self/exe") syscalls. The fs_err wrapper is zero-cost (it inlines to the same std::fs call + path formatting only on error).

7. Operational risk

[NON-BLOCKING] `just bazel-lock-check` / `just bazel-lock-update` not verified

Type: Unverified concern
Evidence: MODULE.bazel.lock is updated in the diff; AGENTS.md requires running just bazel-lock-check after Cargo.lock changes.
Why it matters: The just command is not available in this environment (/bin/bash: line 1: just: command not found). The MODULE.bazel.lock was updated, but consistency against the actual Bazel build was not verified locally. CI would catch drift.
Recommendation: CI will validate this. Not blocking because the lockfile update is present and follows the documented process.
Confidence: Low

8. Adversarial review

[NON-BLOCKING] `raw_os_error()` returns `None` on wrapped errors — downstream consumers may break

Type: Plausible risk
Evidence: codex-rs/arg0/src/lib.rs:334-338, codex-rs/arg0/src/lib.rs:352-355
Why it matters: Any downstream code that calls raw_os_error() on errors returned from prepend_path_entry_for_codex_aliases or janitor_cleanup will now get None instead of the original OS code. The doc comments document this, and deep_raw_os_error provides the recovery path, but only under #[cfg(test)]. In practice, callers currently match on err.kind() (e.g., ErrorKind::NotFound in janitor_cleanup), which is preserved. The eprintln! warning in arg0_dispatch only uses Display, which is also fine. The risk is low because raw_os_error() was already unreliable on io::Error::new-constructed errors.
Recommendation: If any production code needs raw_os_error() from these wrapped errors, deep_raw_os_error (or equivalent) needs to be promoted out of #[cfg(test)]. Currently, no such code exists.
Confidence: Medium

[NIT] `exe` parameter shadowing in Windows batch script block

Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:454
Why it matters: In the Windows block, the outer exe (a PathBuf) is shadowed by let exe = exe.display(). This is intentional (formatting convenience) and the pattern existed in the pre-change code too. Not introduced by this PR.
Recommendation: No action needed.
Confidence: High

What I could not fully verify

Bazel build / just bazel-lock-check: just is not available in this environment. The MODULE.bazel.lock update is present and follows the documented process, but Bazel consistency was not validated locally.
Windows path correctness: The batch script template was changed from multi-line r#"..."# to single-line format!. The \r\n line endings are present, but actual Windows execution was not tested.
Cross-crate raw_os_error() consumers: No exhaustive search was performed for code outside codex-arg0 that calls raw_os_error() on errors originating from this crate's public functions. The io::Error return types are unchanged, so external callers are unaffected unless they relied on raw_os_error() — which was already unreliable for Error::new-constructed errors.

Final verdict

✅ Ready to merge — no blocking issues.

The change is a clean, well-scoped std::fs → fs_err migration with appropriate error context enrichment, a redundant-syscall elimination, and a correctly scoped clippy deny list. Tests are meaningful and pass. Documentation of the raw_os_error() caveat is thorough. The two non-blocking items (potential future raw_os_error consumer, just-less Bazel verification) are low-risk and CI-addressable respectively.

- Promote deep_raw_os_error from #[cfg(test)] to always-available utility with #[allow(dead_code)] since no production caller exists yet. This makes the recovery path for raw_os_error() available to future consumers of wrapped errors. - Add doc comment explaining that ContextIoError wrapper nodes are transparently skipped via source() during the chain walk. - Update with_path_context and with_context doc comments to reference deep_raw_os_error as the recovery path instead of manual chain walking. - Rename shadowed `exe` variable to `exe_display` in Windows batch script block to avoid shadowing the outer PathBuf.

HaleTom

Requesting automated review

HaleTom · 2026-04-29T15:46:59Z

Bazel lockfile verification

Running just bazel-lock-check and just bazel-lock-update via mise exec:

`just bazel-lock-check` (via `mise exec just@1.50.0 -- just bazel-lock-check`)

Since the just recipe invokes a sub-shell script that calls bazel directly, I ran the underlying command with bazel on PATH:

$ mise exec bazel@9.1.0 -- bazel mod deps --lockfile_mode=error
<root> (codex@_)

Exit code 0 — lockfile is consistent.

`just bazel-lock-update` (via `mise exec bazel@9.1.0 -- bazel mod deps`)

After running the update, diff against the committed lockfile:

$ cp MODULE.bazel.lock MODULE.bazel.lock.before
$ mise exec bazel@9.1.0 -- bazel mod deps
<root> (codex@_)
$ diff MODULE.bazel.lock.before MODULE.bazel.lock
$ echo $?
0

Zero diff — the committed MODULE.bazel.lock is already up to date and consistent with the current Cargo.lock.

This resolves the "unverified" item from the review about Bazel lockfile consistency.

HaleTom · 2026-04-29T16:14:28Z

Audit: `raw_os_error()` downstream consumers

Result: No downstream raw_os_error() consumers affected.

The two public functions that now return ContextIoError-wrapped errors are:

prepend_path_entry_for_codex_aliases (codex-rs/arg0/src/lib.rs:385)
janitor_cleanup (codex-rs/arg0/src/lib.rs:517)

Both are only called within codex-arg0/src/lib.rs itself:

prepend_path_entry_for_codex_aliases called at line 150 — uses eprintln!("{err}") (Display only, no raw_os_error())
janitor_cleanup called at line 415 — uses eprintln!("{err}") (Display only, no raw_os_error())

All 18 raw_os_error() call sites across the repo are in other crates (cli, windows-sandbox-rs, linux-sandbox, core, utils/pty, shell-escalation) and operate on errors they construct themselves (via from_raw_os_error or last_os_error) — never on errors propagated from codex-arg0.

The NON-BLOCKING finding is confirmed safe: no production code currently needs raw_os_error() from these wrapped errors. deep_raw_os_error is promoted out of #[cfg(test)] with #[allow(dead_code)] as a safety net for any future consumer.

HaleTom · 2026-04-29T16:30:07Z

Related upstream issues

All of these exhibit the same stale ~/.codex/tmp/arg0/codex-arg0*/ path pattern that openai#17570 fixes. Our fs_err migration is orthogonal — it improves diagnosability of arg0 errors but does not prevent the deletion that causes ENOENT downstream.

Issue	Title	Related?	Reason
#16970	Unified exec caches stale `~/.codex/tmp/arg0` session path	Directly	Canonical WSL2 stale-arg0-path bug. Contains the exact `Failed to create unified exec process: Unable to spawn .../codex-linux-sandbox ... ENOENT` error. Explicitly fixed by openai#17570.
#16791	`apply_patch` fails with No such file or directory on existing files	Directly	`apply_patch` ENOENT on existing files — same stale arg0 shim symptom. Referenced by openai#17570 PR body.
#17778	`functions.apply_patch` fails with ENOENT after patch approval	Directly	Same pattern: `apply_patch` via arg0 shim fails while direct shell invocation works. Path: `~/.codex/tmp/arg0/codex-arg0ptO3zX/apply_patch`.
#17517	Apply Patch Bug	Directly	Identical pattern: MCP `apply_patch` ENOENT while shell `apply_patch` works. Path: `~/.codex/tmp/arg0/codex-arg0wFKksz/apply_patch`.
#17240	Codex tool runtime intermittently loses `apply_patch` path	Directly	Same root cause: `which apply_patch` returns different `~/.codex/tmp/arg0/codex-arg0*/apply_patch` paths within the same session. Also shows the `Failed to create unified exec process` error.
#4754	No such file or directory (OS error 2)	Weakly	Closed as not-planned. Windows ENOENT but no arg0 path mentioned and predates the arg0 system. Too vague to confirm.

How our PR relates

The with_path_context / with_context enrichment in janitor_cleanup makes arg0 error messages more diagnosable (e.g., read directory /home/user/.codex/tmp/arg0: No such file or directory instead of bare No such file or directory). But it does not prevent the underlying janitor-from-deleting-live-dirs bug that causes the ENOENT errors in those issues — that's what openai#17570's .pid sentinel fixes.

When openai#17570 merges, its new std::fs::write and std::fs::read_to_string calls in write_owner_pid / dir_belongs_to_running_process would need the same fs_err migration treatment if rebasing on top of our PR.

HaleTom · 2026-05-19T10:50:33Z

Production Readiness Review: enhance-arg0-error-context

PR: #1 (branch: enhance-arg0-error-context)
Reviewed: 2026-05-19T17:32:10+07:00
Scope: 5 files changed, +269 / -40 (codex-arg0 crate only)

Context files read

codex-rs/arg0/src/lib.rs:1-781 — full current file (main changed file)
/tmp/upstream-arg0-lib.rs:1-585 — upstream version for diff comparison
codex-rs/arg0/clippy.toml:1-22 — new file, disallowed-methods lint config
codex-rs/arg0/Cargo.toml:1-26 — added fs-err dependency
codex-rs/Cargo.toml:254-261 — workspace-level fs-err dependency
codex-rs/Cargo.lock — fs-err 3.3.0 added, transitive dep changes
GitHub issue arg0 error messages lack path and operation context, making failures hard to diagnose openai/codex#19674 comments (all 8 comments) — design intent and scope discussion

Verification evidence

cargo check -p codex-arg0 → passed (1m 29s)
cargo clippy -p codex-arg0 → passed (0 warnings, 1m 09s)
cargo test -p codex-arg0 → 8/8 passed (0.00s)

Production-ready bar for this PR

All std::fs calls in production code replaced with fs_err equivalents — errors must include the file path
Lock-acquire failures include the lock file path and "acquire lock" operation context
ContextIoError preserves the Error::source() chain so anyhow and downstream error walkers see the original OS error
deep_raw_os_error correctly walks the source chain to recover raw_os_error() lost by io::Error::new
clippy::disallowed_methods lint prevents future std::fs regression in this crate
No duplicate path information in error messages — with_context for fs_err calls (path already in message), with_path_context only for non-fs_err sites
Public API change (prepend_path_entry_for_codex_aliases now takes exe: Option<&Path>) is intentional and the single caller is updated
current_exe() hoisted out of the per-filename loop — eliminates redundant syscalls
Tests cover all new helper functions and the source-chain walking behavior
No blocking clippy warnings, no new unsafe code introduced

Findings

1. Correctness & functional completeness

Finding 1.1 — `with_path_context` doc comment is misleading about `with_context` usage

Classification: NIT
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:334-339
Confidence: High

The doc says:

For fs_err calls that already include the path in their error message, prefer [with_context] instead to avoid duplicating the path in the output.

This reads as if with_context is for wrapping errors returned by fs_err calls. But with_context is also used for fs_err calls themselves (e.g., line 403: fs::create_dir_all(&codex_home).map_err(|e| with_context(e, "create CODEX_HOME directory"))). The sentence is technically correct but confusing. Recommend rewording to:

For errors from fs_err operations (which already include the path), prefer [with_context] to add operation context without duplicating the path.

Finding 1.2 — Source chain is correctly preserved

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:287-303 (ContextIoError), codex-rs/arg0/src/lib.rs:299-302 (Error impl)
Confidence: High

ContextIoError implements Error::source() returning Some(&self.source), which means anyhow and other error walkers will traverse: ContextIoError → inner io::Error → original OS error. The deep_raw_os_error function (line 314) correctly walks this chain via downcast_ref::<io::Error>(). Test at line 739 verifies this.

Finding 1.3 — `try_lock_arg0_dir` error conversion is correct

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:571-580
Confidence: High

std::io::Error::from(TryLockError) correctly converts TryLockError::WouldBlock to io::Error with ErrorKind::WouldBlock, and TryLockError::Io(e) extracts the inner error. The with_path_context wrapper then adds the lock path and "acquire lock" operation. Error messages will read: failed to acquire lock '/path/.lock': Resource temporarily unavailable.

Finding 1.4 — `current_exe` hoisting eliminates redundant syscall

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:435-440 (single call before loop), vs upstream line 339 (current_exe() inside loop)
Confidence: High

Upstream called std::env::current_exe() inside the per-filename loop (once per alias). The PR hoists it to a single call before the loop. This is correct — the executable path doesn't change during the function.

2. Architecture & boundary integrity

Finding 2.1 — `fs_err` migration is correctly scoped

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/Cargo.toml:24 (fs-err dependency), codex-rs/arg0/src/lib.rs:16-19 (imports)
Confidence: High

fs_err is imported as use fs_err as fs and use fs_err::File, replacing all std::fs usage in production code. The clippy.toml (22 lines, 19 methods) prevents regression. Test code uses #[allow(clippy::disallowed_methods)] at the module level, which is correct — tests legitimately use std::fs for setup.

Finding 2.2 — Public API change is intentional

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:385-387 (new signature), codex-rs/arg0/src/lib.rs:149-150 (caller updated)
Confidence: High

prepend_path_entry_for_codex_aliases changed from fn() to fn(exe: Option<&Path>). The single internal caller (arg0_dispatch) is updated. The arg0_dispatch_or_else function (line 197-200) also reuses the guard's exe when available, avoiding a redundant syscall.

Finding 2.3 — Two lock functions with distinct semantics

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:546-563 (try_lock_dir for janitor), codex-rs/arg0/src/lib.rs:565-580 (try_lock_arg0_dir for startup)
Confidence: High

try_lock_dir (janitor) treats WouldBlock as Ok(None) — expected during cleanup. try_lock_arg0_dir (startup) treats WouldBlock as a real error — unexpected during normal startup. This separation is clean and correct.

3. Code clarity, clean code & maintainability

Finding 3.1 — `ContextIoError` is minimal and correct

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:287-303
Confidence: High

The struct has exactly two fields (context: String, source: io::Error), implements Display as "{context}: {source}", and implements Error::source() correctly. No unnecessary complexity.

Finding 3.2 — Error context strings are descriptive

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:388-464 (all .map_err calls)
Confidence: High

Every map_err call uses a distinct, descriptive context string:

"resolve CODEX_HOME" (line 388)
"create CODEX_HOME directory" (line 403)
"create temp directory" (line 405)
"set permissions on temp directory" (line 411)
"create temp directory for arg0" (line 422)
"open lock file" (line 432)
"get current executable path" (line 438)
"create symlink" (line 453)
"write batch script" (line 464)

Each uniquely identifies the failing operation.

4. Comments & code documentation

Finding 4.1 — `with_path_context` doc comment wording (duplicate of 1.1)

Classification: NIT
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:334-339
Confidence: High

Same as Finding 1.1. The sentence "prefer [with_context] instead" is ambiguous about whether it means "for errors returned by fs_err" or "for fs_err calls themselves." Recommend clarifying.

Finding 4.2 — `deep_raw_os_error` doc comment is clear

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:305-312
Confidence: High

The doc clearly explains why io::Error::new clears raw_os_error(), how wrapper types are skipped, and how downcast_ref is used. The #[allow(dead_code)] annotation (line 313) is justified — the function is exercised by tests (line 739) and will be used by callers that need the raw OS code.

Finding 4.3 — `clippy.toml` comments explain exceptions

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:2-6 (inline comments at top)
Confidence: High

The crate-level #![deny(clippy::disallowed_methods)] has inline comments explaining the three exception categories: TryLockError (no fs_err lock API), Permissions (type constructor), and test-only usage.

5. Tests & validation

Finding 5.1 — New tests cover all helper functions

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:713-780
Confidence: High

Three new tests:

with_path_context_includes_operation_path_and_source (line 714) — verifies operation, path, original message, and source chain
deep_raw_os_error_retrieves_original_os_code (line 739) — verifies raw_os_error() is None on wrapper, deep_raw_os_error recovers it
with_context_includes_custom_message_and_source (line 756) — verifies context string, original message, and source chain

Finding 5.2 — Existing tests pass unchanged

Classification: PASS
Type: Verified issue
Evidence: cargo test -p codex-arg0 → 8/8 passed
Confidence: High

All 5 existing tests pass: linux_sandbox_exe_path_prefers_codex_linux_sandbox_alias, run_main_with_arg0_guard_keeps_aliases_alive_until_main_returns, janitor_skips_dirs_without_lock_file, janitor_skips_dirs_with_held_lock, janitor_removes_dirs_with_unlocked_lock.

Finding 5.3 — No test for `with_context` wrapping of `fs_err` errors

Classification: NIT
Type: Plausible risk
Evidence: No test exercises with_context(e, "...") where e came from an actual fs_err call
Confidence: Low

The tests use synthetic io::Error::new(...) values. A test that creates a real fs_err error (e.g., fs_err::read_to_string("/nonexistent")) and wraps it with with_context would verify the combined output format. This is low risk since the wrapping is straightforward, but would add confidence about the "no duplicate path" property.

6. Performance

Finding 6.1 — `current_exe` hoisting saves syscalls

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:435-440 vs upstream line 339
Confidence: High

Upstream called current_exe() once per filename in the loop (4 times on Linux, 2 on Windows). The PR calls it once. Each current_exe() is a syscall (readlink /proc/self/exe on Linux). This is a minor but correct optimization.

Finding 6.2 — `ContextIoError` overhead is on error paths only

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:345-354 (with_path_context), codex-rs/arg0/src/lib.rs:361-369 (with_context)
Confidence: High

The ContextIoError wrapper allocates a String for the context, but only on the error path. Happy path has zero overhead from the context wrappers.

7. Operational risk

Finding 7.1 — `fs_err` migration is contained to `codex-arg0`

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/clippy.toml (prevents regression), codex-rs/arg0/Cargo.toml (dependency)
Confidence: High

The fs_err dependency and clippy lint are scoped to codex-arg0 only. Other crates are unaffected. The issue discussion confirms this is intentional — workspace-wide migration is deferred to follow-up PRs.

Finding 7.2 — Error messages now include paths (privacy consideration)

Classification: NON-BLOCKING
Type: Plausible risk
Evidence: All .map_err calls in prepend_path_entry_for_codex_aliases
Confidence: Medium

Error messages now include filesystem paths (e.g., CODEX_HOME path, temp directory path). The issue discussion (comment #4362753988) acknowledges this: "In the arg0 case, the paths are local temp/cache paths already implicated in the failure, so including them seems like a clear win." This is acceptable for CLI error output but worth noting — these paths would appear in eprintln! output visible to the user.

Finding 7.3 — `#![deny(clippy::disallowed_methods)]` is crate-level

Classification: PASS
Type: Verified issue
Evidence: codex-rs/arg0/src/lib.rs:1
Confidence: High

The deny is at crate level, enforced by clippy.toml. Test code uses #[allow(clippy::disallowed_methods)] at the module level. This is the correct pattern — production code is linted, test code is exempt.

8. Adversarial review

Adversarial 1 — Could `ContextIoError` break error downcasting?

Attack: If downstream code does err.downcast_ref::<io::Error>() on the wrapped error, it would get None because the outer type is ContextIoError, not io::Error.

Assessment: Low risk. The crate returns std::io::Result<...>, and the callers use ? or anyhow. anyhow walks the source chain, so the original io::Error is reachable. The raw_os_error() loss is documented and deep_raw_os_error provides recovery. No code in this crate downcasts the returned errors.

Adversarial 2 — Could the `exe` parameter to `prepend_path_entry_for_codex_aliases` be `None` when it shouldn't be?

Attack: If a caller passes None and current_exe() fails, the function returns an error.

Assessment: Correct behavior. The caller (arg0_dispatch) does let exe = std::env::current_exe().ok() and passes exe.as_deref(). If current_exe() fails, exe is None, and the function falls back to calling current_exe() again (line 437-438). If that also fails, the error is propagated with context "get current executable path". This is the correct fallback chain.

Adversarial 3 — Could `deep_raw_os_error` infinite loop?

Attack: If Error::source() returns a cycle, the while loop would never terminate.

Assessment: Not possible. ContextIoError::source() always returns the inner io::Error, and io::Error::source() returns None for most variants (or a finite chain for custom errors). The standard library Error trait does not support cycles.

Adversarial 4 — Is the `with_context` call on `find_codex_home()` correct?

Attack: find_codex_home() returns Result<PathBuf, anyhow::Error>, not io::Error. Does .map_err(|e| with_context(e, ...)) compile?

Assessment: Let me check... find_codex_home returns anyhow::Result<PathBuf>. The .map_err(|e| with_context(e, ...)) would need e to be io::Error. But anyhow::Error is not io::Error.

Wait — this is a real finding. Let me check the actual type at line 388:

let codex_home = find_codex_home().map_err(|e| with_context(e, "resolve CODEX_HOME"))?;

find_codex_home() returns anyhow::Result<PathBuf>. with_context takes std::io::Error. anyhow::Error does not implement Into<std::io::Error>.

However, the code compiled and passed cargo check. So either:

find_codex_home returns io::Result<PathBuf> (not anyhow::Result)
There's an implicit conversion

Let me check... The find_codex_home function is from codex-utils-home-dir. Given the code compiles, it must return io::Result<PathBuf>. This is consistent with the upstream code which uses ? directly (line 285: let codex_home = find_codex_home()?;) where the function returns io::Result.

Verdict: Not an issue. find_codex_home returns io::Result<PathBuf>. The with_context call is type-correct.

What I could not fully verify

Workspace-wide impact: Could not verify that no other crate in the workspace depends on the old signature of prepend_path_entry_for_codex_aliases() (the one without the exe parameter). The git diff shows only codex-rs/arg0/src/lib.rs was changed, but callers in other crates (e.g., codex-core, codex-tui) would need to be updated if they call this function. Mitigation: The diff only touches codex-rs/arg0/ files, and cargo check passed, which means either (a) no other crate calls this function, or (b) the callers are in deleted files (this fork removed the TUI and other crates). In the upstream context, the caller is arg0_dispatch_or_else which is in the same file.
Cross-platform behavior: The Windows batch script change (line 458-464) uses \r\n line endings and quotes the exe path. Could not verify this works correctly on Windows with paths containing spaces. The upstream used \n without \r, so the PR adds \r\n which is more correct for Windows batch files.
fs_err lock API: Verified that fs_err does not provide lock APIs (the try_lock method comes from fs2/fs4 which fs_err::File re-exports). The try_lock_arg0_dir function correctly handles the TryLockError type.

Final verdict

✅ Ready to merge — no blocking issues.

The PR achieves its goal: all filesystem errors in codex-arg0 now include the path and operation context. The fs_err migration is clean, the clippy.toml prevents regression, and the custom ContextIoError wrapper correctly preserves the error source chain. The scope is appropriately limited to codex-arg0, with workspace-wide migration deferred to follow-up PRs (as discussed in the issue).

3 nits (non-blocking):

with_path_context doc comment wording could be clearer about with_context usage (Finding 1.1 / 4.1)
No test exercises with_context wrapping a real fs_err error (Finding 5.3)
Privacy note: error messages now include filesystem paths (Finding 7.2 — acceptable per issue discussion)

- Reword with_path_context doc to clarify that with_context is for errors from fs_err operations (which already include the path), not for wrapping errors before passing to fs_err - Add test exercising with_context wrapping a real fs_err error to verify path is preserved without duplication

HaleTom · 2026-05-20T06:35:08Z

Workspace-wide impact verification

Claim from review: Could not verify that no other crate depends on the old signature of prepend_path_entry_for_codex_aliases().

Verification result: Non-issue — confirmed no other callers exist.

Only one .rs file references prepend_path_entry_for_codex_aliases across the entire workspace — codex-rs/arg0/src/lib.rs (the declaration and sole call site both live here).
Call sites:
- codex-rs/arg0/src/lib.rs:385 — declaration: pub fn prepend_path_entry_for_codex_aliases(exe: Option<&Path>)
- codex-rs/arg0/src/lib.rs:150 — only call: match prepend_path_entry_for_codex_aliases(exe.as_deref())
- codex-rs/arg0/src/lib.rs:196 — a comment only
Zero references in codex-core, codex-tui, or any other workspace crate — confirmed via rg -l for the symbol name across all .rs files in codex-rs/.
cargo check passing was already sufficient evidence — this just confirms there is no workspace-wide scope for this concern. The pub visibility is cosmetic within the crate.

feat(arg0): enrich error messages with path and operation context

d688609

HaleTom added 3 commits April 27, 2026 14:42

Remove unnecessary changes

9a9885b

style(arg0): move error context helpers after imports

c9cc7ac

fix(arg0): hoist current_exe() out of loop in prepend_path_entry_for_…

0cf4bf5

…codex_aliases

HaleTom changed the title ~~Enrich arg0 error messages with path and operation context~~ fix(arg0): add path and operation context to error messages Apr 27, 2026

HaleTom marked this pull request as ready for review April 27, 2026 12:56

Copilot AI review requested due to automatic review settings April 27, 2026 12:56

HaleTom mentioned this pull request Apr 27, 2026

arg0 error messages lack path and operation context, making failures hard to diagnose openai/codex#19674

Open

Copilot started reviewing on behalf of HaleTom April 27, 2026 12:57 View session