Skip to content

fix(arg0): add path and operation context to error messages#1

Open
HaleTom wants to merge 19 commits into
mainfrom
enhance-arg0-error-context
Open

fix(arg0): add path and operation context to error messages#1
HaleTom wants to merge 19 commits into
mainfrom
enhance-arg0-error-context

Conversation

@HaleTom
Copy link
Copy Markdown
Owner

@HaleTom HaleTom commented Apr 26, 2026

When prepend_path_entry_for_codex_aliases() fails during arg0 setup (permission errors, stale lock files, missing temp directories), every ? operator propagates a bare io::Error with only the OS-level message — with no indication of which path or which operation failed. This makes the arg0-related failures in openai#16970 and openai#17240 extremely difficult to diagnose.

Before:

Read-only file system (os error 30)

After:

failed to create symlink /home/user/.codex/tmp/arg0/codex-arg0XXX/apply_patch: Read-only file system (os error 30)

Approach

A ContextIoError wrapper in codex-arg0 that preserves the original io::Error as a source, so the error chain remains introspectable via get_ref()/source() while enriching the Display message:

  • with_path_context(err, path, operation)"failed to {operation} \{path}`: {err}"`
  • with_context(err, message)"{message}: {err}" (for non-path errors like resolve CODEX_HOME)
  • deep_raw_os_error(&err) → walks the source chain to recover the original OS error code that io::Error::new clears

Every ? in the public function is now .map_err(|e| with_path_context(e, …))? or .map_err(|e| with_context(e, …))?.

Additional changes

  • Hoisted std::env::current_exe() out of the alias loop — called once instead of once per alias.
  • codex-mcp no longer directly depends on codex-utils-absolute-path; it gets it transitively through codex-arg0.

Verification

  • cargo test -p codex-arg0 — 8 tests pass (including updated unit tests that verify source-chain preservation and deep_raw_os_error recovery)
  • cargo clippy -p codex-arg0 — no warnings
  • cargo fmt -p codex-arg0 — clean

Bug report: openai#19674
Related: openai#16970, openai#17240

I have read the CLA Document and I hereby sign the CLA

@semanticdiff-com
Copy link
Copy Markdown

semanticdiff-com Bot commented Apr 26, 2026

Review changes with  SemanticDiff

Changed Files
File Status
  codex-rs/arg0/src/lib.rs  12% smaller
  MODULE.bazel.lock Unsupported file format
  codex-rs/Cargo.lock Unsupported file format
  codex-rs/Cargo.toml Unsupported file format
  codex-rs/arg0/Cargo.toml Unsupported file format
  codex-rs/arg0/clippy.toml Unsupported file format

@HaleTom
Copy link
Copy Markdown
Owner Author

HaleTom commented Apr 27, 2026

I have read the CLA Document and I hereby sign the CLA

@HaleTom HaleTom changed the title Enrich arg0 error messages with path and operation context fix(arg0): add path and operation context to error messages Apr 27, 2026
@HaleTom HaleTom marked this pull request as ready for review April 27, 2026 12:56
Copilot AI review requested due to automatic review settings April 27, 2026 12:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves diagnosability of codex-arg0 failures by enriching propagated io::Error messages with operation + path context (notably around prepend_path_entry_for_codex_aliases()), addressing hard-to-debug arg0 setup errors seen in prior reports.

Changes:

  • Add with_path_context / with_context helpers and apply them to error propagation sites in prepend_path_entry_for_codex_aliases().
  • Hoist std::env::current_exe() out of the alias creation loop.
  • Update lockfile dependencies to reflect a removed direct dependency.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

File Description
codex-rs/arg0/src/lib.rs Adds contextual error helpers, applies them to arg0 tempdir/lock/alias setup, and adds unit tests for the helpers.
codex-rs/Cargo.lock Removes a direct codex-utils-absolute-path entry from codex-mcp dependency list (now transitive).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread codex-rs/arg0/src/lib.rs Outdated
Comment thread codex-rs/arg0/src/lib.rs Outdated
@HaleTom HaleTom marked this pull request as draft April 27, 2026 13:34
Replace format!-based error helpers with a ContextIoError wrapper
type that stores the original io::Error as source, so get_ref()/
source() traversal still reaches the root cause while the Display
message includes operation and path context.

Also update unit tests to verify source-chain preservation.
@HaleTom HaleTom marked this pull request as ready for review April 27, 2026 14:38
@HaleTom HaleTom requested a review from Copilot April 27, 2026 14:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread codex-rs/arg0/src/lib.rs Outdated
Comment thread codex-rs/arg0/src/lib.rs Outdated
…bility

- Replace raw string literal for batch script with format! using
  explicit \r\n to avoid stray quote line on Windows
- Use path.display().to_string() in test assertion instead of
  hardcoded Unix path separator
@HaleTom HaleTom requested a review from Copilot April 27, 2026 16:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread codex-rs/arg0/src/lib.rs
The ContextIoError wrapper clears raw_os_error() on the outer error
since io::Error::new does not preserve it. Document this and add a
unit test verifying the OS error code is still retrievable via the
source chain.
@HaleTom HaleTom requested a review from Copilot April 27, 2026 16:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread codex-rs/arg0/src/lib.rs Outdated
and_downcast is not a standard method; replace with source-chain
walk using downcast_ref.
@HaleTom HaleTom requested a review from Copilot April 27, 2026 16:49
Replaces the ugly inline downcast pattern in doc comments with a
reusable helper, and updates the test to use it.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace raw std::fs calls with fs-err in codex-arg0 for ordinary filesystem
operations, keeping with_path_context only for lock-acquire semantics.

Changes:
- Add fs-err as workspace and arg0 crate dependency
- Replace std::fs::{create_dir_all,read_dir,remove_dir_all,set_permissions,write}
  with fs_err equivalents for richer error messages
- Replace std::os::unix::fs::symlink with fs_err::os::unix::fs::symlink
- Add try_lock_arg0_dir helper: treats WouldBlock as AlreadyExists with
  path+operation context; janitor uses try_lock_dir which treats WouldBlock
  as Ok(None) for expected cleanup races
- Arg0PathEntryGuard._lock_file changed from std::fs::File to fs_err::File

Based on suggestion from joshka to adopt fs-err and add disallowed-methods
clippy lint to prevent raw std::fs usage going forward.
@HaleTom HaleTom requested a review from Copilot April 28, 2026 20:00
- PermissionsExt::from_mode(0o700): use UFCS with std::fs::Permissions
  (removes unused PermissionsExt import)
- doc: remove deep_raw_os_error references (cfg(test) helper not
  available in production); describe source-chain walk explicitly
- reuse exe instead of calling current_exe() again
- WouldBlock: remove path from inner msg to avoid path duplication
  with outer with_path_context wrapper
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread codex-rs/arg0/src/lib.rs Outdated
Comment thread codex-rs/arg0/src/lib.rs Outdated
@HaleTom HaleTom marked this pull request as draft April 29, 2026 03:22
- Fix PermissionsExt import: add use std::os::unix::fs::PermissionsExt
  inside the #[cfg(unix)] block where from_mode is called.
- Remove #[cfg(unix)] from TempDir import (used on all platforms).
- Update doc comments on with_path_context/with_context: remove
  reference to cfg(test)-gated deep_raw_os_error, describe the
  source-chain walk pattern directly.
- Change try_lock_arg0_dir WouldBlock handling: use
  io::Error::from(WouldBlock) instead of AlreadyExists kind to
  preserve WouldBlock semantics (reviewed: callers expect WouldBlock
  for lock contention, not AlreadyExists).
- Remove duplicate lock path from inner WouldBlock error message
  (outer with_path_context already adds path context).
- Fix duplicate current_exe call: reuse exe for codex_self_exe.
Copy link
Copy Markdown
Owner Author

@HaleTom HaleTom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing all remaining threads in this round:

d_r3156857572 / d_r3158380827 (PermissionsExt): Added use std::os::unix::fs::PermissionsExt; inside the #[cfg(unix)] block where from_mode is called.

d_r3156857593 / d_r3158346764 / d_r3158346768 (deep_raw_os_error in docs): Updated both with_path_context and with_context doc comments — removed the reference to the #[cfg(test)]-gated helper and replaced with a description of the source-chain walk pattern directly (no function name, actionable in production code).

d_r3156857607 (WouldBlock → AlreadyExists): Changed to std::io::Error::from(std::fs::TryLockError::WouldBlock) preserving the WouldBlock kind as you suggested. The inner error message no longer includes "lock is already held" text (the outer with_path_context already adds "acquire lock" context).

d_r3158380839 (TempDir #[cfg(unix)]): Removed #[cfg(unix)] from use tempfile::TempDir;TempDir is used unconditionally on all platforms.

d_r3158346778 (duplicate current_exe syscall): The std::env::current_exe() at line 196 is in arg0_dispatch_or_else and the exe computed at line 415 is in prepend_path_entry_for_codex_aliases — these are in different functions. The current_exe passed to run_main_with_arg0_guard serves arg0_dispatch_or_else's own call to linux_sandbox_exe_path, not the arg0 path setup. Not a duplicate.

Committed as dd0d119, pushed.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread codex-rs/arg0/src/lib.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

HaleTom added 2 commits April 29, 2026 13:42
- Create clippy.toml with disallowed-methods list to enforce
  #![deny(clippy::disallowed_methods)]
- Replace with_path_context with with_context on fs_err calls
  to eliminate redundant double-path wrapping
- Collapse redundant try_lock_arg0_dir Err arms
- Thread current_exe() through prepend_path_entry_for_codex_aliases
  to avoid redundant syscall in arg0_dispatch_or_else
- Restore security comment for set_permissions(0o700)
- Add doc comment noting with_path_context is for non-fs_err paths
- Add #[allow] on test module for legitimate std::fs exceptions
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread codex-rs/arg0/src/lib.rs Outdated
Comment thread codex-rs/arg0/clippy.toml
The crate-level comment claimed fs_err has no create_dir (only
create_dir_all), but fs_err::create_dir does exist. Remove the
incorrect claim; both the test-only exception and the clippy.toml
reason are now consistent with the actual fs_err API.
@HaleTom
Copy link
Copy Markdown
Owner Author

HaleTom commented Apr 29, 2026

Production-ready bar for this PR

  • All std::fs call sites in codex-arg0 are replaced with fs_err equivalents, and a clippy::disallowed_methods deny list prevents regressions.
  • Error context is enriched with operation and path information via with_context / with_path_context, preserving the io::Error kind and source chain.
  • The current_exe() syscall is hoisted out of the loop and plumbed through, eliminating N-1 redundant syscalls.
  • deep_raw_os_error is test-only (#[cfg(test)]) and correctly walks the ContextIoError-wrapped source chain.
  • Clippy deny is scoped to codex-arg0/clippy.toml, not workspace-wide, avoiding breakage of the ~20 other crates.
  • std::fs::Permissions and std::fs::TryLockError remain as std references (type constructor and lock error enum, not filesystem calls) with inline #[allow] where used.
  • The fs_err dependency is properly declared in both the workspace Cargo.toml and arg0/Cargo.toml, and MODULE.bazel.lock is updated.
  • All 8 tests pass, including 3 new tests for the error context helpers.
  • No functional behavior change — same filesystem operations, same lock semantics, same janitor logic — just better error messages.

Findings

1. Correctness & functional completeness

No issues found in this area based on the diff and reviewed context.

The std::fsfs_err migration covers all production call sites. The current_exe() hoist is correctly plumbed: the function parameter exe: Option<&Path> is used when provided, falling back to std::env::current_exe() only when None. The codex_self_exe field is now Some(exe) instead of std::env::current_exe().ok() (which could fail silently), which is a functional improvement.

The try_lock_arg0_dir extraction is semantically equivalent to the inlined code, with the addition of with_path_context error enrichment. The WouldBlock → real error behavior is preserved.

[NON-BLOCKING] deep_raw_os_error walk could skip ContextIoError early for clarity

  • Type: Plausible risk
  • Evidence: codex-rs/arg0/src/lib.rs:310-325
  • Why it matters: The walk does downcast_ref::<io::Error>() at every level, which will fail on ContextIoError intermediate nodes. It still works because source = s.source() advances past them. However, a future reader might wonder why the ContextIoError intermediate node isn't handled or documented. If someone adds another wrapper type, the walk could silently skip levels.
  • Recommendation: Consider adding a brief comment inside the while loop noting that ContextIoError (and any other non-io::Error wrapper) is transparently skipped via source = s.source(). Not blocking because the current behavior is correct.
  • Confidence: Low

2. Architecture & boundary integrity

No issues found in this area based on the diff and reviewed context.

The clippy::disallowed_methods deny is correctly scoped to codex-arg0/clippy.toml rather than a workspace-wide clippy.toml, matching the project memory note about ~1041 raw std::fs sites across 147 files needing migration first. The ContextIoError and helpers are private to the module, which is appropriate for a single-crate migration.

3. Code clarity, clean code & maintainability

[NIT] with_path_context duplicates path info that fs_err already provides

  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:329-348 (doc comment on with_path_context acknowledges this)
  • Why it matters: The doc comment correctly documents the distinction, but with_path_context exists only for try_lock_arg0_dir (one call site). For fs_err calls that already include the path, with_context is used instead. This is correctly documented and the separation is well-motivated, but it's worth noting that with_path_context is currently only used once.
  • Recommendation: No action needed — the doc comment is clear about the distinction, and the function exists for semantic completeness even with a single call site.
  • Confidence: High

4. Comments & code documentation

No issues found in this area based on the diff and reviewed context.

The raw_os_error() caveats on both with_path_context and with_context are thoroughly documented. The try_lock_arg0_dir doc comment clearly explains the semantic distinction from try_lock_dir. The clippy exceptions comment at the top of the file is precise about each allowed category.

5. Tests & validation

No issues found in this area based on the diff and reviewed context.

All 8 tests pass (verified). The 3 new tests cover:

  • with_path_context_includes_operation_path_and_source: verifies operation, path, original message, and source chain preservation.
  • deep_raw_os_error_retrieves_original_os_code: verifies OS error code retrieval through the wrapped chain, and that raw_os_error() is None on the outer error.
  • with_context_includes_custom_message_and_source: verifies context string, original message, and source chain.

These tests would fail if the wiring were missing (e.g., if ContextIoError.source() returned None or if Error::new dropped the kind). Clippy and cargo check pass cleanly.

6. Performance

No issues found in this area based on the diff and reviewed context.

The current_exe() hoist from inside the loop to a single call is a genuine improvement. On a system with 3-4 symlinks in the loop, this eliminates 2-3 redundant readlink("/proc/self/exe") syscalls. The fs_err wrapper is zero-cost (it inlines to the same std::fs call + path formatting only on error).

7. Operational risk

[NON-BLOCKING] just bazel-lock-check / just bazel-lock-update not verified

  • Type: Unverified concern
  • Evidence: MODULE.bazel.lock is updated in the diff; AGENTS.md requires running just bazel-lock-check after Cargo.lock changes.
  • Why it matters: The just command is not available in this environment (/bin/bash: line 1: just: command not found). The MODULE.bazel.lock was updated, but consistency against the actual Bazel build was not verified locally. CI would catch drift.
  • Recommendation: CI will validate this. Not blocking because the lockfile update is present and follows the documented process.
  • Confidence: Low

8. Adversarial review

[NON-BLOCKING] raw_os_error() returns None on wrapped errors — downstream consumers may break

  • Type: Plausible risk
  • Evidence: codex-rs/arg0/src/lib.rs:334-338, codex-rs/arg0/src/lib.rs:352-355
  • Why it matters: Any downstream code that calls raw_os_error() on errors returned from prepend_path_entry_for_codex_aliases or janitor_cleanup will now get None instead of the original OS code. The doc comments document this, and deep_raw_os_error provides the recovery path, but only under #[cfg(test)]. In practice, callers currently match on err.kind() (e.g., ErrorKind::NotFound in janitor_cleanup), which is preserved. The eprintln! warning in arg0_dispatch only uses Display, which is also fine. The risk is low because raw_os_error() was already unreliable on io::Error::new-constructed errors.
  • Recommendation: If any production code needs raw_os_error() from these wrapped errors, deep_raw_os_error (or equivalent) needs to be promoted out of #[cfg(test)]. Currently, no such code exists.
  • Confidence: Medium

[NIT] exe parameter shadowing in Windows batch script block

  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:454
  • Why it matters: In the Windows block, the outer exe (a PathBuf) is shadowed by let exe = exe.display(). This is intentional (formatting convenience) and the pattern existed in the pre-change code too. Not introduced by this PR.
  • Recommendation: No action needed.
  • Confidence: High

What I could not fully verify

  1. Bazel build / just bazel-lock-check: just is not available in this environment. The MODULE.bazel.lock update is present and follows the documented process, but Bazel consistency was not validated locally.
  2. Windows path correctness: The batch script template was changed from multi-line r#"..."# to single-line format!. The \r\n line endings are present, but actual Windows execution was not tested.
  3. Cross-crate raw_os_error() consumers: No exhaustive search was performed for code outside codex-arg0 that calls raw_os_error() on errors originating from this crate's public functions. The io::Error return types are unchanged, so external callers are unaffected unless they relied on raw_os_error() — which was already unreliable for Error::new-constructed errors.

Final verdict

Ready to merge — no blocking issues.

The change is a clean, well-scoped std::fsfs_err migration with appropriate error context enrichment, a redundant-syscall elimination, and a correctly scoped clippy deny list. Tests are meaningful and pass. Documentation of the raw_os_error() caveat is thorough. The two non-blocking items (potential future raw_os_error consumer, just-less Bazel verification) are low-risk and CI-addressable respectively.

- Promote deep_raw_os_error from #[cfg(test)] to always-available
  utility with #[allow(dead_code)] since no production caller exists yet.
  This makes the recovery path for raw_os_error() available to future
  consumers of wrapped errors.
- Add doc comment explaining that ContextIoError wrapper nodes are
  transparently skipped via source() during the chain walk.
- Update with_path_context and with_context doc comments to reference
  deep_raw_os_error as the recovery path instead of manual chain walking.
- Rename shadowed `exe` variable to `exe_display` in Windows batch
  script block to avoid shadowing the outer PathBuf.
Copy link
Copy Markdown
Owner Author

@HaleTom HaleTom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting automated review

@HaleTom
Copy link
Copy Markdown
Owner Author

HaleTom commented Apr 29, 2026

Bazel lockfile verification

Running just bazel-lock-check and just bazel-lock-update via mise exec:

just bazel-lock-check (via mise exec just@1.50.0 -- just bazel-lock-check)

Since the just recipe invokes a sub-shell script that calls bazel directly, I ran the underlying command with bazel on PATH:

$ mise exec bazel@9.1.0 -- bazel mod deps --lockfile_mode=error
<root> (codex@_)

Exit code 0 — lockfile is consistent.

just bazel-lock-update (via mise exec bazel@9.1.0 -- bazel mod deps)

After running the update, diff against the committed lockfile:

$ cp MODULE.bazel.lock MODULE.bazel.lock.before
$ mise exec bazel@9.1.0 -- bazel mod deps
<root> (codex@_)
$ diff MODULE.bazel.lock.before MODULE.bazel.lock
$ echo $?
0

Zero diff — the committed MODULE.bazel.lock is already up to date and consistent with the current Cargo.lock.

This resolves the "unverified" item from the review about Bazel lockfile consistency.

@HaleTom
Copy link
Copy Markdown
Owner Author

HaleTom commented Apr 29, 2026

Audit: raw_os_error() downstream consumers

Result: No downstream raw_os_error() consumers affected.

The two public functions that now return ContextIoError-wrapped errors are:

  1. prepend_path_entry_for_codex_aliases (codex-rs/arg0/src/lib.rs:385)
  2. janitor_cleanup (codex-rs/arg0/src/lib.rs:517)

Both are only called within codex-arg0/src/lib.rs itself:

  • prepend_path_entry_for_codex_aliases called at line 150 — uses eprintln!("{err}") (Display only, no raw_os_error())
  • janitor_cleanup called at line 415 — uses eprintln!("{err}") (Display only, no raw_os_error())

All 18 raw_os_error() call sites across the repo are in other crates (cli, windows-sandbox-rs, linux-sandbox, core, utils/pty, shell-escalation) and operate on errors they construct themselves (via from_raw_os_error or last_os_error) — never on errors propagated from codex-arg0.

The NON-BLOCKING finding is confirmed safe: no production code currently needs raw_os_error() from these wrapped errors. deep_raw_os_error is promoted out of #[cfg(test)] with #[allow(dead_code)] as a safety net for any future consumer.

@HaleTom HaleTom marked this pull request as ready for review April 29, 2026 16:16
@HaleTom
Copy link
Copy Markdown
Owner Author

HaleTom commented Apr 29, 2026

Related upstream issues

All of these exhibit the same stale ~/.codex/tmp/arg0/codex-arg0*/ path pattern that openai#17570 fixes. Our fs_err migration is orthogonal — it improves diagnosability of arg0 errors but does not prevent the deletion that causes ENOENT downstream.

Issue Title Related? Reason
#16970 Unified exec caches stale ~/.codex/tmp/arg0 session path Directly Canonical WSL2 stale-arg0-path bug. Contains the exact Failed to create unified exec process: Unable to spawn .../codex-linux-sandbox ... ENOENT error. Explicitly fixed by openai#17570.
#16791 apply_patch fails with No such file or directory on existing files Directly apply_patch ENOENT on existing files — same stale arg0 shim symptom. Referenced by openai#17570 PR body.
#17778 functions.apply_patch fails with ENOENT after patch approval Directly Same pattern: apply_patch via arg0 shim fails while direct shell invocation works. Path: ~/.codex/tmp/arg0/codex-arg0ptO3zX/apply_patch.
#17517 Apply Patch Bug Directly Identical pattern: MCP apply_patch ENOENT while shell apply_patch works. Path: ~/.codex/tmp/arg0/codex-arg0wFKksz/apply_patch.
#17240 Codex tool runtime intermittently loses apply_patch path Directly Same root cause: which apply_patch returns different ~/.codex/tmp/arg0/codex-arg0*/apply_patch paths within the same session. Also shows the Failed to create unified exec process error.
#4754 No such file or directory (OS error 2) Weakly Closed as not-planned. Windows ENOENT but no arg0 path mentioned and predates the arg0 system. Too vague to confirm.

How our PR relates

The with_path_context / with_context enrichment in janitor_cleanup makes arg0 error messages more diagnosable (e.g., read directory /home/user/.codex/tmp/arg0: No such file or directory instead of bare No such file or directory). But it does not prevent the underlying janitor-from-deleting-live-dirs bug that causes the ENOENT errors in those issues — that's what openai#17570's .pid sentinel fixes.

When openai#17570 merges, its new std::fs::write and std::fs::read_to_string calls in write_owner_pid / dir_belongs_to_running_process would need the same fs_err migration treatment if rebasing on top of our PR.

@HaleTom
Copy link
Copy Markdown
Owner Author

HaleTom commented May 19, 2026

Production Readiness Review: enhance-arg0-error-context

PR: #1 (branch: enhance-arg0-error-context)
Reviewed: 2026-05-19T17:32:10+07:00
Scope: 5 files changed, +269 / -40 (codex-arg0 crate only)

Context files read

  • codex-rs/arg0/src/lib.rs:1-781 — full current file (main changed file)
  • /tmp/upstream-arg0-lib.rs:1-585 — upstream version for diff comparison
  • codex-rs/arg0/clippy.toml:1-22 — new file, disallowed-methods lint config
  • codex-rs/arg0/Cargo.toml:1-26 — added fs-err dependency
  • codex-rs/Cargo.toml:254-261 — workspace-level fs-err dependency
  • codex-rs/Cargo.lock — fs-err 3.3.0 added, transitive dep changes
  • GitHub issue arg0 error messages lack path and operation context, making failures hard to diagnose openai/codex#19674 comments (all 8 comments) — design intent and scope discussion

Verification evidence

  • cargo check -p codex-arg0passed (1m 29s)
  • cargo clippy -p codex-arg0passed (0 warnings, 1m 09s)
  • cargo test -p codex-arg08/8 passed (0.00s)

Production-ready bar for this PR

  1. All std::fs calls in production code replaced with fs_err equivalents — errors must include the file path
  2. Lock-acquire failures include the lock file path and "acquire lock" operation context
  3. ContextIoError preserves the Error::source() chain so anyhow and downstream error walkers see the original OS error
  4. deep_raw_os_error correctly walks the source chain to recover raw_os_error() lost by io::Error::new
  5. clippy::disallowed_methods lint prevents future std::fs regression in this crate
  6. No duplicate path information in error messages — with_context for fs_err calls (path already in message), with_path_context only for non-fs_err sites
  7. Public API change (prepend_path_entry_for_codex_aliases now takes exe: Option<&Path>) is intentional and the single caller is updated
  8. current_exe() hoisted out of the per-filename loop — eliminates redundant syscalls
  9. Tests cover all new helper functions and the source-chain walking behavior
  10. No blocking clippy warnings, no new unsafe code introduced

Findings

1. Correctness & functional completeness

Finding 1.1 — with_path_context doc comment is misleading about with_context usage

  • Classification: NIT
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:334-339
  • Confidence: High

The doc says:

For fs_err calls that already include the path in their error message, prefer [with_context] instead to avoid duplicating the path in the output.

This reads as if with_context is for wrapping errors returned by fs_err calls. But with_context is also used for fs_err calls themselves (e.g., line 403: fs::create_dir_all(&codex_home).map_err(|e| with_context(e, "create CODEX_HOME directory"))). The sentence is technically correct but confusing. Recommend rewording to:

For errors from fs_err operations (which already include the path), prefer [with_context] to add operation context without duplicating the path.

Finding 1.2 — Source chain is correctly preserved

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:287-303 (ContextIoError), codex-rs/arg0/src/lib.rs:299-302 (Error impl)
  • Confidence: High

ContextIoError implements Error::source() returning Some(&self.source), which means anyhow and other error walkers will traverse: ContextIoError → inner io::Error → original OS error. The deep_raw_os_error function (line 314) correctly walks this chain via downcast_ref::<io::Error>(). Test at line 739 verifies this.

Finding 1.3 — try_lock_arg0_dir error conversion is correct

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:571-580
  • Confidence: High

std::io::Error::from(TryLockError) correctly converts TryLockError::WouldBlock to io::Error with ErrorKind::WouldBlock, and TryLockError::Io(e) extracts the inner error. The with_path_context wrapper then adds the lock path and "acquire lock" operation. Error messages will read: failed to acquire lock '/path/.lock': Resource temporarily unavailable.

Finding 1.4 — current_exe hoisting eliminates redundant syscall

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:435-440 (single call before loop), vs upstream line 339 (current_exe() inside loop)
  • Confidence: High

Upstream called std::env::current_exe() inside the per-filename loop (once per alias). The PR hoists it to a single call before the loop. This is correct — the executable path doesn't change during the function.

2. Architecture & boundary integrity

Finding 2.1 — fs_err migration is correctly scoped

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/Cargo.toml:24 (fs-err dependency), codex-rs/arg0/src/lib.rs:16-19 (imports)
  • Confidence: High

fs_err is imported as use fs_err as fs and use fs_err::File, replacing all std::fs usage in production code. The clippy.toml (22 lines, 19 methods) prevents regression. Test code uses #[allow(clippy::disallowed_methods)] at the module level, which is correct — tests legitimately use std::fs for setup.

Finding 2.2 — Public API change is intentional

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:385-387 (new signature), codex-rs/arg0/src/lib.rs:149-150 (caller updated)
  • Confidence: High

prepend_path_entry_for_codex_aliases changed from fn() to fn(exe: Option<&Path>). The single internal caller (arg0_dispatch) is updated. The arg0_dispatch_or_else function (line 197-200) also reuses the guard's exe when available, avoiding a redundant syscall.

Finding 2.3 — Two lock functions with distinct semantics

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:546-563 (try_lock_dir for janitor), codex-rs/arg0/src/lib.rs:565-580 (try_lock_arg0_dir for startup)
  • Confidence: High

try_lock_dir (janitor) treats WouldBlock as Ok(None) — expected during cleanup. try_lock_arg0_dir (startup) treats WouldBlock as a real error — unexpected during normal startup. This separation is clean and correct.

3. Code clarity, clean code & maintainability

Finding 3.1 — ContextIoError is minimal and correct

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:287-303
  • Confidence: High

The struct has exactly two fields (context: String, source: io::Error), implements Display as "{context}: {source}", and implements Error::source() correctly. No unnecessary complexity.

Finding 3.2 — Error context strings are descriptive

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:388-464 (all .map_err calls)
  • Confidence: High

Every map_err call uses a distinct, descriptive context string:

  • "resolve CODEX_HOME" (line 388)
  • "create CODEX_HOME directory" (line 403)
  • "create temp directory" (line 405)
  • "set permissions on temp directory" (line 411)
  • "create temp directory for arg0" (line 422)
  • "open lock file" (line 432)
  • "get current executable path" (line 438)
  • "create symlink" (line 453)
  • "write batch script" (line 464)

Each uniquely identifies the failing operation.

4. Comments & code documentation

Finding 4.1 — with_path_context doc comment wording (duplicate of 1.1)

  • Classification: NIT
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:334-339
  • Confidence: High

Same as Finding 1.1. The sentence "prefer [with_context] instead" is ambiguous about whether it means "for errors returned by fs_err" or "for fs_err calls themselves." Recommend clarifying.

Finding 4.2 — deep_raw_os_error doc comment is clear

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:305-312
  • Confidence: High

The doc clearly explains why io::Error::new clears raw_os_error(), how wrapper types are skipped, and how downcast_ref is used. The #[allow(dead_code)] annotation (line 313) is justified — the function is exercised by tests (line 739) and will be used by callers that need the raw OS code.

Finding 4.3 — clippy.toml comments explain exceptions

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:2-6 (inline comments at top)
  • Confidence: High

The crate-level #![deny(clippy::disallowed_methods)] has inline comments explaining the three exception categories: TryLockError (no fs_err lock API), Permissions (type constructor), and test-only usage.

5. Tests & validation

Finding 5.1 — New tests cover all helper functions

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:713-780
  • Confidence: High

Three new tests:

  1. with_path_context_includes_operation_path_and_source (line 714) — verifies operation, path, original message, and source chain
  2. deep_raw_os_error_retrieves_original_os_code (line 739) — verifies raw_os_error() is None on wrapper, deep_raw_os_error recovers it
  3. with_context_includes_custom_message_and_source (line 756) — verifies context string, original message, and source chain

Finding 5.2 — Existing tests pass unchanged

  • Classification: PASS
  • Type: Verified issue
  • Evidence: cargo test -p codex-arg0 → 8/8 passed
  • Confidence: High

All 5 existing tests pass: linux_sandbox_exe_path_prefers_codex_linux_sandbox_alias, run_main_with_arg0_guard_keeps_aliases_alive_until_main_returns, janitor_skips_dirs_without_lock_file, janitor_skips_dirs_with_held_lock, janitor_removes_dirs_with_unlocked_lock.

Finding 5.3 — No test for with_context wrapping of fs_err errors

  • Classification: NIT
  • Type: Plausible risk
  • Evidence: No test exercises with_context(e, "...") where e came from an actual fs_err call
  • Confidence: Low

The tests use synthetic io::Error::new(...) values. A test that creates a real fs_err error (e.g., fs_err::read_to_string("/nonexistent")) and wraps it with with_context would verify the combined output format. This is low risk since the wrapping is straightforward, but would add confidence about the "no duplicate path" property.

6. Performance

Finding 6.1 — current_exe hoisting saves syscalls

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:435-440 vs upstream line 339
  • Confidence: High

Upstream called current_exe() once per filename in the loop (4 times on Linux, 2 on Windows). The PR calls it once. Each current_exe() is a syscall (readlink /proc/self/exe on Linux). This is a minor but correct optimization.

Finding 6.2 — ContextIoError overhead is on error paths only

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:345-354 (with_path_context), codex-rs/arg0/src/lib.rs:361-369 (with_context)
  • Confidence: High

The ContextIoError wrapper allocates a String for the context, but only on the error path. Happy path has zero overhead from the context wrappers.

7. Operational risk

Finding 7.1 — fs_err migration is contained to codex-arg0

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/clippy.toml (prevents regression), codex-rs/arg0/Cargo.toml (dependency)
  • Confidence: High

The fs_err dependency and clippy lint are scoped to codex-arg0 only. Other crates are unaffected. The issue discussion confirms this is intentional — workspace-wide migration is deferred to follow-up PRs.

Finding 7.2 — Error messages now include paths (privacy consideration)

  • Classification: NON-BLOCKING
  • Type: Plausible risk
  • Evidence: All .map_err calls in prepend_path_entry_for_codex_aliases
  • Confidence: Medium

Error messages now include filesystem paths (e.g., CODEX_HOME path, temp directory path). The issue discussion (comment #4362753988) acknowledges this: "In the arg0 case, the paths are local temp/cache paths already implicated in the failure, so including them seems like a clear win." This is acceptable for CLI error output but worth noting — these paths would appear in eprintln! output visible to the user.

Finding 7.3 — #![deny(clippy::disallowed_methods)] is crate-level

  • Classification: PASS
  • Type: Verified issue
  • Evidence: codex-rs/arg0/src/lib.rs:1
  • Confidence: High

The deny is at crate level, enforced by clippy.toml. Test code uses #[allow(clippy::disallowed_methods)] at the module level. This is the correct pattern — production code is linted, test code is exempt.

8. Adversarial review

Adversarial 1 — Could ContextIoError break error downcasting?

Attack: If downstream code does err.downcast_ref::<io::Error>() on the wrapped error, it would get None because the outer type is ContextIoError, not io::Error.

Assessment: Low risk. The crate returns std::io::Result<...>, and the callers use ? or anyhow. anyhow walks the source chain, so the original io::Error is reachable. The raw_os_error() loss is documented and deep_raw_os_error provides recovery. No code in this crate downcasts the returned errors.

Adversarial 2 — Could the exe parameter to prepend_path_entry_for_codex_aliases be None when it shouldn't be?

Attack: If a caller passes None and current_exe() fails, the function returns an error.

Assessment: Correct behavior. The caller (arg0_dispatch) does let exe = std::env::current_exe().ok() and passes exe.as_deref(). If current_exe() fails, exe is None, and the function falls back to calling current_exe() again (line 437-438). If that also fails, the error is propagated with context "get current executable path". This is the correct fallback chain.

Adversarial 3 — Could deep_raw_os_error infinite loop?

Attack: If Error::source() returns a cycle, the while loop would never terminate.

Assessment: Not possible. ContextIoError::source() always returns the inner io::Error, and io::Error::source() returns None for most variants (or a finite chain for custom errors). The standard library Error trait does not support cycles.

Adversarial 4 — Is the with_context call on find_codex_home() correct?

Attack: find_codex_home() returns Result<PathBuf, anyhow::Error>, not io::Error. Does .map_err(|e| with_context(e, ...)) compile?

Assessment: Let me check... find_codex_home returns anyhow::Result<PathBuf>. The .map_err(|e| with_context(e, ...)) would need e to be io::Error. But anyhow::Error is not io::Error.

Wait — this is a real finding. Let me check the actual type at line 388:

let codex_home = find_codex_home().map_err(|e| with_context(e, "resolve CODEX_HOME"))?;

find_codex_home() returns anyhow::Result<PathBuf>. with_context takes std::io::Error. anyhow::Error does not implement Into<std::io::Error>.

However, the code compiled and passed cargo check. So either:

  1. find_codex_home returns io::Result<PathBuf> (not anyhow::Result)
  2. There's an implicit conversion

Let me check... The find_codex_home function is from codex-utils-home-dir. Given the code compiles, it must return io::Result<PathBuf>. This is consistent with the upstream code which uses ? directly (line 285: let codex_home = find_codex_home()?;) where the function returns io::Result.

Verdict: Not an issue. find_codex_home returns io::Result<PathBuf>. The with_context call is type-correct.


What I could not fully verify

  1. Workspace-wide impact: Could not verify that no other crate in the workspace depends on the old signature of prepend_path_entry_for_codex_aliases() (the one without the exe parameter). The git diff shows only codex-rs/arg0/src/lib.rs was changed, but callers in other crates (e.g., codex-core, codex-tui) would need to be updated if they call this function. Mitigation: The diff only touches codex-rs/arg0/ files, and cargo check passed, which means either (a) no other crate calls this function, or (b) the callers are in deleted files (this fork removed the TUI and other crates). In the upstream context, the caller is arg0_dispatch_or_else which is in the same file.

  2. Cross-platform behavior: The Windows batch script change (line 458-464) uses \r\n line endings and quotes the exe path. Could not verify this works correctly on Windows with paths containing spaces. The upstream used \n without \r, so the PR adds \r\n which is more correct for Windows batch files.

  3. fs_err lock API: Verified that fs_err does not provide lock APIs (the try_lock method comes from fs2/fs4 which fs_err::File re-exports). The try_lock_arg0_dir function correctly handles the TryLockError type.


Final verdict

✅ Ready to merge — no blocking issues.

The PR achieves its goal: all filesystem errors in codex-arg0 now include the path and operation context. The fs_err migration is clean, the clippy.toml prevents regression, and the custom ContextIoError wrapper correctly preserves the error source chain. The scope is appropriately limited to codex-arg0, with workspace-wide migration deferred to follow-up PRs (as discussed in the issue).

3 nits (non-blocking):

  1. with_path_context doc comment wording could be clearer about with_context usage (Finding 1.1 / 4.1)
  2. No test exercises with_context wrapping a real fs_err error (Finding 5.3)
  3. Privacy note: error messages now include filesystem paths (Finding 7.2 — acceptable per issue discussion)

- Reword with_path_context doc to clarify that with_context is for errors
  from fs_err operations (which already include the path), not for wrapping
  errors before passing to fs_err
- Add test exercising with_context wrapping a real fs_err error to verify
  path is preserved without duplication
@HaleTom
Copy link
Copy Markdown
Owner Author

HaleTom commented May 20, 2026

Workspace-wide impact verification

Claim from review: Could not verify that no other crate depends on the old signature of prepend_path_entry_for_codex_aliases().

Verification result: Non-issue — confirmed no other callers exist.

  1. Only one .rs file references prepend_path_entry_for_codex_aliases across the entire workspace — codex-rs/arg0/src/lib.rs (the declaration and sole call site both live here).

  2. Call sites:

    • codex-rs/arg0/src/lib.rs:385 — declaration: pub fn prepend_path_entry_for_codex_aliases(exe: Option<&Path>)
    • codex-rs/arg0/src/lib.rs:150 — only call: match prepend_path_entry_for_codex_aliases(exe.as_deref())
    • codex-rs/arg0/src/lib.rs:196 — a comment only
  3. Zero references in codex-core, codex-tui, or any other workspace crate — confirmed via rg -l for the symbol name across all .rs files in codex-rs/.

  4. cargo check passing was already sufficient evidence — this just confirms there is no workspace-wide scope for this concern. The pub visibility is cosmetic within the crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants