fix(login): hold Child handle and reap on cancel to close PID-reuse race#29
Merged
Conversation
PR #27's first cut at cancel stored only the OS PID and used shell kill / taskkill against it. Between wait_with_output returning and clear_login_pid running, the OS could have recycled the PID — a late cancel click would target whatever new owner held it. Worst case on Windows, taskkill /F /T would nuke an unrelated process tree. Replace the slot type with Mutex<Option<Child>> so cancel goes through Child::kill() on the actual handle. Both spawn-site exit and cancel atomically `take()` the same slot under one lock, so they can never race on the same Child. A new shared helper `wait_for_login_or_cancel(child) -> AppResult<Output>` drives a poll loop on Child::try_wait and is called from both mac and win run_codex_login (instead of each platform repeating the register/wait/clear dance). Adds typed `LOGIN_CANCELLED` error code and threads it through to the front-end. Three additional fixes from the local 3-agent review: - `cancel_login_in_progress` and `register_login_child`'s replace branch and `wait_for_login_or_cancel`'s try_wait Err arm all funnel through a new `drop_killed_child(child)` helper that does kill + wait. Without the wait, every cancel leaked a zombie on Unix (std::process::Child has no reaping Drop) and could let a stuck codex login keep running in the background long enough to write auth.json and confuse subsequent state. - `register_login_child` now uses `replace` and reaps any prior occupant rather than silently dropping it. Defensive in case acquire_login_lock is ever bypassed. - Front-end's catch handler now also recognises `LOGIN_CANCELLED` as a cancelled state, in addition to the optimistic `cancelledLoginProfile` flag. Tests: added end-to-end coverage of `wait_for_login_or_cancel` (happy path + cancel-from-sibling-thread). The cancel test polls the slot rather than sleeping a fixed amount so it stays stable on slow CI. Added `register_login_child_replaces_and_reaps_previous_occupant` to lock in the new replace/reap contract. Test module gated `#[cfg(all (test, unix))]` because it relies on /bin/sleep + /bin/sh. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Owner
Author
剩余在你侧
已知 follow-up(不阻塞)
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 493c738196
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
3 tasks
…fill hang P1 from chatgpt-codex-connector review on PR #29: the spawn-site poll loop only called Child::try_wait without draining the piped stdout / stderr. A verbose codex login (warnings, debug output) would fill the ~64 KB pipe buffer, block on write, never reach exit, and our loop would see Running forever — login would appear stuck until manual cancel. wait_for_login_or_cancel now takes the stdio handles off the Child *before* registering it in the slot, hands each pipe to a dedicated drainer thread that read_to_end into a Vec<u8>, and joins them after the child exits naturally. On cancel / try_wait Err the drainers self-clean (the killed child's pipes close → read_to_end returns EOF → thread exits) and we discard their output. Added regression test wait_for_login_or_cancel_drains_oversized_pipe_output that pushes 256 KiB through stdout — well past the 64 KB buffer threshold — and asserts the helper still returns Ok with the full payload. The test uses absolute paths (`/usr/bin/yes`, `/usr/bin/head`) so it stays stable when other parallel tests modify PATH via `std::env::set_var`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Follow-ups from the 3rd-round review on the pipe-drain fix: - Add `wait_for_login_or_cancel_captures_stderr_on_nonzero_exit` regression test. Counterpart to the stdout test: pushes 256 KiB through stderr only, exits non-zero, and asserts the helper still captures the full payload — the `LOGIN_FAILED` toast feeds `output.stderr`, so this is the path that actually matters in production. - Add `debug_assert!` to `register_login_child` enforcing that stdout/stderr are taken before parking the child. Documents the invariant `wait_for_login_or_cancel` already maintains and catches any future caller that forgets. - Document why `read_to_end`'s Err and `JoinHandle::join`'s panic branch silently collapse to empty bytes. Both are rare exceptional conditions; partial stdio on success is informational, on failure still better than nothing. We lose visibility because the project has no structured logger; flagged as a future revisit if one lands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 9, 2026
Cmochance
added a commit
that referenced
this pull request
May 9, 2026
…#31) Two small UX/perf fixes flagged by the local 3-agent reviews on PR #27 and PR #29: - Login button now sets `aria-label` reflecting its dual role: "Log into <profile>" when idle, "Cancel login for <profile>" while a login is in flight. The visible label was already updated (spinner + red "取消") and the `title` attribute carried the cancel hint, but many screen readers prefer aria-label and ignore title; this makes the dual semantics announced consistently across assistive tech. New i18n keys `profileLoginCancelAria` / `profileLoginReadyAria` (en + zh-CN). - mac `suggested_codex_cli_paths` walked the entire `PATH` and stat'd `<entry>/codex` for every component, blocking the `get_codex_cli_status` Tauri command for seconds when PATH includes NFS / SMB / slow drives. Now bounded by a 500 ms soft deadline: fixed locations (Codex.app bundle, Homebrew, npm-global, etc) are checked first since those are guaranteed-fast local stats, then the bounded PATH walk fills in whatever remaining suggestions fit in the deadline. Worst case: dialog still opens promptly even on a pathologically slow PATH. Both are pure refinements — no behavior change on the happy path. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Cmochance
added a commit
that referenced
this pull request
May 9, 2026
Bumps package.json to 1.5.7 (and lets `version-sync.mjs` propagate to Cargo.toml + Cargo.lock + package-lock.json) and adds the 1.5.7 CHANGELOG entry covering #26, #29, #30, #31: - #29 critical: PID-reuse race + buffer-fill hang in login cancel - #26 baseurl button red warning style - #30 InstallState + codex CLI helpers refactor to shared - #31 a11y aria-label + bounded mac PATH probe Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the PID-reuse race carried over from PR #27's first cut at cancel.
Mutex<Option<u32>>(PID slot) withMutex<Option<Child>>so cancel callsChild::kill()on the actual handle. Both natural-exit and cancel paths atomicallytake()the same slot under one lock — no more recycled-PID hazard.wait_for_login_or_cancel(child) -> AppResult<Output>drives aChild::try_waitpoll loop and is reused by bothmac/runtime/process.rsandwin/runtime/process.rsrun_codex_login. Removes the duplicated register/wait/clear dance.drop_killed_child(child)helper doeskill()+wait()together. Closes a zombie-leak hole flagged by the local 3-agent review:std::process::Childhas no reapingDrop, so dropping a killed child on Unix leaves a zombie until app exit — and could let a stuckcodex loginkeep running long enough to writeauth.json. Funneled through cancel, thetry_waitErr cleanup branch, andregister_login_child's replace-and-reap path.LOGIN_CANCELLEDplumbed through to the front-end. BothcancelledLoginProfile === profile(optimistic flag) anderror_code === "LOGIN_CANCELLED"(authoritative) now route to the same "已取消登录" toast.Test plan
cargo test --manifest-path src-tauri/Cargo.toml --lib— 66 passed (4 new tests onlogin_cancel: e2e happy-path, e2e cancel-from-thread, register-replaces-and-reaps, plus the existing peek/cancel unit tests, all gated#[cfg(all(test, unix))]).acquire_login_lock).Known follow-ups (not blocking)
Child::wait()after a failedTerminateProcess(e.g. elevated child) blocks until the child exits naturally. In practice we always spawn codex login as a non-elevated child of the app, so access-denied shouldn't fire — but if it ever does we'd want to bound the wait or move it onto a detached thread. Documented indrop_killed_child's rustdoc.try_waitErr branch is genuinely hard to unit-test portably; left as an accepted gap with the cleanup logic exercised in code review only.🤖 Generated with Claude Code