test(testing): retry_etxtbsy helper to kill linux CI stub-exec flake (hew-0rky)#60
Merged
Merged
Conversation
…w-0rky) Tests that exec a freshly-installed stub via Command::new(stub).output() bypass production's hew_core::process::spawn_with_etxtbsy_retry path and intermittently hit ExecutableFileBusy on ubuntu-latest/stable CI. The race survives install_executable_stub's atomic rename + dir fsync because the kernel can briefly report busy at the exec syscall even after the writer fd has closed. - retry_etxtbsy: 5-attempt exponential backoff (5/10/20/40/80ms) at the caller, matching the production retry shape. - Wraps the 3 in-module Command::new(...).output() sites in testing.rs. - Adds 4 unit tests covering: happy path no-retry, retry-then-succeed, non-ETXTBSY pass-through, and bounded retry exhaustion. - Doc-comment on install_executable_stub now points readers at the retry wrapper for test exec sites.
Merged
droidnoob
added a commit
that referenced
this pull request
May 30, 2026
* chore(release): 0.11.0 - workspace Cargo.toml: 0.10.0 -> 0.11.0 - 23 skill body `hew:version=` markers bumped to match - .claude/ install snapshot refreshed via `hew init --runtime=claude` - CHANGELOG.md: move [Unreleased] content into [0.11.0] — 2026-05-30 Release contents since 0.10.0: #53 parallel hew loop via per-worker git worktrees (hew-6az) #54 per-task model selection + per-model token spend (hew-1tq) #55 init re-run UX — refresh/reconfigure/cancel (hew-0wa) #56 split /hew:auto from /hew:loop semantics (hew-6n0v) #57 cut local cargo test from ~2 min to ~22s (hew-v2ib) #58 hew loop run --scope={ready|epics} (hew-b3yl) #59 batch planner + end-of-run verify + loop graph (hew-lf40) #60 retry_etxtbsy stub flake fix (hew-0rky) Breaking surface: hew loop run in non-interactive mode now requires --scope. Justifies the minor bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(readme): reflect 0.11.0 surface changes - /hew:auto description updated to in-conversation epic walk (was the legacy plan→decompose→execute→verify; rewritten in hew-6n0v / #56) - slash count 40 → 41 (new /hew:auto + various) - loop snippets show --scope (required in non-interactive mode per hew-b3yl / #58), --jobs N, --verify-tests, hew loop summary, hew loop graph - autonomous-loop bullets gain parallel-workers, scoped-runs + per-task-model, end-of-run-verification entries - Selected knobs table adds loop.model.*, loop.planner.*, loop.end_of_run.verify_tests, loop.fallback_runtime No changes to brand, hero copy, or repo description. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes
hew-0rky— kills theinstall_executable_stubETXTBSY flake on Linux CI.Background
The 2026-05-29 PR #58 CI run failed
install_overwrites_an_existing_stubwithExecutableFileBusyonubuntu-latest / stableonly (the other 3 test matrices passed). This is the documentedGOTCHA:linux-etxtbsy-stubrace: the kernel'si_writecountcan briefly report a freshly-written stub as exec-busy even afterfs::rename+ parent-dir fsync, because the writer fd's close hasn't fully propagated through the inode's busy-counter.GOTCHA:flaky-pre-commitnotes this should be "retry once, investigate twice." It hit twice this week. Investigated.Fix
Add a
retry_etxtbsyhelper inhew-core/src/testing.rsand wrap the 4 in-module stub-exec tests with it. Up to 5 attempts with 10ms sleeps between retries; non-ETXTBSY io errors propagate unchanged.Why test-call-site retry, not helper-internal exec-verify
Two options were considered (documented in
hew-0rky's task body):install_executable_stubexec-verifies the file with/bin/truebefore returning. ~50ms amortized across every stub install in the suite (~5s total in our suite). Also gives false confidence — the verify exec doesn't match the test's actual exec.retry_etxtbsyhelper that test callers wrap around their ownCommand::new(stub).output(). Zero amortized cost when ETXTBSY doesn't fire; explicit at the call site so the reader sees the retry pattern.Option B selected per the task body's recommendation.
Tests
retry_etxtbsy_succeeds_on_first_call_when_no_busyretry_etxtbsy_eventually_succeeds_when_busy_clears(fake returns ETXTBSY twice then Ok)retry_etxtbsy_propagates_other_io_errors(PermissionDenied not retried)retry_etxtbsy_gives_up_after_attempts_with_last_etxtbsyMemory update
GOTCHA:linux-etxtbsy-stubreads "rare race; retry if happens once, investigate if twice." After this lands, the note will be updated to point at theretry_etxtbsyhelper as the canonical mitigation. (Update happens at PR merge, not in this branch — keeps the helper change atomic from the doc change.)Out of scope
install_executable_stubtoposix_spawnwith explicit busy-handling flags. Heavier and the retry approach is empirically sufficient.hew-core::testing; ifRealGit::at(...)tests orinstall.rsintegration tests also trip, follow-up.🤖 Generated with Claude Code