fix(oauth): retry secret-lock on Windows ERROR_ACCESS_DENIED (delete-pending race)#445
fix(oauth): retry secret-lock on Windows ERROR_ACCESS_DENIED (delete-pending race)#445gnanam1990 wants to merge 1 commit into
Conversation
…reateSecretFile createSecretFile's secret-lock loop only retried on os.ErrExist. On Windows a concurrent holder's `defer os.Remove(lockPath)` leaves the lock file in a "delete pending" state, so a racer's O_EXCL create returns ERROR_ACCESS_DENIED (os.ErrPermission) rather than ErrExist — which fell through to a hard "create token secret lock: Access is denied" error instead of retrying. Mirrors the identical handling already present in acquireFileLock (lock.go, from #261). Fixes the flaky Windows-CI failure TestLoadOrCreateSecretConcurrentConverges that was intermittently red-flagging every PR's Windows smoke job. Verified on macOS: build/vet/gofmt clean, oauth package green, concurrent- converge test passes under -race (5x). The Windows delete-pending path can't be reproduced off Windows, so the definitive proof is the Windows CI on this PR.
Zero automated PR reviewVerdict: No blockers found Blockers
Validation
ScopeHead: This deterministic review checks validation status and basic diff hygiene. A human reviewer still owns product judgment and design quality. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe lock-file acquisition retry logic in ChangesLock Acquisition Retry Fix
Estimated code review effort: 1 (Trivial) | ~3 minutes Suggested labels: bug, windows, oauth Suggested reviewers: none specifically identified from the provided context 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Approve — genuine root-cause fix, and it kills the CI flake for real
Worth it: yes. On Windows a concurrent holder's defer os.Remove(lockPath) leaves the .lock file delete-pending, so a racer's O_EXCL create returns ERROR_ACCESS_DENIED (os.ErrPermission) rather than ErrExist. The old createSecretFile loop only treated ErrExist as contention, so parallel token-secret creation (concurrent logins/refreshes) could spuriously hard-fail with oauth: create token secret lock: Access is denied. Treating ErrPermission as retryable is exactly right — and it mirrors the already-merged, identical handling in acquireFileLock (lock.go, #261); that sibling lock got the fix and this one was simply missed. 6 lines, precedented, well-commented.
Verified on Windows (isolated worktree): go test ./internal/oauth/ -run TestLoadOrCreateSecret -count=5 → 5/5 green (the concurrent-converge test that flakes now passes reliably); full package green; go build/vet/gofmt clean. I also confirmed it against the original racy test at -count=100 → 100/100. CI all green including Smoke (windows-latest) — same-repo branch, so that Windows pass is real.
Security: safe. The new ErrPermission branch is read-only (readSecretFileRetry only); the sole write path (writeNewSecretFile, temp-file + atomic rename) runs only after O_EXCL exclusively acquired the lock, so the broadened retry can't double-write or corrupt the secret. Fail-closed behavior untouched.
Nit (non-blocking): broadening to ErrPermission means a genuine permission failure (dir truly unwritable) now surfaces as a timed out waiting for token secret lock timeout rather than the immediate Access is denied — cosmetic diagnostics, and the same tradeoff already accepted in lock.go #261.
This is the correct fix of the #445/#451 pair — see my note on #451. Thanks @gnanam1990.
Summary
createSecretFile(internal/oauth/encrypt.go) takes a<secret>.lockwithO_CREATE|O_EXCLand retries on contention — but it only treatedos.ErrExistas contention. On Windows, when a concurrent holder'sdefer os.Remove(lockPath)leaves the lock file in a "delete pending" state, a racer'sO_EXCLcreate returnsERROR_ACCESS_DENIED(os.ErrPermission), notErrExist— so it fell through to a hardoauth: create token secret lock: Access is deniederror instead of retrying.This treats
os.ErrPermissionas retryable contention too, mirroring the identical handling already present inacquireFileLock(lock.go, from #261) for the exact same OS quirk. That sibling lock got the fix; this second lock loop was missed.Impact: fixes the intermittently-failing Windows CI test
TestLoadOrCreateSecretConcurrentConvergesthat has been red-flagging the Windows smoke job on unrelated PRs — and fixes a real Windows bug where concurrent token-secret creation (parallel logins/refreshes) could spuriously fail.Verification
go build ./...,go vet,gofmtclean;go test ./internal/oauth/green; the concurrent-converge test passes under-race(5×).ERROR_ACCESS_DENIEDpath can't be reproduced off Windows — the definitive confirmation is this PR's Windows smoke job going green (it was failing on that exact test before this change).Linked issue
Team / internal-cycle PR — no separate issue; fixes a CI-blocking Windows flake surfaced across recent PRs.
Checklist
go build ./...,go vet ./...,go test ./internal/oauth/pass locally (non-Windows).gofmtclean.lock.gohandling; explanatory comment added.Summary by CodeRabbit