fix(moarstats): open subprocess output with write access for fsync (Windows)#3831
Merged
Conversation
`sync_subprocess_output` was opening the file read-only and calling `sync_all()`. On Windows that maps to `FlushFileBuffers`, which rejects read-only handles with "Access is denied" (os error 5) — breaking moarstats on the Windows Lite CI run. Switch to `.write(true)` (no `truncate`, no `create`, no `append`). That gives a writable handle suitable for `FlushFileBuffers` on Windows and remains a safe no-op for the file contents on Linux/macOS. Failing run: https://github.com/dathere/qsv/actions/runs/25492014192 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
6 tasks
jqnatividad
added a commit
that referenced
this pull request
May 9, 2026
…arent dir (#3838) * fix(moarstats): retry on stats coverage mismatch + fsync joined CSV parent dir Two conservative guards for the Linux-only flake on test_moarstats::moarstats_join_type_full_runs_and_writes_bivariate (CI run 25545197594, surfaced via PR #3834). The macOS fix (#3830) and Windows fix (#3831) hardened the parent-side fsync. Linux still surfaces the same symptom — bivariate output missing secondary columns — at a much lower frequency. The most plausible remaining gap is the `qsv stats`-on-joined-CSV subprocess handoff: the spawned child has been observed to produce stats records for fewer columns than the joined CSV actually has. Two guards: 1. Retry once on stats-coverage mismatch. After `qsv stats` runs on the joined CSV, cross-check that every column in the joined CSV's header appears in the stats output's `field` column. On mismatch, re-fsync the joined CSV and re-run `qsv stats` once. Only fail loud if the mismatch persists. Self-heals the rare race; surfaces a precise error if a different root cause ever appears. 2. Fsync the joined CSV's parent directory after `join_datasets_internal` returns. `fsync(file)` on Linux doesn't flush parent-dir metadata, and rare FS/timing edge cases (overlayfs, fuse, network mounts) can affect read-after-write visibility for the follow-up subprocess. New helper `util::sync_directory()` is a no-op on Windows, where dir-handle `FlushFileBuffers` would error. Also includes `tools/repro_moarstats_linux_flake.sh` — a parallel-worker harness that replicates the failing test as a shell loop, optionally adds page-cache pressure (DD_LOAD=1), and classifies failures into "retry caught it" / "retry didn't catch it" / "different bug" buckets to confirm root cause on a Linux VM. Verified on macOS: `cargo build --locked --bin qsv -F all_features` and `cargo clippy --locked --bin qsv -F all_features -- -D warnings` clean; `cargo t -F all_features --test tests test_moarstats::` 80/80 pass at --test-threads=8; failing test in a 30x stress loop all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * address copilot review: best-effort dir fsync, error propagation, script hardening All six Copilot comments on PR #3838 applied: src/util.rs - sync_directory: errors no longer propagated. Directory fsync is not uniformly supported across Linux filesystems (FAT, some FUSE/overlay/ network mounts return EINVAL/EOPNOTSUPP); a defensive guard layered on top of file-level fsync must NOT fail the caller. Errors are now logged at debug level and swallowed. Signature changed from CliResult<()> to () to make the contract explicit. - New unit test test_sync_directory_smoke covering both the existing-dir and missing-dir paths (Unix-only). src/cmd/moarstats.rs - Call site updated for the new sync_directory signature. - Stats-output reader now propagates csv parse errors instead of silently dropping rows via filter_map(Result::ok). A malformed/ truncated stats row now surfaces as "Failed to parse joined-stats row N" rather than as a misleading downstream "missing columns" assertion — which is exactly the diagnostic mode this fix is trying to eliminate. tools/repro_moarstats_linux_flake.sh - FIFO is now created inside a real mktemp -d directory, replacing the inherently racy mktemp -u. - cleanup trap now removes LOAD_DIR (when DD_LOAD=1) and FIFO_DIR so temp resources are not leaked on interrupts/early exits. - bivariate awk check now distinguishes "header malformed" (exit 2 -> FAIL_OTHER) from "coverage failed" (exit 1 -> FAIL_OLD_BIVARIATE), so a truncated bivariate header is not silently misclassified as a column-drop bug. Verified on macOS: cargo build --locked clean, cargo clippy -- -D warnings clean, all 80 test_moarstats tests pass at --test-threads=8, the new test_sync_directory_smoke unit test passes, smoke-run of repro script green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #3830. The newly-added `util::sync_subprocess_output` opened the file read-only and called `sync_all()`. On Windows, that maps to `FlushFileBuffers`, which rejects read-only handles with "Access is denied" (os error 5) — breaking moarstats on the Windows Lite CI run.
Switching to `.write(true)` (no `truncate`, no `create`, no `append`) gives a writable handle suitable for `FlushFileBuffers` on Windows and is a safe no-op for the file contents on Linux/macOS.
Failing run: https://github.com/dathere/qsv/actions/runs/25492014192/job/74802155053
Test plan
🤖 Generated with Claude Code