Skip to content

fix(moarstats): open subprocess output with write access for fsync (Windows)#3831

Merged
jqnatividad merged 1 commit into
masterfrom
fix-windows-fsync-202605
May 7, 2026
Merged

fix(moarstats): open subprocess output with write access for fsync (Windows)#3831
jqnatividad merged 1 commit into
masterfrom
fix-windows-fsync-202605

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

@jqnatividad jqnatividad commented May 7, 2026

Summary

Follow-up to #3830. The newly-added `util::sync_subprocess_output` opened the file read-only and called `sync_all()`. On Windows, that maps to `FlushFileBuffers`, which rejects read-only handles with "Access is denied" (os error 5) — breaking moarstats on the Windows Lite CI run.

Switching to `.write(true)` (no `truncate`, no `create`, no `append`) gives a writable handle suitable for `FlushFileBuffers` on Windows and is a safe no-op for the file contents on Linux/macOS.

Failing run: https://github.com/dathere/qsv/actions/runs/25492014192/job/74802155053

Test plan

  • `cargo build --locked --bin qsv -F all_features`
  • `cargo test --locked -F all_features test_moarstats::` — 80/80 pass on macOS
  • Windows / Windows Lite CI run on this PR

🤖 Generated with Claude Code

`sync_subprocess_output` was opening the file read-only and calling
`sync_all()`. On Windows that maps to `FlushFileBuffers`, which rejects
read-only handles with "Access is denied" (os error 5) — breaking
moarstats on the Windows Lite CI run.

Switch to `.write(true)` (no `truncate`, no `create`, no `append`). That
gives a writable handle suitable for `FlushFileBuffers` on Windows and
remains a safe no-op for the file contents on Linux/macOS.

Failing run: https://github.com/dathere/qsv/actions/runs/25492014192

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@jqnatividad jqnatividad merged commit ef54959 into master May 7, 2026
17 checks passed
@jqnatividad jqnatividad deleted the fix-windows-fsync-202605 branch May 7, 2026 11:51
jqnatividad added a commit that referenced this pull request May 9, 2026
…arent dir (#3838)

* fix(moarstats): retry on stats coverage mismatch + fsync joined CSV parent dir

Two conservative guards for the Linux-only flake on
test_moarstats::moarstats_join_type_full_runs_and_writes_bivariate
(CI run 25545197594, surfaced via PR #3834).

The macOS fix (#3830) and Windows fix (#3831) hardened the parent-side
fsync. Linux still surfaces the same symptom — bivariate output missing
secondary columns — at a much lower frequency. The most plausible
remaining gap is the `qsv stats`-on-joined-CSV subprocess handoff: the
spawned child has been observed to produce stats records for fewer
columns than the joined CSV actually has.

Two guards:

1. Retry once on stats-coverage mismatch. After `qsv stats` runs on the
   joined CSV, cross-check that every column in the joined CSV's header
   appears in the stats output's `field` column. On mismatch, re-fsync
   the joined CSV and re-run `qsv stats` once. Only fail loud if the
   mismatch persists. Self-heals the rare race; surfaces a precise error
   if a different root cause ever appears.

2. Fsync the joined CSV's parent directory after `join_datasets_internal`
   returns. `fsync(file)` on Linux doesn't flush parent-dir metadata, and
   rare FS/timing edge cases (overlayfs, fuse, network mounts) can affect
   read-after-write visibility for the follow-up subprocess. New helper
   `util::sync_directory()` is a no-op on Windows, where dir-handle
   `FlushFileBuffers` would error.

Also includes `tools/repro_moarstats_linux_flake.sh` — a parallel-worker
harness that replicates the failing test as a shell loop, optionally
adds page-cache pressure (DD_LOAD=1), and classifies failures into
"retry caught it" / "retry didn't catch it" / "different bug" buckets
to confirm root cause on a Linux VM.

Verified on macOS: `cargo build --locked --bin qsv -F all_features` and
`cargo clippy --locked --bin qsv -F all_features -- -D warnings` clean;
`cargo t -F all_features --test tests test_moarstats::` 80/80 pass at
--test-threads=8; failing test in a 30x stress loop all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* address copilot review: best-effort dir fsync, error propagation, script hardening

All six Copilot comments on PR #3838 applied:

src/util.rs
- sync_directory: errors no longer propagated. Directory fsync is not
  uniformly supported across Linux filesystems (FAT, some FUSE/overlay/
  network mounts return EINVAL/EOPNOTSUPP); a defensive guard layered on
  top of file-level fsync must NOT fail the caller. Errors are now logged
  at debug level and swallowed. Signature changed from CliResult<()> to
  () to make the contract explicit.
- New unit test test_sync_directory_smoke covering both the existing-dir
  and missing-dir paths (Unix-only).

src/cmd/moarstats.rs
- Call site updated for the new sync_directory signature.
- Stats-output reader now propagates csv parse errors instead of
  silently dropping rows via filter_map(Result::ok). A malformed/
  truncated stats row now surfaces as "Failed to parse joined-stats
  row N" rather than as a misleading downstream "missing columns"
  assertion — which is exactly the diagnostic mode this fix is trying
  to eliminate.

tools/repro_moarstats_linux_flake.sh
- FIFO is now created inside a real mktemp -d directory, replacing the
  inherently racy mktemp -u.
- cleanup trap now removes LOAD_DIR (when DD_LOAD=1) and FIFO_DIR so
  temp resources are not leaked on interrupts/early exits.
- bivariate awk check now distinguishes "header malformed" (exit 2 ->
  FAIL_OTHER) from "coverage failed" (exit 1 -> FAIL_OLD_BIVARIATE), so
  a truncated bivariate header is not silently misclassified as a
  column-drop bug.

Verified on macOS: cargo build --locked clean, cargo clippy -- -D warnings
clean, all 80 test_moarstats tests pass at --test-threads=8, the new
test_sync_directory_smoke unit test passes, smoke-run of repro script
green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant