Skip to content

macOS: skip set_readwrite_recursive walk after clonefile (~1.85x materialization speedup)#2349

Merged
MarcusSorealheis merged 5 commits into
TraceMachina:mainfrom
erneestoc:ec/macos-skip-chmod-walk
May 21, 2026
Merged

macOS: skip set_readwrite_recursive walk after clonefile (~1.85x materialization speedup)#2349
MarcusSorealheis merged 5 commits into
TraceMachina:mainfrom
erneestoc:ec/macos-skip-chmod-walk

Conversation

@erneestoc
Copy link
Copy Markdown
Contributor

@erneestoc erneestoc commented May 19, 2026

Description

Follow-up to #2338 (now merged). The macOS clonefile(2) fast path is followed by a recursive set_readwrite_recursive chmod walk that makes every file in the cloned tree writable. On real Bazel SwiftCompile shapes (~2000 inputs / ~466 MB) that walk accounts for ~46% of materialization time — ~33 µs per file, ~67 ms per action.

This PR replaces the walk with a single chmod(2) on the destination root (new helper chmod_dir_writable). Existing entries inside the clone inherit the source's read-only mode (0o555 dirs, 0o444 files). The worker can still create the action's declared output files because the root itself is 0o755.

Why this is correct (and not a regression)

Leaving inputs read-only is what Bazel itself does:

  • linux-sandbox bind-mounts inputs read-only — kernel rejects writes regardless of file mode.
  • darwin-sandbox / sandbox-exec denies writes outside declared output paths via Seatbelt.
  • REAPI Action.output_files / Action.output_directories are the only paths an action may write to.

Workers without sandbox primitives (NativeLink, bazel-remote, BuildBuddy, Buildfarm) substitute file mode for kernel enforcement: 0o444 files / 0o555 dirs is the hermeticity enforcement. The previous chmod walk weakened that — making inputs writable to "be nice" to misbehaved actions. Skipping the walk brings NativeLink in line with the REAPI contract and with what Bazel's own sandboxes do.

An action that tries to mutate an input now hits EACCES, which is the correct REAPI behavior — same failure mode as on Bazel's own sandbox. The new test test_clonefile_input_mutation_fails encodes this contract.

Bench evidence

From nativelink-util/benches/chmod_strategy.rs (on ec/macos-clonefile-optimizations-benchmarks, Apple M4 Max / APFS):

shape walk (current main) toplevel_only (this PR) speedup
small_flat (64 files, 64 KB) 4.66 ms 2.61 ms 1.79×
pcm_cluster (219 files, 40 MB) 15.17 ms 8.19 ms 1.85×
medium_flat (635 files, 180 MB) 46.36 ms 25.10 ms 1.85×
large_flat (1978 files, 466 MB) 147.39 ms 80.17 ms 1.84×

Walk fraction is ~46% across shapes regardless of file size — confirming the cost is per-file syscall, not per-byte.

Scope

  • macOS-only. Linux/Windows fall straight through to hardlink_directory_tree_recursive and never ran set_readwrite_recursive in this code path.
  • set_readwrite_recursive stays public — still called by nativelink-worker/src/directory_cache.rs:451 on the source side during eviction. (Note: PR directory_cache: fix CAS inode corruption from chmod-during-eviction #2347 replaces that remaining caller with a directories-only variant; after both PRs land, set_readwrite_recursive would be dead code and can be removed in a follow-up.)
  • No worker-side write wrapper, no audit phase. Trusting the REAPI hermeticity contract — if a future audit discovers worker-internal writes that hit EACCES inside the cloned tree, that's a bug to fix at the write site, not paper over by mutating input perms.

Note on backward compatibility

Not a breaking change in the semver sense. However, actions that previously mutated their own input files (a hermeticity violation) will now hit EACCES. This matches how Bazel's own sandboxes already behave, so any action that breaks here was also broken under linux-sandbox / darwin-sandbox — operators can detect such actions by the new error mode.

Type of change

  • Bug fix (non-breaking change which fixes an issue — hermeticity contract violation + perf)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • cargo test -p nativelink-util --lib fs_util:: — 9/9 pass on macOS arm64 (3 new + 6 existing)
  • cargo test -p nativelink-worker --lib directory_cache:: — 2/2 pass
  • cargo build -p nativelink-util -p nativelink-worker clean
  • cargo clippy -p nativelink-util --lib --tests -- -D warnings clean
  • cargo fmt --check clean
  • Disallowed-methods grep against clippy.toml — no matches in diff

New tests:

  • test_clonefile_root_writable_inputs_readonly — root is 0o755, subdirs stay 0o555, files stay 0o444 (replaces test_clonefile_dest_is_writable which assumed subdirs would be made writable by the walk).
  • test_clonefile_root_accepts_new_files — worker can create output files at the root even though everything inside the clone is read-only.
  • test_clonefile_input_mutation_fails — writes to existing input files fail with PermissionDenied — encodes the hermeticity contract.

Checklist

  • Updated documentation if needed (in-code doc comments updated; no public API change)
  • Tests added/amended
  • bazel test //... passes locally (verified via cargo only)
  • PR is contained in a single commit

This change is Reviewable

@palfrey
Copy link
Copy Markdown
Member

palfrey commented May 20, 2026

/build-image nativelink-worker-init

@palfrey
Copy link
Copy Markdown
Member

palfrey commented May 20, 2026

/build-image

@github-actions
Copy link
Copy Markdown

Image built and pushed!

ghcr.io/TraceMachina/nativelink:de74f3c

@github-actions
Copy link
Copy Markdown

Image built and pushed!

ghcr.io/TraceMachina/nativelink-worker-init:de74f3c

@palfrey
Copy link
Copy Markdown
Member

palfrey commented May 20, 2026

Image built and pushed!

ghcr.io/TraceMachina/nativelink:de74f3c

Given that this is a follow-up to an earlier PR and that the major category of problem here is a slowdown on Macs (and I don't have one to hand), is it possible for you to run a test with the newly listed images here to check this does actually resolve the problem in real-world testing v.s. just the CI tests?

Copy link
Copy Markdown
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

erneestoc and others added 3 commits May 20, 2026 23:02
The macOS clonefile fast path was followed by a recursive chmod walk that
made every file in the cloned tree writable (0o644 / 0o755). On real
Bazel input shapes (~2000-file SwiftCompile) that walk accounted for
~46% of materialization time — ~33 µs per file, ~67 ms per action.

Replace the walk with a single chmod(2) on the destination root.
Existing entries inherit the source's read-only mode (0o555 dirs,
0o444 files). The worker can still create the action's declared output
files inside the root because the root itself is 0o755.

This matches the hermeticity contract enforced by Bazel's local
sandbox (linux-sandbox bind-mounts inputs read-only;
darwin-sandbox / sandbox-exec denies writes outside declared output
paths) and the REAPI Action.output_files / output_directories
semantics: actions write only to declared outputs, never mutate
inputs. An action that does try to mutate an input now hits EACCES,
which is the correct REAPI behavior — same failure mode as on
Bazel's own sandbox.

Bench (nativelink-util/benches/chmod_strategy.rs on the bench branch),
toplevel_only vs full walk:

  shape                         walk      toplevel_only   speedup
  small_flat   (64 files)       4.66 ms   2.61 ms         1.79x
  pcm_cluster  (219 files)     15.17 ms   8.19 ms         1.85x
  medium_flat  (635 files)     46.36 ms  25.10 ms         1.85x
  large_flat   (1978 files)   147.39 ms  80.17 ms         1.84x

set_readwrite_recursive stays public — directory_cache.rs:451 still
uses it on the source side during eviction.

Tests:
- test_clonefile_root_writable_inputs_readonly: root 0o755, subdirs
  0o555, files 0o444 (replaces the old test_clonefile_dest_is_writable
  which assumed subdirs would be made writable).
- test_clonefile_root_accepts_new_files: worker can create outputs at
  the root even though everything inside the clone is read-only.
- test_clonefile_input_mutation_fails: writes to existing input files
  fail with PermissionDenied — encodes the hermeticity contract.
…2347 cleanup

After TraceMachina#2347 (DirectoryCache cleanup uses set_dir_writable_recursive)
and this PR (post-clonefile path uses chmod_dir_writable), the
generic set_readwrite_recursive helper has zero remaining callers.
Both replacements are intentionally narrower - set_dir_writable_recursive
chmods only directories so file inodes aren't mutated, and
chmod_dir_writable chmods only the destination root so the clone
tree stays read-only inside. Delete the old helper and its private
companion set_readwrite_one_path.

Addresses palfrey's review feedback on TraceMachina#2347 ("set_readwrite_recursive
and set_readwrite_one_path can be dropped as part of this") - sequenced
on this PR instead because both PRs had to land before the helpers
became dead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR TraceMachina#2338's clonefile fast path is effectively dead in production — it
materialized 1 of 1771 actions in a real build. Three coupled bugs kept
the directory cache silently falling back to the slow download path, and
two of them only surface once the fast path actually fires, so they must
land together.

Bug 1: prepare_action_inputs received a work directory the caller had
already created, but hardlink_directory_tree and clonefile(2) both
require the destination to NOT exist. Every cache attempt failed its
precondition and fell back to download_to_directory. Fix: remove the
empty pre-created directory before invoking the cache, and recreate it
on cache failure so the download fallback (which needs an existing
destination) still works. Adds fs::remove_dir for the empty-dir removal.

Bug 2: set_readonly_recursive chmod'd files to 0o444, stripping the
execute bit from cached executables. Once a tree is cloned into a
workspace this makes an action's interpreter/wrapper script fail with
EACCES. Fix: mark files 0o555 instead of 0o444 — read + execute, still
no write bit, so the hermeticity contract is unchanged.

Bug 3: the clonefile path chmods only the destination root writable;
cloned subdirectories keep the source's 0o555 mode. Bazel actions
declare outputs at paths nested inside input subdirectories, and
creating those files needs write permission on the parent directory.
Fix: after a cache hit, set_dir_writable_recursive makes every directory
in the materialized tree writable. Files stay read-only — they may be
CAS-hardlinked and chmoding them would corrupt the shared inode.

Adds regression tests for nested output creation, which the existing
root-only clonefile tests did not cover.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erneestoc erneestoc force-pushed the ec/macos-skip-chmod-walk branch from 7d5e88e to ff3fede Compare May 21, 2026 07:07
@erneestoc
Copy link
Copy Markdown
Contributor Author

erneestoc commented May 21, 2026

@palfrey tested. Thanks for requesting, I found a few issues. There was a Directory cache failed, falling back to traditional download error, only first clonefile went through, and the rest falling back.

Good news is they're fixed and tested on a big app and actually seeing some performance improvements on my local setup (using tart macOS VM for the worker, dockerized scheduler 31% improvement compared to 1.2.0 release on a big modular iOS app "fetch" part of each action).

The original change (skip the set_readwrite_recursive walk) is correct, but it optimizes a path production almost never reaches — and it was only safe because
that path was dead (see Bug 3).

3 coupled bugs (fixed)

  • Bug 1 — the fast path never fires. The worker pre-creates the action's work/ directory before materializing inputs, but hardlink_directory_tree and clonefile(2)
    both require the destination to not exist. Every cache attempt failed its precondition and fell back to the slow download_to_directory path. Fix:
    prepare_action_inputs removes the empty pre-created directory before invoking the cache, and recreates it on cache failure so the download fallback still works.
  • Bug 2 — the execute bit gets stripped. set_readonly_recursive chmod'd files to 0o444, dropping +x from cached executables. Once a tree is cloned into a
    workspace, an action whose interpreter/wrapper script lost its exec bit fails with EACCES. Latent only because Bug 1 kept the directory-cache path dead. Fix:
    mark files 0o555 instead of 0o444 — read + execute, still no write bit, so the hermeticity contract is unchanged.
  • Bug 3 — nested outputs can't be created. This PR's original change chmods only the destination root to 0o755; cloned subdirectories keep the source's 0o555 mode.
    But Bazel actions declare outputs at paths nested inside input subdirectories (e.g. bazel-out/.../bin/foo.entitlements), and creating those needs write
    permission on the parent directory. The original tests only covered creating files at the root, so the PR passed CI while shipping this gap. Fix: after a cache
    hit, set_dir_writable_recursive makes every directory in the materialized tree writable; files stay read-only (they may be CAS-hardlinked — chmoding them would
    corrupt the shared inode, see directory_cache: fix CAS inode corruption from chmod-during-eviction #2347).

@MarcusSorealheis MarcusSorealheis enabled auto-merge (squash) May 21, 2026 17:09
@MarcusSorealheis MarcusSorealheis merged commit 6b17b68 into TraceMachina:main May 21, 2026
31 checks passed
@erneestoc erneestoc deleted the ec/macos-skip-chmod-walk branch May 22, 2026 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants