Skip to content

perf(walk): improvements to the walker hot path#575

Open
arcuru wants to merge 6 commits into
bootandy:masterfrom
arcuru:walker-perf
Open

perf(walk): improvements to the walker hot path#575
arcuru wants to merge 6 commits into
bootandy:masterfrom
arcuru:walker-perf

Conversation

@arcuru
Copy link
Copy Markdown

@arcuru arcuru commented May 12, 2026

More changes as a stack on top of my previous #574. Ideally this would be stacked commits but right now this will show the prev commit inside this PR...

Happy to split/rebase as requested, though I am also fine if you pull/modify yourself.

Five small reductions on the per-entry walker hot path. None of them change observable behavior they just remove redundant work. They compound for noticeable wins on tree-heavy and wide-directory walks.

  1. Per-dir stat cache. walk_dir() was statting each directory for its is_dir() check, then build_node() statted it again to populate the Node. Cache the parsed metadata tuple in PendingDir via OnceLock. One statx saved per directory.

  2. Cache entry.path() / entry.file_type(). Both were called 2-3× per entry, and entry.path() allocates a fresh PathBuf each call. Compute once in process_entry and thread through. Also gate is_ignored_path on a cheap empty-check so walks without --ignore-directory skip the HashSet probe entirely.

  3. Skip ignore_file on default walks. Precompute has_any_filter when constructing WalkData. In the hot loop, three-way branch: full ignore_file when filters are active, inlined dot-file check when only --ignore-hidden is on, otherwise skip. When filters are active, also coalesce the duplicate get_metadata calls inside ignore_file, replace path.is_file() stats with the cached file_type, hoist fs::canonicalize out of the ignore_directories loop, and thread the prefetched tuple into build_node so each entry is statted at most once on filter-active walks.

  4. FxHash for inode dedup. clean_inodes hashes 25M (inode, dev) pairs on my test /nix/store. SipHash's DoS resistance is pointless for primitive keys from our own syscalls. Adds rustc-hash = "2", swaps HashSet for FxHashSet. This can be inlined to avoid the dep by defining a simple hasher.

  5. Skip per-file statx in --filecount mode. Under -f, every file's size is overwritten with 1, so the per-file statx is removable. The only field the dedup still needs is (inode, dev): inode comes from DirEntry::ino() (filled by getdents64), dev from the parent directory's cached tuple. Gated to -f without -L and without metadata-needing filters, so output is byte-identical to the previous stat path. Unix-only; Windows DirEntry has no cheap d_ino analogue.

Benchmarks

Cumulative for this batch of changes. "Before" is inclusive of #574.

host: 24 CPU; hyperfine --warmup 2 --runs 10 for synthetics, --warmup 1 --runs 3-5 for large targets

target before after speedup
balanced (1000 dirs / 20k files) 9.3 ms 8.8 ms 1.06×
wide_flat (1 dir, 100k files) 76.9 ms 72.9 ms 1.06×
deep_narrow (1500-deep chain) 139.3 ms 98.9 ms 1.41×
-e "\.rs$" regex filter on dust src 35.7 ms 16.9 ms 2.11×
-x dust src 48.3 ms 30.4 ms 1.59×
-f balanced 9.6 ms 7.7 ms 1.25×
-f wide_flat 82.4 ms 74.8 ms 1.10×
~ (273k entries) 182 ms 173 ms 1.06×
/nix/store (7.5M+ entries, ZFS) 33.5 s 25.6 s 1.31×
-f /nix/store 25.6 s 18.8 s 1.36×

Default-walk synthetics (balanced, wide_flat, ~) are tighter on this run. Those workloads are already close to the syscall floor; the per-entry CPU savings show up in user time more than wall time. I also sampled the system times for these runs, and they are more dramatic where they apply: -e -75% sys, -x -41% sys, -f /nix/store -61% sys. strace confirms -f wide_flat goes from ~100k statx calls to ~10.

Further Work

One more potential perf change stacked on top of this:

  • AT_STATX_DONT_SYNC on Linux. Biggest win on network filesystems (~3.8× on an 11 TB NFS mount in my testing); helps cold-cache local in smaller amounts. dust's default std::fs::Metadata uses AT_STATX_SYNC_AS_STAT, which for a read-only walker is stricter coherence than we need.

That is a small behavior change though, which may or may not be acceptable.

arcuru added 6 commits May 11, 2026 17:31
Assisted-by: Claude Opus 4.7 (code generation, refactoring, code review)
Syscall reduction verified via strace on a 3-dir / 5-file test tree:
statx count drops from 11 to 8, exactly one per directory saved.

Assisted-by: Claude Opus 4.7 (code review, refactoring)
Assisted-by: Claude Opus 4.7 (code review, refactoring)
Also thread metadata through the ignore_file -> build_node boundary so
filter-active walks (-x / -M / -A / -y) stat each entry at most once.

Correctness verified by diffing output against the pre-change binary
with default flags, -i, -e, -v, -X, -M, -x.

Assisted-by: Claude Opus 4.7 (code review, refactoring)
Assisted-by: Claude Opus 4.7 (code review, refactoring)
Under `-f` the file's stat is mostly thrown away — `node_from_tuple`
sets size to 1 and the time fields are only read when -M/-A/-y are
on. The only field still needed downstream is `inode_device` for
`clean_inodes` dedup, and both halves are syscall-free: `inode`
from `DirEntry::ino()` (getdents64), `dev` from the parent
directory's cached tuple.

Gated to `-f` without `-L` and without metadata-needing filters,
so output is byte-identical to the previous stat path. Unix-only;
Windows DirEntry has no cheap d_ino analogue.

Assisted-by: Claude Opus 4.7 (code review, refactoring)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant