Skip to content

perf: Rust-side scan_repo to eliminate FS listdir/stat storms (B-medium) #80

@doubleailes

Description

@doubleailes

Background

pyrer.solve is already very fast (~36× rez on the 188-case benchmark on the same machine, see README). But the end-to-end rez env <pkgs> path on a real repository still spends a substantial chunk of wall-clock time on filesystem operations before the solver ever runs — well outside what we can speed up by tuning the solver further.

Profile sketch from rez/src/rezplugins/package_repository/filesystem.py:

Operation Where Per resolve (typical)
Family enumeration _get_family_dirs — one os.listdir(root) + os.path.isdir per entry 1
Version enumeration _get_version_dirs(family) — multiple os.listdir per family, checking for .ignore* / .building* 1 per family touched
Package file probe _is_valid_package_directory_get_file — up to 4 sequential os.path.isfile calls per version dir, checking package.py / package.yaml 1 per version dir

For a 50-package resolve on a repo with hundreds of families, that's hundreds of Python-level syscalls (per resolve, even cached the cache is only as warm as memcached is configured). It dominates the pre-solve wall-clock for cold resolves.

What this issue covers

A Rust-side directory walker that produces the enumeration output (which families exist, which versions, where the package file lives) much faster than rez's Python loop — without taking on package.py parsing.

Out of scope: see the dedicated section below.

Proposed user-facing API

Low-level: pyrer.scan_repo(paths) -> list[ScannedPackage]

import pyrer

for entry in pyrer.scan_repo(["/sw/pkg", "/sw/site"]):
    entry.family       # str — package family name
    entry.version      # str — version directory name (as on disk)
    entry.format       # "py" | "yaml" | "txt"
    entry.path         # str — absolute path to the package file

Pure data, no package.py evaluation. Callers feed each entry.path through rez's existing loader (rez.serialise.load_py / etc.) to get a real Package object.

High-level: pyrer.solve(requests, *, package_paths=[...])

import pyrer

result = pyrer.solve(["maya-2024", "nuke-14"], package_paths=["/sw/pkg"])

Internally:

  1. scan_repo(package_paths) (Rust walk)
  2. For each entry, ask rez to load it — rez.serialise.load_py(entry.path) or equivalent
  3. Convert each loaded Package via PackageData.from_rez(pkg)
  4. Hand the list to the existing solver core
  5. Return a SolveResult exactly as today

Rez stays the loader. Rust is only the walker.

The existing pyrer.solve(requests, packages=[...]) (which takes list[PackageData] directly) stays unchanged — package_paths= is purely additive.

Implementation sketch

Rust crate (new module in rer-resolver or a new sibling crate)

pub struct ScannedPackage {
    pub family: String,
    pub version: String,
    pub format: PackageFormat,   // Py | Yaml | Txt
    pub path: PathBuf,
}

pub fn scan_repo(paths: &[PathBuf]) -> Result<Vec<ScannedPackage>, ScanError>;

For each input path:

  1. std::fs::read_dir(path) to list family directories (one syscall).
  2. For each family dir:
    • One read_dir to list versions.
    • One read_dir again only if .ignore* / .building* filtering is enabled (or fold into the version scan — single pass).
    • For each version dir: single stat-equivalent check that picks the first existing of package.py / package.yaml / package.txt via DirEntry::file_type rather than four sequential isfile calls.
  3. Skip families/versions matching rez's standard ignore patterns (.ignore*, .building*, leading underscore, etc. — match rez's _is_valid_package_directory semantics exactly).

The Rust walk should match rez's _get_family_dirs / _get_version_dirs / _is_valid_package_directory byte-for-byte in terms of which entries it surfaces, so a pyrer.solve(..., package_paths=...) resolves against exactly the same set of packages as rez env would.

PyO3 binding (in rer-python)

  • #[pyclass] ScannedPackage mirroring the Rust struct.
  • #[pyfunction] scan_repo(paths: Vec<PathBuf>) -> PyResult<Vec<ScannedPackage>>.
  • solve(...) gains a package_paths: Option<Vec<PathBuf>> keyword that triggers the walk + rez-loader path (Python-side glue, since the loader call lives in Python).

Python shim layer (in rer-python's wheel)

A small Python file shipped alongside the cdylib that:

  • Imports rez lazily (guarded try: import rez; ... except ImportError: raise RuntimeError(\"package_paths= requires rez to be installed\")).
  • Wraps solve(*, package_paths=...): calls scan_repo, iterates entries, calls rez.serialise.load_<format>(entry.path), converts via PackageData.from_rez, delegates to the Rust solver.
  • This is the only place where pyrer touches rez at all — keeps the Rust core rez-free.

Acceptance criteria

  • pyrer.scan_repo(paths) produces the same set of (family, version, format, path) tuples that rez's FileSystemPackageRepository would, on a representative repo. Validated by a diff test against rez's own enumeration.
  • pyrer.solve(requests, package_paths=...) produces the same resolution as pyrer.solve(requests, packages=[...]) built from rez's existing iter_package_families for the same paths.
  • On rez's bundled 188-case benchmark, end-to-end pyrer.solve(reqs, package_paths=...) is measurably faster than the same path through rez's enumeration. Target: 50%+ reduction in pre-solve wall-clock on cold runs.
  • No regression in the 188/188 differential against rez.
  • Rez is an optional dependency: pyrer.scan_repo alone works without rez installed; solve(..., package_paths=...) raises a clear error if rez is missing.
  • Documentation: update rez-integration.md to show the new path-based solve(...) as the recommended integration shape.

Out of scope — B-deep

This issue is explicitly not about porting package.py evaluation to Rust.

rez's package.py files in production routinely contain arbitrary Python:

  • non-literal expressions (requires = ["python-" + sys.platform]),
  • @early() decorators (build-time-evaluated),
  • @late() decorators (resolve-context-aware),
  • platform-conditional variants,
  • runtime imports, helper functions defined in the file body.

A Rust literal-AST parser cannot evaluate any of that, and shipping a real CPython embedding in the cdylib is a massive scope expansion. rez's exec()-based loader stays on the Python side. B-medium captures the FS-walk win without taking that on.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions