Skip to content

V2 watch mode: incremental compilation cache — scoping (follow-up to #126) #131

@TheHypnoo

Description

@TheHypnoo

Context

PR #126 shipped V1 of perry dev as v0.5.143 — filesystem watcher + debounced full rebuild on change. Hot rebuild after a one-line edit in a mid-sized program is ~330 ms, cold ~15 s (auto-optimize lib cache), and the PR's "Recent Changes" note explicitly earmarks V2 for "in-memory AST cache + per-module .o reuse for incremental compilation."

This issue scopes V2 before implementation. It splits naturally into two deliverables:

  • V2.1 — in-memory AST cache inside a single perry dev session. No disk state, no cross-session reuse, no ABI concerns. Small, low-risk, and measurable on its own. I'll send this as a standalone PR against the existing feat/dev-incremental-cache branch — it doesn't need design discussion.
  • V2.2 — on-disk per-module .o cache surviving across perry dev runs (and, eventually, across perry compile invocations). This is where the interesting design decisions are, and what this issue is actually about.

Problem

A hot rebuild in perry dev today does the full compile pipeline on every save: parse → lower → transform → per-module LLVM codegen → link. On a ~20-module program the ~330 ms breaks down roughly 60% codegen, 25% link, 15% parse+lower+transform. Codegen is the fat part and it's almost entirely cacheable: per-module .o files already exist as independent artifacts (see crates/perry/src/commands/compile.rs:4764-4780, produced inside the rayon loop at line 4425, then handed to the linker as separate objects at line 5656 — they are not currently archived into a static lib). The only reason we recompute them on every rebuild is that we throw them away at the end of each run.

Industry comparison:

  • Vite keeps the parsed module graph in memory and HMR-patches only the changed module. Dev-server only, not AOT.
  • esbuild holds ASTs warm in RAM between builds in watch mode, but re-codegens everything on each rebuild (fast enough that it doesn't matter for JS).
  • tsc persists .tsbuildinfo with a file-level dependency graph and timestamps; skips modules whose inputs didn't change transitively.
  • Cargo fingerprints each crate with a hash over source + deps + rustc flags + toolchain version; hits the fingerprint → reuse the .rlib/.o; else recompile. Closest analog to what we want.

The four coupling concerns

Perry's compile_module in perry-codegen/src/codegen.rs is called per-module but takes a CompileOptions that carries cross-module state. Any cache key has to pin these or we'll produce silently-wrong binaries:

  1. Class IDscompile.rs:2271-2274 threads a shared next_class_id counter across all modules during collection. Module A's class ID depends on which modules were seen before it. If we cache A.o but the module list changes, the ID baked into A's vtables could collide. Fix: either key the cache on the full module-order-derived class-ID map, or snapshot the assignment and verify on hit.

  2. Import prefixes — symbols are mangled as perry_fn_<module>__<fn>. CompileOptions.import_function_prefixes + non_entry_module_prefixes tell each module how to spell its imports. If an import is renamed, every dependent module's .o contains a stale symbol reference and link fails. Cache key must hash the full import-prefix map a module depends on, not the module's own source alone.

  3. Monomorphizationperry-hir/src/monomorph.rs:1621's monomorphize_module takes a single module. If app.ts instantiates Box<number> and Box<string> from box.ts, where do those specializations live and what invalidates them?

    I ran this empirically rather than speculate. Two test cases, nm on the resulting .o files:

    // utils.ts
    export function identity<T>(x: T): T { return x; }
    // app.ts
    import { identity } from "./utils";
    const a = identity(42);
    const b = identity("hello");

    utils_ts.o contained exactly one symbol: T _perry_fn_utils_ts__identity. Same result for a generic class Box<T> with number and string instantiations — box_ts.o had a single Box_constructor and a single Box__get, both reused by app2_ts.o regardless of T.

    Root cause: Perry NaN-boxes every JS value into a 64-bit double. Generic code is bit-identical for any T, so there's nothing to specialize at the .o level. This coupling concern evaporates — monomorphization is a pure HIR-level rewrite that doesn't fork object code. The cache doesn't need to reason about it.

  4. Global i18n tableCompileOptions.i18n_table is materialized as rodata in whichever module has is_entry_module: true. Entry-module cache entries must key on the table hash; non-entry modules don't touch it.

The CompileOptions docstring at perry-codegen/src/codegen.rs:109-115 confirms most other fields are "informational for the CLI driver's auto-optimize rebuild + linker step — compile_module itself only consults output_type and i18n_table." That narrows the real key material substantially: source hash + import-prefix map + imported-function signatures + class-ID assignment + (entry-only) i18n table hash + perry version + target triple + LLVM opt flags.

Plans

Plan A — Conservative: content-hash cache, per-module only

Hash (source_bytes, import_prefix_map, imported_func_signatures, class_id_map, perry_version, target, opt_flags) per module. Cache at .perry-cache/objects/<target>/<hash>.o in the project root. On compile, check cache; on miss, run codegen and write. Entry module's hash additionally includes the i18n table.

Expected win: the ~60% codegen portion drops to near-zero for unchanged modules. A one-line edit touching a single module → only that module re-codegens, ~200 ms saved on the ~330 ms rebuild. Link time (~85 ms) is unchanged.

Risk: low. Cache miss is always safe (falls back to current path). Cache-key bugs produce mismatched .o → link errors, caught immediately. No ABI surface exposed.

Plan B — Granular: split codegen + link caches

Plan A plus: cache the linked binary itself keyed on the set of input .o hashes + link flags. Unchanged set → skip cc/ld entirely.

Expected win: the remaining ~85 ms link time disappears on no-op rebuilds (useful when the watcher fires on unrelated file changes in the project root — docs, configs). A typed-edit rebuild still pays link cost.

Risk: low-medium. Link cache is an append to Plan A — fall back to re-link on miss. Extra care for platforms where the linker has non-deterministic output (Windows PE timestamps, macOS LC_UUID); stamp both keys with perry version to force invalidation on compiler upgrade.

Plan C — HIR-addressable: content-hash at the HIR level

Hash the post-transform HIR instead of source. Equivalent-after-format-only-edits source changes (whitespace, comment edits) become cache hits. Monomorphization output folds into the HIR hash so any change to a generic's HIR form invalidates its specializations.

Expected win: covers the Plan A win plus free hits on formatter/linter noise, prettier-on-save, tools that rewrite "', etc. — realistic for an IDE-integrated dev loop.

Risk: medium. HIR serialization needs to be stable and deterministic; any non-determinism in lowering (HashMap iteration order, ID counters) becomes a cache bug. Likely warrants a dedicated Hash impl on HIR nodes. Slightly larger upfront investment.

Plan D — Staged: Plan A now, Plan B next, Plan C when it pays off

Ship Plan A first because it captures most of the win with the least code and the least ABI surface. Observe hit rate in real use. If link time dominates post-Plan-A, add Plan B. If formatter-on-save is common, do Plan C. Each stage lands as an independent PR.

Recommendation

Plan D, Plan A as first deliverable. It's the lowest-risk path that still pays off, and each subsequent stage is independently valuable. Plan A alone brings the hot rebuild from ~330 ms to ~130 ms on the common "edit one file" case.

Proposal

Concrete decisions I'm committing to unless someone objects here:

  • Cache location: .perry-cache/objects/<target-triple>/<hash>.o under the project root (sibling of .perry-dev/), not under target/ (Cargo owns that) and not user-global ($HOME/.cache/perry/) so it's per-project and gitignorable. Add .perry-cache/ to the generated .gitignore in perry init.
  • Scope in first PR: dev-only. Wired into perry dev, not perry compile, so the cache's correctness risk is bounded to the watch loop (where the worst-case is a confusing rebuild the user resolves by rm -rf .perry-cache). Promote to perry compile in a follow-up once it's been exercised.
  • ABI gate: include CARGO_PKG_VERSION in the cache key. Perry ships patch versions frequently; any version bump invalidates the whole cache. Cheap and bulletproof.
  • Bitcode-link mode (PERRY_LLVM_BITCODE_LINK=1): disable the cache when set. That mode swaps per-module .o output for .ll, and the bitcode-link pipeline is experimental enough that layering a cache on top isn't worth the testing surface yet.

Questions

  1. Cache invalidation UX — a perry dev --no-cache flag, a perry cache clean subcommand, both, or neither (users rm -rf .perry-cache when needed)? Precedent leans toward both existing (Cargo has --no-cache via target-dir, cargo clean), but Perry has been happy to add flags lazily.
  2. Metrics — should V2.1 and V2.2 print hit-rate / time-saved telemetry on each rebuild, or stay silent like V1? Useful for tuning but can be noisy. I'd lean toward a PERRY_DEV_VERBOSE=1 env-gate.
  3. Cross-platform caveats — Plan B's link cache needs per-platform verification (PE timestamps, LC_UUID). Anyone aware of non-obvious non-determinism in the current link command I should watch for?

Acceptance criteria for V2.2 Plan A

  • Hot rebuild of a one-line edit on a 20-module program drops from ~330 ms to ≤ 150 ms measured on macOS.
  • Cache misses on: source change, import-prefix change, class-ID reshuffle, imported-signature change, perry version bump, target change, opt-flag change.
  • Cache stays correct across perry dev sessions (stop and restart → hits, not misses).
  • --no-cache (or equivalent) bypass exists.
  • Regression test under test-files/ exercising hit + two invalidation paths.

Happy to take this on after V2.1 lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions