V2 watch mode: incremental compilation cache — scoping (follow-up to #126)

## Context

PR #126 shipped V1 of `perry dev` as v0.5.143 — filesystem watcher + debounced full rebuild on change. Hot rebuild after a one-line edit in a mid-sized program is ~330 ms, cold ~15 s (auto-optimize lib cache), and the PR's "Recent Changes" note explicitly earmarks V2 for "in-memory AST cache + per-module `.o` reuse for incremental compilation."

This issue scopes V2 before implementation. It splits naturally into two deliverables:

- **V2.1 — in-memory AST cache** inside a single `perry dev` session. No disk state, no cross-session reuse, no ABI concerns. Small, low-risk, and measurable on its own. I'll send this as a standalone PR against the existing `feat/dev-incremental-cache` branch — it doesn't need design discussion.
- **V2.2 — on-disk per-module `.o` cache** surviving across `perry dev` runs (and, eventually, across `perry compile` invocations). This is where the interesting design decisions are, and what this issue is actually about.

## Problem

A hot rebuild in `perry dev` today does the full compile pipeline on every save: parse → lower → transform → per-module LLVM codegen → link. On a ~20-module program the ~330 ms breaks down roughly 60% codegen, 25% link, 15% parse+lower+transform. Codegen is the fat part and it's almost entirely cacheable: per-module `.o` files already exist as independent artifacts (see `crates/perry/src/commands/compile.rs:4764-4780`, produced inside the rayon loop at line 4425, then handed to the linker as separate objects at line 5656 — they are *not* currently archived into a static lib). The only reason we recompute them on every rebuild is that we throw them away at the end of each run.

Industry comparison:

- **Vite** keeps the parsed module graph in memory and HMR-patches only the changed module. Dev-server only, not AOT.
- **esbuild** holds ASTs warm in RAM between builds in watch mode, but re-codegens everything on each rebuild (fast enough that it doesn't matter for JS).
- **tsc** persists `.tsbuildinfo` with a file-level dependency graph and timestamps; skips modules whose inputs didn't change transitively.
- **Cargo** fingerprints each crate with a hash over source + deps + rustc flags + toolchain version; hits the fingerprint → reuse the `.rlib`/`.o`; else recompile. Closest analog to what we want.

## The four coupling concerns

Perry's `compile_module` in `perry-codegen/src/codegen.rs` is called per-module but takes a `CompileOptions` that carries cross-module state. Any cache key has to pin these or we'll produce silently-wrong binaries:

1. **Class IDs** — `compile.rs:2271-2274` threads a shared `next_class_id` counter across all modules during collection. Module A's class ID depends on which modules were seen before it. If we cache `A.o` but the module list changes, the ID baked into A's vtables could collide. Fix: either key the cache on the full module-order-derived class-ID map, or snapshot the assignment and verify on hit.

2. **Import prefixes** — symbols are mangled as `perry_fn_<module>__<fn>`. `CompileOptions.import_function_prefixes` + `non_entry_module_prefixes` tell each module how to spell its imports. If an import is renamed, every dependent module's `.o` contains a stale symbol reference and link fails. Cache key must hash the full import-prefix map a module depends on, not the module's own source alone.

3. **Monomorphization** — `perry-hir/src/monomorph.rs:1621`'s `monomorphize_module` takes a single module. If `app.ts` instantiates `Box<number>` and `Box<string>` from `box.ts`, where do those specializations live and what invalidates them?

   I ran this empirically rather than speculate. Two test cases, `nm` on the resulting `.o` files:
   
   ```ts
   // utils.ts
   export function identity<T>(x: T): T { return x; }
   // app.ts
   import { identity } from "./utils";
   const a = identity(42);
   const b = identity("hello");
   ```
   
   `utils_ts.o` contained exactly one symbol: `T _perry_fn_utils_ts__identity`. Same result for a generic class `Box<T>` with `number` and `string` instantiations — `box_ts.o` had a single `Box_constructor` and a single `Box__get`, both reused by `app2_ts.o` regardless of T.
   
   Root cause: Perry NaN-boxes every JS value into a 64-bit double. Generic code is bit-identical for any T, so there's nothing to specialize at the `.o` level. This coupling concern evaporates — monomorphization is a pure HIR-level rewrite that doesn't fork object code. The cache doesn't need to reason about it.

4. **Global i18n table** — `CompileOptions.i18n_table` is materialized as rodata in whichever module has `is_entry_module: true`. Entry-module cache entries must key on the table hash; non-entry modules don't touch it.

The `CompileOptions` docstring at `perry-codegen/src/codegen.rs:109-115` confirms most other fields are "informational for the CLI driver's auto-optimize rebuild + linker step — `compile_module` itself only consults `output_type` and `i18n_table`." That narrows the real key material substantially: source hash + import-prefix map + imported-function signatures + class-ID assignment + (entry-only) i18n table hash + perry version + target triple + LLVM opt flags.

## Plans

### Plan A — Conservative: content-hash cache, per-module only

Hash `(source_bytes, import_prefix_map, imported_func_signatures, class_id_map, perry_version, target, opt_flags)` per module. Cache at `.perry-cache/objects/<target>/<hash>.o` in the project root. On compile, check cache; on miss, run codegen and write. Entry module's hash additionally includes the i18n table.

Expected win: the ~60% codegen portion drops to near-zero for unchanged modules. A one-line edit touching a single module → only that module re-codegens, ~200 ms saved on the ~330 ms rebuild. Link time (~85 ms) is unchanged.

Risk: low. Cache miss is always safe (falls back to current path). Cache-key bugs produce mismatched `.o` → link errors, caught immediately. No ABI surface exposed.

### Plan B — Granular: split codegen + link caches

Plan A plus: cache the linked binary itself keyed on the set of input `.o` hashes + link flags. Unchanged set → skip `cc`/`ld` entirely.

Expected win: the remaining ~85 ms link time disappears on no-op rebuilds (useful when the watcher fires on unrelated file changes in the project root — docs, configs). A typed-edit rebuild still pays link cost.

Risk: low-medium. Link cache is an append to Plan A — fall back to re-link on miss. Extra care for platforms where the linker has non-deterministic output (Windows PE timestamps, macOS LC_UUID); stamp both keys with perry version to force invalidation on compiler upgrade.

### Plan C — HIR-addressable: content-hash at the HIR level

Hash the post-transform HIR instead of source. Equivalent-after-format-only-edits source changes (whitespace, comment edits) become cache hits. Monomorphization output folds into the HIR hash so any change to a generic's HIR form invalidates its specializations.

Expected win: covers the Plan A win *plus* free hits on formatter/linter noise, prettier-on-save, tools that rewrite `"` ↔ `'`, etc. — realistic for an IDE-integrated dev loop.

Risk: medium. HIR serialization needs to be stable and deterministic; any non-determinism in lowering (HashMap iteration order, ID counters) becomes a cache bug. Likely warrants a dedicated `Hash` impl on HIR nodes. Slightly larger upfront investment.

### Plan D — Staged: Plan A now, Plan B next, Plan C when it pays off

Ship Plan A first because it captures most of the win with the least code and the least ABI surface. Observe hit rate in real use. If link time dominates post-Plan-A, add Plan B. If formatter-on-save is common, do Plan C. Each stage lands as an independent PR.

### Recommendation

**Plan D, Plan A as first deliverable.** It's the lowest-risk path that still pays off, and each subsequent stage is independently valuable. Plan A alone brings the hot rebuild from ~330 ms to ~130 ms on the common "edit one file" case.

## Proposal

Concrete decisions I'm committing to unless someone objects here:

- **Cache location**: `.perry-cache/objects/<target-triple>/<hash>.o` under the project root (sibling of `.perry-dev/`), not under `target/` (Cargo owns that) and not user-global (`$HOME/.cache/perry/`) so it's per-project and gitignorable. Add `.perry-cache/` to the generated `.gitignore` in `perry init`.
- **Scope in first PR**: dev-only. Wired into `perry dev`, not `perry compile`, so the cache's correctness risk is bounded to the watch loop (where the worst-case is a confusing rebuild the user resolves by `rm -rf .perry-cache`). Promote to `perry compile` in a follow-up once it's been exercised.
- **ABI gate**: include `CARGO_PKG_VERSION` in the cache key. Perry ships patch versions frequently; any version bump invalidates the whole cache. Cheap and bulletproof.
- **Bitcode-link mode** (`PERRY_LLVM_BITCODE_LINK=1`): disable the cache when set. That mode swaps per-module `.o` output for `.ll`, and the bitcode-link pipeline is experimental enough that layering a cache on top isn't worth the testing surface yet.

## Questions

1. **Cache invalidation UX** — a `perry dev --no-cache` flag, a `perry cache clean` subcommand, both, or neither (users `rm -rf .perry-cache` when needed)? Precedent leans toward both existing (Cargo has `--no-cache` via `target-dir`, `cargo clean`), but Perry has been happy to add flags lazily.
2. **Metrics** — should V2.1 and V2.2 print hit-rate / time-saved telemetry on each rebuild, or stay silent like V1? Useful for tuning but can be noisy. I'd lean toward a `PERRY_DEV_VERBOSE=1` env-gate.
3. **Cross-platform caveats** — Plan B's link cache needs per-platform verification (PE timestamps, LC_UUID). Anyone aware of non-obvious non-determinism in the current link command I should watch for?

## Acceptance criteria for V2.2 Plan A

- Hot rebuild of a one-line edit on a 20-module program drops from ~330 ms to ≤ 150 ms measured on macOS.
- Cache misses on: source change, import-prefix change, class-ID reshuffle, imported-signature change, perry version bump, target change, opt-flag change.
- Cache stays correct across `perry dev` sessions (stop and restart → hits, not misses).
- `--no-cache` (or equivalent) bypass exists.
- Regression test under `test-files/` exercising hit + two invalidation paths.

Happy to take this on after V2.1 lands.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2 watch mode: incremental compilation cache — scoping (follow-up to #126) #131

Context

Problem

The four coupling concerns

Plans

Plan A — Conservative: content-hash cache, per-module only

Plan B — Granular: split codegen + link caches

Plan C — HIR-addressable: content-hash at the HIR level

Plan D — Staged: Plan A now, Plan B next, Plan C when it pays off

Recommendation

Proposal

Questions

Acceptance criteria for V2.2 Plan A

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

V2 watch mode: incremental compilation cache — scoping (follow-up to #126) #131

Description

Context

Problem

The four coupling concerns

Plans

Plan A — Conservative: content-hash cache, per-module only

Plan B — Granular: split codegen + link caches

Plan C — HIR-addressable: content-hash at the HIR level

Plan D — Staged: Plan A now, Plan B next, Plan C when it pays off

Recommendation

Proposal

Questions

Acceptance criteria for V2.2 Plan A

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions