Context
PR #126 shipped V1 of perry dev as v0.5.143 — filesystem watcher + debounced full rebuild on change. Hot rebuild after a one-line edit in a mid-sized program is ~330 ms, cold ~15 s (auto-optimize lib cache), and the PR's "Recent Changes" note explicitly earmarks V2 for "in-memory AST cache + per-module .o reuse for incremental compilation."
This issue scopes V2 before implementation. It splits naturally into two deliverables:
- V2.1 — in-memory AST cache inside a single
perry dev session. No disk state, no cross-session reuse, no ABI concerns. Small, low-risk, and measurable on its own. I'll send this as a standalone PR against the existing feat/dev-incremental-cache branch — it doesn't need design discussion.
- V2.2 — on-disk per-module
.o cache surviving across perry dev runs (and, eventually, across perry compile invocations). This is where the interesting design decisions are, and what this issue is actually about.
Problem
A hot rebuild in perry dev today does the full compile pipeline on every save: parse → lower → transform → per-module LLVM codegen → link. On a ~20-module program the ~330 ms breaks down roughly 60% codegen, 25% link, 15% parse+lower+transform. Codegen is the fat part and it's almost entirely cacheable: per-module .o files already exist as independent artifacts (see crates/perry/src/commands/compile.rs:4764-4780, produced inside the rayon loop at line 4425, then handed to the linker as separate objects at line 5656 — they are not currently archived into a static lib). The only reason we recompute them on every rebuild is that we throw them away at the end of each run.
Industry comparison:
- Vite keeps the parsed module graph in memory and HMR-patches only the changed module. Dev-server only, not AOT.
- esbuild holds ASTs warm in RAM between builds in watch mode, but re-codegens everything on each rebuild (fast enough that it doesn't matter for JS).
- tsc persists
.tsbuildinfo with a file-level dependency graph and timestamps; skips modules whose inputs didn't change transitively.
- Cargo fingerprints each crate with a hash over source + deps + rustc flags + toolchain version; hits the fingerprint → reuse the
.rlib/.o; else recompile. Closest analog to what we want.
The four coupling concerns
Perry's compile_module in perry-codegen/src/codegen.rs is called per-module but takes a CompileOptions that carries cross-module state. Any cache key has to pin these or we'll produce silently-wrong binaries:
-
Class IDs — compile.rs:2271-2274 threads a shared next_class_id counter across all modules during collection. Module A's class ID depends on which modules were seen before it. If we cache A.o but the module list changes, the ID baked into A's vtables could collide. Fix: either key the cache on the full module-order-derived class-ID map, or snapshot the assignment and verify on hit.
-
Import prefixes — symbols are mangled as perry_fn_<module>__<fn>. CompileOptions.import_function_prefixes + non_entry_module_prefixes tell each module how to spell its imports. If an import is renamed, every dependent module's .o contains a stale symbol reference and link fails. Cache key must hash the full import-prefix map a module depends on, not the module's own source alone.
-
Monomorphization — perry-hir/src/monomorph.rs:1621's monomorphize_module takes a single module. If app.ts instantiates Box<number> and Box<string> from box.ts, where do those specializations live and what invalidates them?
I ran this empirically rather than speculate. Two test cases, nm on the resulting .o files:
// utils.ts
export function identity<T>(x: T): T { return x; }
// app.ts
import { identity } from "./utils";
const a = identity(42);
const b = identity("hello");
utils_ts.o contained exactly one symbol: T _perry_fn_utils_ts__identity. Same result for a generic class Box<T> with number and string instantiations — box_ts.o had a single Box_constructor and a single Box__get, both reused by app2_ts.o regardless of T.
Root cause: Perry NaN-boxes every JS value into a 64-bit double. Generic code is bit-identical for any T, so there's nothing to specialize at the .o level. This coupling concern evaporates — monomorphization is a pure HIR-level rewrite that doesn't fork object code. The cache doesn't need to reason about it.
-
Global i18n table — CompileOptions.i18n_table is materialized as rodata in whichever module has is_entry_module: true. Entry-module cache entries must key on the table hash; non-entry modules don't touch it.
The CompileOptions docstring at perry-codegen/src/codegen.rs:109-115 confirms most other fields are "informational for the CLI driver's auto-optimize rebuild + linker step — compile_module itself only consults output_type and i18n_table." That narrows the real key material substantially: source hash + import-prefix map + imported-function signatures + class-ID assignment + (entry-only) i18n table hash + perry version + target triple + LLVM opt flags.
Plans
Plan A — Conservative: content-hash cache, per-module only
Hash (source_bytes, import_prefix_map, imported_func_signatures, class_id_map, perry_version, target, opt_flags) per module. Cache at .perry-cache/objects/<target>/<hash>.o in the project root. On compile, check cache; on miss, run codegen and write. Entry module's hash additionally includes the i18n table.
Expected win: the ~60% codegen portion drops to near-zero for unchanged modules. A one-line edit touching a single module → only that module re-codegens, ~200 ms saved on the ~330 ms rebuild. Link time (~85 ms) is unchanged.
Risk: low. Cache miss is always safe (falls back to current path). Cache-key bugs produce mismatched .o → link errors, caught immediately. No ABI surface exposed.
Plan B — Granular: split codegen + link caches
Plan A plus: cache the linked binary itself keyed on the set of input .o hashes + link flags. Unchanged set → skip cc/ld entirely.
Expected win: the remaining ~85 ms link time disappears on no-op rebuilds (useful when the watcher fires on unrelated file changes in the project root — docs, configs). A typed-edit rebuild still pays link cost.
Risk: low-medium. Link cache is an append to Plan A — fall back to re-link on miss. Extra care for platforms where the linker has non-deterministic output (Windows PE timestamps, macOS LC_UUID); stamp both keys with perry version to force invalidation on compiler upgrade.
Plan C — HIR-addressable: content-hash at the HIR level
Hash the post-transform HIR instead of source. Equivalent-after-format-only-edits source changes (whitespace, comment edits) become cache hits. Monomorphization output folds into the HIR hash so any change to a generic's HIR form invalidates its specializations.
Expected win: covers the Plan A win plus free hits on formatter/linter noise, prettier-on-save, tools that rewrite " ↔ ', etc. — realistic for an IDE-integrated dev loop.
Risk: medium. HIR serialization needs to be stable and deterministic; any non-determinism in lowering (HashMap iteration order, ID counters) becomes a cache bug. Likely warrants a dedicated Hash impl on HIR nodes. Slightly larger upfront investment.
Plan D — Staged: Plan A now, Plan B next, Plan C when it pays off
Ship Plan A first because it captures most of the win with the least code and the least ABI surface. Observe hit rate in real use. If link time dominates post-Plan-A, add Plan B. If formatter-on-save is common, do Plan C. Each stage lands as an independent PR.
Recommendation
Plan D, Plan A as first deliverable. It's the lowest-risk path that still pays off, and each subsequent stage is independently valuable. Plan A alone brings the hot rebuild from ~330 ms to ~130 ms on the common "edit one file" case.
Proposal
Concrete decisions I'm committing to unless someone objects here:
- Cache location:
.perry-cache/objects/<target-triple>/<hash>.o under the project root (sibling of .perry-dev/), not under target/ (Cargo owns that) and not user-global ($HOME/.cache/perry/) so it's per-project and gitignorable. Add .perry-cache/ to the generated .gitignore in perry init.
- Scope in first PR: dev-only. Wired into
perry dev, not perry compile, so the cache's correctness risk is bounded to the watch loop (where the worst-case is a confusing rebuild the user resolves by rm -rf .perry-cache). Promote to perry compile in a follow-up once it's been exercised.
- ABI gate: include
CARGO_PKG_VERSION in the cache key. Perry ships patch versions frequently; any version bump invalidates the whole cache. Cheap and bulletproof.
- Bitcode-link mode (
PERRY_LLVM_BITCODE_LINK=1): disable the cache when set. That mode swaps per-module .o output for .ll, and the bitcode-link pipeline is experimental enough that layering a cache on top isn't worth the testing surface yet.
Questions
- Cache invalidation UX — a
perry dev --no-cache flag, a perry cache clean subcommand, both, or neither (users rm -rf .perry-cache when needed)? Precedent leans toward both existing (Cargo has --no-cache via target-dir, cargo clean), but Perry has been happy to add flags lazily.
- Metrics — should V2.1 and V2.2 print hit-rate / time-saved telemetry on each rebuild, or stay silent like V1? Useful for tuning but can be noisy. I'd lean toward a
PERRY_DEV_VERBOSE=1 env-gate.
- Cross-platform caveats — Plan B's link cache needs per-platform verification (PE timestamps, LC_UUID). Anyone aware of non-obvious non-determinism in the current link command I should watch for?
Acceptance criteria for V2.2 Plan A
- Hot rebuild of a one-line edit on a 20-module program drops from ~330 ms to ≤ 150 ms measured on macOS.
- Cache misses on: source change, import-prefix change, class-ID reshuffle, imported-signature change, perry version bump, target change, opt-flag change.
- Cache stays correct across
perry dev sessions (stop and restart → hits, not misses).
--no-cache (or equivalent) bypass exists.
- Regression test under
test-files/ exercising hit + two invalidation paths.
Happy to take this on after V2.1 lands.
Context
PR #126 shipped V1 of
perry devas v0.5.143 — filesystem watcher + debounced full rebuild on change. Hot rebuild after a one-line edit in a mid-sized program is ~330 ms, cold ~15 s (auto-optimize lib cache), and the PR's "Recent Changes" note explicitly earmarks V2 for "in-memory AST cache + per-module.oreuse for incremental compilation."This issue scopes V2 before implementation. It splits naturally into two deliverables:
perry devsession. No disk state, no cross-session reuse, no ABI concerns. Small, low-risk, and measurable on its own. I'll send this as a standalone PR against the existingfeat/dev-incremental-cachebranch — it doesn't need design discussion..ocache surviving acrossperry devruns (and, eventually, acrossperry compileinvocations). This is where the interesting design decisions are, and what this issue is actually about.Problem
A hot rebuild in
perry devtoday does the full compile pipeline on every save: parse → lower → transform → per-module LLVM codegen → link. On a ~20-module program the ~330 ms breaks down roughly 60% codegen, 25% link, 15% parse+lower+transform. Codegen is the fat part and it's almost entirely cacheable: per-module.ofiles already exist as independent artifacts (seecrates/perry/src/commands/compile.rs:4764-4780, produced inside the rayon loop at line 4425, then handed to the linker as separate objects at line 5656 — they are not currently archived into a static lib). The only reason we recompute them on every rebuild is that we throw them away at the end of each run.Industry comparison:
.tsbuildinfowith a file-level dependency graph and timestamps; skips modules whose inputs didn't change transitively..rlib/.o; else recompile. Closest analog to what we want.The four coupling concerns
Perry's
compile_moduleinperry-codegen/src/codegen.rsis called per-module but takes aCompileOptionsthat carries cross-module state. Any cache key has to pin these or we'll produce silently-wrong binaries:Class IDs —
compile.rs:2271-2274threads a sharednext_class_idcounter across all modules during collection. Module A's class ID depends on which modules were seen before it. If we cacheA.obut the module list changes, the ID baked into A's vtables could collide. Fix: either key the cache on the full module-order-derived class-ID map, or snapshot the assignment and verify on hit.Import prefixes — symbols are mangled as
perry_fn_<module>__<fn>.CompileOptions.import_function_prefixes+non_entry_module_prefixestell each module how to spell its imports. If an import is renamed, every dependent module's.ocontains a stale symbol reference and link fails. Cache key must hash the full import-prefix map a module depends on, not the module's own source alone.Monomorphization —
perry-hir/src/monomorph.rs:1621'smonomorphize_moduletakes a single module. Ifapp.tsinstantiatesBox<number>andBox<string>frombox.ts, where do those specializations live and what invalidates them?I ran this empirically rather than speculate. Two test cases,
nmon the resulting.ofiles:utils_ts.ocontained exactly one symbol:T _perry_fn_utils_ts__identity. Same result for a generic classBox<T>withnumberandstringinstantiations —box_ts.ohad a singleBox_constructorand a singleBox__get, both reused byapp2_ts.oregardless of T.Root cause: Perry NaN-boxes every JS value into a 64-bit double. Generic code is bit-identical for any T, so there's nothing to specialize at the
.olevel. This coupling concern evaporates — monomorphization is a pure HIR-level rewrite that doesn't fork object code. The cache doesn't need to reason about it.Global i18n table —
CompileOptions.i18n_tableis materialized as rodata in whichever module hasis_entry_module: true. Entry-module cache entries must key on the table hash; non-entry modules don't touch it.The
CompileOptionsdocstring atperry-codegen/src/codegen.rs:109-115confirms most other fields are "informational for the CLI driver's auto-optimize rebuild + linker step —compile_moduleitself only consultsoutput_typeandi18n_table." That narrows the real key material substantially: source hash + import-prefix map + imported-function signatures + class-ID assignment + (entry-only) i18n table hash + perry version + target triple + LLVM opt flags.Plans
Plan A — Conservative: content-hash cache, per-module only
Hash
(source_bytes, import_prefix_map, imported_func_signatures, class_id_map, perry_version, target, opt_flags)per module. Cache at.perry-cache/objects/<target>/<hash>.oin the project root. On compile, check cache; on miss, run codegen and write. Entry module's hash additionally includes the i18n table.Expected win: the ~60% codegen portion drops to near-zero for unchanged modules. A one-line edit touching a single module → only that module re-codegens, ~200 ms saved on the ~330 ms rebuild. Link time (~85 ms) is unchanged.
Risk: low. Cache miss is always safe (falls back to current path). Cache-key bugs produce mismatched
.o→ link errors, caught immediately. No ABI surface exposed.Plan B — Granular: split codegen + link caches
Plan A plus: cache the linked binary itself keyed on the set of input
.ohashes + link flags. Unchanged set → skipcc/ldentirely.Expected win: the remaining ~85 ms link time disappears on no-op rebuilds (useful when the watcher fires on unrelated file changes in the project root — docs, configs). A typed-edit rebuild still pays link cost.
Risk: low-medium. Link cache is an append to Plan A — fall back to re-link on miss. Extra care for platforms where the linker has non-deterministic output (Windows PE timestamps, macOS LC_UUID); stamp both keys with perry version to force invalidation on compiler upgrade.
Plan C — HIR-addressable: content-hash at the HIR level
Hash the post-transform HIR instead of source. Equivalent-after-format-only-edits source changes (whitespace, comment edits) become cache hits. Monomorphization output folds into the HIR hash so any change to a generic's HIR form invalidates its specializations.
Expected win: covers the Plan A win plus free hits on formatter/linter noise, prettier-on-save, tools that rewrite
"↔', etc. — realistic for an IDE-integrated dev loop.Risk: medium. HIR serialization needs to be stable and deterministic; any non-determinism in lowering (HashMap iteration order, ID counters) becomes a cache bug. Likely warrants a dedicated
Hashimpl on HIR nodes. Slightly larger upfront investment.Plan D — Staged: Plan A now, Plan B next, Plan C when it pays off
Ship Plan A first because it captures most of the win with the least code and the least ABI surface. Observe hit rate in real use. If link time dominates post-Plan-A, add Plan B. If formatter-on-save is common, do Plan C. Each stage lands as an independent PR.
Recommendation
Plan D, Plan A as first deliverable. It's the lowest-risk path that still pays off, and each subsequent stage is independently valuable. Plan A alone brings the hot rebuild from ~330 ms to ~130 ms on the common "edit one file" case.
Proposal
Concrete decisions I'm committing to unless someone objects here:
.perry-cache/objects/<target-triple>/<hash>.ounder the project root (sibling of.perry-dev/), not undertarget/(Cargo owns that) and not user-global ($HOME/.cache/perry/) so it's per-project and gitignorable. Add.perry-cache/to the generated.gitignoreinperry init.perry dev, notperry compile, so the cache's correctness risk is bounded to the watch loop (where the worst-case is a confusing rebuild the user resolves byrm -rf .perry-cache). Promote toperry compilein a follow-up once it's been exercised.CARGO_PKG_VERSIONin the cache key. Perry ships patch versions frequently; any version bump invalidates the whole cache. Cheap and bulletproof.PERRY_LLVM_BITCODE_LINK=1): disable the cache when set. That mode swaps per-module.ooutput for.ll, and the bitcode-link pipeline is experimental enough that layering a cache on top isn't worth the testing surface yet.Questions
perry dev --no-cacheflag, aperry cache cleansubcommand, both, or neither (usersrm -rf .perry-cachewhen needed)? Precedent leans toward both existing (Cargo has--no-cacheviatarget-dir,cargo clean), but Perry has been happy to add flags lazily.PERRY_DEV_VERBOSE=1env-gate.Acceptance criteria for V2.2 Plan A
perry devsessions (stop and restart → hits, not misses).--no-cache(or equivalent) bypass exists.test-files/exercising hit + two invalidation paths.Happy to take this on after V2.1 lands.