Skip to content

feat(dev): in-memory AST cache for perry dev rebuilds (V2.1)#132

Merged
proggeramlug merged 1 commit intoPerryTS:mainfrom
TheHypnoo:feat/dev-incremental-cache
Apr 22, 2026
Merged

feat(dev): in-memory AST cache for perry dev rebuilds (V2.1)#132
proggeramlug merged 1 commit intoPerryTS:mainfrom
TheHypnoo:feat/dev-incremental-cache

Conversation

@TheHypnoo
Copy link
Copy Markdown
Contributor

Summary

V2.1 of the perry dev watch mode: an in-memory AST cache held across rebuilds in a single dev session. On each rebuild, files whose source bytes match the cached version skip SWC parsing and reuse the previous swc_ecma_ast::Module. Scope is strictly perry devperry compile and every other entry point pass None and behave byte-identically to v0.5.155.

This is the first of the two deliverables scoped in #131. The V2.2 follow-up (per-module .o reuse on disk, with cache keying over class-IDs / import prefixes / imported-function signatures / perry version / target / opt flags) is a separate PR tracked in that issue.

Smoke-tested on a two-module project with PERRY_DEV_VERBOSE=1:

Initial build:    parse cache: 0/2 hit (2 miss)
After edit:       parse cache: 1/2 hit (1 miss)   rebuilt in 502ms

What this changes

commands/compile.rs

  • New pub struct ParseCache (path-keyed HashMap<PathBuf, ParseCacheEntry> storing { source: String, module: swc_ecma_ast::Module } plus hit/miss counters).
  • New parse_cached(&mut ParseCache, &Path, &str, &str) -> Result<&Module> helper that does the full-source-bytes comparison and reparse-on-miss.
  • collect_modules gains a trailing Option<&mut ParseCache> parameter, re-borrowed at its four recursive call sites via Option::as_deref_mut().
  • New pub fn run_with_parse_cache(args, Option<&mut ParseCache>, ...). The existing pub fn run(...) delegates to it with None, so every callsite outside this PR is unchanged.

commands/dev.rs

  • Owns one ParseCache::new() across the watch loop; reset_counters() before each rebuild; prints the per-rebuild hit/miss ratio when PERRY_DEV_VERBOSE=1.
  • build_once threads the cache into run_with_parse_cache.

Design choices worth calling out

Content-addressed invalidation, not mtime. The cached entry stores the last seen source bytes and compares byte-for-byte on lookup. A formatter-on-save that changes trivia is a miss (correctly); a touch that only bumps mtime is a hit (correctly); git checkout to the same content is a hit. Byte comparison is O(n) but it beats parsing O(n) by a large factor, and we already have the bytes in hand from fs::read_to_string so there's no extra I/O.

No metadata-based heuristics. I considered mtime+size but that can get fooled by tools that rewrite files with the same mtime (rare but real), and metadata-only invalidation leaves the door open for subtle bugs. Full byte compare is simpler to reason about and costs microseconds.

Scope is dev-only. Every non-dev path still calls the unchanged pub fn run(...). This keeps the correctness risk bounded to the watch loop — where the worst-case failure mode is a confusing rebuild the user resolves by restarting perry dev.

No serialization, no disk. The cache lives in the perry dev process and dies with it. The on-disk equivalent is V2.2 in #131, which has its own ABI and cache-key concerns (class-ID coupling, import-prefix mangling, perry version gate) that aren't relevant here.

Test plan

  • cargo test --workspace --exclude perry-ui-ios --exclude perry-ui-tvos --exclude perry-ui-watchos --exclude perry-ui-gtk4 --exclude perry-ui-android --exclude perry-ui-windows — all green.
  • New unit tests in commands::compile::parse_cache_tests (8 tests, all pass):
    • cold lookup is a miss
    • identical source twice → hit
    • changed source → miss + entry replaced
    • revert to prior source is still a miss (documented behaviour — cache keeps only the latest version, not history)
    • distinct paths are independent
    • reset_counters clears counters but preserves entries
    • hit returns an equivalent AST to a fresh parse
    • parse errors propagate without poisoning subsequent lookups on the same path
  • Existing commands::dev::tests (8 tests) still pass — V1 helpers untouched.
  • End-to-end smoke test: two-module project, initial 0/2 hit, post-edit 1/2 hit with correct rebuilt output.

What's NOT in this PR

Refs #131.

Introduces `ParseCache`, a path-keyed in-memory cache of parsed
`swc_ecma_ast::Module`s owned by `perry dev` for the lifetime of one
watch session. On each rebuild the cache is consulted at the parse
site in `collect_modules`; unchanged files reuse their prior AST and
skip the SWC parse step entirely.

Invalidation is content-addressed — the cached entry stores the last
seen source bytes and a fresh read that matches byte-for-byte is a
hit. Editor formatter-on-save, `touch`-style mtime bumps, and git
checkouts all route to the correct branch (hit when the resulting
bytes are identical, miss when they differ) without any mtime
reasoning.

Plumbing is the minimum needed:

- new `pub fn run_with_parse_cache(args, Option<&mut ParseCache>, ...)`
  in `commands/compile.rs`; existing `pub fn run(...)` delegates with
  `None` so every non-dev caller is byte-identical with v0.5.155;
- `collect_modules` grows a trailing `Option<&mut ParseCache>` and
  re-borrows through its four recursive calls via `as_deref_mut`;
- `commands/dev.rs` owns one `ParseCache` across the watch loop and
  threads it into `build_once`, resetting hit/miss counters per
  rebuild; `PERRY_DEV_VERBOSE=1` prints the per-rebuild ratio.

Scope is strictly `perry dev`. `perry compile` and every other entry
point pass `None` and behave unchanged. On-disk `.o` reuse lives in
the V2.2 scoping issue (PerryTS#131) as a separate, staged follow-up.

Smoke-tested on a two-module project: initial build reports
`0/2 hit (2 miss)`, after editing one file the next rebuild reports
`1/2 hit (1 miss)` in 502ms with correct output. 8 unit tests cover
the helper (cold miss, warm hit, source change → miss + replace,
revert-is-miss, path independence, `reset_counters` preserves
entries, hit-equals-fresh-parse, parse errors propagate without
poisoning the cache).

Refs PerryTS#131
@proggeramlug
Copy link
Copy Markdown
Contributor

This is looking great, will merge once the tests clear!

@TheHypnoo
Copy link
Copy Markdown
Contributor Author

TheHypnoo commented Apr 22, 2026

Empirical benchmark — V2.1 on its own is not a measurable improvement

Ran after opening the PR to sanity-check the real impact before merge. Synthetic project: 30 TypeScript modules, ~1700 LOC total, each rebuild editing a different module (→ 30/31 hit ratio under V2.1).

Configuration Median rebuild (excluding the first, which is warmup)
V2.1 cache ON ~824 ms
Baseline (cache disabled = v0.5.155 behaviour) ~782 ms

V2.1 sits inside measurement noise, slightly below baseline on some runs. The reason is purely mechanical:

  • SWC parses extremely fast (~1 ms per file, ~30 ms total on this sample).
  • The full source byte comparison on the 30 hit paths costs roughly the same as the parse it avoids.
  • Rebuild time is dominated by HIR lowering + codegen + link, none of which V2.1 touches.

I'm leaving the PR open anyway because:

  1. The scaffolding (ParseCache, run_with_parse_cache, the threading through collect_modules) is exactly what V2.2 needs for the on-disk .o cache — it's reused verbatim, only the cached value changes (AST → object file).
  2. The source byte comparison becomes the basis of V2.2's cache key (together with import_prefix_map + class_id_map + perry_version + target + opt_flags), so the work isn't thrown away.
  3. No regression introduced: 8 new unit tests + the 8 existing dev tests still pass, and the full workspace is green.

The actual performance win lands with V2.2 — issue #131, plan A (per-module content-hashed .o cache). That's the ~60% of rebuild time (current codegen → 0 ms on hits), not a few milliseconds of parsing. This PR is the stepping stone.

@proggeramlug proggeramlug merged commit 77ad295 into PerryTS:main Apr 22, 2026
8 checks passed
@TheHypnoo TheHypnoo deleted the feat/dev-incremental-cache branch April 22, 2026 09:02
proggeramlug pushed a commit that referenced this pull request Apr 22, 2026
Adds `.perry-cache/objects/<target>/<key:016x>.o`, shared across
`perry compile` / `perry run` / `perry dev` invocations. Each rayon
codegen worker computes a djb2 key from (source hash, every codegen-
affecting `CompileOptions` field, perry version) and reuses the cached
bytes instead of re-invoking LLVM on unchanged modules. On a 30-module
bench, warm rebuilds drop from ~714 ms → ~509 ms (~29% faster); a
single-module edit rebuilds in the same ~512 ms (cost scales with
changed modules, not total).

Follows v2.1 (#132) — v2.1's in-memory AST cache only helps within a
single `perry dev` session and didn't pay off against SWC's ~1ms/file
parse cost; v2.2 is the real win because it skips the whole LLVM
pipeline, not just parsing.

Architecture:
- `ObjectCache` (thread-safe via AtomicUsize counters, file-per-entry
  so rayon workers don't contend) with atomic tmp-then-rename writes.
  IO errors are silently counted and degrade to the uncached codepath
  — the cache is strictly an optimization.
- `compute_object_cache_key` serializes every `CompileOptions` field
  that affects `compile_module`'s bytes: source hash, target triple,
  is_entry_module, all import maps/sets (sorted so HashMap iteration
  order doesn't leak in), imported classes (full signature incl. ids),
  imported enums, type aliases, enabled features, i18n snapshot,
  CARGO_PKG_VERSION. Topologically-sorted lists (non_entry_prefixes,
  native_module_init_names) preserve order so a link-ordering change
  (the v0.5.127-128 bug class) invalidates consumers.
- Disabled automatically in bitcode-link mode (PERRY_LLVM_BITCODE_LINK=1)
  since compile_module emits .ll text, not object bytes.

CLI:
- `--no-cache` flag on `perry compile` / `perry run` / `perry dev`.
- `PERRY_NO_CACHE=1` env var (CI-friendly override).
- `PERRY_CACHE_VERBOSE=1` prints `• object cache: H/T hit (M miss, S store, E store-err)`
- `perry cache info` — cache location, total size, per-target breakdown.
- `perry cache clean` — wipe `.perry-cache/` for the current project.

Tests:
- 15 unit tests in `object_cache_tests` covering key stability across
  HashMap-insertion-order permutations, key divergence on every
  invalidation axis (source hash, perry version, target, entry flag,
  non-entry-prefix order, imported class arity, bitcode mode),
  disabled-cache no-op semantics, cross-target isolation, and store
  round-trip.
- `scripts/run_cache_tests.sh` end-to-end smoke: 4-module project
  (test-files/module-init-order), asserts cold→warm→partial→rewarm
  hit/miss shapes and that a source edit is never served stale bytes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants