feat: incremental AST cache to eliminate redundant AstEncoder work by daniplatform · Pull Request #59 · ParzivalHack/PySpector

daniplatform · 2026-06-04T07:58:53Z

Summary

Adds an incremental AST cache that eliminates redundant AstEncoder work the pure-Python json.dumps(ast_tree, cls=AstEncoder) step, which is O(N nodes) and dominates parsing cost (ast.parse itself is C and negligible).

Three-level hierarchy:

L1 in-memory mtime guard: zero work on a hit within a single process run.
L2 disk content-hash guard: no parse/encode across runs.
L3 chunk-aware per-function / per-class subtree reuse when a file only partially changes.

Changes

src/pyspector/ast_cache.py — cache implementation. Persistence is JSON + base64 + zlib, deliberately not pickle (pickle executes arbitrary code on load, unsafe for cache files living in an untrusted repo directory). Atomic writes, LRU eviction, graceful degradation to L1-only when the cache dir can't be created.
src/pyspector/_ast_encode.py — shared AST → JSON encoder extracted as the single source of truth for the schema consumed by the Rust core, eliminating encoder drift between cli.py and the cache.
src/pyspector/cli.py — wires the cache into get_python_file_asts via a new optional cache parameter, and re-exports AstEncoder from _ast_encode (so from pyspector.cli import AstEncoder keeps working). The cache is bypassed when enable_syntax_warnings is True, preserving that diagnostic (the cache suppresses SyntaxWarning internally).
tests/unit/ast_cache_test.py — 51 unit tests (all green).

Compatibility

This branch was rebased onto the latest main, so it integrates cleanly on top of the recent CLI work (--stats, --debug, exclude pre-pass, syntax-warning handling, absolute file_path). All of those upstream behaviours are preserved; the cache only adds an optional fast path. The output AST JSON is byte-for-byte identical to the non-cached path.

Testing

tests/unit/ast_cache_test.py: 51 passed.
AST JSON produced via the cache verified identical to the direct ast.parse + AstEncoder path.
Pre-existing failures unrelated to this change (stale compiled _rust_core vs. new rules-TOML schema, missing bs4, and an existing absolute-vs-relative file_path assertion in test_get_asts.py) reproduce identically on main and are out of scope here.

Three-level (L1 in-memory mtime / L2 disk content-hash / L3 chunk-aware) incremental AST cache that skips the pure-Python json.dumps(AstEncoder) bottleneck across runs and on partial file changes. - src/pyspector/ast_cache.py: cache implementation (JSON+base64 persistence, no pickle/code-exec on load) - src/pyspector/_ast_encode.py: shared AST->JSON encoder (single source of truth, eliminates encoder drift between cli.py and the cache) - src/pyspector/cli.py: wire the cache into get_python_file_asts - tests/unit/ast_cache_test.py: unit tests Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ParzivalHack

Hey @daniplatform, great idea on reducing the workload of AstEncoder, and in my tests this PR also increases PySpector's average scanning speed by 41.90%, so really a great addition. I require no edits. Merging :D

ParzivalHack added enhancement New feature or request Test Label for every issue related to tests and testing in general labels Jun 4, 2026

ParzivalHack approved these changes Jun 4, 2026

View reviewed changes

ParzivalHack merged commit fce70f6 into ParzivalHack:main Jun 4, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: incremental AST cache to eliminate redundant AstEncoder work#59

feat: incremental AST cache to eliminate redundant AstEncoder work#59
ParzivalHack merged 1 commit into
ParzivalHack:mainfrom
daniplatform:caching-AST

daniplatform commented Jun 4, 2026 •

edited

Loading

Uh oh!

ParzivalHack left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daniplatform commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Compatibility

Testing

Uh oh!

ParzivalHack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daniplatform commented Jun 4, 2026 •

edited

Loading