feat: incremental AST cache to eliminate redundant AstEncoder work#59
Merged
Merged
Conversation
Three-level (L1 in-memory mtime / L2 disk content-hash / L3 chunk-aware) incremental AST cache that skips the pure-Python json.dumps(AstEncoder) bottleneck across runs and on partial file changes. - src/pyspector/ast_cache.py: cache implementation (JSON+base64 persistence, no pickle/code-exec on load) - src/pyspector/_ast_encode.py: shared AST->JSON encoder (single source of truth, eliminates encoder drift between cli.py and the cache) - src/pyspector/cli.py: wire the cache into get_python_file_asts - tests/unit/ast_cache_test.py: unit tests Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ParzivalHack
approved these changes
Jun 4, 2026
Owner
ParzivalHack
left a comment
There was a problem hiding this comment.
Hey @daniplatform, great idea on reducing the workload of AstEncoder, and in my tests this PR also increases PySpector's average scanning speed by 41.90%, so really a great addition. I require no edits. Merging :D
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an incremental AST cache that eliminates redundant
AstEncoderwork the pure-Pythonjson.dumps(ast_tree, cls=AstEncoder)step, which isO(N nodes)and dominates parsing cost (ast.parseitself is C and negligible).Three-level hierarchy:
mtimeguard: zero work on a hit within a single process run.Changes
src/pyspector/ast_cache.py— cache implementation. Persistence is JSON + base64 + zlib, deliberately not pickle (pickle executes arbitrary code on load, unsafe for cache files living in an untrusted repo directory). Atomic writes, LRU eviction, graceful degradation to L1-only when the cache dir can't be created.src/pyspector/_ast_encode.py— sharedAST → JSONencoder extracted as the single source of truth for the schema consumed by the Rust core, eliminating encoder drift betweencli.pyand the cache.src/pyspector/cli.py— wires the cache intoget_python_file_astsvia a new optionalcacheparameter, and re-exportsAstEncoderfrom_ast_encode(sofrom pyspector.cli import AstEncoderkeeps working). The cache is bypassed whenenable_syntax_warningsis True, preserving that diagnostic (the cache suppressesSyntaxWarninginternally).tests/unit/ast_cache_test.py— 51 unit tests (all green).Compatibility
This branch was rebased onto the latest
main, so it integrates cleanly on top of the recent CLI work (--stats,--debug,excludepre-pass, syntax-warning handling, absolutefile_path). All of those upstream behaviours are preserved; the cache only adds an optional fast path. The output AST JSON is byte-for-byte identical to the non-cached path.Testing
tests/unit/ast_cache_test.py: 51 passed.ast.parse + AstEncoderpath._rust_corevs. new rules-TOML schema, missingbs4, and an existing absolute-vs-relativefile_pathassertion intest_get_asts.py) reproduce identically onmainand are out of scope here.