perf: cache file reads and AST parses during discovery pass by KRRT7 · Pull Request #2135 · codeflash-ai/codeflash

KRRT7 · 2026-05-07T08:30:24Z

Parent PR

#2132 — perf: targeted performance improvements for E2E pipeline hot path

Summary

Adds discovery-scoped discovery_cache() context manager that caches file content and parsed ASTs for the duration of a single discovery run
read_file_cached() ensures each file is read from disk at most once
parse_ast_cached() ensures each file's AST is parsed at most once
Removes dead code: _find_all_functions_via_language_support() (never called, had pre-existing type error)

Changes

get_functions_to_optimize(): wraps body with discovery_cache()
find_all_functions_in_file(): uses read_file_cached
inspect_top_level_functions_or_methods(): uses parse_ast_cached
get_all_replay_test_functions(): uses parse_ast_cached
JS/TS helpers: use read_file_cached
New test file: tests/test_discovery_cache.py (12 tests)

Stack

perf: prefilter files by path before read_text() in discovery #2134 (prefilter) → main
This PR → perf/discovery-prefilter
perf: read JS/TS files once during discovery export checks #2136 (js-single-read) → this branch

Moves path-based checks (test file detection, ignore paths, submodule paths, site-packages, outside module-root) to run BEFORE read_text() is called in get_all_files_and_functions() and get_functions_within_lines(). This avoids unnecessary file I/O for files that would be discarded by filter_functions() anyway. Also fixes pre-existing mypy error in _find_all_functions_via_language_support where discover_functions was called with wrong argument order. Signature changes (backward-compatible, all new params are optional): - get_all_files_and_functions: added tests_root, module_root params - get_functions_within_lines: added tests_root, ignore_paths, module_root - get_functions_within_git_diff: added tests_root, ignore_paths, module_root

Introduces a discovery-scoped cache (`discovery_cache()` context manager) that ensures each file is read from disk at most once and parsed into an AST at most once within a single discovery run. Key changes: - `read_file_cached()`: returns cached file content when `discovery_cache()` is active, falls back to normal read otherwise - `parse_ast_cached()`: returns cached `ast.Module` when active - `get_functions_to_optimize()`: wraps its body with `discovery_cache()` - `find_all_functions_in_file()`: uses `read_file_cached` - `inspect_top_level_functions_or_methods()`: uses `parse_ast_cached` - `get_all_replay_test_functions()`: uses `parse_ast_cached` - JS/TS export helpers: use `read_file_cached` - Removed dead code: `_find_all_functions_via_language_support()` (never called, had a type error passing Path as source str) Signature changes: None. All public function signatures are unchanged. The cache is transparent -- when not inside `discovery_cache()`, all functions behave identically to before (direct reads/parses).

The _is_js_ts_function_exported and _is_js_ts_function_exists_but_not_exported helpers each read the file from disk independently, causing up to 2 extra reads per file during get_functions_to_optimize. Add an optional `source` parameter to both functions so callers can pass pre-read content. The main call site now reads the file once and passes it to both helpers. Also fixes pre-existing mypy error in _find_all_functions_via_language_support where discover_functions was called with wrong argument order. Signatures changed: - _is_js_ts_function_exported(file_path, function_name, source=None) - _is_js_ts_function_exists_but_not_exported(file_path, function_name, source=None)

The --file path was calling file.read_text() directly even though find_all_functions_in_file() already primed the discovery cache. Now uses read_file_cached() to hit the cache with zero disk I/O.

The callers BLOB column was fetched and JSON-decoded for every row during FunctionRanker initialization, but the resulting data was never accessed — FunctionRanker.load_function_stats() discards it as `_callers`. This eliminates O(n) JSON parsing at ranking startup. Also fixes all pre-existing mypy errors in this file by declaring pstats.Stats attributes that the type stubs don't expose.

aseembits93 · 2026-05-08T00:23:30Z


 class TestFiles(BaseModel):
    test_files: list[TestFile]
+    _seen_paths: set[Path] = PrivateAttr(default_factory=set)


wasn't this merged already?

The merge-base changed after approval.

perf: read JS/TS files once during discovery export checks

…decode perf: skip unused callers JSON decode in ProfileStats

KRRT7 requested a review from misrasaurabh1 as a code owner May 7, 2026 08:30

This was referenced May 7, 2026

perf: read JS/TS files once during discovery export checks #2136

Merged

perf: targeted performance improvements for E2E pipeline #2132

Draft

perf: skip unused callers JSON decode in ProfileStats #2137

Merged

KRRT7 force-pushed the perf/discovery-prefilter branch from 5c1cc1e to 828496b Compare May 7, 2026 16:51

KRRT7 force-pushed the perf/discovery-ast-cache branch from dcc5bea to 9164a8d Compare May 7, 2026 16:51

KRRT7 mentioned this pull request May 7, 2026

perf: prefilter files by path before read_text() in discovery #2134

Open

KRRT7 force-pushed the perf/discovery-prefilter branch from 828496b to 36074a3 Compare May 7, 2026 17:30

KRRT7 requested a review from aseembits93 as a code owner May 7, 2026 17:30

KRRT7 force-pushed the perf/discovery-ast-cache branch from 9164a8d to 120ea4b Compare May 7, 2026 17:30

KRRT7 force-pushed the perf/discovery-prefilter branch from 36074a3 to 6bbf072 Compare May 7, 2026 17:42

KRRT7 force-pushed the perf/discovery-ast-cache branch from 120ea4b to beac23d Compare May 7, 2026 17:42

KRRT7 added 5 commits May 7, 2026 17:46

fix: use read_file_cached instead of raw disk read for JS/TS source

39d49b1

The --file path was calling file.read_text() directly even though find_all_functions_in_file() already primed the discovery cache. Now uses read_file_cached() to hit the cache with zero disk I/O.

KRRT7 force-pushed the perf/discovery-prefilter branch from 6bbf072 to 6527780 Compare May 7, 2026 22:47

KRRT7 force-pushed the perf/discovery-ast-cache branch from beac23d to 93763a3 Compare May 7, 2026 22:47

KRRT7 marked this pull request as draft May 7, 2026 22:53

KRRT7 marked this pull request as ready for review May 8, 2026 00:19

KRRT7 changed the base branch from perf/discovery-prefilter to main May 8, 2026 00:19

Merge branch 'main' into perf/discovery-ast-cache

8123d52

aseembits93 reviewed May 8, 2026

View reviewed changes

aseembits93 previously approved these changes May 8, 2026

View reviewed changes

aseembits93 and others added 2 commits May 7, 2026 17:28

Merge pull request #2136 from codeflash-ai/perf/discovery-js-read

af2912f

perf: read JS/TS files once during discovery export checks

Merge branch 'main' into perf/discovery-ast-cache

54eb842

KRRT7 enabled auto-merge May 8, 2026 00:35

Merge pull request #2137 from codeflash-ai/cf-2132-skip-callers-json-…

d906b88

…decode perf: skip unused callers JSON decode in ProfileStats

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cache file reads and AST parses during discovery pass#2135

perf: cache file reads and AST parses during discovery pass#2135
KRRT7 wants to merge 9 commits intomainfrom
perf/discovery-ast-cache

KRRT7 commented May 7, 2026 •

edited

Loading

Uh oh!

aseembits93 May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KRRT7 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Parent PR

Summary

Changes

Stack

Uh oh!

aseembits93 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KRRT7 commented May 7, 2026 •

edited

Loading