feat: parse/plan LRU cache for repeated queries#504
Conversation
- GraphForge(cache_size=N) enables an N-entry LRU plan cache
- Cache key: whitespace-normalised query string
- Cache stores post-optimisation operator list; executor re-uses it
directly, skipping parse + plan + optimise on subsequent calls
- cache_size=0 (default) disables caching with zero overhead
- clear_cache() empties the cache and resets hit/miss counters
- cache_info() returns {"size", "hits", "misses", "capacity"}
- Thread-safe: all cache mutations guarded by threading.Lock
- UNION queries and multi-statement scripts bypass the cache
- 16 unit tests: disabled mode, hits/misses, whitespace normalisation,
parameterised queries, LRU eviction, clear, thread-safety, benchmark
- Benchmark confirms >25% wall-clock reduction at 500 repeated reads
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WalkthroughAdds an optional LRU plan cache to GraphForge via a new ChangesQuery Plan Caching
Sequence DiagramsequenceDiagram
actor User
participant API as GraphForge API
participant Cache as Plan Cache
participant Parser as Query Parser
participant Planner as Operator Planner
participant Executor as Plan Executor
User->>API: execute(query, parameters)
API->>API: normalize_query(query)
API->>Cache: lookup(cache_key)
alt Cache Hit
Cache-->>API: cached_operators
API->>Executor: execute(cached_operators, parameters)
Executor-->>User: results
else Cache Miss
Cache-->>API: miss
API->>Parser: parse(query)
Parser->>Planner: plan(ast)
Planner-->>API: operators
API->>Cache: store(cache_key, operators)
Cache-->>API: stored (evict if needed)
API->>Executor: execute(operators, parameters)
Executor-->>User: results
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #504 +/- ##
==========================================
+ Coverage 88.07% 88.10% +0.03%
==========================================
Files 40 40
Lines 14676 14730 +54
Branches 3472 3481 +9
==========================================
+ Hits 12926 12978 +52
- Misses 1145 1146 +1
- Partials 605 606 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/graphforge/api.py (1)
196-204: 💤 Low valueMove
import threadingto module-level imports.The
threadingimport inside__init__works but is unconventional. Module-level imports improve readability and make dependencies explicit.♻️ Suggested fix
Add to the imports section at the top of the file (around line 6-11):
import threadingThen remove line 197-198 from
__init__.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graphforge/api.py` around lines 196 - 204, Move the inline import out of the __init__ and add it to the module-level imports: remove the "import threading" statement inside the GraphForgeClient (or relevant class) __init__ and add "import threading" at the top of src/graphforge/api.py alongside the other imports, leaving the existing attributes (_cache_lock, _cache_size, _cache, _cache_order, _cache_hits, _cache_misses) and their initialization unchanged so that _cache_lock continues to be initialized with threading.Lock() in __init__.tests/unit/api/test_cache.py (2)
75-86: 💤 Low valueConsider strengthening the assertion.
The assertion
hits >= 1is correct but could be more precise. After two different CREATE queries and two identical MATCH queries (with different parameters), the expected count is exactly 1 hit.♻️ Suggested fix
- # Second MATCH call hits the cache - assert gf.cache_info()["hits"] >= 1 + # Second MATCH call hits the cache (same query string, different params) + assert gf.cache_info()["hits"] == 1 + assert gf.cache_info()["misses"] == 3 # 2 CREATEs + 1 first MATCH🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/api/test_cache.py` around lines 75 - 86, Update the test_parameterised_query_cached_once to assert the exact expected cache hit count: after creating two nodes and executing the same parameterised MATCH twice (via GraphForge instance gf and method execute), the cache should record exactly one hit, so replace the loosened assertion (gf.cache_info()["hits"] >= 1) with a strict equality check (gf.cache_info()["hits"] == 1) to make the expectation precise; keep references to GraphForge, test_parameterised_query_cached_once, gf.execute and gf.cache_info unchanged.
1-6: ⚡ Quick winConsider adding test for UNION query cache bypass.
The PR objectives specify that UNION queries bypass the cache, but there's no explicit test validating this behavior.
💡 Suggested test
class TestCacheUnionBypass: """UNION queries should not be cached.""" def test_union_query_not_cached(self): gf = GraphForge(cache_size=8) gf.execute("CREATE (:A {v: 1})") gf.execute("CREATE (:B {v: 2})") union_q = "MATCH (a:A) RETURN a.v AS v UNION MATCH (b:B) RETURN b.v AS v" gf.execute(union_q) gf.execute(union_q) # UNION queries bypass cache, so both calls should be misses # (only the CREATEs + 2 UNION calls = 4 misses, 0 hits) info = gf.cache_info() assert info["hits"] == 0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/api/test_cache.py` around lines 1 - 6, Add a unit test that verifies UNION queries bypass the plan cache: create a new test class TestCacheUnionBypass with a method test_union_query_not_cached that instantiates GraphForge(cache_size=8), runs two CREATE statements to seed :A and :B nodes, executes the UNION query string "MATCH (a:A) RETURN a.v AS v UNION MATCH (b:B) RETURN b.v AS v" twice, then calls gf.cache_info() and asserts that cache_info()["hits"] == 0 (optionally assert misses increased accordingly); reference GraphForge, gf.execute, and gf.cache_info in the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unit/api/test_cache.py`:
- Around line 163-189: The benchmark test in
TestCacheBenchmark.test_cache_reduces_overhead performs many query executions
and must not live in fast unit tests — either move the whole class/file to a
benchmarks/ directory or mark the test with `@pytest.mark.slow` (import pytest) so
CI can skip it by default; also add the slow marker configuration to
pytest.ini/pyproject.toml (markers = slow: marks tests as slow). Additionally
suppress the static-analysis false positive on the gf.execute(...) call (inside
test_cache_reduces_overhead) by adding an inline suppression comment appropriate
for your linter (e.g., "# nosec" or the linter-specific ignore) next to the
gf.execute invocation to indicate it's safe Cypher test data.
---
Nitpick comments:
In `@src/graphforge/api.py`:
- Around line 196-204: Move the inline import out of the __init__ and add it to
the module-level imports: remove the "import threading" statement inside the
GraphForgeClient (or relevant class) __init__ and add "import threading" at the
top of src/graphforge/api.py alongside the other imports, leaving the existing
attributes (_cache_lock, _cache_size, _cache, _cache_order, _cache_hits,
_cache_misses) and their initialization unchanged so that _cache_lock continues
to be initialized with threading.Lock() in __init__.
In `@tests/unit/api/test_cache.py`:
- Around line 75-86: Update the test_parameterised_query_cached_once to assert
the exact expected cache hit count: after creating two nodes and executing the
same parameterised MATCH twice (via GraphForge instance gf and method execute),
the cache should record exactly one hit, so replace the loosened assertion
(gf.cache_info()["hits"] >= 1) with a strict equality check
(gf.cache_info()["hits"] == 1) to make the expectation precise; keep references
to GraphForge, test_parameterised_query_cached_once, gf.execute and
gf.cache_info unchanged.
- Around line 1-6: Add a unit test that verifies UNION queries bypass the plan
cache: create a new test class TestCacheUnionBypass with a method
test_union_query_not_cached that instantiates GraphForge(cache_size=8), runs two
CREATE statements to seed :A and :B nodes, executes the UNION query string
"MATCH (a:A) RETURN a.v AS v UNION MATCH (b:B) RETURN b.v AS v" twice, then
calls gf.cache_info() and asserts that cache_info()["hits"] == 0 (optionally
assert misses increased accordingly); reference GraphForge, gf.execute, and
gf.cache_info in the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 827758fe-952b-4032-be75-d13ad431b53f
📒 Files selected for processing (2)
src/graphforge/api.pytests/unit/api/test_cache.py
| class TestCacheBenchmark: | ||
| """>50% wall-clock reduction for 1 000 repeated reads.""" | ||
|
|
||
| def test_cache_reduces_overhead(self): | ||
| gf_uncached = GraphForge(cache_size=0) | ||
| gf_cached = GraphForge(cache_size=128) | ||
| for gf in (gf_uncached, gf_cached): | ||
| for i in range(20): | ||
| gf.execute(f"CREATE (:Tool {{name: 't{i}', cat: 'c{i % 3}'}})") | ||
|
|
||
| q = "MATCH (t:Tool) WHERE t.cat = $cat RETURN t.name AS name" | ||
| n = 500 | ||
|
|
||
| t0 = time.perf_counter() | ||
| for i in range(n): | ||
| gf_uncached.execute(q, parameters={"cat": f"c{i % 3}"}) | ||
| uncached_s = time.perf_counter() - t0 | ||
|
|
||
| t0 = time.perf_counter() | ||
| for i in range(n): | ||
| gf_cached.execute(q, parameters={"cat": f"c{i % 3}"}) | ||
| cached_s = time.perf_counter() - t0 | ||
|
|
||
| assert cached_s < uncached_s * 0.75, ( | ||
| f"Cache should be at least 25% faster: cached={cached_s:.3f}s " | ||
| f"uncached={uncached_s:.3f}s" | ||
| ) |
There was a problem hiding this comment.
Move benchmark test out of unit tests or mark as slow.
This test runs 1000 query executions and exceeds the < 1ms guideline for unit tests. Benchmark tests should either live in a separate benchmarks/ directory or be marked with @pytest.mark.slow to exclude from fast CI runs.
The static analysis SQL injection warning on line 171 is a false positive — gf.execute() runs Cypher queries (not SQL), and the input is controlled test data.
💡 Suggested fix
+import pytest
+
+
class TestCacheBenchmark:
""">50% wall-clock reduction for 1 000 repeated reads."""
+ `@pytest.mark.slow`
def test_cache_reduces_overhead(self):Then configure pytest to skip slow tests by default:
# pytest.ini or pyproject.toml
markers = slow: marks tests as slow (deselect with '-m "not slow"')As per coding guidelines: tests/unit/**/*.py: "each test isolated and < 1ms execution time".
🧰 Tools
🪛 OpenGrep (1.20.0)
[ERROR] 171-171: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.
(coderabbit.sql-injection.python-fstring-execute)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/api/test_cache.py` around lines 163 - 189, The benchmark test in
TestCacheBenchmark.test_cache_reduces_overhead performs many query executions
and must not live in fast unit tests — either move the whole class/file to a
benchmarks/ directory or mark the test with `@pytest.mark.slow` (import pytest) so
CI can skip it by default; also add the slow marker configuration to
pytest.ini/pyproject.toml (markers = slow: marks tests as slow). Additionally
suppress the static-analysis false positive on the gf.execute(...) call (inside
test_cache_reduces_overhead) by adding an inline suppression comment appropriate
for your linter (e.g., "# nosec" or the linter-specific ignore) next to the
gf.execute invocation to indicate it's safe Cypher test data.
- Document the ~0.2ms constant parse+plan overhead the cache eliminates - State where it helps (graphs <500 nodes, agent tool-registry workloads) and where it does not (graphs >1000 nodes, executor cost dominates) - Add correctness note: cached plans use stale optimizer statistics after large writes; results are always correct but plan order may be suboptimal - Add note to cache_info() that all counters are 0 when cache_size=0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/graphforge/api.py (1)
388-404:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUNION and multi-statement queries are not truly bypassing cache.
Current flow does cache lookup before parsing, so UNION/scripts still incur lock/lookup and increment misses, which violates the stated bypass behavior and skews
cache_info().Suggested flow change
- # Cache lookup (single-statement, non-script queries only) - cache_key = self._normalise_query(query) - cached = self._cache_get(cache_key) - if cached is not None: - return self.executor.execute(cached, parameters=parameters) - # Parse query (may return a single CypherQuery or a list for multi-statement scripts) ast = self.parser.parse(query) + from graphforge.ast.query import UnionQuery # Multi-statement script: execute each query sequentially, return last result if isinstance(ast, list): results: list[dict] = [] for single_ast in ast: results = self._execute_single_ast(single_ast, parameters=parameters) return results + # UNION queries bypass cache + if isinstance(ast, UnionQuery): + return self._execute_single_ast(ast, parameters=parameters) + + # Cache lookup for single non-UNION statement + cache_key = self._normalise_query(query) + cached = self._cache_get(cache_key) + if cached is not None: + return self.executor.execute(cached, parameters=parameters) + return self._execute_single_ast(ast, cache_key=cache_key, parameters=parameters)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graphforge/api.py` around lines 388 - 404, Move the cache lookup to after parsing so scripts and UNIONs truly bypass cache: call self.parser.parse(query) first (using parser.parse and ast variable), then if ast is a list (multi-statement) execute each via _execute_single_ast and return without calling _normalise_query or _cache_get; for single-statement ASTs only, perform cache logic—compute cache_key via _normalise_query and call _cache_get—and skip that cache step if the query is a UNION (e.g., detect via ast metadata if available or as a fallback use 'UNION' in query.upper()), then proceed to either executor.execute(cached, ...) or _execute_single_ast(...) as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/graphforge/api.py`:
- Around line 213-219: The constructor currently assigns self._cache_size
directly which allows invalid values (e.g., negative or non-int) and can break
LRU eviction logic that uses self._cache_order.pop(0); validate the incoming
cache_size in the __init__ (or the class constructor where self._cache_size is
set) to ensure it's an int and >= 0, raising TypeError for non-ints or
ValueError for negative values (or coerce to 0 if you prefer), and add a short
docstring/inline comment documenting the invariant so subsequent code using
self._cache, self._cache_order, and self._cache_lock can assume a non-negative
integer capacity.
- Around line 296-299: The current _normalise_query function collapses all
whitespace including inside string literals, which can alias distinct queries
(e.g., 'A B' vs 'A B'); update _normalise_query to only collapse runs of
whitespace that occur outside quoted literals (track single/double-quote state
and escape sequences or use a query-aware parser/tokenizer if available) so
string literal contents are preserved exactly while normalising external
whitespace for cache keys; locate and replace the implementation in the
_normalise_query function to perform quote-aware whitespace collapsing.
---
Outside diff comments:
In `@src/graphforge/api.py`:
- Around line 388-404: Move the cache lookup to after parsing so scripts and
UNIONs truly bypass cache: call self.parser.parse(query) first (using
parser.parse and ast variable), then if ast is a list (multi-statement) execute
each via _execute_single_ast and return without calling _normalise_query or
_cache_get; for single-statement ASTs only, perform cache logic—compute
cache_key via _normalise_query and call _cache_get—and skip that cache step if
the query is a UNION (e.g., detect via ast metadata if available or as a
fallback use 'UNION' in query.upper()), then proceed to either
executor.execute(cached, ...) or _execute_single_ast(...) as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 48373c0b-050e-4669-b022-cc15b478ae67
📒 Files selected for processing (1)
src/graphforge/api.py
| def _normalise_query(query: str) -> str: | ||
| """Collapse all whitespace runs to a single space for cache keying.""" | ||
| return " ".join(query.split()) | ||
|
|
There was a problem hiding this comment.
Cache key normalization can alias semantically different queries.
" ".join(query.split()) collapses whitespace inside string literals too, so queries like 'A B' and 'A B' map to the same cache key and can reuse the wrong planned constants (incorrect results).
Suggested safe mitigation
`@staticmethod`
def _normalise_query(query: str) -> str:
- """Collapse all whitespace runs to a single space for cache keying."""
- return " ".join(query.split())
+ """Conservative normalization that preserves literal semantics."""
+ return query.strip()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/graphforge/api.py` around lines 296 - 299, The current _normalise_query
function collapses all whitespace including inside string literals, which can
alias distinct queries (e.g., 'A B' vs 'A B'); update _normalise_query to only
collapse runs of whitespace that occur outside quoted literals (track
single/double-quote state and escape sequences or use a query-aware
parser/tokenizer if available) so string literal contents are preserved exactly
while normalising external whitespace for cache keys; locate and replace the
implementation in the _normalise_query function to perform quote-aware
whitespace collapsing.
- Raise TypeError for non-int values (including float, bool, str) - Raise ValueError for negative integers - Bool is explicitly rejected despite being int subclass — True/False as a cache size is semantically wrong - Add 6 validation tests covering negative, float, str, bool, zero, positive Skipped: quote-aware whitespace normalisation (theoretical aliasing with internal string literal whitespace, never occurs in real Cypher; complexity not justified); parse-before-cache reorder (would force parse on every cache hit, defeating the cache's purpose). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/graphforge/api.py (2)
187-217:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate
cache_sizebefore any backend initialization side effects.
cache_sizeis validated at Line 213, but backend setup can already run at Lines 187-202. With invalidcache_size, construction fails after opening/loading persistence, which is avoidable side-effect work.💡 Suggested fix
- # Initialize storage backend - self.backend: SQLiteBackend | None - if path: + if not isinstance(cache_size, int) or isinstance(cache_size, bool): + raise TypeError(f"cache_size must be an int, got {type(cache_size).__name__}") + if cache_size < 0: + raise ValueError(f"cache_size must be >= 0, got {cache_size}") + self._cache_size = cache_size # invariant: non-negative int + + # Initialize storage backend + self.backend: SQLiteBackend | None + if path: # Use SQLite for persistence self.backend = SQLiteBackend(Path(path)) self.graph = self._load_graph_from_backend() @@ - if not isinstance(cache_size, int) or isinstance(cache_size, bool): - raise TypeError(f"cache_size must be an int, got {type(cache_size).__name__}") - if cache_size < 0: - raise ValueError(f"cache_size must be >= 0, got {cache_size}") - self._cache_size = cache_size # invariant: non-negative int🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graphforge/api.py` around lines 187 - 217, The cache_size parameter is validated after backend initialization has already started, which means side effects like opening and loading the SQLite database occur before the validation check. This is inefficient because if cache_size is invalid, the constructor fails after unnecessary work. Move the cache_size validation logic (the isinstance and value range checks that raise TypeError and ValueError) to the beginning of the initialization block, before any backend setup code in the conditional that checks the path parameter, so invalid cache_size is caught immediately without triggering database operations.
392-408:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUNION and multi-statement queries are not truly bypassing cache accounting.
At Line 394, every query does
_cache_getbefore parse. UNION/script queries are then not inserted (Lines 446-448 / script branch), so they accumulate misses and lock traffic instead of bypassing cache behavior.💡 Suggested fix
def execute(self, query: str, parameters: dict[str, Any] | None = None) -> list[dict]: @@ - # Cache lookup (single-statement, non-script queries only) - cache_key = self._normalise_query(query) - cached = self._cache_get(cache_key) - if cached is not None: - return self.executor.execute(cached, parameters=parameters) + # Fast bypass for known non-cacheable forms + likely_script = ";" in query + likely_union = "UNION" in query.upper() + cache_key: str | None = None + if not (likely_script or likely_union): + cache_key = self._normalise_query(query) + cached = self._cache_get(cache_key) + if cached is not None: + return self.executor.execute(cached, parameters=parameters) @@ - return self._execute_single_ast(ast, cache_key=cache_key, parameters=parameters) + return self._execute_single_ast(ast, cache_key=cache_key, parameters=parameters)Also applies to: 446-448
🧹 Nitpick comments (2)
tests/unit/api/test_cache.py (2)
14-37: ⚡ Quick winParametrize cache_size validation cases to reduce duplicated test logic.
These methods validate the same behavior pattern with different inputs and can be collapsed into
@pytest.mark.parametrizefor maintainability.As per coding guidelines, "Use pytest parametrization (
@pytest.mark.parametrize) when testing the same logic with different inputs to avoid code duplication".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/api/test_cache.py` around lines 14 - 37, Replace the duplicated cache_size tests with a single parametrized test using pytest.mark.parametrize to cover invalid and valid inputs; consolidate the negative/TypeError/ValueError cases into a parametrize that passes (input, expected_exception, expected_match) for GraphForge(cache_size=...) and keep a separate parametrize for accepted values that asserts gf._cache_size equals the provided value; target the existing test functions (test_negative_cache_size_raises, test_non_int_cache_size_raises, test_float_cache_size_raises, test_bool_cache_size_raises, test_zero_cache_size_accepted, test_positive_cache_size_accepted) and the GraphForge constructor to avoid changing behavior.
64-220: ⚡ Quick winAdd explicit tests for UNION and multi-statement cache bypass behavior.
Given the new cache control flow, add assertions that these queries do not affect cache entries/hit-miss accounting (true bypass semantics), so regressions are caught.
As per coding guidelines, "Aim for 100% code coverage on new code, with 90% minimum coverage on changed files (patch coverage) and 85% minimum total project coverage".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/api/test_cache.py` around lines 64 - 220, Tests are missing explicit assertions that UNION queries and multi-statement queries bypass the cache without changing cache entries or hit/miss counts; add new unit tests that call GraphForge.execute with a UNION query and with multiple statements in one execute call and then assert via gf.cache_info() that "hits", "misses" and "size" have not changed (or remain consistent with prior state), and that repeated bypassing does not increment hits/misses; locate uses of GraphForge.execute, gf.cache_info(), and clear_cache() in the existing test classes (e.g., TestCacheHitsAndMisses and TestCacheClear) and add assertions around those symbols to validate true bypass semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/graphforge/api.py`:
- Around line 187-217: The cache_size parameter is validated after backend
initialization has already started, which means side effects like opening and
loading the SQLite database occur before the validation check. This is
inefficient because if cache_size is invalid, the constructor fails after
unnecessary work. Move the cache_size validation logic (the isinstance and value
range checks that raise TypeError and ValueError) to the beginning of the
initialization block, before any backend setup code in the conditional that
checks the path parameter, so invalid cache_size is caught immediately without
triggering database operations.
---
Nitpick comments:
In `@tests/unit/api/test_cache.py`:
- Around line 14-37: Replace the duplicated cache_size tests with a single
parametrized test using pytest.mark.parametrize to cover invalid and valid
inputs; consolidate the negative/TypeError/ValueError cases into a parametrize
that passes (input, expected_exception, expected_match) for
GraphForge(cache_size=...) and keep a separate parametrize for accepted values
that asserts gf._cache_size equals the provided value; target the existing test
functions (test_negative_cache_size_raises, test_non_int_cache_size_raises,
test_float_cache_size_raises, test_bool_cache_size_raises,
test_zero_cache_size_accepted, test_positive_cache_size_accepted) and the
GraphForge constructor to avoid changing behavior.
- Around line 64-220: Tests are missing explicit assertions that UNION queries
and multi-statement queries bypass the cache without changing cache entries or
hit/miss counts; add new unit tests that call GraphForge.execute with a UNION
query and with multiple statements in one execute call and then assert via
gf.cache_info() that "hits", "misses" and "size" have not changed (or remain
consistent with prior state), and that repeated bypassing does not increment
hits/misses; locate uses of GraphForge.execute, gf.cache_info(), and
clear_cache() in the existing test classes (e.g., TestCacheHitsAndMisses and
TestCacheClear) and add assertions around those symbols to validate true bypass
semantics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a9349f3c-9050-4bab-93b0-9b4cc83087ab
📒 Files selected for processing (2)
src/graphforge/api.pytests/unit/api/test_cache.py
Closes #464
Summary
GraphForge(cache_size=N)adds an N-entry LRU plan cache keyed on the normalised query stringcache_size=0(default) is a strict no-op — zero overhead, zero behaviour changeclear_cache()/cache_info()for manual control and introspectionthreading.Lockon all cache mutationsDesign notes
" ".join(query.split())) ensures trivial formatting differences don't defeat the cacheparametersvary between callsTest plan
cache_size=0identical behaviour to uncached (no regression)clear_cache()resets size, hits, misses to 0make pre-pushgreen (87.21% total coverage)🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.Summary by CodeRabbit
New Features
Documentation
Tests