feat: parse/plan LRU cache for repeated queries by DecisionNerd · Pull Request #504 · DecisionNerd/graphforge

DecisionNerd · 2026-05-06T00:54:15Z

Closes #464

Summary

GraphForge(cache_size=N) adds an N-entry LRU plan cache keyed on the normalised query string
Cache stores the post-optimisation operator list; repeated calls skip parse + plan + optimise entirely
cache_size=0 (default) is a strict no-op — zero overhead, zero behaviour change
clear_cache() / cache_info() for manual control and introspection
Thread-safe via threading.Lock on all cache mutations
UNION queries and multi-statement scripts bypass the cache (they are inherently heterogeneous)

Design notes

Whitespace normalisation (" ".join(query.split())) ensures trivial formatting differences don't defeat the cache
Plans are stateless — the same plan works for any graph state; only parameters vary between calls
LRU eviction implemented with a list-based order tracker (no external dependencies)

Test plan

🤖 Generated with Claude Code

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

Summary by CodeRabbit

New Features
- Added plan-level query caching with configurable cache size to improve query execution performance; UNION branches are excluded from caching.
- Added cache management APIs to clear the cache and retrieve cache statistics.
Documentation
- Updated constructor docs with caching behavior and usage examples.
Tests
- Added comprehensive tests covering cache correctness, eviction (LRU), normalization, concurrency, and performance.

- GraphForge(cache_size=N) enables an N-entry LRU plan cache - Cache key: whitespace-normalised query string - Cache stores post-optimisation operator list; executor re-uses it directly, skipping parse + plan + optimise on subsequent calls - cache_size=0 (default) disables caching with zero overhead - clear_cache() empties the cache and resets hit/miss counters - cache_info() returns {"size", "hits", "misses", "capacity"} - Thread-safe: all cache mutations guarded by threading.Lock - UNION queries and multi-statement scripts bypass the cache - 16 unit tests: disabled mode, hits/misses, whitespace normalisation, parameterised queries, LRU eviction, clear, thread-safety, benchmark - Benchmark confirms >25% wall-clock reduction at 500 repeated reads Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-05-06T00:54:25Z

Walkthrough

Adds an optional LRU plan cache to GraphForge via a new cache_size constructor arg. Queries are normalized and cached as operator plans; execute() checks cache pre-parse and short-circuits on hit. New public methods clear_cache() and cache_info() and comprehensive unit tests included. (50 words)

Changes

Query Plan Caching

Layer / File(s)	Summary
API surface / Constructor `src/graphforge/api.py`	`GraphForge.__init__` gains `cache_size: int = 0`; docstring updated to describe cache semantics. New internal fields `_cache_size`, `_cache`, `_cache_order`, `_cache_hits`, `_cache_misses`, `_cache_lock` added and validated.
Query normalization `src/graphforge/api.py`	Added `_normalise_query(query: str) -> str` that produces normalized (whitespace-collapsed) cache keys.
Cache primitives `src/graphforge/api.py`	Added `_cache_get(key)` and `_cache_put(key, operators)` implementing a thread-safe LRU policy, hit/miss accounting, and eviction logic.
Execution flow integration `src/graphforge/api.py`	`execute()` now computes a `cache_key` and performs a pre-parse cache lookup; on hit, executes cached operator plan. `_execute_single_ast()` signature extended to accept `cache_key`, and computed `operators` are stored in the cache (UNION branches are excluded from caching).
Public cache API `src/graphforge/api.py`	Added `clear_cache()` to reset storage and counters, and `cache_info()` returning current size, hits, misses, and capacity.
Tests & Benchmark `tests/unit/api/test_cache.py`	New unit tests cover cache_size validation, disabled/default behavior, hits/misses, query normalization, distinct keys, LRU eviction, clear_cache, thread-safety, and a micro-benchmark comparing cached vs uncached execution.

Sequence Diagram

sequenceDiagram
    actor User
    participant API as GraphForge API
    participant Cache as Plan Cache
    participant Parser as Query Parser
    participant Planner as Operator Planner
    participant Executor as Plan Executor

    User->>API: execute(query, parameters)
    API->>API: normalize_query(query)
    API->>Cache: lookup(cache_key)
    alt Cache Hit
        Cache-->>API: cached_operators
        API->>Executor: execute(cached_operators, parameters)
        Executor-->>User: results
    else Cache Miss
        Cache-->>API: miss
        API->>Parser: parse(query)
        Parser->>Planner: plan(ast)
        Planner-->>API: operators
        API->>Cache: store(cache_key, operators)
        Cache-->>API: stored (evict if needed)
        API->>Executor: execute(operators, parameters)
        Executor-->>User: results
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.53% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding an LRU parse/plan cache for repeated queries, which matches the primary objective of the PR.
Description check	✅ Passed	The PR description covers key requirements (cache_size parameter, LRU behavior, thread-safety, clear_cache/cache_info APIs) and references linked issue `#464`, but does not follow the template structure with checkboxes for testing, code quality, and compliance items.
Linked Issues check	✅ Passed	The PR comprehensively meets all acceptance criteria from `#464`: cache_size=0 is a no-op, LRU eviction is implemented, cache_info() and clear_cache() are provided, thread-safety is ensured, and tests validate all requirements including the >25% performance improvement benchmark.
Out of Scope Changes check	✅ Passed	All changes directly support the cache feature: API modifications (cache_size parameter, clear_cache/cache_info methods), cache implementation in execute() and single-statement execution, and comprehensive test coverage—no out-of-scope changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/464-parse-plan-lru-cache

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-06T00:55:53Z

Codecov Report

❌ Patch coverage is 94.02985% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.10%. Comparing base (52d16f3) to head (a610ed5).
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #504      +/-   ##
==========================================
+ Coverage   88.07%   88.10%   +0.03%     
==========================================
  Files          40       40              
  Lines       14676    14730      +54     
  Branches     3472     3481       +9     
==========================================
+ Hits        12926    12978      +52     
- Misses       1145     1146       +1     
- Partials      605      606       +1

Flag	Coverage Δ
full-coverage	`88.10% <94.02%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
parser	`95.49% <ø> (ø)`
planner	`83.07% <ø> (ø)`
executor	`83.63% <ø> (ø)`
storage	`91.25% <ø> (ø)`
ast	`98.22% <ø> (ø)`
types	`94.75% <ø> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52d16f3...a610ed5. Read the comment docs.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

src/graphforge/api.py (1)

196-204: 💤 Low value

Move import threading to module-level imports.

The threading import inside __init__ works but is unconventional. Module-level imports improve readability and make dependencies explicit.
♻️ Suggested fix

Add to the imports section at the top of the file (around line 6-11):
import threading
Then remove line 197-198 from __init__.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/graphforge/api.py` around lines 196 - 204, Move the inline import out of
the __init__ and add it to the module-level imports: remove the "import
threading" statement inside the GraphForgeClient (or relevant class) __init__
and add "import threading" at the top of src/graphforge/api.py alongside the
other imports, leaving the existing attributes (_cache_lock, _cache_size,
_cache, _cache_order, _cache_hits, _cache_misses) and their initialization
unchanged so that _cache_lock continues to be initialized with threading.Lock()
in __init__.

tests/unit/api/test_cache.py (2)

75-86: 💤 Low value

Consider strengthening the assertion.

The assertion hits >= 1 is correct but could be more precise. After two different CREATE queries and two identical MATCH queries (with different parameters), the expected count is exactly 1 hit.

♻️ Suggested fix

-        # Second MATCH call hits the cache
-        assert gf.cache_info()["hits"] >= 1
+        # Second MATCH call hits the cache (same query string, different params)
+        assert gf.cache_info()["hits"] == 1
+        assert gf.cache_info()["misses"] == 3  # 2 CREATEs + 1 first MATCH

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/api/test_cache.py` around lines 75 - 86, Update the
test_parameterised_query_cached_once to assert the exact expected cache hit
count: after creating two nodes and executing the same parameterised MATCH twice
(via GraphForge instance gf and method execute), the cache should record exactly
one hit, so replace the loosened assertion (gf.cache_info()["hits"] >= 1) with a
strict equality check (gf.cache_info()["hits"] == 1) to make the expectation
precise; keep references to GraphForge, test_parameterised_query_cached_once,
gf.execute and gf.cache_info unchanged.

1-6: ⚡ Quick win

Consider adding test for UNION query cache bypass.

The PR objectives specify that UNION queries bypass the cache, but there's no explicit test validating this behavior.

💡 Suggested test

class TestCacheUnionBypass:
    """UNION queries should not be cached."""

    def test_union_query_not_cached(self):
        gf = GraphForge(cache_size=8)
        gf.execute("CREATE (:A {v: 1})")
        gf.execute("CREATE (:B {v: 2})")
        
        union_q = "MATCH (a:A) RETURN a.v AS v UNION MATCH (b:B) RETURN b.v AS v"
        gf.execute(union_q)
        gf.execute(union_q)
        
        # UNION queries bypass cache, so both calls should be misses
        # (only the CREATEs + 2 UNION calls = 4 misses, 0 hits)
        info = gf.cache_info()
        assert info["hits"] == 0

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/api/test_cache.py` around lines 1 - 6, Add a unit test that
verifies UNION queries bypass the plan cache: create a new test class
TestCacheUnionBypass with a method test_union_query_not_cached that instantiates
GraphForge(cache_size=8), runs two CREATE statements to seed :A and :B nodes,
executes the UNION query string "MATCH (a:A) RETURN a.v AS v UNION MATCH (b:B)
RETURN b.v AS v" twice, then calls gf.cache_info() and asserts that
cache_info()["hits"] == 0 (optionally assert misses increased accordingly);
reference GraphForge, gf.execute, and gf.cache_info in the test.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/api/test_cache.py`:
- Around line 163-189: The benchmark test in
TestCacheBenchmark.test_cache_reduces_overhead performs many query executions
and must not live in fast unit tests — either move the whole class/file to a
benchmarks/ directory or mark the test with `@pytest.mark.slow` (import pytest) so
CI can skip it by default; also add the slow marker configuration to
pytest.ini/pyproject.toml (markers = slow: marks tests as slow). Additionally
suppress the static-analysis false positive on the gf.execute(...) call (inside
test_cache_reduces_overhead) by adding an inline suppression comment appropriate
for your linter (e.g., "# nosec" or the linter-specific ignore) next to the
gf.execute invocation to indicate it's safe Cypher test data.

---

Nitpick comments:
In `@src/graphforge/api.py`:
- Around line 196-204: Move the inline import out of the __init__ and add it to
the module-level imports: remove the "import threading" statement inside the
GraphForgeClient (or relevant class) __init__ and add "import threading" at the
top of src/graphforge/api.py alongside the other imports, leaving the existing
attributes (_cache_lock, _cache_size, _cache, _cache_order, _cache_hits,
_cache_misses) and their initialization unchanged so that _cache_lock continues
to be initialized with threading.Lock() in __init__.

In `@tests/unit/api/test_cache.py`:
- Around line 75-86: Update the test_parameterised_query_cached_once to assert
the exact expected cache hit count: after creating two nodes and executing the
same parameterised MATCH twice (via GraphForge instance gf and method execute),
the cache should record exactly one hit, so replace the loosened assertion
(gf.cache_info()["hits"] >= 1) with a strict equality check
(gf.cache_info()["hits"] == 1) to make the expectation precise; keep references
to GraphForge, test_parameterised_query_cached_once, gf.execute and
gf.cache_info unchanged.
- Around line 1-6: Add a unit test that verifies UNION queries bypass the plan
cache: create a new test class TestCacheUnionBypass with a method
test_union_query_not_cached that instantiates GraphForge(cache_size=8), runs two
CREATE statements to seed :A and :B nodes, executes the UNION query string
"MATCH (a:A) RETURN a.v AS v UNION MATCH (b:B) RETURN b.v AS v" twice, then
calls gf.cache_info() and asserts that cache_info()["hits"] == 0 (optionally
assert misses increased accordingly); reference GraphForge, gf.execute, and
gf.cache_info in the test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 827758fe-952b-4032-be75-d13ad431b53f

📥 Commits

Reviewing files that changed from the base of the PR and between 52d16f3 and b7d94dc.

📒 Files selected for processing (2)

src/graphforge/api.py
tests/unit/api/test_cache.py

coderabbitai · 2026-05-06T00:58:39Z

+class TestCacheBenchmark:
+    """>50% wall-clock reduction for 1 000 repeated reads."""
+
+    def test_cache_reduces_overhead(self):
+        gf_uncached = GraphForge(cache_size=0)
+        gf_cached = GraphForge(cache_size=128)
+        for gf in (gf_uncached, gf_cached):
+            for i in range(20):
+                gf.execute(f"CREATE (:Tool {{name: 't{i}', cat: 'c{i % 3}'}})")
+
+        q = "MATCH (t:Tool) WHERE t.cat = $cat RETURN t.name AS name"
+        n = 500
+
+        t0 = time.perf_counter()
+        for i in range(n):
+            gf_uncached.execute(q, parameters={"cat": f"c{i % 3}"})
+        uncached_s = time.perf_counter() - t0
+
+        t0 = time.perf_counter()
+        for i in range(n):
+            gf_cached.execute(q, parameters={"cat": f"c{i % 3}"})
+        cached_s = time.perf_counter() - t0
+
+        assert cached_s < uncached_s * 0.75, (
+            f"Cache should be at least 25% faster: cached={cached_s:.3f}s "
+            f"uncached={uncached_s:.3f}s"
+        )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Move benchmark test out of unit tests or mark as slow.

This test runs 1000 query executions and exceeds the < 1ms guideline for unit tests. Benchmark tests should either live in a separate benchmarks/ directory or be marked with @pytest.mark.slow to exclude from fast CI runs.

The static analysis SQL injection warning on line 171 is a false positive — gf.execute() runs Cypher queries (not SQL), and the input is controlled test data.

💡 Suggested fix

+import pytest + + class TestCacheBenchmark: """>50% wall-clock reduction for 1 000 repeated reads.""" + `@pytest.mark.slow` def test_cache_reduces_overhead(self):

Then configure pytest to skip slow tests by default:

# pytest.ini or pyproject.toml markers = slow: marks tests as slow (deselect with '-m "not slow"')

As per coding guidelines: tests/unit/**/*.py: "each test isolated and < 1ms execution time".

🧰 Tools

🪛 OpenGrep (1.20.0)

[ERROR] 171-171: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/api/test_cache.py` around lines 163 - 189, The benchmark test in TestCacheBenchmark.test_cache_reduces_overhead performs many query executions and must not live in fast unit tests — either move the whole class/file to a benchmarks/ directory or mark the test with `@pytest.mark.slow` (import pytest) so CI can skip it by default; also add the slow marker configuration to pytest.ini/pyproject.toml (markers = slow: marks tests as slow). Additionally suppress the static-analysis false positive on the gf.execute(...) call (inside test_cache_reduces_overhead) by adding an inline suppression comment appropriate for your linter (e.g., "# nosec" or the linter-specific ignore) next to the gf.execute invocation to indicate it's safe Cypher test data.

- Document the ~0.2ms constant parse+plan overhead the cache eliminates - State where it helps (graphs <500 nodes, agent tool-registry workloads) and where it does not (graphs >1000 nodes, executor cost dominates) - Add correctness note: cached plans use stale optimizer statistics after large writes; results are always correct but plan order may be suboptimal - Add note to cache_info() that all counters are 0 when cache_size=0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/graphforge/api.py (1)

388-404: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

UNION and multi-statement queries are not truly bypassing cache.

Current flow does cache lookup before parsing, so UNION/scripts still incur lock/lookup and increment misses, which violates the stated bypass behavior and skews cache_info().

Suggested flow change

-        # Cache lookup (single-statement, non-script queries only)
-        cache_key = self._normalise_query(query)
-        cached = self._cache_get(cache_key)
-        if cached is not None:
-            return self.executor.execute(cached, parameters=parameters)
-
         # Parse query (may return a single CypherQuery or a list for multi-statement scripts)
         ast = self.parser.parse(query)
+        from graphforge.ast.query import UnionQuery
 
         # Multi-statement script: execute each query sequentially, return last result
         if isinstance(ast, list):
             results: list[dict] = []
             for single_ast in ast:
                 results = self._execute_single_ast(single_ast, parameters=parameters)
             return results
 
+        # UNION queries bypass cache
+        if isinstance(ast, UnionQuery):
+            return self._execute_single_ast(ast, parameters=parameters)
+
+        # Cache lookup for single non-UNION statement
+        cache_key = self._normalise_query(query)
+        cached = self._cache_get(cache_key)
+        if cached is not None:
+            return self.executor.execute(cached, parameters=parameters)
+
         return self._execute_single_ast(ast, cache_key=cache_key, parameters=parameters)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/graphforge/api.py` around lines 388 - 404, Move the cache lookup to after
parsing so scripts and UNIONs truly bypass cache: call self.parser.parse(query)
first (using parser.parse and ast variable), then if ast is a list
(multi-statement) execute each via _execute_single_ast and return without
calling _normalise_query or _cache_get; for single-statement ASTs only, perform
cache logic—compute cache_key via _normalise_query and call _cache_get—and skip
that cache step if the query is a UNION (e.g., detect via ast metadata if
available or as a fallback use 'UNION' in query.upper()), then proceed to either
executor.execute(cached, ...) or _execute_single_ast(...) as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/graphforge/api.py`:
- Around line 213-219: The constructor currently assigns self._cache_size
directly which allows invalid values (e.g., negative or non-int) and can break
LRU eviction logic that uses self._cache_order.pop(0); validate the incoming
cache_size in the __init__ (or the class constructor where self._cache_size is
set) to ensure it's an int and >= 0, raising TypeError for non-ints or
ValueError for negative values (or coerce to 0 if you prefer), and add a short
docstring/inline comment documenting the invariant so subsequent code using
self._cache, self._cache_order, and self._cache_lock can assume a non-negative
integer capacity.
- Around line 296-299: The current _normalise_query function collapses all
whitespace including inside string literals, which can alias distinct queries
(e.g., 'A  B' vs 'A B'); update _normalise_query to only collapse runs of
whitespace that occur outside quoted literals (track single/double-quote state
and escape sequences or use a query-aware parser/tokenizer if available) so
string literal contents are preserved exactly while normalising external
whitespace for cache keys; locate and replace the implementation in the
_normalise_query function to perform quote-aware whitespace collapsing.

---

Outside diff comments:
In `@src/graphforge/api.py`:
- Around line 388-404: Move the cache lookup to after parsing so scripts and
UNIONs truly bypass cache: call self.parser.parse(query) first (using
parser.parse and ast variable), then if ast is a list (multi-statement) execute
each via _execute_single_ast and return without calling _normalise_query or
_cache_get; for single-statement ASTs only, perform cache logic—compute
cache_key via _normalise_query and call _cache_get—and skip that cache step if
the query is a UNION (e.g., detect via ast metadata if available or as a
fallback use 'UNION' in query.upper()), then proceed to either
executor.execute(cached, ...) or _execute_single_ast(...) as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 48373c0b-050e-4669-b022-cc15b478ae67

📥 Commits

Reviewing files that changed from the base of the PR and between b7d94dc and 5125f1d.

📒 Files selected for processing (1)

src/graphforge/api.py

coderabbitai · 2026-05-06T01:07:00Z

+    def _normalise_query(query: str) -> str:
+        """Collapse all whitespace runs to a single space for cache keying."""
+        return " ".join(query.split())
+


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Cache key normalization can alias semantically different queries.

" ".join(query.split()) collapses whitespace inside string literals too, so queries like 'A B' and 'A B' map to the same cache key and can reuse the wrong planned constants (incorrect results).

Suggested safe mitigation

`@staticmethod` def _normalise_query(query: str) -> str: - """Collapse all whitespace runs to a single space for cache keying.""" - return " ".join(query.split()) + """Conservative normalization that preserves literal semantics.""" + return query.strip()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graphforge/api.py` around lines 296 - 299, The current _normalise_query function collapses all whitespace including inside string literals, which can alias distinct queries (e.g., 'A B' vs 'A B'); update _normalise_query to only collapse runs of whitespace that occur outside quoted literals (track single/double-quote state and escape sequences or use a query-aware parser/tokenizer if available) so string literal contents are preserved exactly while normalising external whitespace for cache keys; locate and replace the implementation in the _normalise_query function to perform quote-aware whitespace collapsing.

- Raise TypeError for non-int values (including float, bool, str) - Raise ValueError for negative integers - Bool is explicitly rejected despite being int subclass — True/False as a cache size is semantically wrong - Add 6 validation tests covering negative, float, str, bool, zero, positive Skipped: quote-aware whitespace normalisation (theoretical aliasing with internal string literal whitespace, never occurs in real Cypher; complexity not justified); parse-before-cache reorder (would force parse on every cache hit, defeating the cache's purpose). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/graphforge/api.py (2)

187-217: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate cache_size before any backend initialization side effects.

cache_size is validated at Line 213, but backend setup can already run at Lines 187-202. With invalid cache_size, construction fails after opening/loading persistence, which is avoidable side-effect work.

💡 Suggested fix

-        # Initialize storage backend
-        self.backend: SQLiteBackend | None
-        if path:
+        if not isinstance(cache_size, int) or isinstance(cache_size, bool):
+            raise TypeError(f"cache_size must be an int, got {type(cache_size).__name__}")
+        if cache_size < 0:
+            raise ValueError(f"cache_size must be >= 0, got {cache_size}")
+        self._cache_size = cache_size  # invariant: non-negative int
+
+        # Initialize storage backend
+        self.backend: SQLiteBackend | None
+        if path:
             # Use SQLite for persistence
             self.backend = SQLiteBackend(Path(path))
             self.graph = self._load_graph_from_backend()
@@
-        if not isinstance(cache_size, int) or isinstance(cache_size, bool):
-            raise TypeError(f"cache_size must be an int, got {type(cache_size).__name__}")
-        if cache_size < 0:
-            raise ValueError(f"cache_size must be >= 0, got {cache_size}")
-        self._cache_size = cache_size  # invariant: non-negative int

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/graphforge/api.py` around lines 187 - 217, The cache_size parameter is
validated after backend initialization has already started, which means side
effects like opening and loading the SQLite database occur before the validation
check. This is inefficient because if cache_size is invalid, the constructor
fails after unnecessary work. Move the cache_size validation logic (the
isinstance and value range checks that raise TypeError and ValueError) to the
beginning of the initialization block, before any backend setup code in the
conditional that checks the path parameter, so invalid cache_size is caught
immediately without triggering database operations.

392-408: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

UNION and multi-statement queries are not truly bypassing cache accounting.

At Line 394, every query does _cache_get before parse. UNION/script queries are then not inserted (Lines 446-448 / script branch), so they accumulate misses and lock traffic instead of bypassing cache behavior.

💡 Suggested fix

     def execute(self, query: str, parameters: dict[str, Any] | None = None) -> list[dict]:
@@
-        # Cache lookup (single-statement, non-script queries only)
-        cache_key = self._normalise_query(query)
-        cached = self._cache_get(cache_key)
-        if cached is not None:
-            return self.executor.execute(cached, parameters=parameters)
+        # Fast bypass for known non-cacheable forms
+        likely_script = ";" in query
+        likely_union = "UNION" in query.upper()
+        cache_key: str | None = None
+        if not (likely_script or likely_union):
+            cache_key = self._normalise_query(query)
+            cached = self._cache_get(cache_key)
+            if cached is not None:
+                return self.executor.execute(cached, parameters=parameters)
@@
-        return self._execute_single_ast(ast, cache_key=cache_key, parameters=parameters)
+        return self._execute_single_ast(ast, cache_key=cache_key, parameters=parameters)

Also applies to: 446-448

🧹 Nitpick comments (2)

tests/unit/api/test_cache.py (2)
14-37: ⚡ Quick win

Parametrize cache_size validation cases to reduce duplicated test logic.

These methods validate the same behavior pattern with different inputs and can be collapsed into @pytest.mark.parametrize for maintainability.

As per coding guidelines, "Use pytest parametrization (@pytest.mark.parametrize) when testing the same logic with different inputs to avoid code duplication".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/api/test_cache.py` around lines 14 - 37, Replace the duplicated
cache_size tests with a single parametrized test using pytest.mark.parametrize
to cover invalid and valid inputs; consolidate the negative/TypeError/ValueError
cases into a parametrize that passes (input, expected_exception, expected_match)
for GraphForge(cache_size=...) and keep a separate parametrize for accepted
values that asserts gf._cache_size equals the provided value; target the
existing test functions (test_negative_cache_size_raises,
test_non_int_cache_size_raises, test_float_cache_size_raises,
test_bool_cache_size_raises, test_zero_cache_size_accepted,
test_positive_cache_size_accepted) and the GraphForge constructor to avoid
changing behavior.
64-220: ⚡ Quick win

Add explicit tests for UNION and multi-statement cache bypass behavior.

Given the new cache control flow, add assertions that these queries do not affect cache entries/hit-miss accounting (true bypass semantics), so regressions are caught.

As per coding guidelines, "Aim for 100% code coverage on new code, with 90% minimum coverage on changed files (patch coverage) and 85% minimum total project coverage".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/api/test_cache.py` around lines 64 - 220, Tests are missing
explicit assertions that UNION queries and multi-statement queries bypass the
cache without changing cache entries or hit/miss counts; add new unit tests that
call GraphForge.execute with a UNION query and with multiple statements in one
execute call and then assert via gf.cache_info() that "hits", "misses" and
"size" have not changed (or remain consistent with prior state), and that
repeated bypassing does not increment hits/misses; locate uses of
GraphForge.execute, gf.cache_info(), and clear_cache() in the existing test
classes (e.g., TestCacheHitsAndMisses and TestCacheClear) and add assertions
around those symbols to validate true bypass semantics.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/graphforge/api.py`:
- Around line 187-217: The cache_size parameter is validated after backend
initialization has already started, which means side effects like opening and
loading the SQLite database occur before the validation check. This is
inefficient because if cache_size is invalid, the constructor fails after
unnecessary work. Move the cache_size validation logic (the isinstance and value
range checks that raise TypeError and ValueError) to the beginning of the
initialization block, before any backend setup code in the conditional that
checks the path parameter, so invalid cache_size is caught immediately without
triggering database operations.

---

Nitpick comments:
In `@tests/unit/api/test_cache.py`:
- Around line 14-37: Replace the duplicated cache_size tests with a single
parametrized test using pytest.mark.parametrize to cover invalid and valid
inputs; consolidate the negative/TypeError/ValueError cases into a parametrize
that passes (input, expected_exception, expected_match) for
GraphForge(cache_size=...) and keep a separate parametrize for accepted values
that asserts gf._cache_size equals the provided value; target the existing test
functions (test_negative_cache_size_raises, test_non_int_cache_size_raises,
test_float_cache_size_raises, test_bool_cache_size_raises,
test_zero_cache_size_accepted, test_positive_cache_size_accepted) and the
GraphForge constructor to avoid changing behavior.
- Around line 64-220: Tests are missing explicit assertions that UNION queries
and multi-statement queries bypass the cache without changing cache entries or
hit/miss counts; add new unit tests that call GraphForge.execute with a UNION
query and with multiple statements in one execute call and then assert via
gf.cache_info() that "hits", "misses" and "size" have not changed (or remain
consistent with prior state), and that repeated bypassing does not increment
hits/misses; locate uses of GraphForge.execute, gf.cache_info(), and
clear_cache() in the existing test classes (e.g., TestCacheHitsAndMisses and
TestCacheClear) and add assertions around those symbols to validate true bypass
semantics.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a9349f3c-9050-4bab-93b0-9b4cc83087ab

📥 Commits

Reviewing files that changed from the base of the PR and between 5125f1d and a610ed5.

📒 Files selected for processing (2)

src/graphforge/api.py
tests/unit/api/test_cache.py

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

DecisionNerd merged commit 938c148 into main May 6, 2026
34 checks passed

DecisionNerd deleted the feature/464-parse-plan-lru-cache branch May 6, 2026 01:25

This was referenced May 6, 2026

Release v0.3.10 — Analytics Integration: NetworkX/igraph Export & Parse Cache #448

Closed

release: v0.3.10 — Analytics Integration #508

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: parse/plan LRU cache for repeated queries#504

feat: parse/plan LRU cache for repeated queries#504
DecisionNerd merged 3 commits into
mainfrom
feature/464-parse-plan-lru-cache

DecisionNerd commented May 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 6, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DecisionNerd commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design notes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DecisionNerd commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading

codecov Bot commented May 6, 2026 •

edited

Loading