fix: variable reuse across WITH boundary no longer crashes by DecisionNerd · Pull Request #495 · DecisionNerd/graphforge

DecisionNerd · 2026-05-05T18:53:34Z

Closes #482

Summary

MATCH (a)... WITH count(...) AS shared MATCH (a)... raised KeyError: 'a' because two optimizer bugs treated the second MATCH (a) as a duplicate of the first
Root cause 1: _redundant_traversal_elimination_pass didn't reset seen_signatures on Aggregate, so the second ScanNodes('a') was eliminated
Root cause 2: _get_bound_variables_after_op had no Aggregate case, so filter pushdown believed pre-aggregate variables (like a) were still in scope and pushed predicates past the boundary
Fix: treat Aggregate as a scope boundary in both passes (same as With)

Test plan

test_scan_after_aggregate_not_eliminated — unit: second ScanNodes preserved
test_scan_before_aggregate_still_deduped_within_segment — duplicates within a segment still removed
test_variable_reuse_no_keyerror — integration: exact repro from issue, no crash
test_jaccard_pattern_variable_reuse — integration: 3-WITH Jaccard query returns correct result (1.0)
make pre-push green (87.08% total coverage)

🤖 Generated with Claude Code

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

Summary by CodeRabbit

Bug Fixes
- Prevented errors when reusing variable names across WITH/aggregation boundaries and ensured correct deduplication behavior across those boundaries.
Tests
- Added integration tests validating variable reuse across multiple WITH boundaries.
- Added unit tests verifying deduplication behavior around aggregation boundaries.

Two optimizer bugs caused KeyError when the same variable name was reused after a WITH-aggregation boundary: 1. _redundant_traversal_elimination_pass did not reset seen_signatures on Aggregate, so the second MATCH (a:Node {…}) was eliminated as a duplicate of the pre-aggregate one. 2. _get_bound_variables_after_op had no Aggregate case, so filter pushdown kept stale variables in scope and pushed predicates past the boundary. Fix: treat Aggregate as a scope boundary in both passes (same as With). Also adds four tests: two unit (optimizer), two integration (Jaccard query). Closes #482

coderabbitai · 2026-05-05T18:53:46Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8b48ea2a-30b0-4a68-8b3d-e8af1dd921ba

📥 Commits

Reviewing files that changed from the base of the PR and between 42967eb and 8cc7cb9.

📒 Files selected for processing (1)

tests/integration/test_with_clause.py

Walkthrough

Query optimizer now treats Aggregate as a scope/pipeline boundary: bound-variable tracking is reset to aggregation aliases after an Aggregate, and redundant-traversal-elimination clears its seen-signature state at Aggregate boundaries to avoid cross-boundary deduplication.

Changes

Aggregate as Scope / Pipeline Boundary

Layer / File(s)	Summary
Data Shape / Binding Model `src/graphforge/optimizer/optimizer.py`	`_get_bound_variables_after_op` adds an `Aggregate` case: the new bound-variable set becomes the aggregation return aliases (or underlying variable names when no alias is present).
Core Optimization Logic `src/graphforge/optimizer/optimizer.py`	`_redundant_traversal_elimination_pass` now treats `Aggregate` as a pipeline boundary (in addition to `With`, `Union`, `Subquery`), resetting the seen-signature state when an `Aggregate` is encountered.
Unit Tests `tests/unit/optimizer/test_redundant_elimination.py`	Adds `TestAggregateBoundary` with tests ensuring scans after an `Aggregate` are not deduplicated across the boundary while duplicates within the same pre-aggregate segment remain deduplicated.
Integration Tests `tests/integration/test_with_clause.py`	Adds `TestVariableReuseAcrossWithBoundary` verifying reusing variable names across `WITH` boundaries does not raise `KeyError` and a multi-`WITH` Jaccard query returns expected numeric result.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

DecisionNerd/graphforge#184: Related changes to redundant-traversal-elimination logic and pipeline boundary handling involving Aggregate.
DecisionNerd/graphforge#495: Overlaps on the same optimizer changes treating Aggregate as a scope/pipeline boundary.
DecisionNerd/graphforge#329: Also modifies optimizer behavior to treat Aggregate as a pipeline/scope boundary.

Suggested labels

bug, tests

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main bug fix: preventing crashes when variables are reused across WITH boundaries, which is the core objective of this PR.
Description check	✅ Passed	The description provides a comprehensive summary of the bugs fixed, root causes identified, and tests added. It includes linked issue reference, test coverage verification, and pre-push checks confirmation.
Linked Issues check	✅ Passed	The PR fully addresses all acceptance criteria from `#482`: fixes the KeyError crash by treating Aggregate as a scope boundary, enables proper variable reuse per openCypher semantics, and includes comprehensive unit and integration tests covering the repro case and Jaccard query.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing issue `#482`: optimizer fixes in QueryOptimizer for Aggregate boundary handling, and test additions for both unit and integration coverage. No extraneous changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/482-variable-reuse-with-boundary

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-05T18:55:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.02%. Comparing base (b55591b) to head (8cc7cb9).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #495      +/-   ##
==========================================
+ Coverage   88.01%   88.02%   +0.01%     
==========================================
  Files          40       40              
  Lines       14449    14471      +22     
  Branches     3430     3434       +4     
==========================================
+ Hits        12717    12738      +21     
- Misses       1141     1142       +1     
  Partials      591      591

Flag	Coverage Δ
full-coverage	`88.02% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
parser	`95.67% <ø> (ø)`
planner	`82.89% <ø> (ø)`
executor	`83.63% <ø> (ø)`
storage	`91.25% <ø> (ø)`
ast	`98.20% <ø> (ø)`
types	`94.75% <ø> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b55591b...8cc7cb9. Read the comment docs.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/test_with_clause.py`:
- Around line 420-447: The test_variable_reuse_no_keyerror creates a GraphForge
without cleanup and should use the tmp_path fixture and close the DB to match
other tests: add a tmp_path parameter to the test signature, instantiate
GraphForge using a fresh path from tmp_path (or otherwise ensure per-test
isolation), and call db.close() at the end of the test (reference symbols:
test_variable_reuse_no_keyerror, GraphForge, db.close). Ensure any other similar
tests in this file follow the same pattern for consistency.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e8aa6c7f-dd4f-45ad-b8f4-4f8f342dc127

📥 Commits

Reviewing files that changed from the base of the PR and between b55591b and 42967eb.

📒 Files selected for processing (3)

src/graphforge/optimizer/optimizer.py
tests/integration/test_with_clause.py
tests/unit/optimizer/test_redundant_elimination.py

coderabbitai · 2026-05-05T18:58:19Z

+    def test_variable_reuse_no_keyerror(self):
+        """Re-using variable 'a' after WITH no longer raises KeyError (#482)."""
+        db = GraphForge()
+        db.execute("""
+            CREATE (:Node {id: '107'}),
+                   (:Node {id: '1684'}),
+                   (:Node {id: 'x'})
+        """)
+        db.execute("""
+            MATCH (a:Node {id: '107'}), (b:Node {id: 'x'})
+            MERGE (a)-[:CONNECTED_TO]->(b)
+        """)
+        db.execute("""
+            MATCH (a:Node {id: '1684'}), (b:Node {id: 'x'})
+            MERGE (a)-[:CONNECTED_TO]->(b)
+        """)
+
+        results = db.execute("""
+            MATCH (a:Node {id: '107'})-[:CONNECTED_TO]-(common)-[:CONNECTED_TO]-(b:Node {id: '1684'})
+            WITH count(DISTINCT common) AS shared
+            MATCH (a:Node {id: '107'})-[:CONNECTED_TO]-(na)
+            WITH shared, count(DISTINCT na) AS deg_a
+            RETURN shared, deg_a
+        """)
+
+        assert len(results) == 1
+        assert results[0]["shared"].value == 1
+        assert results[0]["deg_a"].value == 1


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add tmp_path fixture and cleanup for test isolation consistency.

Other tests in this file use the tmp_path fixture and call db.close(). These tests create an in-memory GraphForge() without cleanup, which is inconsistent with the file's patterns and may cause resource leaks if GraphForge() holds file handles or connections.

Suggested fix for both tests

- def test_variable_reuse_no_keyerror(self): + def test_variable_reuse_no_keyerror(self, tmp_path): """Re-using variable 'a' after WITH no longer raises KeyError (`#482`).""" - db = GraphForge() + db = GraphForge(tmp_path / "test.db") ... assert results[0]["deg_a"].value == 1 + + db.close() - def test_jaccard_pattern_variable_reuse(self): + def test_jaccard_pattern_variable_reuse(self, tmp_path): """Multi-WITH query with variable reuse computes correct Jaccard coefficient (`#482`).""" - db = GraphForge() + db = GraphForge(tmp_path / "test.db") ... assert abs(results[0]["jaccard"].value - 1.0) < 1e-9 + + db.close()

As per coding guidelines, "Use fresh fixtures in tests (avoid shared mutable state) to ensure test isolation".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_with_clause.py` around lines 420 - 447, The test_variable_reuse_no_keyerror creates a GraphForge without cleanup and should use the tmp_path fixture and close the DB to match other tests: add a tmp_path parameter to the test signature, instantiate GraphForge using a fresh path from tmp_path (or otherwise ensure per-test isolation), and call db.close() at the end of the test (reference symbols: test_variable_reuse_no_keyerror, GraphForge, db.close). Ensure any other similar tests in this file follow the same pattern for consistency.

* docs: analytics guide, installation extras, KG/agent doc fixes (#457 #456 #455 #454 #453 #473 #503) - Add docs/guide/analytics-integration.md covering to_dicts/to_dataframe/ to_networkx/to_igraph/to_json/from_json with choosing-between table - Update docs/getting-started/installation.md with all optional extras (pandas, networkx, igraph, analytics, zstandard) and bump to v0.3.10 - Update examples/05_migration_from_networkx.py with working to_networkx() replacing the "future feature" placeholder - Update docs/use-cases/knowledge-graph-construction.md: add add_graph_documents() section, CREATE vs MERGE idempotency warning, fix shortestPath to raise NotImplementedError with BFS workaround - Update research docs to mark resolved: FP-1/FP-5/FP-6 in llm-workflows and network-analysis, FP-2/FP-4/FP-5/FP-6 in kg-construction, Engine Bug 1/2 in agent-grounding (PRs #494 #495); update pass/fail matrix for S4 and S10 (20 PASS / 5 PARTIAL / 4 FAIL) Closes #457, #456, #455, #454, #453, #473, #503 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: zero-base rebuild of docs publishing pipeline (#506) - Move docs deps from optional-dependencies to [dependency-groups] - Rewrite CI workflow: uv sync --group docs (lock-file reproducible), caching, src/** trigger - Remove dead version.provider: mike from mkdocs.yml - Add docs-serve, docs-build, docs-clean Makefile targets - Pin pygments<2.20 in docs group (2.20.0 breaks pymdownx title=None handling) - Drop pygments>=2.20.0 constraint (Lua ReDoS CVE not relevant to graphforge) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…456 #455 #454 #453 #473 #503) (#505) - Add docs/guide/analytics-integration.md covering to_dicts/to_dataframe/ to_networkx/to_igraph/to_json/from_json with choosing-between table - Update docs/getting-started/installation.md with all optional extras (pandas, networkx, igraph, analytics, zstandard) and bump to v0.3.10 - Update examples/05_migration_from_networkx.py with working to_networkx() replacing the "future feature" placeholder - Update docs/use-cases/knowledge-graph-construction.md: add add_graph_documents() section, CREATE vs MERGE idempotency warning, fix shortestPath to raise NotImplementedError with BFS workaround - Update research docs to mark resolved: FP-1/FP-5/FP-6 in llm-workflows and network-analysis, FP-2/FP-4/FP-5/FP-6 in kg-construction, Engine Bug 1/2 in agent-grounding (PRs #494 #495); update pass/fail matrix for S4 and S10 (20 PASS / 5 PARTIAL / 4 FAIL) Closes #457, #456, #455, #454, #453, #473, #503 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

DecisionNerd mentioned this pull request May 5, 2026

Release v0.3.10 — Analytics Integration: NetworkX/igraph Export & Parse Cache #448

Closed

29 tasks

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

test: add db.close() to variable reuse integration tests for consistency

8cc7cb9

DecisionNerd merged commit 79726d9 into main May 5, 2026
28 of 29 checks passed

DecisionNerd deleted the fix/482-variable-reuse-with-boundary branch May 5, 2026 19:14

DecisionNerd mentioned this pull request May 6, 2026

docs: analytics guide, installation extras, and use-case doc fixes #505

Merged

3 tasks

DecisionNerd mentioned this pull request May 6, 2026

release: v0.3.10 — Analytics Integration #508

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: variable reuse across WITH boundary no longer crashes#495

fix: variable reuse across WITH boundary no longer crashes#495
DecisionNerd merged 2 commits into
mainfrom
fix/482-variable-reuse-with-boundary

DecisionNerd commented May 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 5, 2026 •

edited

Loading

Review failed

Uh oh!

codecov Bot commented May 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DecisionNerd commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

codecov Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DecisionNerd commented May 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading

codecov Bot commented May 5, 2026 •

edited

Loading