Skip to content

fix: variable reuse across WITH boundary no longer crashes#495

Merged
DecisionNerd merged 2 commits into
mainfrom
fix/482-variable-reuse-with-boundary
May 5, 2026
Merged

fix: variable reuse across WITH boundary no longer crashes#495
DecisionNerd merged 2 commits into
mainfrom
fix/482-variable-reuse-with-boundary

Conversation

@DecisionNerd
Copy link
Copy Markdown
Owner

@DecisionNerd DecisionNerd commented May 5, 2026

Closes #482

Summary

  • MATCH (a)... WITH count(...) AS shared MATCH (a)... raised KeyError: 'a' because two optimizer bugs treated the second MATCH (a) as a duplicate of the first
  • Root cause 1: _redundant_traversal_elimination_pass didn't reset seen_signatures on Aggregate, so the second ScanNodes('a') was eliminated
  • Root cause 2: _get_bound_variables_after_op had no Aggregate case, so filter pushdown believed pre-aggregate variables (like a) were still in scope and pushed predicates past the boundary
  • Fix: treat Aggregate as a scope boundary in both passes (same as With)

Test plan

  • test_scan_after_aggregate_not_eliminated — unit: second ScanNodes preserved
  • test_scan_before_aggregate_still_deduped_within_segment — duplicates within a segment still removed
  • test_variable_reuse_no_keyerror — integration: exact repro from issue, no crash
  • test_jaccard_pattern_variable_reuse — integration: 3-WITH Jaccard query returns correct result (1.0)
  • make pre-push green (87.08% total coverage)

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

Summary by CodeRabbit

  • Bug Fixes

    • Prevented errors when reusing variable names across WITH/aggregation boundaries and ensured correct deduplication behavior across those boundaries.
  • Tests

    • Added integration tests validating variable reuse across multiple WITH boundaries.
    • Added unit tests verifying deduplication behavior around aggregation boundaries.

Two optimizer bugs caused KeyError when the same variable name was reused
after a WITH-aggregation boundary:

1. _redundant_traversal_elimination_pass did not reset seen_signatures on
   Aggregate, so the second MATCH (a:Node {…}) was eliminated as a duplicate
   of the pre-aggregate one.
2. _get_bound_variables_after_op had no Aggregate case, so filter pushdown
   kept stale variables in scope and pushed predicates past the boundary.

Fix: treat Aggregate as a scope boundary in both passes (same as With).
Also adds four tests: two unit (optimizer), two integration (Jaccard query).

Closes #482
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8b48ea2a-30b0-4a68-8b3d-e8af1dd921ba

📥 Commits

Reviewing files that changed from the base of the PR and between 42967eb and 8cc7cb9.

📒 Files selected for processing (1)
  • tests/integration/test_with_clause.py

Walkthrough

Query optimizer now treats Aggregate as a scope/pipeline boundary: bound-variable tracking is reset to aggregation aliases after an Aggregate, and redundant-traversal-elimination clears its seen-signature state at Aggregate boundaries to avoid cross-boundary deduplication.

Changes

Aggregate as Scope / Pipeline Boundary

Layer / File(s) Summary
Data Shape / Binding Model
src/graphforge/optimizer/optimizer.py
_get_bound_variables_after_op adds an Aggregate case: the new bound-variable set becomes the aggregation return aliases (or underlying variable names when no alias is present).
Core Optimization Logic
src/graphforge/optimizer/optimizer.py
_redundant_traversal_elimination_pass now treats Aggregate as a pipeline boundary (in addition to With, Union, Subquery), resetting the seen-signature state when an Aggregate is encountered.
Unit Tests
tests/unit/optimizer/test_redundant_elimination.py
Adds TestAggregateBoundary with tests ensuring scans after an Aggregate are not deduplicated across the boundary while duplicates within the same pre-aggregate segment remain deduplicated.
Integration Tests
tests/integration/test_with_clause.py
Adds TestVariableReuseAcrossWithBoundary verifying reusing variable names across WITH boundaries does not raise KeyError and a multi-WITH Jaccard query returns expected numeric result.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

bug, tests

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main bug fix: preventing crashes when variables are reused across WITH boundaries, which is the core objective of this PR.
Description check ✅ Passed The description provides a comprehensive summary of the bugs fixed, root causes identified, and tests added. It includes linked issue reference, test coverage verification, and pre-push checks confirmation.
Linked Issues check ✅ Passed The PR fully addresses all acceptance criteria from #482: fixes the KeyError crash by treating Aggregate as a scope boundary, enables proper variable reuse per openCypher semantics, and includes comprehensive unit and integration tests covering the repro case and Jaccard query.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing issue #482: optimizer fixes in QueryOptimizer for Aggregate boundary handling, and test additions for both unit and integration coverage. No extraneous changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/482-variable-reuse-with-boundary

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.02%. Comparing base (b55591b) to head (8cc7cb9).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #495      +/-   ##
==========================================
+ Coverage   88.01%   88.02%   +0.01%     
==========================================
  Files          40       40              
  Lines       14449    14471      +22     
  Branches     3430     3434       +4     
==========================================
+ Hits        12717    12738      +21     
- Misses       1141     1142       +1     
  Partials      591      591              
Flag Coverage Δ
full-coverage 88.02% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
parser 95.67% <ø> (ø)
planner 82.89% <ø> (ø)
executor 83.63% <ø> (ø)
storage 91.25% <ø> (ø)
ast 98.20% <ø> (ø)
types 94.75% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b55591b...8cc7cb9. Read the comment docs.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/test_with_clause.py`:
- Around line 420-447: The test_variable_reuse_no_keyerror creates a GraphForge
without cleanup and should use the tmp_path fixture and close the DB to match
other tests: add a tmp_path parameter to the test signature, instantiate
GraphForge using a fresh path from tmp_path (or otherwise ensure per-test
isolation), and call db.close() at the end of the test (reference symbols:
test_variable_reuse_no_keyerror, GraphForge, db.close). Ensure any other similar
tests in this file follow the same pattern for consistency.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e8aa6c7f-dd4f-45ad-b8f4-4f8f342dc127

📥 Commits

Reviewing files that changed from the base of the PR and between b55591b and 42967eb.

📒 Files selected for processing (3)
  • src/graphforge/optimizer/optimizer.py
  • tests/integration/test_with_clause.py
  • tests/unit/optimizer/test_redundant_elimination.py

Comment on lines +420 to +447
def test_variable_reuse_no_keyerror(self):
"""Re-using variable 'a' after WITH no longer raises KeyError (#482)."""
db = GraphForge()
db.execute("""
CREATE (:Node {id: '107'}),
(:Node {id: '1684'}),
(:Node {id: 'x'})
""")
db.execute("""
MATCH (a:Node {id: '107'}), (b:Node {id: 'x'})
MERGE (a)-[:CONNECTED_TO]->(b)
""")
db.execute("""
MATCH (a:Node {id: '1684'}), (b:Node {id: 'x'})
MERGE (a)-[:CONNECTED_TO]->(b)
""")

results = db.execute("""
MATCH (a:Node {id: '107'})-[:CONNECTED_TO]-(common)-[:CONNECTED_TO]-(b:Node {id: '1684'})
WITH count(DISTINCT common) AS shared
MATCH (a:Node {id: '107'})-[:CONNECTED_TO]-(na)
WITH shared, count(DISTINCT na) AS deg_a
RETURN shared, deg_a
""")

assert len(results) == 1
assert results[0]["shared"].value == 1
assert results[0]["deg_a"].value == 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add tmp_path fixture and cleanup for test isolation consistency.

Other tests in this file use the tmp_path fixture and call db.close(). These tests create an in-memory GraphForge() without cleanup, which is inconsistent with the file's patterns and may cause resource leaks if GraphForge() holds file handles or connections.

Suggested fix for both tests
-    def test_variable_reuse_no_keyerror(self):
+    def test_variable_reuse_no_keyerror(self, tmp_path):
         """Re-using variable 'a' after WITH no longer raises KeyError (`#482`)."""
-        db = GraphForge()
+        db = GraphForge(tmp_path / "test.db")
         ...
         assert results[0]["deg_a"].value == 1
+
+        db.close()

-    def test_jaccard_pattern_variable_reuse(self):
+    def test_jaccard_pattern_variable_reuse(self, tmp_path):
         """Multi-WITH query with variable reuse computes correct Jaccard coefficient (`#482`)."""
-        db = GraphForge()
+        db = GraphForge(tmp_path / "test.db")
         ...
         assert abs(results[0]["jaccard"].value - 1.0) < 1e-9
+
+        db.close()

As per coding guidelines, "Use fresh fixtures in tests (avoid shared mutable state) to ensure test isolation".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_with_clause.py` around lines 420 - 447, The
test_variable_reuse_no_keyerror creates a GraphForge without cleanup and should
use the tmp_path fixture and close the DB to match other tests: add a tmp_path
parameter to the test signature, instantiate GraphForge using a fresh path from
tmp_path (or otherwise ensure per-test isolation), and call db.close() at the
end of the test (reference symbols: test_variable_reuse_no_keyerror, GraphForge,
db.close). Ensure any other similar tests in this file follow the same pattern
for consistency.

@DecisionNerd DecisionNerd merged commit 79726d9 into main May 5, 2026
28 of 29 checks passed
@DecisionNerd DecisionNerd deleted the fix/482-variable-reuse-with-boundary branch May 5, 2026 19:14
DecisionNerd added a commit that referenced this pull request May 6, 2026
* docs: analytics guide, installation extras, KG/agent doc fixes (#457 #456 #455 #454 #453 #473 #503)

- Add docs/guide/analytics-integration.md covering to_dicts/to_dataframe/
  to_networkx/to_igraph/to_json/from_json with choosing-between table
- Update docs/getting-started/installation.md with all optional extras
  (pandas, networkx, igraph, analytics, zstandard) and bump to v0.3.10
- Update examples/05_migration_from_networkx.py with working to_networkx()
  replacing the "future feature" placeholder
- Update docs/use-cases/knowledge-graph-construction.md: add
  add_graph_documents() section, CREATE vs MERGE idempotency warning,
  fix shortestPath to raise NotImplementedError with BFS workaround
- Update research docs to mark resolved: FP-1/FP-5/FP-6 in llm-workflows
  and network-analysis, FP-2/FP-4/FP-5/FP-6 in kg-construction,
  Engine Bug 1/2 in agent-grounding (PRs #494 #495); update pass/fail
  matrix for S4 and S10 (20 PASS / 5 PARTIAL / 4 FAIL)

Closes #457, #456, #455, #454, #453, #473, #503

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: zero-base rebuild of docs publishing pipeline (#506)

- Move docs deps from optional-dependencies to [dependency-groups]
- Rewrite CI workflow: uv sync --group docs (lock-file reproducible), caching, src/** trigger
- Remove dead version.provider: mike from mkdocs.yml
- Add docs-serve, docs-build, docs-clean Makefile targets
- Pin pygments<2.20 in docs group (2.20.0 breaks pymdownx title=None handling)
- Drop pygments>=2.20.0 constraint (Lua ReDoS CVE not relevant to graphforge)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
DecisionNerd added a commit that referenced this pull request May 6, 2026
…456 #455 #454 #453 #473 #503) (#505)

- Add docs/guide/analytics-integration.md covering to_dicts/to_dataframe/
  to_networkx/to_igraph/to_json/from_json with choosing-between table
- Update docs/getting-started/installation.md with all optional extras
  (pandas, networkx, igraph, analytics, zstandard) and bump to v0.3.10
- Update examples/05_migration_from_networkx.py with working to_networkx()
  replacing the "future feature" placeholder
- Update docs/use-cases/knowledge-graph-construction.md: add
  add_graph_documents() section, CREATE vs MERGE idempotency warning,
  fix shortestPath to raise NotImplementedError with BFS workaround
- Update research docs to mark resolved: FP-1/FP-5/FP-6 in llm-workflows
  and network-analysis, FP-2/FP-4/FP-5/FP-6 in kg-construction,
  Engine Bug 1/2 in agent-grounding (PRs #494 #495); update pass/fail
  matrix for S4 and S10 (20 PASS / 5 PARTIAL / 4 FAIL)

Closes #457, #456, #455, #454, #453, #473, #503

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: variable reuse across WITH boundary raises KeyError instead of UndefinedVariable

1 participant