fix: variable reuse across WITH boundary no longer crashes#495
Conversation
Two optimizer bugs caused KeyError when the same variable name was reused
after a WITH-aggregation boundary:
1. _redundant_traversal_elimination_pass did not reset seen_signatures on
Aggregate, so the second MATCH (a:Node {…}) was eliminated as a duplicate
of the pre-aggregate one.
2. _get_bound_variables_after_op had no Aggregate case, so filter pushdown
kept stale variables in scope and pushed predicates past the boundary.
Fix: treat Aggregate as a scope boundary in both passes (same as With).
Also adds four tests: two unit (optimizer), two integration (Jaccard query).
Closes #482
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughQuery optimizer now treats ChangesAggregate as Scope / Pipeline Boundary
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #495 +/- ##
==========================================
+ Coverage 88.01% 88.02% +0.01%
==========================================
Files 40 40
Lines 14449 14471 +22
Branches 3430 3434 +4
==========================================
+ Hits 12717 12738 +21
- Misses 1141 1142 +1
Partials 591 591
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/test_with_clause.py`:
- Around line 420-447: The test_variable_reuse_no_keyerror creates a GraphForge
without cleanup and should use the tmp_path fixture and close the DB to match
other tests: add a tmp_path parameter to the test signature, instantiate
GraphForge using a fresh path from tmp_path (or otherwise ensure per-test
isolation), and call db.close() at the end of the test (reference symbols:
test_variable_reuse_no_keyerror, GraphForge, db.close). Ensure any other similar
tests in this file follow the same pattern for consistency.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e8aa6c7f-dd4f-45ad-b8f4-4f8f342dc127
📒 Files selected for processing (3)
src/graphforge/optimizer/optimizer.pytests/integration/test_with_clause.pytests/unit/optimizer/test_redundant_elimination.py
| def test_variable_reuse_no_keyerror(self): | ||
| """Re-using variable 'a' after WITH no longer raises KeyError (#482).""" | ||
| db = GraphForge() | ||
| db.execute(""" | ||
| CREATE (:Node {id: '107'}), | ||
| (:Node {id: '1684'}), | ||
| (:Node {id: 'x'}) | ||
| """) | ||
| db.execute(""" | ||
| MATCH (a:Node {id: '107'}), (b:Node {id: 'x'}) | ||
| MERGE (a)-[:CONNECTED_TO]->(b) | ||
| """) | ||
| db.execute(""" | ||
| MATCH (a:Node {id: '1684'}), (b:Node {id: 'x'}) | ||
| MERGE (a)-[:CONNECTED_TO]->(b) | ||
| """) | ||
|
|
||
| results = db.execute(""" | ||
| MATCH (a:Node {id: '107'})-[:CONNECTED_TO]-(common)-[:CONNECTED_TO]-(b:Node {id: '1684'}) | ||
| WITH count(DISTINCT common) AS shared | ||
| MATCH (a:Node {id: '107'})-[:CONNECTED_TO]-(na) | ||
| WITH shared, count(DISTINCT na) AS deg_a | ||
| RETURN shared, deg_a | ||
| """) | ||
|
|
||
| assert len(results) == 1 | ||
| assert results[0]["shared"].value == 1 | ||
| assert results[0]["deg_a"].value == 1 |
There was a problem hiding this comment.
Add tmp_path fixture and cleanup for test isolation consistency.
Other tests in this file use the tmp_path fixture and call db.close(). These tests create an in-memory GraphForge() without cleanup, which is inconsistent with the file's patterns and may cause resource leaks if GraphForge() holds file handles or connections.
Suggested fix for both tests
- def test_variable_reuse_no_keyerror(self):
+ def test_variable_reuse_no_keyerror(self, tmp_path):
"""Re-using variable 'a' after WITH no longer raises KeyError (`#482`)."""
- db = GraphForge()
+ db = GraphForge(tmp_path / "test.db")
...
assert results[0]["deg_a"].value == 1
+
+ db.close()
- def test_jaccard_pattern_variable_reuse(self):
+ def test_jaccard_pattern_variable_reuse(self, tmp_path):
"""Multi-WITH query with variable reuse computes correct Jaccard coefficient (`#482`)."""
- db = GraphForge()
+ db = GraphForge(tmp_path / "test.db")
...
assert abs(results[0]["jaccard"].value - 1.0) < 1e-9
+
+ db.close()As per coding guidelines, "Use fresh fixtures in tests (avoid shared mutable state) to ensure test isolation".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration/test_with_clause.py` around lines 420 - 447, The
test_variable_reuse_no_keyerror creates a GraphForge without cleanup and should
use the tmp_path fixture and close the DB to match other tests: add a tmp_path
parameter to the test signature, instantiate GraphForge using a fresh path from
tmp_path (or otherwise ensure per-test isolation), and call db.close() at the
end of the test (reference symbols: test_variable_reuse_no_keyerror, GraphForge,
db.close). Ensure any other similar tests in this file follow the same pattern
for consistency.
* docs: analytics guide, installation extras, KG/agent doc fixes (#457 #456 #455 #454 #453 #473 #503) - Add docs/guide/analytics-integration.md covering to_dicts/to_dataframe/ to_networkx/to_igraph/to_json/from_json with choosing-between table - Update docs/getting-started/installation.md with all optional extras (pandas, networkx, igraph, analytics, zstandard) and bump to v0.3.10 - Update examples/05_migration_from_networkx.py with working to_networkx() replacing the "future feature" placeholder - Update docs/use-cases/knowledge-graph-construction.md: add add_graph_documents() section, CREATE vs MERGE idempotency warning, fix shortestPath to raise NotImplementedError with BFS workaround - Update research docs to mark resolved: FP-1/FP-5/FP-6 in llm-workflows and network-analysis, FP-2/FP-4/FP-5/FP-6 in kg-construction, Engine Bug 1/2 in agent-grounding (PRs #494 #495); update pass/fail matrix for S4 and S10 (20 PASS / 5 PARTIAL / 4 FAIL) Closes #457, #456, #455, #454, #453, #473, #503 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: zero-base rebuild of docs publishing pipeline (#506) - Move docs deps from optional-dependencies to [dependency-groups] - Rewrite CI workflow: uv sync --group docs (lock-file reproducible), caching, src/** trigger - Remove dead version.provider: mike from mkdocs.yml - Add docs-serve, docs-build, docs-clean Makefile targets - Pin pygments<2.20 in docs group (2.20.0 breaks pymdownx title=None handling) - Drop pygments>=2.20.0 constraint (Lua ReDoS CVE not relevant to graphforge) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…456 #455 #454 #453 #473 #503) (#505) - Add docs/guide/analytics-integration.md covering to_dicts/to_dataframe/ to_networkx/to_igraph/to_json/from_json with choosing-between table - Update docs/getting-started/installation.md with all optional extras (pandas, networkx, igraph, analytics, zstandard) and bump to v0.3.10 - Update examples/05_migration_from_networkx.py with working to_networkx() replacing the "future feature" placeholder - Update docs/use-cases/knowledge-graph-construction.md: add add_graph_documents() section, CREATE vs MERGE idempotency warning, fix shortestPath to raise NotImplementedError with BFS workaround - Update research docs to mark resolved: FP-1/FP-5/FP-6 in llm-workflows and network-analysis, FP-2/FP-4/FP-5/FP-6 in kg-construction, Engine Bug 1/2 in agent-grounding (PRs #494 #495); update pass/fail matrix for S4 and S10 (20 PASS / 5 PARTIAL / 4 FAIL) Closes #457, #456, #455, #454, #453, #473, #503 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #482
Summary
MATCH (a)... WITH count(...) AS shared MATCH (a)...raisedKeyError: 'a'because two optimizer bugs treated the secondMATCH (a)as a duplicate of the first_redundant_traversal_elimination_passdidn't resetseen_signaturesonAggregate, so the secondScanNodes('a')was eliminated_get_bound_variables_after_ophad noAggregatecase, so filter pushdown believed pre-aggregate variables (likea) were still in scope and pushed predicates past the boundaryAggregateas a scope boundary in both passes (same asWith)Test plan
test_scan_after_aggregate_not_eliminated— unit: second ScanNodes preservedtest_scan_before_aggregate_still_deduped_within_segment— duplicates within a segment still removedtest_variable_reuse_no_keyerror— integration: exact repro from issue, no crashtest_jaccard_pattern_variable_reuse— integration: 3-WITH Jaccard query returns correct result (1.0)make pre-pushgreen (87.08% total coverage)🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.Summary by CodeRabbit
Bug Fixes
Tests