fix: WITH clause variable passing in aggregation queries#67
Conversation
## Problem
Queries using WITH clause with aggregation failed with `KeyError` when trying
to access grouping variables in subsequent clauses. This prevented common
aggregation patterns from working.
### Example Query That Failed
```cypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
ORDER BY movie_count DESC
```
Error: `KeyError: 'p'` - The variable `p` from the WITH clause was not
accessible in the RETURN clause.
## Root Cause
The executor's `_execute_aggregate` method had two issues when binding
grouping variables for WITH clauses:
1. **Object equality comparison**: Used `item.expression == expr` which
doesn't work for comparing AST nodes (two Variable objects with the same
name are different objects).
2. **Variable name extraction**: When no alias was provided, fell back to
`f"col_{i}"` instead of extracting the variable name from Variable
expressions.
## Solution
### 1. Added Expression Comparison Helper
Added `_expressions_match()` method to semantically compare AST expressions:
- Compares Variable nodes by name
- Compares PropertyAccess nodes by variable and property
- Compares Literal nodes by value
- Compares FunctionCall nodes by name and arguments
- Handles all expression types properly
### 2. Fixed Variable Binding Logic
Updated grouping variable binding to:
- Use semantic expression matching instead of object equality
- Extract variable name from Variable expressions when no alias provided
- Only fall back to `col_{i}` for complex expressions without aliases
## Changes
### Core Fix
- **File**: `src/graphforge/executor/executor.py`
- Added `_expressions_match()` helper method
- Updated `_execute_aggregate()` to use semantic matching
- Fixed variable name extraction for grouping expressions
## Testing
### Unit Tests (11 tests)
- Single and multiple grouping variables
- Aggregation with filtering (WHERE after WITH)
- Various aggregation functions (COUNT, SUM, AVG, MIN, MAX, COLLECT)
- Edge cases (no results, only aggregates, chained WITH)
### Integration Tests (8 tests)
- Top N by count pattern
- Multi-level aggregation
- Graph analytics patterns
- Time series aggregation
- Complex WITH patterns with DISTINCT, LIMIT
- Nested property access
**Total: 19 new tests, all passing**
## Examples
### Before (Failed)
```cypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
-- KeyError: 'p'
```
### After (Works)
```cypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
ORDER BY movie_count DESC
-- Returns: Alice: 2 movies, Bob: 1 movies
```
### Additional Patterns Now Working
```cypher
-- Multiple grouping variables
MATCH (p:Person)
WITH p.age as age, count(p) as person_count
RETURN age, person_count
-- Aggregation with filtering
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
WHERE movie_count > 1
RETURN p.name, movie_count
-- Chained aggregations
MATCH (e:Employee)
WITH e.dept as dept, avg(e.salary) as avg_salary
WITH dept, avg_salary
WHERE avg_salary > 100000
RETURN dept, avg_salary
```
## Verification
- ✅ All 19 new tests pass
- ✅ All 1037 total tests pass (backward compatibility verified)
- ✅ 95.42% coverage (meets threshold)
- ✅ Issue reproduction test case now works
- ✅ No regressions in existing functionality
## Impact
Fixes a **HIGH priority bug** that blocked common graph analytics workflows:
- Top N by count queries
- Aggregation with post-filter
- Multi-level aggregations
- Graph analytics patterns
This is critical for dataset integration and real-world query patterns.
## Related
- Discovered during dataset integration testing (v0.2.1)
- Closes #60
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThis PR introduces semantic expression matching to the query executor's WITH clause aggregation handling. A new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #67 +/- ##
==========================================
+ Coverage 93.82% 93.89% +0.06%
==========================================
Files 15 15
Lines 2009 2032 +23
Branches 501 511 +10
==========================================
+ Hits 1885 1908 +23
Misses 44 44
Partials 80 80
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
|
- Add 16 unit tests in test_expressions_match.py directly testing the helper - Tests cover all expression types: Variable, PropertyAccess, Literal, FunctionCall - Tests cover edge cases: different names, properties, arg counts - Tests ensure case-insensitive function names work correctly - Improves executor.py coverage from 54% to 92.43% (targeting patch coverage) Related to #60
- Add check-patch-coverage target to Makefile - Checks coverage of changed files only (compared to origin/main) - Requires 90% coverage for modified source files - Mirrors GitHub's codecov/patch check locally - Skips gracefully when no source files changed - Integrated into pre-push workflow This helps catch patch coverage issues before pushing to GitHub.
Version Bump: - Bump version to 0.2.1 in pyproject.toml, __init__.py, and uv.lock - Add comprehensive v0.2.1 changelog entry Documentation Updates: - Update README.md with dataset loading examples and quickstart - Add "Load Real-World Datasets" section to main README - Update docs/index.md with dataset features and examples - Complete rewrite of docs/datasets/snap.md: - Mark as available in v0.2.1 (5 datasets) - Add detailed dataset table with stats - Add comprehensive usage examples and query patterns - Document download/caching behavior - Add performance tips for large datasets - Update docs/datasets/overview.md: - Reorganize to show SNAP as "Available Now" - Mark other sources as "Coming Soon" - List all 5 available SNAP datasets - Update docs/getting-started/quickstart.md: - Add "Load a Dataset" section with examples - Add dataset browsing examples - Update navigation links Release Contents (v0.2.1): - Dataset loading infrastructure with caching (#68) - CSV loader for edge-list datasets (#69) - 5 SNAP datasets available - MERGE ON CREATE SET syntax (#65) - MERGE ON MATCH SET syntax (#66) - WITH clause variable passing fix (#67) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: prepare v0.2.1 release with dataset documentation Version Bump: - Bump version to 0.2.1 in pyproject.toml, __init__.py, and uv.lock - Add comprehensive v0.2.1 changelog entry Documentation Updates: - Update README.md with dataset loading examples and quickstart - Add "Load Real-World Datasets" section to main README - Update docs/index.md with dataset features and examples - Complete rewrite of docs/datasets/snap.md: - Mark as available in v0.2.1 (5 datasets) - Add detailed dataset table with stats - Add comprehensive usage examples and query patterns - Document download/caching behavior - Add performance tips for large datasets - Update docs/datasets/overview.md: - Reorganize to show SNAP as "Available Now" - Mark other sources as "Coming Soon" - List all 5 available SNAP datasets - Update docs/getting-started/quickstart.md: - Add "Load a Dataset" section with examples - Add dataset browsing examples - Update navigation links Release Contents (v0.2.1): - Dataset loading infrastructure with caching (#68) - CSV loader for edge-list datasets (#69) - 5 SNAP datasets available - MERGE ON CREATE SET syntax (#65) - MERGE ON MATCH SET syntax (#66) - WITH clause variable passing fix (#67) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: update uv.lock after version bump --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Problem
Queries using WITH clause with aggregation failed with
KeyErrorwhen trying to access grouping variables in subsequent clauses. This prevented common aggregation patterns from working.Example Query That Failed
Error:
KeyError: 'p'- The variablepfrom the WITH clause was not accessible in the RETURN clause.Root Cause
The executor's
_execute_aggregatemethod had two issues when binding grouping variables for WITH clauses:Object equality comparison: Used
item.expression == exprwhich doesn't work for comparing AST nodes (two Variable objects with the same name are different objects).Variable name extraction: When no alias was provided, fell back to
f"col_{i}"instead of extracting the variable name from Variable expressions.Solution
1. Added Expression Comparison Helper
Added
_expressions_match()method to semantically compare AST expressions:2. Fixed Variable Binding Logic
Updated grouping variable binding to:
col_{i}for complex expressions without aliasesChanges
Core Fix
File:
src/graphforge/executor/executor.py_expressions_match()helper method (45 lines)_execute_aggregate()to use semantic matchingTesting
✅ Test Coverage
📊 Test Categories
Unit Tests:
Integration Tests:
Examples
Before (Failed) ❌
After (Works) ✅
Additional Patterns Now Working
Multiple Grouping Variables
Aggregation with Post-Filter
Chained Aggregations
Graph Analytics Pattern
Verification
Impact
Fixes a HIGH priority bug that blocked common graph analytics workflows:
This is critical for:
Related
🤖 Generated with Claude Code
Summary by CodeRabbit
Improvements
Tests