Skip to content

fix: WITH clause variable passing in aggregation queries#67

Merged
DecisionNerd merged 3 commits into
mainfrom
fix/with-clause-variable-passing
Feb 3, 2026
Merged

fix: WITH clause variable passing in aggregation queries#67
DecisionNerd merged 3 commits into
mainfrom
fix/with-clause-variable-passing

Conversation

@DecisionNerd
Copy link
Copy Markdown
Owner

@DecisionNerd DecisionNerd commented Feb 3, 2026

Problem

Queries using WITH clause with aggregation failed with KeyError when trying to access grouping variables in subsequent clauses. This prevented common aggregation patterns from working.

Example Query That Failed

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
ORDER BY movie_count DESC

Error: KeyError: 'p' - The variable p from the WITH clause was not accessible in the RETURN clause.

Root Cause

The executor's _execute_aggregate method had two issues when binding grouping variables for WITH clauses:

  1. Object equality comparison: Used item.expression == expr which doesn't work for comparing AST nodes (two Variable objects with the same name are different objects).

  2. Variable name extraction: When no alias was provided, fell back to f"col_{i}" instead of extracting the variable name from Variable expressions.

Solution

1. Added Expression Comparison Helper

Added _expressions_match() method to semantically compare AST expressions:

  • ✅ Compares Variable nodes by name
  • ✅ Compares PropertyAccess nodes by variable and property
  • ✅ Compares Literal nodes by value
  • ✅ Compares FunctionCall nodes by name and arguments
  • ✅ Handles all expression types properly

2. Fixed Variable Binding Logic

Updated grouping variable binding to:

  • ✅ Use semantic expression matching instead of object equality
  • ✅ Extract variable name from Variable expressions when no alias provided
  • ✅ Only fall back to col_{i} for complex expressions without aliases

Changes

Core Fix

File: src/graphforge/executor/executor.py

  • Added _expressions_match() helper method (45 lines)
  • Updated _execute_aggregate() to use semantic matching
  • Fixed variable name extraction for grouping expressions

Testing

✅ Test Coverage

Category Tests Status
Unit Tests 11 ✅ All passing
Integration Tests 8 ✅ All passing
Total 19 ✅ All passing

📊 Test Categories

Unit Tests:

  • Single and multiple grouping variables
  • Aggregation with filtering (WHERE after WITH)
  • Various aggregation functions (COUNT, SUM, AVG, MIN, MAX, COLLECT)
  • Edge cases (no results, only aggregates, chained WITH)
  • Property access in aggregation

Integration Tests:

  • Top N by count pattern
  • Multi-level aggregation
  • Graph analytics patterns (citation networks)
  • Time series aggregation
  • Complex WITH patterns with DISTINCT, LIMIT
  • Nested property access

Examples

Before (Failed) ❌

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
-- KeyError: 'p'

After (Works) ✅

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
ORDER BY movie_count DESC
-- Returns: Alice: 2 movies, Bob: 1 movies

Additional Patterns Now Working

Multiple Grouping Variables

MATCH (p:Person)
WITH p.age as age, count(p) as person_count
RETURN age, person_count

Aggregation with Post-Filter

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
WHERE movie_count > 1
RETURN p.name, movie_count

Chained Aggregations

MATCH (e:Employee)
WITH e.dept as dept, avg(e.salary) as avg_salary
WITH dept, avg_salary
WHERE avg_salary > 100000
RETURN dept, avg_salary

Graph Analytics Pattern

MATCH (p:Paper)<-[:CITES]-(citing)
WITH p, count(citing) as citation_count
RETURN p.title, citation_count
ORDER BY citation_count DESC

Verification

  • ✅ All 19 new tests pass
  • ✅ All 1037 total tests pass (backward compatibility verified)
  • ✅ 95.42% coverage (meets threshold)
  • ✅ Issue reproduction test case now works
  • ✅ No regressions in existing functionality

Impact

Fixes a HIGH priority bug that blocked common graph analytics workflows:

  • ✅ Top N by count queries (find most connected nodes)
  • ✅ Aggregation with post-filter (filter aggregated results)
  • ✅ Multi-level aggregations (aggregate then filter, then aggregate again)
  • ✅ Graph analytics patterns (PageRank-like queries, citation counts)
  • ✅ Time series analysis (aggregate by time buckets)

This is critical for:

  • Dataset integration and real-world query patterns
  • Basic graph analytics workflows
  • Common aggregation use cases

Related

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Improvements

    • Enhanced aggregation query handling to recognize semantically equivalent expressions in WITH clauses, enabling more flexible aggregation patterns.
  • Tests

    • Added comprehensive test coverage for aggregation functions (COUNT, SUM, AVG, MIN, MAX, COLLECT), WITH clause patterns, multi-level aggregations, and edge cases.

## Problem

Queries using WITH clause with aggregation failed with `KeyError` when trying
to access grouping variables in subsequent clauses. This prevented common
aggregation patterns from working.

### Example Query That Failed
```cypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
ORDER BY movie_count DESC
```

Error: `KeyError: 'p'` - The variable `p` from the WITH clause was not
accessible in the RETURN clause.

## Root Cause

The executor's `_execute_aggregate` method had two issues when binding
grouping variables for WITH clauses:

1. **Object equality comparison**: Used `item.expression == expr` which
   doesn't work for comparing AST nodes (two Variable objects with the same
   name are different objects).

2. **Variable name extraction**: When no alias was provided, fell back to
   `f"col_{i}"` instead of extracting the variable name from Variable
   expressions.

## Solution

### 1. Added Expression Comparison Helper
Added `_expressions_match()` method to semantically compare AST expressions:
- Compares Variable nodes by name
- Compares PropertyAccess nodes by variable and property
- Compares Literal nodes by value
- Compares FunctionCall nodes by name and arguments
- Handles all expression types properly

### 2. Fixed Variable Binding Logic
Updated grouping variable binding to:
- Use semantic expression matching instead of object equality
- Extract variable name from Variable expressions when no alias provided
- Only fall back to `col_{i}` for complex expressions without aliases

## Changes

### Core Fix
- **File**: `src/graphforge/executor/executor.py`
- Added `_expressions_match()` helper method
- Updated `_execute_aggregate()` to use semantic matching
- Fixed variable name extraction for grouping expressions

## Testing

### Unit Tests (11 tests)
- Single and multiple grouping variables
- Aggregation with filtering (WHERE after WITH)
- Various aggregation functions (COUNT, SUM, AVG, MIN, MAX, COLLECT)
- Edge cases (no results, only aggregates, chained WITH)

### Integration Tests (8 tests)
- Top N by count pattern
- Multi-level aggregation
- Graph analytics patterns
- Time series aggregation
- Complex WITH patterns with DISTINCT, LIMIT
- Nested property access

**Total: 19 new tests, all passing**

## Examples

### Before (Failed)
```cypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
-- KeyError: 'p'
```

### After (Works)
```cypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
ORDER BY movie_count DESC
-- Returns: Alice: 2 movies, Bob: 1 movies
```

### Additional Patterns Now Working
```cypher
-- Multiple grouping variables
MATCH (p:Person)
WITH p.age as age, count(p) as person_count
RETURN age, person_count

-- Aggregation with filtering
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
WHERE movie_count > 1
RETURN p.name, movie_count

-- Chained aggregations
MATCH (e:Employee)
WITH e.dept as dept, avg(e.salary) as avg_salary
WITH dept, avg_salary
WHERE avg_salary > 100000
RETURN dept, avg_salary
```

## Verification

- ✅ All 19 new tests pass
- ✅ All 1037 total tests pass (backward compatibility verified)
- ✅ 95.42% coverage (meets threshold)
- ✅ Issue reproduction test case now works
- ✅ No regressions in existing functionality

## Impact

Fixes a **HIGH priority bug** that blocked common graph analytics workflows:
- Top N by count queries
- Aggregation with post-filter
- Multi-level aggregations
- Graph analytics patterns

This is critical for dataset integration and real-world query patterns.

## Related

- Discovered during dataset integration testing (v0.2.1)
- Closes #60

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@DecisionNerd DecisionNerd added bug Something isn't working enhancement New feature or request executor Changes to query executor labels Feb 3, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 3, 2026

Warning

Rate limit exceeded

@DecisionNerd has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 40 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

This PR introduces semantic expression matching to the query executor's WITH clause aggregation handling. A new _expressions_match helper method enables recognition of structurally different but semantically equivalent AST expressions, applied across multiple aggregation execution paths. Comprehensive unit and integration tests validate the implementation.

Changes

Cohort / File(s) Summary
Core Logic
src/graphforge/executor/executor.py
Added _expressions_match() helper method for semantic comparison of AST expressions (Variable, PropertyAccess, Literal, FunctionCall). Replaced direct equality checks with semantic matching in _execute_aggregate, _compute_aggregates_for_group, and WITH aggregation binding paths.
Unit Tests
tests/unit/executor/test_expressions_match.py, tests/unit/executor/test_expression_matching.py, tests/unit/executor/test_with_aggregation.py
Three test modules covering: (1) _expressions_match helper across all expression types and nested comparisons, (2) expression matching within aggregation and ordering contexts, (3) WITH clause aggregation behavior including grouping, filtering, multiple aggregate functions, and edge cases.
Integration Tests
tests/integration/test_with_aggregation_real.py
Real-world aggregation test suite covering Top-N patterns, post-aggregation filtering, chained aggregations, graph analytics, time-series aggregations, DISTINCT with aggregation, and nested property access scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Suggested labels

tests, release:patch

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: WITH clause variable passing in aggregation queries' accurately and clearly summarizes the main bug fix without vagueness or off-topic references.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering problem statement, root cause analysis, solution details, changes made, testing evidence, verification results, and impact assessment.
Linked Issues check ✅ Passed The code changes in executor.py and comprehensive test suites directly address all requirements from issue #60: semantic expression matching, variable binding in WITH clauses, and support for common aggregation patterns.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the WITH clause aggregation issue: core fix in executor.py, unit tests for expression matching, and integration tests for aggregation patterns.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/with-clause-variable-passing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 84.61538% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.89%. Comparing base (b2a9897) to head (b3ea839).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #67      +/-   ##
==========================================
+ Coverage   93.82%   93.89%   +0.06%     
==========================================
  Files          15       15              
  Lines        2009     2032      +23     
  Branches      501      511      +10     
==========================================
+ Hits         1885     1908      +23     
  Misses         44       44              
  Partials       80       80              
Flag Coverage Δ
full-coverage 93.89% <84.61%> (+0.06%) ⬆️
unittests 73.52% <80.76%> (+10.40%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
parser 95.31% <ø> (ø)
planner 94.20% <ø> (ø)
executor 89.87% <84.61%> (+0.26%) ⬆️
storage 99.50% <ø> (ø)
ast 100.00% <ø> (ø)
types 98.42% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2a9897...b3ea839. Read the comment docs.

- Add 16 unit tests in test_expressions_match.py directly testing the helper
- Tests cover all expression types: Variable, PropertyAccess, Literal, FunctionCall
- Tests cover edge cases: different names, properties, arg counts
- Tests ensure case-insensitive function names work correctly
- Improves executor.py coverage from 54% to 92.43% (targeting patch coverage)

Related to #60
- Add check-patch-coverage target to Makefile
- Checks coverage of changed files only (compared to origin/main)
- Requires 90% coverage for modified source files
- Mirrors GitHub's codecov/patch check locally
- Skips gracefully when no source files changed
- Integrated into pre-push workflow

This helps catch patch coverage issues before pushing to GitHub.
@DecisionNerd DecisionNerd merged commit a2f540a into main Feb 3, 2026
19 checks passed
@DecisionNerd DecisionNerd deleted the fix/with-clause-variable-passing branch February 3, 2026 21:24
DecisionNerd added a commit that referenced this pull request Feb 4, 2026
Version Bump:
- Bump version to 0.2.1 in pyproject.toml, __init__.py, and uv.lock
- Add comprehensive v0.2.1 changelog entry

Documentation Updates:
- Update README.md with dataset loading examples and quickstart
- Add "Load Real-World Datasets" section to main README
- Update docs/index.md with dataset features and examples
- Complete rewrite of docs/datasets/snap.md:
  - Mark as available in v0.2.1 (5 datasets)
  - Add detailed dataset table with stats
  - Add comprehensive usage examples and query patterns
  - Document download/caching behavior
  - Add performance tips for large datasets
- Update docs/datasets/overview.md:
  - Reorganize to show SNAP as "Available Now"
  - Mark other sources as "Coming Soon"
  - List all 5 available SNAP datasets
- Update docs/getting-started/quickstart.md:
  - Add "Load a Dataset" section with examples
  - Add dataset browsing examples
  - Update navigation links

Release Contents (v0.2.1):
- Dataset loading infrastructure with caching (#68)
- CSV loader for edge-list datasets (#69)
- 5 SNAP datasets available
- MERGE ON CREATE SET syntax (#65)
- MERGE ON MATCH SET syntax (#66)
- WITH clause variable passing fix (#67)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
DecisionNerd added a commit that referenced this pull request Feb 4, 2026
* docs: prepare v0.2.1 release with dataset documentation

Version Bump:
- Bump version to 0.2.1 in pyproject.toml, __init__.py, and uv.lock
- Add comprehensive v0.2.1 changelog entry

Documentation Updates:
- Update README.md with dataset loading examples and quickstart
- Add "Load Real-World Datasets" section to main README
- Update docs/index.md with dataset features and examples
- Complete rewrite of docs/datasets/snap.md:
  - Mark as available in v0.2.1 (5 datasets)
  - Add detailed dataset table with stats
  - Add comprehensive usage examples and query patterns
  - Document download/caching behavior
  - Add performance tips for large datasets
- Update docs/datasets/overview.md:
  - Reorganize to show SNAP as "Available Now"
  - Mark other sources as "Coming Soon"
  - List all 5 available SNAP datasets
- Update docs/getting-started/quickstart.md:
  - Add "Load a Dataset" section with examples
  - Add dataset browsing examples
  - Update navigation links

Release Contents (v0.2.1):
- Dataset loading infrastructure with caching (#68)
- CSV loader for edge-list datasets (#69)
- 5 SNAP datasets available
- MERGE ON CREATE SET syntax (#65)
- MERGE ON MATCH SET syntax (#66)
- WITH clause variable passing fix (#67)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: update uv.lock after version bump

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request executor Changes to query executor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix WITH clause variable passing in aggregation queries

1 participant