fix: WITH clause variable passing in aggregation queries by DecisionNerd · Pull Request #67 · DecisionNerd/graphforge

DecisionNerd · 2026-02-03T21:02:48Z

Problem

Queries using WITH clause with aggregation failed with KeyError when trying to access grouping variables in subsequent clauses. This prevented common aggregation patterns from working.

Example Query That Failed

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
ORDER BY movie_count DESC

Error: KeyError: 'p' - The variable p from the WITH clause was not accessible in the RETURN clause.

Root Cause

The executor's _execute_aggregate method had two issues when binding grouping variables for WITH clauses:

Object equality comparison: Used item.expression == expr which doesn't work for comparing AST nodes (two Variable objects with the same name are different objects).
Variable name extraction: When no alias was provided, fell back to f"col_{i}" instead of extracting the variable name from Variable expressions.

Solution

1. Added Expression Comparison Helper

Added _expressions_match() method to semantically compare AST expressions:

✅ Compares Variable nodes by name
✅ Compares PropertyAccess nodes by variable and property
✅ Compares Literal nodes by value
✅ Compares FunctionCall nodes by name and arguments
✅ Handles all expression types properly

2. Fixed Variable Binding Logic

Updated grouping variable binding to:

✅ Use semantic expression matching instead of object equality
✅ Extract variable name from Variable expressions when no alias provided
✅ Only fall back to col_{i} for complex expressions without aliases

Changes

Core Fix

File: src/graphforge/executor/executor.py

Added _expressions_match() helper method (45 lines)
Updated _execute_aggregate() to use semantic matching
Fixed variable name extraction for grouping expressions

Testing

✅ Test Coverage

Category	Tests	Status
Unit Tests	11	✅ All passing
Integration Tests	8	✅ All passing
Total	19	✅ All passing

📊 Test Categories

Unit Tests:

Single and multiple grouping variables
Aggregation with filtering (WHERE after WITH)
Various aggregation functions (COUNT, SUM, AVG, MIN, MAX, COLLECT)
Edge cases (no results, only aggregates, chained WITH)
Property access in aggregation

Integration Tests:

Top N by count pattern
Multi-level aggregation
Graph analytics patterns (citation networks)
Time series aggregation
Complex WITH patterns with DISTINCT, LIMIT
Nested property access

Examples

Before (Failed) ❌

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
-- KeyError: 'p'

After (Works) ✅

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
RETURN p.name as actor, movie_count
ORDER BY movie_count DESC
-- Returns: Alice: 2 movies, Bob: 1 movies

Additional Patterns Now Working

Multiple Grouping Variables

MATCH (p:Person)
WITH p.age as age, count(p) as person_count
RETURN age, person_count

Aggregation with Post-Filter

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as movie_count
WHERE movie_count > 1
RETURN p.name, movie_count

Chained Aggregations

MATCH (e:Employee)
WITH e.dept as dept, avg(e.salary) as avg_salary
WITH dept, avg_salary
WHERE avg_salary > 100000
RETURN dept, avg_salary

Graph Analytics Pattern

MATCH (p:Paper)<-[:CITES]-(citing)
WITH p, count(citing) as citation_count
RETURN p.title, citation_count
ORDER BY citation_count DESC

Verification

✅ All 19 new tests pass
✅ All 1037 total tests pass (backward compatibility verified)
✅ 95.42% coverage (meets threshold)
✅ Issue reproduction test case now works
✅ No regressions in existing functionality

Impact

Fixes a HIGH priority bug that blocked common graph analytics workflows:

✅ Top N by count queries (find most connected nodes)
✅ Aggregation with post-filter (filter aggregated results)
✅ Multi-level aggregations (aggregate then filter, then aggregate again)
✅ Graph analytics patterns (PageRank-like queries, citation counts)
✅ Time series analysis (aggregate by time buckets)

This is critical for:

Dataset integration and real-world query patterns
Basic graph analytics workflows
Common aggregation use cases

Summary by CodeRabbit

Improvements
- Enhanced aggregation query handling to recognize semantically equivalent expressions in WITH clauses, enabling more flexible aggregation patterns.
Tests
- Added comprehensive test coverage for aggregation functions (COUNT, SUM, AVG, MIN, MAX, COLLECT), WITH clause patterns, multi-level aggregations, and edge cases.

## Problem Queries using WITH clause with aggregation failed with `KeyError` when trying to access grouping variables in subsequent clauses. This prevented common aggregation patterns from working. ### Example Query That Failed ```cypher MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WITH p, count(m) as movie_count RETURN p.name as actor, movie_count ORDER BY movie_count DESC ``` Error: `KeyError: 'p'` - The variable `p` from the WITH clause was not accessible in the RETURN clause. ## Root Cause The executor's `_execute_aggregate` method had two issues when binding grouping variables for WITH clauses: 1. **Object equality comparison**: Used `item.expression == expr` which doesn't work for comparing AST nodes (two Variable objects with the same name are different objects). 2. **Variable name extraction**: When no alias was provided, fell back to `f"col_{i}"` instead of extracting the variable name from Variable expressions. ## Solution ### 1. Added Expression Comparison Helper Added `_expressions_match()` method to semantically compare AST expressions: - Compares Variable nodes by name - Compares PropertyAccess nodes by variable and property - Compares Literal nodes by value - Compares FunctionCall nodes by name and arguments - Handles all expression types properly ### 2. Fixed Variable Binding Logic Updated grouping variable binding to: - Use semantic expression matching instead of object equality - Extract variable name from Variable expressions when no alias provided - Only fall back to `col_{i}` for complex expressions without aliases ## Changes ### Core Fix - **File**: `src/graphforge/executor/executor.py` - Added `_expressions_match()` helper method - Updated `_execute_aggregate()` to use semantic matching - Fixed variable name extraction for grouping expressions ## Testing ### Unit Tests (11 tests) - Single and multiple grouping variables - Aggregation with filtering (WHERE after WITH) - Various aggregation functions (COUNT, SUM, AVG, MIN, MAX, COLLECT) - Edge cases (no results, only aggregates, chained WITH) ### Integration Tests (8 tests) - Top N by count pattern - Multi-level aggregation - Graph analytics patterns - Time series aggregation - Complex WITH patterns with DISTINCT, LIMIT - Nested property access **Total: 19 new tests, all passing** ## Examples ### Before (Failed) ```cypher MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WITH p, count(m) as movie_count RETURN p.name as actor, movie_count -- KeyError: 'p' ``` ### After (Works) ```cypher MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WITH p, count(m) as movie_count RETURN p.name as actor, movie_count ORDER BY movie_count DESC -- Returns: Alice: 2 movies, Bob: 1 movies ``` ### Additional Patterns Now Working ```cypher -- Multiple grouping variables MATCH (p:Person) WITH p.age as age, count(p) as person_count RETURN age, person_count -- Aggregation with filtering MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WITH p, count(m) as movie_count WHERE movie_count > 1 RETURN p.name, movie_count -- Chained aggregations MATCH (e:Employee) WITH e.dept as dept, avg(e.salary) as avg_salary WITH dept, avg_salary WHERE avg_salary > 100000 RETURN dept, avg_salary ``` ## Verification - ✅ All 19 new tests pass - ✅ All 1037 total tests pass (backward compatibility verified) - ✅ 95.42% coverage (meets threshold) - ✅ Issue reproduction test case now works - ✅ No regressions in existing functionality ## Impact Fixes a **HIGH priority bug** that blocked common graph analytics workflows: - Top N by count queries - Aggregation with post-filter - Multi-level aggregations - Graph analytics patterns This is critical for dataset integration and real-world query patterns. ## Related - Discovered during dataset integration testing (v0.2.1) - Closes #60 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai · 2026-02-03T21:02:56Z

Warning

Rate limit exceeded

@DecisionNerd has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 40 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

This PR introduces semantic expression matching to the query executor's WITH clause aggregation handling. A new _expressions_match helper method enables recognition of structurally different but semantically equivalent AST expressions, applied across multiple aggregation execution paths. Comprehensive unit and integration tests validate the implementation.

Changes

Cohort / File(s)	Summary
Core Logic `src/graphforge/executor/executor.py`	Added `_expressions_match()` helper method for semantic comparison of AST expressions (Variable, PropertyAccess, Literal, FunctionCall). Replaced direct equality checks with semantic matching in `_execute_aggregate`, `_compute_aggregates_for_group`, and WITH aggregation binding paths.
Unit Tests `tests/unit/executor/test_expressions_match.py`, `tests/unit/executor/test_expression_matching.py`, `tests/unit/executor/test_with_aggregation.py`	Three test modules covering: (1) `_expressions_match` helper across all expression types and nested comparisons, (2) expression matching within aggregation and ordering contexts, (3) WITH clause aggregation behavior including grouping, filtering, multiple aggregate functions, and edge cases.
Integration Tests `tests/integration/test_with_aggregation_real.py`	Real-world aggregation test suite covering Top-N patterns, post-aggregation filtering, chained aggregations, graph analytics, time-series aggregations, DISTINCT with aggregation, and nested property access scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

test: improve test coverage to 93.76% (+4.94%) #37: Adds extensive aggregation and WITH clause tests that validate the same executor behavior being fixed by this PR's semantic expression matching.

Suggested labels

tests, release:patch

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: WITH clause variable passing in aggregation queries' accurately and clearly summarizes the main bug fix without vagueness or off-topic references.
Description check	✅ Passed	The PR description is comprehensive and well-structured, covering problem statement, root cause analysis, solution details, changes made, testing evidence, verification results, and impact assessment.
Linked Issues check	✅ Passed	The code changes in executor.py and comprehensive test suites directly address all requirements from issue `#60`: semantic expression matching, variable binding in WITH clauses, and support for common aggregation patterns.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing the WITH clause aggregation issue: core fix in executor.py, unit tests for expression matching, and integration tests for aggregation patterns.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/with-clause-variable-passing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-03T21:05:06Z

Codecov Report

❌ Patch coverage is 84.61538% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.89%. Comparing base (b2a9897) to head (b3ea839).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #67      +/-   ##
==========================================
+ Coverage   93.82%   93.89%   +0.06%     
==========================================
  Files          15       15              
  Lines        2009     2032      +23     
  Branches      501      511      +10     
==========================================
+ Hits         1885     1908      +23     
  Misses         44       44              
  Partials       80       80

Flag	Coverage Δ
full-coverage	`93.89% <84.61%> (+0.06%)`	⬆️
unittests	`73.52% <80.76%> (+10.40%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
parser	`95.31% <ø> (ø)`
planner	`94.20% <ø> (ø)`
executor	`89.87% <84.61%> (+0.26%)`	⬆️
storage	`99.50% <ø> (ø)`
ast	`100.00% <ø> (ø)`
types	`98.42% <ø> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2a9897...b3ea839. Read the comment docs.

- Add 16 unit tests in test_expressions_match.py directly testing the helper - Tests cover all expression types: Variable, PropertyAccess, Literal, FunctionCall - Tests cover edge cases: different names, properties, arg counts - Tests ensure case-insensitive function names work correctly - Improves executor.py coverage from 54% to 92.43% (targeting patch coverage) Related to #60

- Add check-patch-coverage target to Makefile - Checks coverage of changed files only (compared to origin/main) - Requires 90% coverage for modified source files - Mirrors GitHub's codecov/patch check locally - Skips gracefully when no source files changed - Integrated into pre-push workflow This helps catch patch coverage issues before pushing to GitHub.

Version Bump: - Bump version to 0.2.1 in pyproject.toml, __init__.py, and uv.lock - Add comprehensive v0.2.1 changelog entry Documentation Updates: - Update README.md with dataset loading examples and quickstart - Add "Load Real-World Datasets" section to main README - Update docs/index.md with dataset features and examples - Complete rewrite of docs/datasets/snap.md: - Mark as available in v0.2.1 (5 datasets) - Add detailed dataset table with stats - Add comprehensive usage examples and query patterns - Document download/caching behavior - Add performance tips for large datasets - Update docs/datasets/overview.md: - Reorganize to show SNAP as "Available Now" - Mark other sources as "Coming Soon" - List all 5 available SNAP datasets - Update docs/getting-started/quickstart.md: - Add "Load a Dataset" section with examples - Add dataset browsing examples - Update navigation links Release Contents (v0.2.1): - Dataset loading infrastructure with caching (#68) - CSV loader for edge-list datasets (#69) - 5 SNAP datasets available - MERGE ON CREATE SET syntax (#65) - MERGE ON MATCH SET syntax (#66) - WITH clause variable passing fix (#67) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: prepare v0.2.1 release with dataset documentation Version Bump: - Bump version to 0.2.1 in pyproject.toml, __init__.py, and uv.lock - Add comprehensive v0.2.1 changelog entry Documentation Updates: - Update README.md with dataset loading examples and quickstart - Add "Load Real-World Datasets" section to main README - Update docs/index.md with dataset features and examples - Complete rewrite of docs/datasets/snap.md: - Mark as available in v0.2.1 (5 datasets) - Add detailed dataset table with stats - Add comprehensive usage examples and query patterns - Document download/caching behavior - Add performance tips for large datasets - Update docs/datasets/overview.md: - Reorganize to show SNAP as "Available Now" - Mark other sources as "Coming Soon" - List all 5 available SNAP datasets - Update docs/getting-started/quickstart.md: - Add "Load a Dataset" section with examples - Add dataset browsing examples - Update navigation links Release Contents (v0.2.1): - Dataset loading infrastructure with caching (#68) - CSV loader for edge-list datasets (#69) - 5 SNAP datasets available - MERGE ON CREATE SET syntax (#65) - MERGE ON MATCH SET syntax (#66) - WITH clause variable passing fix (#67) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: update uv.lock after version bump --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

DecisionNerd added bug Something isn't working enhancement New feature or request executor Changes to query executor labels Feb 3, 2026

DecisionNerd added 2 commits February 3, 2026 14:14

DecisionNerd merged commit a2f540a into main Feb 3, 2026
19 checks passed

DecisionNerd deleted the fix/with-clause-variable-passing branch February 3, 2026 21:24

DecisionNerd mentioned this pull request Feb 4, 2026

docs: prepare v0.2.1 release with dataset documentation #71

Merged

coderabbitai Bot mentioned this pull request Feb 8, 2026

feat: implement v0.3.0 features - 29% TCK coverage achieved #104

Merged

coderabbitai Bot mentioned this pull request Feb 16, 2026

feat: implement aggregate pushdown in query optimizer #185

Merged

coderabbitai Bot mentioned this pull request Mar 30, 2026

fix: aggregation in composite expressions and ORDER BY (#328) #329

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: WITH clause variable passing in aggregation queries#67

fix: WITH clause variable passing in aggregation queries#67
DecisionNerd merged 3 commits into
mainfrom
fix/with-clause-variable-passing

DecisionNerd commented Feb 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Feb 3, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

codecov Bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DecisionNerd commented Feb 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Example Query That Failed

Root Cause

Solution

1. Added Expression Comparison Helper

2. Fixed Variable Binding Logic

Changes

Core Fix

Testing

✅ Test Coverage

📊 Test Categories

Examples

Before (Failed) ❌

After (Works) ✅

Additional Patterns Now Working

Multiple Grouping Variables

Aggregation with Post-Filter

Chained Aggregations

Graph Analytics Pattern

Verification

Impact

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

codecov Bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DecisionNerd commented Feb 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 3, 2026 •

edited

Loading

codecov Bot commented Feb 3, 2026 •

edited

Loading