Skip to content

Add comprehensive WITH clause integration tests#5

Merged
DecisionNerd merged 1 commit into
mainfrom
feature/with-clause-integration-tests
Feb 1, 2026
Merged

Add comprehensive WITH clause integration tests#5
DecisionNerd merged 1 commit into
mainfrom
feature/with-clause-integration-tests

Conversation

@DecisionNerd
Copy link
Copy Markdown
Owner

@DecisionNerd DecisionNerd commented Feb 1, 2026

Summary

Adds 17 comprehensive integration tests for the WITH clause, establishing a test baseline and exposing implementation gaps.

Motivation

The WITH clause was implemented in v0.1.1 but has zero integration tests. This PR adds comprehensive test coverage to:

  1. Document expected behavior
  2. Identify bugs and missing features
  3. Establish a baseline for future WITH clause work

Test Coverage

✅ Test Categories (17 tests total)

  • Basic projection (3 tests)

    • Simple variable projection
    • Variable renaming with AS
    • Variable scoping
  • Filtering (2 tests)

    • WHERE after WITH
    • Filtering on computed values
  • Aggregation (2 tests)

    • COUNT with GROUP BY
    • Filtering after aggregation
  • Ordering (2 tests)

    • ORDER BY on intermediate results
    • ORDER BY + LIMIT
  • Pagination (3 tests)

    • LIMIT only
    • SKIP only
    • SKIP + LIMIT pagination
  • Chaining (2 tests)

    • Multiple WITH clauses
    • WITH between MATCH clauses
  • Edge cases (3 tests)

    • Empty result sets
    • Null value preservation
    • DISTINCT functionality

Known Issues Revealed

These tests intentionally expose bugs in the current implementation:

🐛 15/17 tests currently fail:

  1. Column alias bug - Results use col_0, col_1 instead of actual aliases

    • Expected: {"name": "Alice"}
    • Actual: {"col_0": "Alice"}
    • Same regression that was just fixed for RETURN clause
  2. DISTINCT not supported - Parser doesn't recognize DISTINCT in WITH

    WITH DISTINCT p.city AS city  -- Parser error
  3. Aggregation functions fail - count(*), sum(), etc. cause errors

    TypeError: Cannot evaluate expression type: FunctionCall
    
  4. Parser errors with complex patterns - Multiple MATCH+WITH+CREATE fails

    MATCH (p:Person)
    WITH p
    CREATE (p)-[:KNOWS]->(q)  -- Parser error

Value

This PR provides:

  • ✅ Comprehensive test documentation for WITH clause behavior
  • ✅ Clear baseline showing what works vs. what doesn't
  • ✅ Actionable bug list for future work
  • ✅ Safety net for refactoring and fixes

Test Results

17 tests collected
2 passed (12%)
15 failed (88%)

Expected behavior - These failures document real bugs that need fixing.

Next Steps

After this PR merges, follow-up work will:

  1. Fix column alias preservation in WITH clause
  2. Add DISTINCT support to WITH
  3. Fix aggregation function evaluation in WITH
  4. Improve parser to handle complex multi-part queries

Checklist

  • Tests added for new functionality
  • CHANGELOG updated
  • Tests run locally (intentionally failing to expose bugs)
  • Commit message follows conventions
  • Small, focused PR (<300 lines)

Note: This PR is intended to document and expose bugs, not fix them. The test failures are expected and valuable.

Summary by CodeRabbit

  • Tests
    • Added comprehensive integration test suite for WITH clause functionality, covering projection, filtering, aggregation, ordering, pagination, chaining, and edge cases.

✏️ Tip: You can customize this high-level summary in your review settings.

Add 17 integration tests covering all WITH clause features:

**Test Coverage:**
- Basic projection and variable renaming (3 tests)
- WHERE filtering on intermediate results (2 tests)
- Aggregation with GROUP BY semantics (2 tests)
- ORDER BY with sorting (2 tests)
- SKIP/LIMIT pagination (3 tests)
- Multi-part query chaining (2 tests)
- Edge cases (empty results, nulls, DISTINCT) (3 tests)

**Known Issues Revealed:**
These tests expose several bugs in the current implementation:
1. Column aliases not preserved (returns 'col_0' instead of 'name')
2. DISTINCT keyword not supported in WITH clause
3. Aggregation functions (count, sum, etc.) fail in WITH
4. Parser errors with multiple MATCH+WITH+CREATE patterns

15/17 tests currently fail, documenting gaps that need to be fixed.
This establishes a comprehensive test baseline for WITH clause work.

Closes #N/A (initial test baseline)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 1, 2026

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'python', 'ignore', 'tools', 'file_types', 'comments', 'thresholds', 'integrations', 'notifications'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Walkthrough

Adds a comprehensive integration test suite for the WITH clause in GraphForge, covering basic behavior, filtering, aggregation, ordering, pagination, chaining, and edge cases across multiple test classes and scenarios.

Changes

Cohort / File(s) Summary
WITH Clause Integration Tests
tests/integration/test_with_clause.py
New test module with multiple test classes validating WITH clause functionality including projection, renaming, filtering, aggregation, ordering, pagination, chaining across multiple WITH clauses, and edge cases like empty results and null value preservation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding comprehensive integration tests for the WITH clause.
Description check ✅ Passed The description is comprehensive and well-structured, covering motivation, test categories, known issues, and next steps, though it doesn't follow the exact template format.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/with-clause-integration-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DecisionNerd DecisionNerd added the release:patch Bug fix, requires patch version bump label Feb 1, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@tests/integration/test_with_clause.py`:
- Around line 18-40: The test creates a GraphForge and calls db.close() manually
which can leak resources if assertions fail; refactor to use a pytest fixture
(e.g., db) that constructs GraphForge(tmp_path / f"test_{uuid.uuid4().hex}.db"),
yields the instance, and calls database.close() after yield, then update
test_with_simple_projection to accept that db fixture and remove the explicit
GraphForge(...) and db.close() calls so cleanup always runs; add the fixture in
conftest.py or the module to centralize tmp_path usage and avoid hardcoded DB
names.
🧹 Nitpick comments (2)
tests/integration/test_with_clause.py (2)

245-246: Avoid f-string interpolation in execute() calls.

While safe here with loop counters, using f-strings in db.execute() demonstrates a pattern that could lead to injection vulnerabilities if used with external input. Consider using parameterized queries or batch operations instead.

♻️ Alternative approach
-for i in range(5):
-    db.execute(f"CREATE (p:Person {{id: {i}}})")
+for i in range(5):
+    db.create_node(labels=["Person"], id=i)

Or if the query syntax must be tested:

# At minimum, add a comment explaining why f-strings are safe here
for i in range(5):
    # Safe: i is an integer from range(), not external input
    db.execute(f"CREATE (p:Person {{id: {i}}})")

1-8: Consider adding xfail markers for expected failures.

The PR objectives document that 15 of 17 tests are expected to fail due to known bugs (alias preservation, DISTINCT support, aggregation evaluation, parser issues). Adding @pytest.mark.xfail decorators would:

  1. Clearly document which failures are expected vs. new regressions
  2. Allow the test suite to pass while still running these tests
  3. Automatically detect when bugs are fixed (tests will fail with "XPASS")
📝 Example of documenting expected failures
class TestWithBasics:
    """Test basic WITH clause functionality."""

    `@pytest.mark.xfail`(reason="Known bug: aliases returned as col_0 instead of provided aliases")
    def test_with_simple_projection(self, tmp_path):
        """WITH should pass through selected variables."""
        # ...
    
    `@pytest.mark.xfail`(reason="Known bug: alias preservation")
    def test_with_renaming_variables(self, tmp_path):
        """WITH should allow renaming variables."""
        # ...

class TestWithAggregation:
    """Test WITH clause with aggregation."""

    `@pytest.mark.xfail`(reason="Known bug: Cannot evaluate FunctionCall expressions")
    def test_with_count_aggregation(self, tmp_path):
        """WITH should support count aggregation."""
        # ...

This provides clear documentation without changing test assertions.

Comment on lines +18 to +40
def test_with_simple_projection(self, tmp_path):
"""WITH should pass through selected variables."""
db = GraphForge(tmp_path / "test.db")

# Create test data
db.execute("CREATE (p:Person {name: 'Alice', age: 30})")
db.execute("CREATE (p:Person {name: 'Bob', age: 25})")

# Query with WITH
results = db.execute("""
MATCH (p:Person)
WITH p.name AS name, p.age AS age
RETURN name, age
ORDER BY name
""")

assert len(results) == 2
assert results[0]["name"].value == "Alice"
assert results[0]["age"].value == 30
assert results[1]["name"].value == "Bob"
assert results[1]["age"].value == 25

db.close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use fixtures or context managers for resource cleanup.

The pattern of calling db.close() at the end of each test has a critical flaw: if any assertion fails, close() won't be executed, potentially leaving resources open or causing state pollution between tests.

♻️ Recommended fix using pytest fixture

Create a fixture in conftest.py or at the module level:

`@pytest.fixture`
def db(tmp_path):
    """Provide a GraphForge instance with automatic cleanup."""
    database = GraphForge(tmp_path / f"test_{uuid.uuid4().hex}.db")
    yield database
    database.close()

Then update tests to use the fixture:

-def test_with_simple_projection(self, tmp_path):
+def test_with_simple_projection(self, db):
     """WITH should pass through selected variables."""
-    db = GraphForge(tmp_path / "test.db")
-
     # Create test data
     db.execute("CREATE (p:Person {name: 'Alice', age: 30})")
     ...
-    
-    db.close()

This ensures cleanup happens even when assertions fail and avoids conflicts from hardcoded database names.

🧰 Tools
🪛 GitHub Actions: Test Suite

[error] 35-35: KeyError: 'name' in test_with_simple_projection results

🤖 Prompt for AI Agents
In `@tests/integration/test_with_clause.py` around lines 18 - 40, The test creates
a GraphForge and calls db.close() manually which can leak resources if
assertions fail; refactor to use a pytest fixture (e.g., db) that constructs
GraphForge(tmp_path / f"test_{uuid.uuid4().hex}.db"), yields the instance, and
calls database.close() after yield, then update test_with_simple_projection to
accept that db fixture and remove the explicit GraphForge(...) and db.close()
calls so cleanup always runs; add the fixture in conftest.py or the module to
centralize tmp_path usage and avoid hardcoded DB names.

Comment on lines +107 to +130
def test_with_where_on_computed_value(self, tmp_path):
"""WITH should allow filtering on computed values."""
db = GraphForge(tmp_path / "test.db")

db.execute("CREATE (p:Person {name: 'Alice', score: 95})")
db.execute("CREATE (p:Person {name: 'Bob', score: 75})")
db.execute("CREATE (p:Person {name: 'Charlie', score: 85})")

# Compute value, then filter
results = db.execute("""
MATCH (p:Person)
WITH p.name AS name, p.score AS score
WHERE score >= 85
RETURN name, score
ORDER BY score DESC
""")

assert len(results) == 2
assert results[0]["name"].value == "Alice"
assert results[0]["score"].value == 95
assert results[1]["name"].value == "Charlie"
assert results[1]["score"].value == 85

db.close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test name doesn't match implementation.

The test is named test_with_where_on_computed_value and the docstring claims it tests "filtering on computed values," but the query doesn't compute anything—it just passes through p.score AS score. A computed value would involve an expression like p.score * 2 or p.score + p.bonus.

🔧 Proposed fix options

Option 1: Rename the test to match current behavior

-def test_with_where_on_computed_value(self, tmp_path):
-    """WITH should allow filtering on computed values."""
+def test_with_where_on_projected_value(self, tmp_path):
+    """WITH should allow filtering on projected property values."""

Option 2: Update the query to actually compute a value

     results = db.execute("""
         MATCH (p:Person)
-        WITH p.name AS name, p.score AS score
-        WHERE score >= 85
-        RETURN name, score
-        ORDER BY score DESC
+        WITH p.name AS name, p.score * 2 AS doubled_score
+        WHERE doubled_score >= 170
+        RETURN name, doubled_score
+        ORDER BY doubled_score DESC
     """)
     
     assert len(results) == 2
     assert results[0]["name"].value == "Alice"
-    assert results[0]["score"].value == 95
+    assert results[0]["doubled_score"].value == 190
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_with_where_on_computed_value(self, tmp_path):
"""WITH should allow filtering on computed values."""
db = GraphForge(tmp_path / "test.db")
db.execute("CREATE (p:Person {name: 'Alice', score: 95})")
db.execute("CREATE (p:Person {name: 'Bob', score: 75})")
db.execute("CREATE (p:Person {name: 'Charlie', score: 85})")
# Compute value, then filter
results = db.execute("""
MATCH (p:Person)
WITH p.name AS name, p.score AS score
WHERE score >= 85
RETURN name, score
ORDER BY score DESC
""")
assert len(results) == 2
assert results[0]["name"].value == "Alice"
assert results[0]["score"].value == 95
assert results[1]["name"].value == "Charlie"
assert results[1]["score"].value == 85
db.close()
def test_with_where_on_projected_value(self, tmp_path):
"""WITH should allow filtering on projected property values."""
db = GraphForge(tmp_path / "test.db")
db.execute("CREATE (p:Person {name: 'Alice', score: 95})")
db.execute("CREATE (p:Person {name: 'Bob', score: 75})")
db.execute("CREATE (p:Person {name: 'Charlie', score: 85})")
# Compute value, then filter
results = db.execute("""
MATCH (p:Person)
WITH p.name AS name, p.score AS score
WHERE score >= 85
RETURN name, score
ORDER BY score DESC
""")
assert len(results) == 2
assert results[0]["name"].value == "Alice"
assert results[0]["score"].value == 95
assert results[1]["name"].value == "Charlie"
assert results[1]["score"].value == 85
db.close()
🧰 Tools
🪛 GitHub Actions: Test Suite

[error] 125-125: KeyError: 'name' in test_with_where_on_computed_value results

@DecisionNerd DecisionNerd merged commit 1fd1978 into main Feb 1, 2026
9 of 23 checks passed
@github-actions github-actions Bot added the release:pending Merged but not yet released label Feb 1, 2026
@DecisionNerd DecisionNerd deleted the feature/with-clause-integration-tests branch February 1, 2026 05:55
DecisionNerd added a commit that referenced this pull request Feb 8, 2026
…TCH (#103)

**Phase 1 (Tasks 1-3) completed: Foundation for v0.3.0 TCK features**

## Changes

### 1. Tree-Based Operator Structure (Task #1)
- Add Union operator for UNION/UNION ALL query combination
- Add Subquery operator for EXISTS/COUNT subquery expressions
- Add OptionalExpandEdges operator for left outer join semantics
- Update executor to handle nested operator dispatch
- Maintain backward compatibility with flat operator lists

### 2. Left Outer Join Primitive (Task #2)
- Implement OptionalExpandEdges operator
- Add _execute_optional_expand() method with NULL preservation
- Handle left join semantics: preserve rows with NULL when no matches

### 3. OPTIONAL MATCH Grammar & AST (Task #3)
- Add OptionalMatchClause to ast/clause.py
- Add OPTIONAL MATCH grammar rule to cypher.lark
- Add parser transformer for optional_match_clause
- Update planner to convert OptionalMatchClause → OptionalExpandEdges
- Add _plan_optional_match() method to QueryPlanner

## Testing
- Add 6 integration tests for OPTIONAL MATCH
- 4/6 tests passing (2 require IS NULL support - Task #5)
- All existing 1736 tests still pass
- Coverage: 91.96% (above 85% threshold)

## Architecture Impact
- Executor now supports nested operator trees (enables UNION, subqueries)
- Planner can emit OptionalExpandEdges alongside ExpandEdges
- Parser recognizes OPTIONAL MATCH in query grammar
- Foundation ready for Phase 2 features (list comprehensions, EXISTS)

## Next Steps (Task #5)
- Add IS NULL / IS NOT NULL expression support
- Handle NULL property access gracefully
- Complete NULL propagation in WHERE/aggregation
- Target: ~150 additional TCK scenarios from OPTIONAL MATCH

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
DecisionNerd added a commit that referenced this pull request Feb 8, 2026
**Task #5 completed: Full OPTIONAL MATCH end-to-end implementation**

## Changes

### 1. IS NULL / IS NOT NULL Support
- Add `IS NULL` and `IS NOT NULL` grammar rules to cypher.lark
- Represent as UnaryOp in AST (distinct from `= NULL`)
- Implement IS NULL evaluation in evaluator (returns true/false, not NULL)
- Handle proper NULL comparison semantics:
  - `x IS NULL` → true if x is NULL (not NULL itself)
  - `x = NULL` → NULL (standard ternary logic)

### 2. NULL Property Access
- Handle property access on NULL values gracefully
- Return NULL when accessing property on NULL (e.g., `f.name` when `f` is NULL)
- Prevents TypeError in OPTIONAL MATCH scenarios

### 3. Test Improvements
- Fix test expectations for NULL handling
- Add comprehensive IS NULL / IS NOT NULL tests
- All 6 OPTIONAL MATCH integration tests pass
- All 731 unit + integration tests pass

## NULL Semantics

**IS NULL (unary operator)**:
- `NULL IS NULL` → true
- `5 IS NULL` → false
- Returns CypherBool, never NULL

**= NULL (binary operator)**:
- `NULL = NULL` → NULL (ternary logic)
- `5 = NULL` → NULL
- Returns NULL for any NULL operand

**Property Access**:
- `node.property` when node is NULL → NULL
- `node.missing_property` → NULL

## Testing
- 731 tests pass (725 unit, 6 new OPTIONAL MATCH)
- 13 skipped (existing grammar limitations)
- Coverage: 91.96%

## Impact
- OPTIONAL MATCH fully functional with NULL handling
- IS NULL / IS NOT NULL work in WHERE clauses
- NULL values propagate correctly through expressions
- Ready for Phase 2 features (UNION, list comprehensions)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
DecisionNerd added a commit that referenced this pull request Feb 9, 2026
* feat: implement v0.3.0 foundation with tree operators and OPTIONAL MATCH (#103)

**Phase 1 (Tasks 1-3) completed: Foundation for v0.3.0 TCK features**

## Changes

### 1. Tree-Based Operator Structure (Task #1)
- Add Union operator for UNION/UNION ALL query combination
- Add Subquery operator for EXISTS/COUNT subquery expressions
- Add OptionalExpandEdges operator for left outer join semantics
- Update executor to handle nested operator dispatch
- Maintain backward compatibility with flat operator lists

### 2. Left Outer Join Primitive (Task #2)
- Implement OptionalExpandEdges operator
- Add _execute_optional_expand() method with NULL preservation
- Handle left join semantics: preserve rows with NULL when no matches

### 3. OPTIONAL MATCH Grammar & AST (Task #3)
- Add OptionalMatchClause to ast/clause.py
- Add OPTIONAL MATCH grammar rule to cypher.lark
- Add parser transformer for optional_match_clause
- Update planner to convert OptionalMatchClause → OptionalExpandEdges
- Add _plan_optional_match() method to QueryPlanner

## Testing
- Add 6 integration tests for OPTIONAL MATCH
- 4/6 tests passing (2 require IS NULL support - Task #5)
- All existing 1736 tests still pass
- Coverage: 91.96% (above 85% threshold)

## Architecture Impact
- Executor now supports nested operator trees (enables UNION, subqueries)
- Planner can emit OptionalExpandEdges alongside ExpandEdges
- Parser recognizes OPTIONAL MATCH in query grammar
- Foundation ready for Phase 2 features (list comprehensions, EXISTS)

## Next Steps (Task #5)
- Add IS NULL / IS NOT NULL expression support
- Handle NULL property access gracefully
- Complete NULL propagation in WHERE/aggregation
- Target: ~150 additional TCK scenarios from OPTIONAL MATCH

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: complete OPTIONAL MATCH with IS NULL support (#103)

**Task #5 completed: Full OPTIONAL MATCH end-to-end implementation**

## Changes

### 1. IS NULL / IS NOT NULL Support
- Add `IS NULL` and `IS NOT NULL` grammar rules to cypher.lark
- Represent as UnaryOp in AST (distinct from `= NULL`)
- Implement IS NULL evaluation in evaluator (returns true/false, not NULL)
- Handle proper NULL comparison semantics:
  - `x IS NULL` → true if x is NULL (not NULL itself)
  - `x = NULL` → NULL (standard ternary logic)

### 2. NULL Property Access
- Handle property access on NULL values gracefully
- Return NULL when accessing property on NULL (e.g., `f.name` when `f` is NULL)
- Prevents TypeError in OPTIONAL MATCH scenarios

### 3. Test Improvements
- Fix test expectations for NULL handling
- Add comprehensive IS NULL / IS NOT NULL tests
- All 6 OPTIONAL MATCH integration tests pass
- All 731 unit + integration tests pass

## NULL Semantics

**IS NULL (unary operator)**:
- `NULL IS NULL` → true
- `5 IS NULL` → false
- Returns CypherBool, never NULL

**= NULL (binary operator)**:
- `NULL = NULL` → NULL (ternary logic)
- `5 = NULL` → NULL
- Returns NULL for any NULL operand

**Property Access**:
- `node.property` when node is NULL → NULL
- `node.missing_property` → NULL

## Testing
- 731 tests pass (725 unit, 6 new OPTIONAL MATCH)
- 13 skipped (existing grammar limitations)
- Coverage: 91.96%

## Impact
- OPTIONAL MATCH fully functional with NULL handling
- IS NULL / IS NOT NULL work in WHERE clauses
- NULL values propagate correctly through expressions
- Ready for Phase 2 features (UNION, list comprehensions)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: implement UNION and UNION ALL operators (#103)

**Task #4 completed: Full UNION/UNION ALL implementation**

## Changes

### 1. UNION Grammar
- Add `union_query` grammar rule to cypher.lark
- Support UNION and UNION ALL keywords
- Allow multiple query branches to be combined

### 2. Parser Support
- Add `union_query` transformer to parser.py
- Return dict with type='union', branches, and all flag
- Handle `union_distinct` and `union_all` transformers

### 3. API Integration
- Update GraphForge.execute() to detect UNION queries
- Plan each branch separately
- Create Union operator with all branches
- Execute Union operator and return combined results

### 4. Executor Enhancement
- Update execute() to recognize Union as RETURN-producing operator
- Prevent empty result check from failing on UNION queries
- Union._execute_union() already handles deduplication

## Features

**UNION (deduplicates)**:
```cypher
MATCH (p:Person) RETURN p.name
UNION
MATCH (c:Company) RETURN c.name
```

**UNION ALL (preserves duplicates)**:
```cypher
MATCH (p:Person) RETURN p.name
UNION ALL
MATCH (p:Person) RETURN p.name
```

**Multiple branches**:
```cypher
MATCH (p:Person) RETURN p.name
UNION
MATCH (c:Company) RETURN c.name
UNION
MATCH (d:Department) RETURN d.name
```

## Testing
- 9 comprehensive integration tests
- All 740 tests pass (731 previous + 9 new)
- Tests cover: basic UNION, UNION ALL, deduplication, multiple branches,
  ORDER BY, LIMIT, empty results, edge cases

## Impact
- UNION/UNION ALL fully functional
- ~30 TCK scenarios expected to pass
- Foundation supports nested query execution
- Ready for Phase 2 completion (list comprehensions, subqueries)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: implement list comprehensions (#94)

Add support for list comprehension syntax: [x IN list WHERE x > 5 | x * 2]

## Changes
- Add ListComprehension AST node to ast/expression.py
- Extend grammar to support comprehension syntax in cypher.lark
- Add parser transformer for list_comprehension
- Implement evaluation logic in evaluator.py
- Create 12 comprehensive integration tests

## Features
- Basic iteration: [x IN [1,2,3]]
- Filtering with WHERE: [x IN list WHERE x > 5]
- Transformation with |: [x IN list | x * 2]
- Combined filter and map: [x IN list WHERE x > 5 | x * 2]
- Nested comprehensions
- NULL handling in filters

## Test Coverage
- 12 new integration tests (all passing)
- Tests cover basic iteration, filtering, mapping, edge cases
- 752 total integration tests passing
- 91.96% code coverage maintained

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: implement EXISTS and COUNT subquery expressions (#94)

Add support for subquery expressions in WHERE and RETURN clauses.

## Changes
- Add SubqueryExpression AST node to ast/expression.py
- Extend grammar to support EXISTS and COUNT subqueries
- Add parser transformers for exists_expr and count_expr
- Implement subquery evaluation in evaluator.py
- Update executor to pass itself and planner for subquery execution
- Create 13 comprehensive integration tests

## Features
### EXISTS Subquery
- Returns boolean indicating if subquery matches any rows
- Can be used in WHERE clause for filtering
- Can be used in RETURN clause for computed columns
- Supports correlated subqueries (references outer variables)

### COUNT Subquery
- Returns integer count of rows matched by subquery
- Can be used in WHERE/WITH clause with comparisons
- Can be used in RETURN clause for aggregations
- Supports correlated subqueries

## Examples
```cypher
-- EXISTS in WHERE
MATCH (p:Person)
WHERE EXISTS { MATCH (p)-[:KNOWS]->() }
RETURN p.name

-- COUNT in RETURN
MATCH (p:Person)
RETURN p.name, COUNT { MATCH (p)-[:KNOWS]->() } AS friends

-- COUNT in WITH for filtering
MATCH (p:Person)
WITH p, COUNT { MATCH (p)-[:KNOWS]->() } AS friend_count
WHERE friend_count > 2
RETURN p.name
```

## Architecture
- Subqueries execute in isolated context with outer bindings
- Planner and executor passed through evaluation chain
- Supports nested query execution via operator pipeline

## Test Coverage
- 13 new integration tests (all passing)
- 765 total integration tests passing
- Tests cover EXISTS/COUNT in various contexts
- Tests cover correlated subqueries and edge cases

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: implement variable-length path patterns (#94)

Add support for variable-length relationship traversal like -[:KNOWS*1..3]->

## Changes
- Extend RelationshipPattern AST to include min_hops and max_hops
- Add grammar support for *, *1..3, *..3, *1.. syntax
- Add ExpandVariableLength operator for recursive traversal
- Implement depth-first search with cycle detection in executor
- Update planner to emit ExpandVariableLength for var-length patterns
- Create 2 integration tests

## Features
### Variable-Length Syntax
- `*` - 1 or more hops (unbounded)
- `*1..3` - min 1, max 3 hops
- `*..3` - min 1, max 3 hops
- `*2..` - min 2, unbounded hops

### Implementation
- Depth-first search traversal
- Cycle detection to prevent infinite loops
- Edge list binding for paths
- Type and direction filtering

## Examples
```cypher
-- Find all friends (direct and indirect)
MATCH (p:Person {name: 'Alice'})-[:KNOWS*]->(f)
RETURN f.name

-- Find friends within 2 hops
MATCH (p)-[:KNOWS*1..2]->(f)
RETURN f.name

-- Find all ancestors
MATCH (p)-[:PARENT*]->(ancestor)
RETURN ancestor.name
```

## Test Coverage
- 2 new integration tests (all passing)
- 767 total integration tests passing
- Tests cover unbounded and bounded patterns
- Tests verify cycle detection

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs: add comprehensive v0.3.0 release notes (#94)

- Document all 8 major features implemented
- OPTIONAL MATCH, UNION, list comprehensions
- EXISTS/COUNT subqueries, variable-length paths
- IS NULL operators, tree-based architecture
- Include syntax examples and use cases
- Implementation details and test coverage
- Architecture improvements and migration guide
- 767 passing integration tests documented

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: preserve column names for property access in RETURN (#94)

- Add PropertyAccess to executor imports
- Generate dotted notation (e.g., 'p.name') for property access
- Previously used generic 'col_N' names
- Update test to expect proper column names
- Fixes TCK result format expectations

This improves TCK compatibility by matching expected result column names.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: add toBoolean() type conversion function (#94)

- Add toBoolean to TYPE_FUNCTIONS set
- Implement toBoolean in _evaluate_type_function
- Support string-to-boolean conversion ("true"/"false" case-insensitive)
- Add toboolean to FUNCTION_NAME grammar terminal
- Returns null for invalid inputs

Completes type conversion function suite for TCK compatibility.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: implement list quantifier functions (all, any, none, single) (#94)

- Add QuantifierExpression AST node
- Add quantifier_expr grammar rules (ALL, ANY, NONE, SINGLE)
- Implement parser transformers for all quantifiers
- Add evaluation logic in evaluator
- Fix empty list parsing bug (filtered None values)
- Implement correct semantics:
  * ALL: true if all items satisfy (vacuous truth for empty list)
  * ANY: true if at least one item satisfies
  * NONE: true if no items satisfy
  * SINGLE: true if exactly one item satisfies

Expected TCK impact: ~100-150 scenarios

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: support multi-clause queries with UNWIND (#94)

- Add reading_only_query grammar rule for multiple reading clauses
- Support patterns: MATCH + UNWIND + RETURN, UNWIND + MATCH + RETURN
- Add WHERE clause support in reading-only queries
- Fix planner to respect clause order for reading clauses
- Previously MATCH always executed before UNWIND
- Now UNWIND and MATCH execute in declaration order
- Flatten nested reading_clause lists in parser

This enables variable dependencies between UNWIND and MATCH:
- UNWIND ['Alice', 'Bob'] AS name MATCH (p {name: name})
- MATCH (p) UNWIND p.tags AS tag RETURN p, tag

Expected TCK impact: ~10-15 scenarios

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs: add parallel testing documentation (#94)

- Document pytest -n flag for parallel execution
- TCK tests run 4x faster with -n auto (54s vs 3.5min)
- Add examples for auto CPU detection and manual worker count

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: add comprehensive end-to-end dataset workflow tests (#94)

- Add TestDatasetDiscovery for dataset listing and filtering
- Add TestSmallDatasetLoading for collaboration network queries
- Add TestComplexQueries for v0.3.0 feature patterns
- Add TestDataExportWorkflow for JSON export/import
- Add TestRealWorldUseCases for typical query patterns

Tests demonstrate:
- Dataset discovery and metadata access
- Loading SNAP datasets and querying
- OPTIONAL MATCH, UNION, EXISTS, quantifier usage
- JSON graph export/import roundtrip
- Common patterns: degree queries, paths, aggregation

12 tests passing, 5 skipped (awaiting size() function support)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: resolve type checking and linting issues (#94)

- Add ExpandVariableLength import to executor.py top-level imports
- Fix type annotations for list comprehension items variable
- Fix type annotations for variable-length path stack
- Update node ID set type to handle str | int
- Fix import ordering in test_end_to_end_workflows.py
- Update test_unknown_type_function_raises_error to use truly unknown function
- All pre-push checks passing: format, lint, type-check, tests, coverage

All 1,790 tests passing with 91.43% coverage.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: resolve critical bugs in executor and parser (#103)

**Executor Fixes:**
- Fix _execute_union to pass correct op_index for each branch operator
  instead of hardcoded 0, fixing aggregation finalization logic
- Fix _execute_variable_expand cycle detection by properly tracking
  visited nodes per path (initialize with src_node.id, add next_node.id
  when pushing to stack)

**Parser Fixes:**
- Replace union_query dict return with proper UnionQuery AST node
- Add validation to reject mixed UNION/UNION ALL in same query
- Update api.py to handle UnionQuery instead of dict
- Fix list_comprehension disambiguation by adding named sub-rules
  (comp_where_clause, comp_map_clause) to grammar
- Add transformers that return tagged tuples to distinguish WHERE vs MAP

**Test Improvements:**
- Add @pytest.mark.integration to all integration test classes
- Add @pytest.mark.slow to dataset tests loading snap-ca-grqc
- Fix test_quantifier_pattern assertion to validate both Alice and Carol
  are present (not just one) and Bob is excluded
- Rename test_union_different_column_count to test_union_different_value_types
  to accurately describe what it tests
- Add test_union_mismatched_column_count to document current behavior
  with different column counts

All tests passing, no type errors, linting clean.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: critical executor, evaluator, and parser bugs (#103)

**Evaluator Fixes:**
- Fix subquery operator execution to pass correct op_index (use enumerate)
- Implement proper three-valued logic for quantifiers (ALL/ANY/NONE/SINGLE)
  tracking satisfied_count, false_count, and null_count for openCypher semantics
- Add null-check for executor.planner before calling plan() in subqueries
- ALL: False if any False, True if empty or all True, else NULL
- ANY: True if any True, NULL if any NULL but no True, else False
- NONE: False if any True, NULL if any NULL but no True, else True
- SINGLE: True if exactly one True and no NULLs, else False/NULL

**Executor Fixes:**
- Fix _execute_optional_expand to preserve rows with NULL bindings when
  src_var is missing or not a NodeRef (proper OPTIONAL semantics)
- Fix _execute_variable_expand to bind raw list of EdgeRef objects instead
  of wrapping in CypherList (EdgeRef is not a CypherValue)
- Add OptionalScanNodes operator for single-node OPTIONAL MATCH with
  LEFT JOIN semantics preserving rows with NULL bindings
- Implement _execute_optional_scan method with proper NULL handling

**Parser/Grammar Fixes:**
- Fix read_query ambiguity by splitting into three alternatives:
  match + optional_match+ + where?, match + where?, optional_match + where?
- Add 'size' to FUNCTION_NAME token regex for size() function support

**AST Improvements:**
- Replace UnionQuery dataclass with Pydantic BaseModel (frozen=True)
- Add field validators for branches (non-empty, all CypherQuery)
- Add model validator to ensure at least two branches in UNION

**Planner Improvements:**
- Update single-node OPTIONAL MATCH to use OptionalScanNodes instead of
  ScanNodes for proper NULL preservation
- Add OptionalScanNodes to operators.py with Pydantic validation

**Test Improvements:**
- Add @pytest.mark.integration to all test classes in test_union.py

All tests passing, linting clean, type checking passing.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: implement size() function and add coverage tests (#103)

**size() Function Implementation:**
- Added size() function for lists and strings in evaluator.py
- Returns length of CypherList or CypherString
- Raises TypeError for invalid argument types
- Works in WHERE clauses, RETURN, and property access

**Coverage Improvements:**
- Added tests for subquery planner null-check error paths
- Added integration tests for size() function (3 tests)
- Tests cover node property lists, WHERE clause filtering, string properties

All new tests passing.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: format test file

* fix: simplify size() implementation per linting

* fix: address code review issues (#103)

- Add QuantifierExpression and UnaryOp to list/dict literal evaluation
- Update max_hops validator to allow 0 (non-negative instead of positive)
- Add model validator to ensure max_hops >= min_hops cross-field validation

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
DecisionNerd added a commit that referenced this pull request May 31, 2026
- RemoveType: add constraint cascade (doc.constraints.retain) to fix
  orphaned constraints bug (finding #1)
- apply(): add step contiguity validation — doc.version must match
  steps[0].from_version and each step's from must equal prev's to;
  returns NoMigrationPath on mismatch (finding #5)
- migrate_to(): guard against doc.version != handle.version() before
  planning (finding #2)

Deferred: AddProperty default_json limitation (#3 — intentional for
this milestone) and silent Unknown degradation (#4 — intentional
forward-compatibility design)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DecisionNerd added a commit that referenced this pull request May 31, 2026
#669)

* feat(ontology): migration engine — plan and apply versioned transforms (#563)

- error.rs: OntologyError::NoMigrationPath { from, to }
- migration.rs: TransformKind enum (RenameType, RenameProperty, AddProperty,
  RemoveProperty, AddType, RemoveType, Unknown), MigrationStep struct,
  MigrationEngine::plan() (BFS shortest-path), MigrationEngine::apply()
  (mutates OntologyDoc then re-compiles via OntologyCompiler)
- handle.rs: OntologyHandle::migrate_to(target_version, doc) convenience method
- lib.rs: re-export MigrationEngine, MigrationStep, TransformKind
- 11 new tests (77 total): plan same/single/multi/no-path/shortest, apply
  rename-type/property, add/remove property, version update, chain

Closes #563

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review on migration engine

- RemoveType: add constraint cascade (doc.constraints.retain) to fix
  orphaned constraints bug (finding #1)
- apply(): add step contiguity validation — doc.version must match
  steps[0].from_version and each step's from must equal prev's to;
  returns NoMigrationPath on mismatch (finding #5)
- migrate_to(): guard against doc.version != handle.version() before
  planning (finding #2)

Deferred: AddProperty default_json limitation (#3 — intentional for
this milestone) and silent Unknown degradation (#4 — intentional
forward-compatibility design)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release:patch Bug fix, requires patch version bump release:pending Merged but not yet released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant