Skip to content

feat: implement MERGE ON CREATE SET syntax#65

Merged
DecisionNerd merged 3 commits into
mainfrom
feature/merge-on-create-set
Feb 3, 2026
Merged

feat: implement MERGE ON CREATE SET syntax#65
DecisionNerd merged 3 commits into
mainfrom
feature/merge-on-create-set

Conversation

@DecisionNerd
Copy link
Copy Markdown
Owner

@DecisionNerd DecisionNerd commented Feb 3, 2026

Summary

Implements MERGE ... ON CREATE SET syntax to support Neo4j example datasets.

This is Phase 1a of the v0.3.0 roadmap and is the critical blocker for loading all 5 Neo4j datasets.

Changes

🔧 Grammar

  • Extended Lark grammar with merge_action and on_create_clause rules
  • Reuses existing set_clause rule for consistency
  • Case-insensitive keywords (ON CREATE SET)
  • Extensible structure for future ON MATCH SET support

📦 AST

  • Added on_create field to MergeClause dataclass
  • Type: SetClause | None for optional ON CREATE clause

🔄 Parser

  • Added transformer methods for new grammar rules
  • Handles optional ON CREATE clause in MERGE
  • Maintains backward compatibility

📋 Planner

  • Updated Merge operator with on_create field
  • Planner passes on_create from AST to operator

⚙️ Executor

  • Enhanced _execute_merge() to track create vs match state
  • Conditionally executes SET operations only when creating
  • Added _execute_set_items() helper for shared SET logic
  • Refactored _execute_set() to use new helper

Testing

✅ Test Coverage

Category Tests Status
Parser 10 ✅ All passing
Executor 14 ✅ All passing
Integration 14 ✅ All passing
Total 38 ✅ All passing

📊 Test Categories

Parser Tests:

  • Single and multiple property assignments
  • Case-insensitive keywords
  • Neo4j Movie Graph patterns
  • Backward compatibility
  • Edge cases (null values, expressions)

Executor Tests:

  • ON CREATE executes when creating new nodes
  • ON CREATE does NOT execute when matching existing nodes
  • Multiple properties, expressions, null values
  • Backward compatibility verification
  • Neo4j real-world patterns

Integration Tests:

  • Complete Neo4j Movie Graph workflow
  • Idempotency testing
  • Complex queries (WITH, WHERE, aggregation)
  • Performance testing (bulk operations)
  • Edge cases (multiple labels, no variables)

Examples

Single Property

MERGE (n:Person {id: 1}) ON CREATE SET n.created = true

Multiple Properties

MERGE (n:Person {id: 1})
ON CREATE SET n.created = true, n.timestamp = 123

Neo4j Movie Graph Pattern

MERGE (TheMatrix:Movie {title:'The Matrix'})
ON CREATE SET TheMatrix.released=1999,
              TheMatrix.tagline='Welcome to the Real World'

Verification

  • ✅ All 38 new tests pass
  • ✅ All 800 existing tests pass (backward compatibility verified)
  • ✅ Manual testing scenarios verified
  • ✅ Grammar correctly parses syntax
  • ✅ Executor correctly implements conditional logic
  • ✅ No regressions in existing functionality

Impact

🎯 Unblocks Neo4j Datasets

This PR unblocks loading of all 5 Neo4j example datasets:

  • neo4j-movie-graph (170 nodes, 250 edges)
  • neo4j-northwind (1K nodes, 3K edges)
  • neo4j-game-of-thrones (800 nodes, 3K edges)
  • neo4j-fincen-files (500 nodes, 1.5K edges)
  • neo4j-twitter (2K nodes, 8K edges)

All of these datasets use the MERGE ... ON CREATE SET pattern extensively.

Related

Notes

  • This PR implements ONLY ON CREATE SET, not ON MATCH SET
  • ON MATCH SET will be implemented in follow-up PR (Implement MERGE ... ON MATCH SET syntax #58)
  • Grammar is structured to easily add ON MATCH in the future
  • Maintains full backward compatibility with existing MERGE behavior

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added support for ON CREATE SET clause in MERGE statements, allowing properties to be set exclusively when nodes are newly created during merge operations.
    • Enables conditional property updates that execute only upon node creation, providing more efficient graph modifications.
  • Tests

    • Comprehensive unit and integration test coverage added for MERGE ON CREATE SET functionality across multiple scenarios and edge cases.

## Summary

Implements `MERGE ... ON CREATE SET` syntax to support Neo4j example datasets.
This is Phase 1a of the v0.3.0 roadmap and is the critical blocker for loading
all 5 Neo4j datasets.

## Changes

### Grammar
- Extended Lark grammar with `merge_action` and `on_create_clause` rules
- Reuses existing `set_clause` rule for consistency
- Case-insensitive keywords (ON CREATE SET)
- Extensible structure for future ON MATCH SET support

### AST
- Added `on_create` field to `MergeClause` dataclass
- Type: `SetClause | None` for optional ON CREATE clause

### Parser
- Added `merge_clause()` transformer to handle optional ON CREATE
- Added `merge_action()` transformer for action clauses
- Added `on_create_clause()` transformer to wrap SET clause

### Planner
- Updated `Merge` operator with `on_create` field
- Planner passes `on_create` from AST to operator

### Executor
- Enhanced `_execute_merge()` to track create vs match state
- Conditionally executes SET operations only when creating
- Added `_execute_set_items()` helper for shared SET logic
- Refactored `_execute_set()` to use new helper

## Testing

### Parser Tests (10 tests)
- Single and multiple property assignments
- Case-insensitive keywords
- Neo4j Movie Graph patterns
- Backward compatibility (MERGE without ON CREATE)
- Edge cases (null values, expressions)

### Executor Tests (14 tests)
- ON CREATE executes when creating new nodes
- ON CREATE does NOT execute when matching existing nodes
- Multiple properties, expressions, null values
- Backward compatibility verification
- Neo4j real-world patterns

### Integration Tests (14 tests)
- Complete Neo4j Movie Graph workflow
- Idempotency testing
- Complex queries (WITH, WHERE, aggregation)
- Performance testing (bulk operations)
- Edge cases (multiple labels, no variables)

**Total: 38 new tests, all passing**

## Examples

```cypher
-- Single property
MERGE (n:Person {id: 1}) ON CREATE SET n.created = true

-- Multiple properties
MERGE (n:Person {id: 1})
ON CREATE SET n.created = true, n.timestamp = 123

-- Neo4j Movie Graph pattern
MERGE (TheMatrix:Movie {title:'The Matrix'})
ON CREATE SET TheMatrix.released=1999,
              TheMatrix.tagline='Welcome to the Real World'
```

## Verification

- ✅ All 38 new tests pass
- ✅ All 800 existing tests pass (backward compatibility)
- ✅ Manual testing scenarios verified
- ✅ Grammar correctly parses syntax
- ✅ Executor correctly implements conditional logic

## Impact

Unblocks loading of all 5 Neo4j example datasets:
- neo4j-movie-graph
- neo4j-northwind
- neo4j-game-of-thrones
- neo4j-fincen-files
- neo4j-twitter

## Related

- Part of v0.3.0 roadmap (Phase 1a)
- Closes #57

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@DecisionNerd DecisionNerd added enhancement New feature or request parser Changes to Cypher parser labels Feb 3, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 3, 2026

Warning

Rate limit exceeded

@DecisionNerd has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Implements MERGE ... ON CREATE SET syntax across the full stack: grammar parsing, AST representation, query planning, and execution. The executor tracks node/edge creation and conditionally executes ON CREATE SET expressions only when elements are newly created, not when matching existing elements.

Changes

Cohort / File(s) Summary
Grammar and Parser
src/graphforge/parser/cypher.lark, src/graphforge/parser/parser.py
Extended MERGE clause grammar to support optional merge_action elements with on_create_clause rule. Added parser transformer methods merge_action and on_create_clause to recognize ON CREATE SET syntax and build corresponding AST nodes.
AST Representation
src/graphforge/ast/clause.py
Added optional on_create field of type `SetClause
Query Planning
src/graphforge/planner/operators.py, src/graphforge/planner/planner.py
Added on_create field to Merge operator and updated planner to pass on_create from parsed MergeClause through to the operator instance.
Execution Logic
src/graphforge/executor/executor.py
Refactored SET handling into dedicated _execute_set_items helper method. Enhanced MERGE execution to track whether patterns resulted in node/edge creation (was_created flag) and conditionally execute on_create items only when creation occurred.
Parser Unit Tests
tests/unit/parser/test_merge_on_create.py
Comprehensive unit tests validating MERGE ON CREATE SET syntax parsing, including single/multiple properties, expressions, case-insensitivity, and backward compatibility without ON CREATE.
Executor Unit Tests
tests/unit/executor/test_merge_on_create.py
Extensive unit tests for ON CREATE SET execution behavior: creation-time evaluation, non-execution on match, expression handling, idempotency, and backward compatibility with existing MERGE operations.
Integration Tests
tests/integration/test_merge_on_create_real.py
Real-world integration test scenarios covering movie graph patterns, repeated MERGE idempotency, interaction with pre-existing nodes, aggregations, edge cases, and bulk operations.

Sequence Diagram

sequenceDiagram
    participant Parser as Parser
    participant AST as AST Layer
    participant Planner as Query Planner
    participant Executor as Executor
    participant Graph as Graph Storage

    Parser->>AST: Parse MERGE ON CREATE SET<br/>Build MergeClause(patterns, on_create)
    AST->>Planner: Pass MergeClause with on_create
    Planner->>Planner: Thread on_create to<br/>Merge operator
    Planner->>Executor: Execute Merge operator<br/>(patterns, on_create)
    
    Executor->>Graph: Match/find patterns
    alt Patterns match existing
        Executor->>Executor: was_created = False
        Executor->>Graph: Return matched nodes
    else Patterns create new
        Executor->>Graph: Create new nodes/edges
        Executor->>Executor: was_created = True
        Executor->>Executor: Execute _execute_set_items<br/>on on_create items
        Executor->>Graph: Set properties from<br/>on_create SetClause
    end
    
    Executor->>Executor: Return merged elements
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related Issues

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: implement MERGE ON CREATE SET syntax' clearly and concisely describes the primary change - implementing support for the MERGE ON CREATE SET Cypher syntax.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering all key sections: summary, changes by component, testing results, examples, verification, and impact.
Linked Issues check ✅ Passed All coding requirements from issue #57 are met: grammar extended for ON CREATE SET, parser updated, AST modified with on_create field, planner carries on_create, executor implements conditional SET execution on create.
Out of Scope Changes check ✅ Passed All changes directly support implementing MERGE ON CREATE SET syntax across grammar, parser, AST, planner, and executor. Tests comprehensively cover the new feature without introducing unrelated changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/merge-on-create-set

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 90.32258% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.74%. Comparing base (0e302ec) to head (ccfb5ff).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #65      +/-   ##
==========================================
+ Coverage   93.57%   93.74%   +0.16%     
==========================================
  Files          15       15              
  Lines        1978     1998      +20     
  Branches      494      498       +4     
==========================================
+ Hits         1851     1873      +22     
  Misses         45       45              
+ Partials       82       80       -2     
Flag Coverage Δ
full-coverage 93.74% <90.32%> (+0.16%) ⬆️
unittests 62.96% <87.09%> (+10.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
parser 95.52% <84.61%> (-0.50%) ⬇️
planner 94.18% <100.00%> (+0.02%) ⬆️
executor 89.35% <93.33%> (+0.54%) ⬆️
storage 99.50% <ø> (ø)
ast 100.00% <100.00%> (ø)
types 98.42% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e302ec...ccfb5ff. Read the comment docs.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/graphforge/executor/executor.py (1)

1193-1258: ⚠️ Potential issue | 🟠 Major

Bug: was_created tracking is incorrect for multi-pattern MERGE.

The was_created flag is overwritten for each pattern in the loop. If an earlier pattern creates a node but a later pattern matches an existing node, was_created will be False, causing ON CREATE SET to not execute even though elements were created.

Example:

MERGE (a:A), (b:B) ON CREATE SET a.x = 1

If a is created but b already exists, was_created ends up False because Line 1247 overwrites the True from Line 1251.

🐛 Proposed fix: track creation with OR logic
             # Track whether we created anything (for ON CREATE SET)
-            was_created = False
+            any_created = False

             # Process each pattern
             for pattern in op.patterns:
                 if not pattern:
                     continue

                 # Handle simple node pattern: MERGE (n:Person {name: 'Alice'})
                 if len(pattern) == 1 and isinstance(pattern[0], NodePattern):
                     node_pattern = pattern[0]

                     # Try to find existing node
                     found_node = None
                     # ... (existing matching logic) ...

                     # Bind found node or create new one
                     if found_node:
-                        was_created = False
                         if node_pattern.variable:
                             new_ctx.bindings[node_pattern.variable] = found_node
                     else:
-                        was_created = True
+                        any_created = True
                         node = self._create_node_from_pattern(node_pattern, new_ctx)
                         if node_pattern.variable:
                             new_ctx.bindings[node_pattern.variable] = node

             # Execute conditional SET if we created something
-            if was_created and op.on_create:
+            if any_created and op.on_create:
                 self._execute_set_items(op.on_create.items, new_ctx)
🤖 Fix all issues with AI agents
In `@tests/integration/test_merge_on_create_real.py`:
- Around line 235-245: The test test_merge_on_create_no_variable currently runs
a plain MERGE and doesn't exercise the ON CREATE SET case; update the test
(function test_merge_on_create_no_variable) to execute a MERGE with an ON CREATE
SET clause that references a non-existent pattern variable (e.g., use gf.execute
to run a MERGE (:Node {id: 1}) ON CREATE SET n.someProp = 1) so the
parser/engine behavior is exercised, then query with gf.execute("MATCH (n:Node
{id: 1}) RETURN count(n) as count, n.someProp as someProp") and assert the node
count is 1 and that someProp is not set (or assert the appropriate error if the
project expects an error), otherwise remove the test if the grammar forbids ON
CREATE SET without a variable; update the assertion to match the chosen expected
behavior.
🧹 Nitpick comments (1)
tests/integration/test_merge_on_create_real.py (1)

192-200: Misleading test name: not testing "empty set".

The test name test_merge_on_create_empty_set suggests testing an empty ON CREATE SET clause, but the test actually sets n.flag = true. Consider renaming to reflect actual behavior (e.g., test_merge_on_create_basic or test_merge_on_create_single_property).

✏️ Suggested rename
-    def test_merge_on_create_empty_set(self):
-        """Test MERGE with ON CREATE SET but empty properties (should parse but do nothing extra)."""
+    def test_merge_on_create_single_property(self):
+        """Test basic MERGE with ON CREATE SET setting a single property."""

Comment thread tests/integration/test_merge_on_create_real.py
- Update test to use MERGE with ON CREATE SET referencing non-existent variable
- Verify node is created but property is NOT set (variable doesn't exist)
- Add detailed docstring explaining expected behavior
- Assert both node creation and property absence
@DecisionNerd DecisionNerd merged commit 65bffcc into main Feb 3, 2026
19 checks passed
@DecisionNerd DecisionNerd deleted the feature/merge-on-create-set branch February 3, 2026 20:38
DecisionNerd added a commit that referenced this pull request Feb 4, 2026
Version Bump:
- Bump version to 0.2.1 in pyproject.toml, __init__.py, and uv.lock
- Add comprehensive v0.2.1 changelog entry

Documentation Updates:
- Update README.md with dataset loading examples and quickstart
- Add "Load Real-World Datasets" section to main README
- Update docs/index.md with dataset features and examples
- Complete rewrite of docs/datasets/snap.md:
  - Mark as available in v0.2.1 (5 datasets)
  - Add detailed dataset table with stats
  - Add comprehensive usage examples and query patterns
  - Document download/caching behavior
  - Add performance tips for large datasets
- Update docs/datasets/overview.md:
  - Reorganize to show SNAP as "Available Now"
  - Mark other sources as "Coming Soon"
  - List all 5 available SNAP datasets
- Update docs/getting-started/quickstart.md:
  - Add "Load a Dataset" section with examples
  - Add dataset browsing examples
  - Update navigation links

Release Contents (v0.2.1):
- Dataset loading infrastructure with caching (#68)
- CSV loader for edge-list datasets (#69)
- 5 SNAP datasets available
- MERGE ON CREATE SET syntax (#65)
- MERGE ON MATCH SET syntax (#66)
- WITH clause variable passing fix (#67)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
DecisionNerd added a commit that referenced this pull request Feb 4, 2026
* docs: prepare v0.2.1 release with dataset documentation

Version Bump:
- Bump version to 0.2.1 in pyproject.toml, __init__.py, and uv.lock
- Add comprehensive v0.2.1 changelog entry

Documentation Updates:
- Update README.md with dataset loading examples and quickstart
- Add "Load Real-World Datasets" section to main README
- Update docs/index.md with dataset features and examples
- Complete rewrite of docs/datasets/snap.md:
  - Mark as available in v0.2.1 (5 datasets)
  - Add detailed dataset table with stats
  - Add comprehensive usage examples and query patterns
  - Document download/caching behavior
  - Add performance tips for large datasets
- Update docs/datasets/overview.md:
  - Reorganize to show SNAP as "Available Now"
  - Mark other sources as "Coming Soon"
  - List all 5 available SNAP datasets
- Update docs/getting-started/quickstart.md:
  - Add "Load a Dataset" section with examples
  - Add dataset browsing examples
  - Update navigation links

Release Contents (v0.2.1):
- Dataset loading infrastructure with caching (#68)
- CSV loader for edge-list datasets (#69)
- 5 SNAP datasets available
- MERGE ON CREATE SET syntax (#65)
- MERGE ON MATCH SET syntax (#66)
- WITH clause variable passing fix (#67)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: update uv.lock after version bump

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request parser Changes to Cypher parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement MERGE ... ON CREATE SET syntax

1 participant