feat: implement MERGE ON CREATE SET syntax#65
Conversation
## Summary
Implements `MERGE ... ON CREATE SET` syntax to support Neo4j example datasets.
This is Phase 1a of the v0.3.0 roadmap and is the critical blocker for loading
all 5 Neo4j datasets.
## Changes
### Grammar
- Extended Lark grammar with `merge_action` and `on_create_clause` rules
- Reuses existing `set_clause` rule for consistency
- Case-insensitive keywords (ON CREATE SET)
- Extensible structure for future ON MATCH SET support
### AST
- Added `on_create` field to `MergeClause` dataclass
- Type: `SetClause | None` for optional ON CREATE clause
### Parser
- Added `merge_clause()` transformer to handle optional ON CREATE
- Added `merge_action()` transformer for action clauses
- Added `on_create_clause()` transformer to wrap SET clause
### Planner
- Updated `Merge` operator with `on_create` field
- Planner passes `on_create` from AST to operator
### Executor
- Enhanced `_execute_merge()` to track create vs match state
- Conditionally executes SET operations only when creating
- Added `_execute_set_items()` helper for shared SET logic
- Refactored `_execute_set()` to use new helper
## Testing
### Parser Tests (10 tests)
- Single and multiple property assignments
- Case-insensitive keywords
- Neo4j Movie Graph patterns
- Backward compatibility (MERGE without ON CREATE)
- Edge cases (null values, expressions)
### Executor Tests (14 tests)
- ON CREATE executes when creating new nodes
- ON CREATE does NOT execute when matching existing nodes
- Multiple properties, expressions, null values
- Backward compatibility verification
- Neo4j real-world patterns
### Integration Tests (14 tests)
- Complete Neo4j Movie Graph workflow
- Idempotency testing
- Complex queries (WITH, WHERE, aggregation)
- Performance testing (bulk operations)
- Edge cases (multiple labels, no variables)
**Total: 38 new tests, all passing**
## Examples
```cypher
-- Single property
MERGE (n:Person {id: 1}) ON CREATE SET n.created = true
-- Multiple properties
MERGE (n:Person {id: 1})
ON CREATE SET n.created = true, n.timestamp = 123
-- Neo4j Movie Graph pattern
MERGE (TheMatrix:Movie {title:'The Matrix'})
ON CREATE SET TheMatrix.released=1999,
TheMatrix.tagline='Welcome to the Real World'
```
## Verification
- ✅ All 38 new tests pass
- ✅ All 800 existing tests pass (backward compatibility)
- ✅ Manual testing scenarios verified
- ✅ Grammar correctly parses syntax
- ✅ Executor correctly implements conditional logic
## Impact
Unblocks loading of all 5 Neo4j example datasets:
- neo4j-movie-graph
- neo4j-northwind
- neo4j-game-of-thrones
- neo4j-fincen-files
- neo4j-twitter
## Related
- Part of v0.3.0 roadmap (Phase 1a)
- Closes #57
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughImplements MERGE ... ON CREATE SET syntax across the full stack: grammar parsing, AST representation, query planning, and execution. The executor tracks node/edge creation and conditionally executes ON CREATE SET expressions only when elements are newly created, not when matching existing elements. Changes
Sequence DiagramsequenceDiagram
participant Parser as Parser
participant AST as AST Layer
participant Planner as Query Planner
participant Executor as Executor
participant Graph as Graph Storage
Parser->>AST: Parse MERGE ON CREATE SET<br/>Build MergeClause(patterns, on_create)
AST->>Planner: Pass MergeClause with on_create
Planner->>Planner: Thread on_create to<br/>Merge operator
Planner->>Executor: Execute Merge operator<br/>(patterns, on_create)
Executor->>Graph: Match/find patterns
alt Patterns match existing
Executor->>Executor: was_created = False
Executor->>Graph: Return matched nodes
else Patterns create new
Executor->>Graph: Create new nodes/edges
Executor->>Executor: was_created = True
Executor->>Executor: Execute _execute_set_items<br/>on on_create items
Executor->>Graph: Set properties from<br/>on_create SetClause
end
Executor->>Executor: Return merged elements
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related Issues
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #65 +/- ##
==========================================
+ Coverage 93.57% 93.74% +0.16%
==========================================
Files 15 15
Lines 1978 1998 +20
Branches 494 498 +4
==========================================
+ Hits 1851 1873 +22
Misses 45 45
+ Partials 82 80 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/graphforge/executor/executor.py (1)
1193-1258:⚠️ Potential issue | 🟠 MajorBug:
was_createdtracking is incorrect for multi-pattern MERGE.The
was_createdflag is overwritten for each pattern in the loop. If an earlier pattern creates a node but a later pattern matches an existing node,was_createdwill beFalse, causingON CREATE SETto not execute even though elements were created.Example:
MERGE (a:A), (b:B) ON CREATE SET a.x = 1If
ais created butbalready exists,was_createdends upFalsebecause Line 1247 overwrites theTruefrom Line 1251.🐛 Proposed fix: track creation with OR logic
# Track whether we created anything (for ON CREATE SET) - was_created = False + any_created = False # Process each pattern for pattern in op.patterns: if not pattern: continue # Handle simple node pattern: MERGE (n:Person {name: 'Alice'}) if len(pattern) == 1 and isinstance(pattern[0], NodePattern): node_pattern = pattern[0] # Try to find existing node found_node = None # ... (existing matching logic) ... # Bind found node or create new one if found_node: - was_created = False if node_pattern.variable: new_ctx.bindings[node_pattern.variable] = found_node else: - was_created = True + any_created = True node = self._create_node_from_pattern(node_pattern, new_ctx) if node_pattern.variable: new_ctx.bindings[node_pattern.variable] = node # Execute conditional SET if we created something - if was_created and op.on_create: + if any_created and op.on_create: self._execute_set_items(op.on_create.items, new_ctx)
🤖 Fix all issues with AI agents
In `@tests/integration/test_merge_on_create_real.py`:
- Around line 235-245: The test test_merge_on_create_no_variable currently runs
a plain MERGE and doesn't exercise the ON CREATE SET case; update the test
(function test_merge_on_create_no_variable) to execute a MERGE with an ON CREATE
SET clause that references a non-existent pattern variable (e.g., use gf.execute
to run a MERGE (:Node {id: 1}) ON CREATE SET n.someProp = 1) so the
parser/engine behavior is exercised, then query with gf.execute("MATCH (n:Node
{id: 1}) RETURN count(n) as count, n.someProp as someProp") and assert the node
count is 1 and that someProp is not set (or assert the appropriate error if the
project expects an error), otherwise remove the test if the grammar forbids ON
CREATE SET without a variable; update the assertion to match the chosen expected
behavior.
🧹 Nitpick comments (1)
tests/integration/test_merge_on_create_real.py (1)
192-200: Misleading test name: not testing "empty set".The test name
test_merge_on_create_empty_setsuggests testing an empty ON CREATE SET clause, but the test actually setsn.flag = true. Consider renaming to reflect actual behavior (e.g.,test_merge_on_create_basicortest_merge_on_create_single_property).✏️ Suggested rename
- def test_merge_on_create_empty_set(self): - """Test MERGE with ON CREATE SET but empty properties (should parse but do nothing extra).""" + def test_merge_on_create_single_property(self): + """Test basic MERGE with ON CREATE SET setting a single property."""
- Update test to use MERGE with ON CREATE SET referencing non-existent variable - Verify node is created but property is NOT set (variable doesn't exist) - Add detailed docstring explaining expected behavior - Assert both node creation and property absence
Version Bump: - Bump version to 0.2.1 in pyproject.toml, __init__.py, and uv.lock - Add comprehensive v0.2.1 changelog entry Documentation Updates: - Update README.md with dataset loading examples and quickstart - Add "Load Real-World Datasets" section to main README - Update docs/index.md with dataset features and examples - Complete rewrite of docs/datasets/snap.md: - Mark as available in v0.2.1 (5 datasets) - Add detailed dataset table with stats - Add comprehensive usage examples and query patterns - Document download/caching behavior - Add performance tips for large datasets - Update docs/datasets/overview.md: - Reorganize to show SNAP as "Available Now" - Mark other sources as "Coming Soon" - List all 5 available SNAP datasets - Update docs/getting-started/quickstart.md: - Add "Load a Dataset" section with examples - Add dataset browsing examples - Update navigation links Release Contents (v0.2.1): - Dataset loading infrastructure with caching (#68) - CSV loader for edge-list datasets (#69) - 5 SNAP datasets available - MERGE ON CREATE SET syntax (#65) - MERGE ON MATCH SET syntax (#66) - WITH clause variable passing fix (#67) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: prepare v0.2.1 release with dataset documentation Version Bump: - Bump version to 0.2.1 in pyproject.toml, __init__.py, and uv.lock - Add comprehensive v0.2.1 changelog entry Documentation Updates: - Update README.md with dataset loading examples and quickstart - Add "Load Real-World Datasets" section to main README - Update docs/index.md with dataset features and examples - Complete rewrite of docs/datasets/snap.md: - Mark as available in v0.2.1 (5 datasets) - Add detailed dataset table with stats - Add comprehensive usage examples and query patterns - Document download/caching behavior - Add performance tips for large datasets - Update docs/datasets/overview.md: - Reorganize to show SNAP as "Available Now" - Mark other sources as "Coming Soon" - List all 5 available SNAP datasets - Update docs/getting-started/quickstart.md: - Add "Load a Dataset" section with examples - Add dataset browsing examples - Update navigation links Release Contents (v0.2.1): - Dataset loading infrastructure with caching (#68) - CSV loader for edge-list datasets (#69) - 5 SNAP datasets available - MERGE ON CREATE SET syntax (#65) - MERGE ON MATCH SET syntax (#66) - WITH clause variable passing fix (#67) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: update uv.lock after version bump --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Implements
MERGE ... ON CREATE SETsyntax to support Neo4j example datasets.This is Phase 1a of the v0.3.0 roadmap and is the critical blocker for loading all 5 Neo4j datasets.
Changes
🔧 Grammar
merge_actionandon_create_clauserulesset_clauserule for consistencyON CREATE SET)ON MATCH SETsupport📦 AST
on_createfield toMergeClausedataclassSetClause | Nonefor optional ON CREATE clause🔄 Parser
📋 Planner
Mergeoperator withon_createfieldon_createfrom AST to operator⚙️ Executor
_execute_merge()to track create vs match state_execute_set_items()helper for shared SET logic_execute_set()to use new helperTesting
✅ Test Coverage
📊 Test Categories
Parser Tests:
Executor Tests:
Integration Tests:
Examples
Single Property
Multiple Properties
Neo4j Movie Graph Pattern
Verification
Impact
🎯 Unblocks Neo4j Datasets
This PR unblocks loading of all 5 Neo4j example datasets:
neo4j-movie-graph(170 nodes, 250 edges)neo4j-northwind(1K nodes, 3K edges)neo4j-game-of-thrones(800 nodes, 3K edges)neo4j-fincen-files(500 nodes, 1.5K edges)neo4j-twitter(2K nodes, 8K edges)All of these datasets use the
MERGE ... ON CREATE SETpattern extensively.Related
Notes
ON CREATE SET, notON MATCH SETON MATCH SETwill be implemented in follow-up PR (Implement MERGE ... ON MATCH SET syntax #58)ON MATCHin the future🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests