fix: implement label disjunction for MATCH patterns#159
Conversation
Implement label disjunction syntax (:Label1|Label2) to match nodes with any of the specified labels. This completes the primary missing feature from issue #148. ## Changes ### Grammar (cypher.lark) - Updated labels rule to support disjunction: `:Person|Company` - Labels now parsed as `label_conjunction ("|" label_conjunction)*` - Maintains backward compatibility with conjunction: `:Person:Employee` ### AST (pattern.py) - Changed NodePattern.labels from `list[str]` to `list[list[str]]` - Labels now represent disjunction of conjunctions: * `[['Person']]` - must have Person * `[['Person', 'Employee']]` - must have both (AND) * `[['Person'], ['Company']]` - must have Person OR Company ### Parser (parser.py) - Added transformers for label_disjunction and label_conjunction - Updated node_pattern transformer to handle nested label format ### Planner (operators.py) - Updated ScanNodes and OptionalScanNodes to accept new label format - Changed labels field to `list[list[str]] | None` ### Executor (executor.py) - Added `_node_matches_labels()` helper for label matching logic - Updated scan operations to handle label disjunction (OR between groups) - Updated CREATE to flatten label groups when creating nodes - Updated MERGE to collect candidates from all label groups ## Tests Added comprehensive integration tests (12 tests): - Basic disjunction matching - Conjunction within groups - Disjunction of conjunctions - Multiple patterns with disjunction - WHERE clause compatibility - Edge cases (single label, properties, etc.) Updated 5 parser unit tests to expect new label format. All 182 parser tests pass. ## Verification Other issue #148 items confirmed working: ✅ Complex pattern combinations ✅ Optional property matching ✅ Bidirectional patterns ✅ Self-relationship patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WalkthroughParser, AST, planner, executor, and tests updated so node labels are represented and matched as disjunctions of conjunctions (List[List[str]]); grammar, transformer, executor matching, MERGE/CREATE semantics, planner operator types, and tests were adjusted accordingly. Changes
Sequence DiagramsequenceDiagram
participant Parser as Parser
participant Transformer as Transformer
participant Planner as Planner
participant Executor as Executor
participant Graph as Graph
Parser->>Transformer: parse label_disjunction / label_conjunction
Transformer->>Planner: emit NodePattern (labels: List[List[str]])
Planner->>Executor: emit operators (Scan / Merge / Create) with label_spec
Executor->>Executor: _node_matches_labels(node, label_spec)
Executor->>Graph: query candidates (per-group or flattened)
Graph-->>Executor: return candidate nodes
Executor->>Executor: apply per-group conjunction filters and union results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #159 +/- ##
==========================================
+ Coverage 88.99% 89.00% +0.01%
==========================================
Files 32 32
Lines 5033 5066 +33
Branches 1314 1326 +12
==========================================
+ Hits 4479 4509 +30
- Misses 317 319 +2
- Partials 237 238 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/graphforge/ast/pattern.py (1)
56-63:⚠️ Potential issue | 🔴 Critical
validate_labelsvalidator is broken for the newlist[list[str]]type — only validates the first label in each group.The validator has three critical issues:
- Type annotation is wrong:
v: list[str]should belist[list[str]]to match the field definition on line 38.- Incomplete validation: The loop
for label in v:iterates over groups, but only checkslabel[0].isalpha(), validating only the first label in each group. Invalid labels like[["Person", "1Employee"]]silently pass.- Misleading error message: Would print
f"Label must start with a letter: {['Person', 'Employee']}"instead of individual label names.🐛 Proposed fix
`@field_validator`("labels") `@classmethod` - def validate_labels(cls, v: list[str]) -> list[str]: - """Validate label names.""" - for label in v: - if not label or not label[0].isalpha(): - raise ValueError(f"Label must start with a letter: {label}") + def validate_labels(cls, v: list[list[str]]) -> list[list[str]]: + """Validate label names in all label groups.""" + for group in v: + if not group: + raise ValueError("Label group cannot be empty") + for label in group: + if not label or not label[0].isalpha(): + raise ValueError(f"Label must start with a letter: {label}") return v
🤖 Fix all issues with AI agents
In `@src/graphforge/executor/executor.py`:
- Around line 208-226: Import NodeRef for type checking (use from __future__
import annotations and typing.TYPE_CHECKING with "from graphforge.types.graph
import NodeRef" inside the TYPE_CHECKING block) and update _node_matches_labels
to guard against CypherNull by checking the node is not an instance of
CypherNull (or node is not None / not a null sentinel used by your code) before
accessing node.labels; keep the signature using the NodeRef forward reference if
desired, and ensure callers from _execute_optional_scan that may pass
bound-but-null variables are handled safely by returning False when node is
null.
- Around line 1580-1585: The CREATE node label handling currently flattens
disjunctive label groups; detect and reject disjunctions by checking
node_pattern.labels and raising an error if there is more than one label group
(i.e., len(node_pattern.labels) > 1) when processing CREATE patterns in the
executor (the block that builds labels from node_pattern.labels and assigns to
labels); replace the silent flattening with a clear exception (with a message
like "Disjunctive labels (using '|') are not allowed in CREATE patterns") so
planning/execution fails for constructs like CREATE (n:Person|Company).
In `@tests/integration/test_label_disjunction.py`:
- Around line 121-122: Replace the Unicode multiplication sign in the comment
lines "Person × (Product, Company) = 2" and "Company × (Product, Company) = 2"
with ASCII "x" to satisfy RUF003; update those comment strings to "Person x
(Product, Company) = 2" and "Company x (Product, Company) = 2" respectively so
the tests/integration/test_label_disjunction.py comments use only ASCII
characters.
- Around line 12-14: Tests in test_label_disjunction.py repeatedly construct
GraphForge inline; extract a pytest fixture named gf that returns a fresh
GraphForge() and make each test (e.g., test_disjunction_matches_first_label)
accept gf as a parameter instead of creating gf = GraphForge() inside the test,
removing the inline instantiation to ensure isolation and reduce boilerplate;
add the fixture near the top of the file and update all test function signatures
to use the gf fixture.
🧹 Nitpick comments (3)
tests/unit/parser/test_parser.py (1)
77-83: Consider adding parser unit tests for the new disjunction syntax.This file tests conjunction (
(n:Person:Employee)) but not disjunction ((n:Person|Company)) or disjunction-of-conjunctions ((n:Person:Employee|Company:Startup)). While integration tests cover execution, parser-level unit tests would validate the AST structure directly.src/graphforge/executor/executor.py (1)
275-292: Duplicated disjunction scan logic across_execute_scan,_execute_optional_scan, and_execute_merge.The same pattern — iterate label groups, scan by first label, filter for conjunction, union into a set — appears three times (lines 275–292, 347–364, and 1853–1869). Extract a helper method to reduce duplication.
♻️ Suggested helper
def _get_nodes_by_label_spec(self, label_spec: list[list[str]]) -> list: """Collect nodes matching a label specification (disjunction of conjunctions).""" all_nodes: set = set() for label_group in label_spec: group_nodes = self.graph.get_nodes_by_label(label_group[0]) if len(label_group) > 1: group_nodes = [ node for node in group_nodes if all(label in node.labels for label in label_group) ] all_nodes.update(group_nodes) return list(all_nodes)Then replace the three inline blocks with
nodes = self._get_nodes_by_label_spec(op.labels).Also applies to: 347-364
src/graphforge/parser/parser.py (1)
385-396: Defensive fallback for flat label list is likely dead code.Lines 390–392 handle a "flat list of strings" format that the comment acknowledges "shouldn't happen anymore" since the grammar now always produces nested lists via
label_disjunction → label_conjunction. This branch is effectively unreachable.Consider removing it to avoid confusion, or at minimum adding a warning/log so you'd notice if it ever triggers unexpectedly. Keeping unreachable code paths that silently coerce data can mask bugs.
Also note: if
itemwere an empty list[], it would bypass the first branch (item and ...is falsy) and hit theall(isinstance(x, str) for x in item)check, which is vacuouslyTrue, producinglabels = [[]](a disjunction containing one empty conjunction group). This is unlikely given the grammar, but worth keeping in mind.♻️ Suggested simplification
elif isinstance(item, list): - # Check if it's the new nested list format (label groups) - if item and isinstance(item[0], list): - # It's a list of label groups (disjunction of conjunctions) - labels = item - elif all(isinstance(x, str) for x in item): - # It's a flat list of strings (old format, shouldn't happen anymore) - labels = [item] # Wrap in a list to make it consistent + # label_disjunction always returns list[list[str]] + labels = item
Implemented label disjunction syntax (e.g., `:Person|Company`) for MATCH patterns, completing the final missing piece of issue #148. Changes: - Grammar: Added label disjunction syntax with `|` operator to cypher.lark - AST: Changed NodePattern.labels from list[str] to list[list[str]] to support disjunction of conjunctions - Parser: Updated label parsing to handle both single labels and disjunction - Planner: Updated ScanNodes and OptionalScanNodes operators for new format - Executor: Implemented _node_matches_labels() with OR logic across groups and AND logic within groups Tests: - Added 12 comprehensive integration tests in test_label_disjunction.py - Updated 40+ existing tests to use new label format - All patterns work: :Person|Company, :Person:Employee|Company:Startup - Supports single labels, conjunctions, and disjunctions Coverage: 92.24% total (above 85% target), pattern.py at 100% Issue #148 is now complete with all 5 edge cases addressed: 1. ✅ Label disjunction - DONE (this commit) 2. ✅ Complex pattern combinations - Already working 3. ✅ Optional property matching - Already working 4. ✅ Bidirectional patterns - Already working 5. ✅ Self-relationship patterns - Already working Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Improved type safety, null handling, and validation:
1. **Type Safety (executor.py)**:
- Use TYPE_CHECKING for NodeRef import to avoid circular dependencies
- Add proper type hints with forward references
- Remove redundant NodeRef imports within functions
2. **Null Safety (executor.py:209-228)**:
- Add CypherNull handling in _node_matches_labels
- Return False for null nodes (from OPTIONAL MATCH)
- Prevents AttributeError when accessing node.labels on null
3. **CREATE Validation (executor.py:1580-1590)**:
- Reject disjunctive labels in CREATE patterns
- Raise clear ValueError: "Disjunctive labels (using '|') are not allowed
in CREATE patterns"
- Prevents ambiguous behavior like CREATE (:Person|Company)
4. **Test Improvements (test_label_disjunction.py)**:
- Add pytest fixture for GraphForge to ensure test isolation
- Remove repetitive gf = GraphForge() boilerplate
- Add test for CREATE disjunction rejection
All tests passing (2,674 passed), coverage maintained at 92.22%.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/graphforge/executor/executor.py`:
- Around line 1868-1885: The MERGE node-pattern handling must reject disjunctive
labels early: add a validation at the start of the MERGE node-pattern block that
inspects node_pattern.labels and raises a ValueError if it contains a
disjunction (e.g., more than one label-group or any label_group with multiple
labels), mirroring the CREATE validation; reference the same error text used by
_create_node_from_pattern (or use a clear "disjunctive labels not allowed in
MERGE" message) so the check fails fast rather than deferring to the CREATE
fallback.
In `@tests/integration/test_label_disjunction.py`:
- Around line 177-180: Add a new integration test that mirrors
test_create_rejects_disjunctive_labels but for MERGE: create a test function
named test_merge_rejects_disjunctive_labels that calls gf.execute("MERGE
(:Person|Company {name: 'Alice'})") and asserts it raises ValueError with a
message matching "Disjunctive labels.*not allowed in MERGE" (use pytest.raises
and a regex match), so the executor behavior is documented and failures are
explicit.
- Around line 177-180: In test_create_rejects_disjunctive_labels change the
pytest.raises regex argument to a raw string to avoid accidental escape-sequence
issues: update the match from "Disjunctive labels.*not allowed in CREATE" to
r"Disjunctive labels.*not allowed in CREATE" in the with pytest.raises(...)
block that wraps gf.execute("CREATE (:Person|Company {name: 'Alice'})").
🧹 Nitpick comments (1)
src/graphforge/executor/executor.py (1)
284-301: Extract duplicated label-group scan logic into a helper.This block (scan by first label, filter conjunction, union across groups) is repeated identically in
_execute_optional_scan(lines 356–373) and_execute_merge(lines 1869–1885). A small helper would eliminate the triplication.♻️ Suggested helper
+ def _collect_nodes_by_label_spec(self, label_spec: list[list[str]]) -> list[NodeRef]: + """Collect nodes matching a label specification (disjunction of conjunctions).""" + all_nodes: set[NodeRef] = set() + for label_group in label_spec: + group_nodes = self.graph.get_nodes_by_label(label_group[0]) + if len(label_group) > 1: + group_nodes = [ + node for node in group_nodes + if all(label in node.labels for label in label_group) + ] + all_nodes.update(group_nodes) + return list(all_nodes)Then replace the three duplicated blocks with
nodes = self._collect_nodes_by_label_spec(op.labels).
Added validation to MERGE patterns to match CREATE behavior:
1. **MERGE Validation (executor.py:1865-1872)**:
- Check for disjunctive labels at start of MERGE node-pattern block
- Raise ValueError if len(node_pattern.labels) > 1
- Clear error: "Disjunctive labels (using '|') are not allowed in MERGE
patterns"
- Fails fast before attempting match/create logic
2. **Test Coverage (test_label_disjunction.py)**:
- Added test_merge_rejects_disjunctive_labels to verify MERGE rejection
- Fixed test_create_rejects_disjunctive_labels to use raw string (r"...")
- Both tests verify clear error messages for user feedback
Rationale:
- MERGE with disjunction (e.g., MERGE (:Person|Company)) is ambiguous
- Should MERGE create one node with both labels or match either?
- Rejecting early provides clear semantics and better error messages
- Consistent with CREATE behavior (both reject disjunctions)
All 2,675 tests passing, 47 MERGE tests verified, coverage at 92.23%.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/graphforge/executor/executor.py (1)
323-392:⚠️ Potential issue | 🟠 Major
_execute_optional_scandoes not bindpath_var, unlike_execute_scan.
OptionalScanNodeshas apath_varfield (operators.py line 74), but_execute_optional_scannever binds it. Meanwhile,_execute_scanconsistently bindspath_varwhen nodes are bound (lines 259–278, 314–318). Without this binding, OPTIONAL MATCH with named paths will silently drop the path binding, breaking queries likeOPTIONAL MATCH (n)-[r]-(m) AS path RETURN path.
🧹 Nitpick comments (2)
src/graphforge/executor/executor.py (2)
356-373: Duplicated disjunction-of-conjunctions scan logic.Lines 356–373 are nearly identical to lines 284–301 in
_execute_scan. Consider extracting a shared helper (e.g.,_collect_nodes_by_label_spec(label_spec) -> list[NodeRef]) to reduce duplication. The same pattern also appears in the MERGE path (lines 1876–1892).♻️ Sketch of extracted helper
def _collect_nodes_by_label_spec(self, label_spec: list[list[str]]) -> list["NodeRef"]: """Collect nodes matching a label spec (disjunction of conjunctions).""" all_nodes: set[NodeRef] = set() for label_group in label_spec: if not label_group: continue group_nodes = self.graph.get_nodes_by_label(label_group[0]) if len(label_group) > 1: group_nodes = [ n for n in group_nodes if all(lbl in n.labels for lbl in label_group) ] all_nodes.update(group_nodes) return list(all_nodes)
1865-1892: MERGE validation is correct; disjunction collection below is now dead logic.The early validation at line 1866 rejects
len(labels) > 1, so the loop at lines 1878–1890 will always iterate exactly once. The disjunction collection code (set-based aggregation across groups) is unreachable for its intended purpose. This isn't a bug, but it's misleading — a simpler single-group extraction would better communicate the intent post-validation.♻️ Simplified post-validation label handling
# Try to find existing node found_node = None if node_pattern.labels: - # Collect candidate nodes from all label groups (disjunction) - all_candidates = set() - for label_group in node_pattern.labels: - # Get nodes by first label in the group - group_candidates = self.graph.get_nodes_by_label(label_group[0]) - - # Filter to nodes with ALL labels in this group (conjunction) - if len(label_group) > 1: - group_candidates = [ - node - for node in group_candidates - if all(label in node.labels for label in label_group) - ] - - all_candidates.update(group_candidates) - - candidates = list(all_candidates) + # Single conjunction group (disjunction rejected above) + label_group = node_pattern.labels[0] + candidates = self.graph.get_nodes_by_label(label_group[0]) + if len(label_group) > 1: + candidates = [ + node for node in candidates + if all(label in node.labels for label in label_group) + ]
Closes #148
Summary
Implemented label disjunction (
:Label1|Label2) syntax for MATCH patterns, which was the primary missing feature from issue #148.Changes
Syntax Support
:Person|Company- matches nodes with Person OR Company label:Person:Employee- matches nodes with BOTH labels (existing):Person:Employee|Company:Startup- matches (Person AND Employee) OR (Company AND Startup)Implementation
Grammar (
cypher.lark):|separator":label_disjunction"where disjunction islabel_conjunction ("|" label_conjunction)*AST (
pattern.py):NodePattern.labelsfromlist[str]tolist[list[str]]Executor (
executor.py):_node_matches_labels()helper for matching logicOperators (
operators.py):ScanNodesandOptionalScanNodeslabel field typesTests
Added 12 comprehensive integration tests covering:
Updated 5 parser unit tests for new label format.
Results: All 182 parser tests pass ✅
Verification
Confirmed other issue #148 items already working:
Examples
Breaking Change Note
NodePattern.labels. Any code directly accessing this field will need updates. However, the query execution is fully backward compatible - existing queries continue to work.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Behavior Changes
Tests