Skip to content

fix: [cypher] label disjunction (n:A|B) returned 0 rows#4221

Merged
robfrank merged 5 commits into
mainfrom
fix/4211-cypher-label-disjunction
May 13, 2026
Merged

fix: [cypher] label disjunction (n:A|B) returned 0 rows#4221
robfrank merged 5 commits into
mainfrom
fix/4211-cypher-label-disjunction

Conversation

@robfrank
Copy link
Copy Markdown
Collaborator

Summary

  • Closes Node label disjunction patterns may match no rows #4211MATCH (n:A|B) returned 0 rows even when nodes with both labels exist.
  • Root cause: the optimizer always called Labels.getCompositeTypeName(["A","B"])"A~B", treating multiple labels as AND. NodeByLabelScan then short-circuited on existsType("A~B") → false.
  • LogicalNode now carries the labelDisjunction flag propagated from NodePattern. IndexSelectionRule.createAnchorOperator() routes disjunction patterns to a new NodeByLabelDisjunctionScan operator that walks each VertexType once and includes types matching any disjunction label.

Files

  • engine/src/main/java/com/arcadedb/query/opencypher/optimizer/plan/LogicalNode.javalabelDisjunction field + getter
  • engine/src/main/java/com/arcadedb/query/opencypher/optimizer/plan/LogicalPlan.java — propagates flag from NodePattern
  • engine/src/main/java/com/arcadedb/query/opencypher/executor/operators/NodeByLabelDisjunctionScan.java — new operator
  • engine/src/main/java/com/arcadedb/query/opencypher/optimizer/rules/IndexSelectionRule.java — disjunction routing in createAnchorOperator
  • engine/src/test/java/com/arcadedb/query/opencypher/CypherLabelDisjunctionTest.java — 8 regression tests

Test plan

  • CypherLabelDisjunctionTest — 8/8 pass (A|B, A|C, B|C, A|B|C, count variants, single-label sanity, disjunction + WHERE)
  • CypherMultiLabelTest, CypherMultiLabelPreExistingTypeTest, CypherLabelFilteringTest, CypherLabelCheckInWhereTest, CypherTest — 40/40 pass (no regressions)
  • CountEdgesOptimizationTest, CypherInlinePropertyFilterTest, CypherRangeIndexTest, CypherCountSubqueryTest, CypherPatternPredicateTest, CypherPolymorphicEdgeTraversalTest, CypherRelationDirectionTest — 50/50 pass
  • CypherCallYieldWithVariablesTest, CollectDistinctTest, CypherCaseTest, CypherExistsTest, CypherMissingFunctionsTest, CypherCountNonExistingLabelTest — 82/82 pass

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 12, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 26 complexity

Metric Results
Complexity 26

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for label disjunction node patterns in Cypher queries, such as (n:A|B). A new physical operator, NodeByLabelDisjunctionScan, has been implemented to scan multiple vertex types and return matching nodes. The logical plan and index selection rules were updated to identify disjunctions and apply the new operator accordingly. Comprehensive regression tests were also added in CypherLabelDisjunctionTest. I have no feedback to provide as no review comments were submitted.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 67.50000% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.00%. Comparing base (89e0d16) to head (4e632de).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
...executor/operators/NodeByLabelDisjunctionScan.java 60.93% 17 Missing and 8 partials ⚠️
...opencypher/optimizer/rules/IndexSelectionRule.java 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4221      +/-   ##
==========================================
+ Coverage   64.62%   65.00%   +0.38%     
==========================================
  Files        1643     1644       +1     
  Lines      126818   126894      +76     
  Branches    27149    27167      +18     
==========================================
+ Hits        81955    82490     +535     
+ Misses      33418    32844     -574     
- Partials    11445    11560     +115     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@robfrank
Copy link
Copy Markdown
Collaborator Author

@claude review this

@claude
Copy link
Copy Markdown

claude Bot commented May 12, 2026

Claude finished @robfrank's task in 2m 42s —— View job


Code Review

  • Gather context and read changed files
  • Review NodeByLabelDisjunctionScan.java (new operator)
  • Review LogicalNode.java and LogicalPlan.java (flag propagation)
  • Review IndexSelectionRule.java (routing logic)
  • Review CypherLabelDisjunctionTest.java (test coverage)
  • Summarize findings

Summary

The fix is correct and the root cause analysis is accurate. The optimizer path was always folding ["A","B"] into "A~B" via Labels.getCompositeTypeName, then short-circuiting on existsType("A~B") → false. The new NodeByLabelDisjunctionScan operator correctly mirrors the logic already working in the legacy MatchNodeStep path.


Issues Found

1. NodeByLabelDisjunctionScan is missing predicate pushdown (whereFilter)

NodeByLabelScan supports an optional inline BooleanExpression whereFilter that evaluates predicates during the scan rather than in a separate filter step. NodeByLabelDisjunctionScan has no equivalent, so a query like MATCH (n:A|B) WHERE n.prop = X always pays for a separate filter step. Not a bug, but a performance gap vs. the single-label scan.

NodeByLabelScan.java:49 — shows the existing pattern to follow.

2. LogicalNode.toString() shows conjunction-style labels for disjunctions

// LogicalNode.java:82
sb.append(":").append(String.join(":", labels));

This always renders labels as (n:A:B) even when labelDisjunction=true. The explain output from NodeByLabelDisjunctionScan.explain() correctly uses |, but any diagnostic that prints a LogicalNode directly (e.g. plan logging) will be misleading. A one-line fix:

sb.append(":").append(String.join(labelDisjunction ? "|" : ":", labels));

Fix this →

3. The && labels.size() > 1 guard in IndexSelectionRule is technically redundant

// IndexSelectionRule.java:111
if (anchor.getNode().isLabelDisjunction() && labels.size() > 1) {

If isLabelDisjunction() is true with only 1 label, Labels.getCompositeTypeName(["A"]) returns "A" and the normal NodeByLabelScan path handles it correctly — so this guard doesn't mask a bug. But the && suggests a case where isLabelDisjunction() could be true with 1 label that needs special handling; in practice there is none, and the comment could be clearer about this being a defensive guard only.


Observations (Not Blocking)

4. iterateType(type.getName(), false) is correct but worth a comment

NodeByLabelDisjunctionScan uses false for the polymorphic flag, while NodeByLabelScan uses true. Both are correct: the disjunction scan already walks every concrete VertexType in the schema and checks instanceOf(), so subtypes are naturally included via their own direct iterator without double-counting. A short inline comment would make this intent explicit for future readers.

This is consistent with how MatchNodeStep handles the same case (line 481 uses false too).

5. Test coverage gaps

The 8 regression tests are well-targeted. Two scenarios worth adding:

  • Type inheritance: MATCH (n:ParentType|OtherType) where ChildType extends ParentType — confirms subtypes are included correctly via instanceOf().
  • Conjunction regression: MATCH (n:A:B) (AND semantics) with a node of type A and a node of type B — confirms the conjunction path was not broken. The existing test suite runs this (per the PR description), but having it in this file makes the contract explicit alongside the disjunction tests.

6. Coverage report discrepancy

Codecov shows 0% diff coverage on NodeByLabelDisjunctionScan.java (62 lines uncovered). The report notes it is 1 commit behind head, which likely explains it — the 8 test cases do exercise this operator. Worth verifying that CI re-runs the coverage report on the final commit before merge.


Verdict

The fix is clean, well-scoped, and consistent with the existing codebase patterns. Item 2 (LogicalNode.toString()) is the only change I'd call out for correctness of diagnostics. Items 1 and 5 are worth tracking as follow-ups but are not blockers.

The optimizer treated multiple labels as AND conjunction unconditionally,
calling Labels.getCompositeTypeName which returned "A~B" — a type that
does not exist in the schema, so NodeByLabelScan returned 0 rows.

LogicalNode now carries the labelDisjunction flag from NodePattern, and
IndexSelectionRule routes disjunction patterns to a new
NodeByLabelDisjunctionScan operator that iterates each VertexType once
and includes types that are instanceOf any disjunction label.

Closes #4211
@robfrank robfrank force-pushed the fix/4211-cypher-label-disjunction branch from feccdee to b0b8d6e Compare May 13, 2026 07:13
@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Code Review

Overview

This PR adds optimizer-path support for Cypher label disjunction patterns ((n:A|B)) via a new NodeByLabelDisjunctionScan operator. The approach is architecturally correct: propagate the labelDisjunction flag from the AST (NodePattern) through the logical plan (LogicalNode) and route it to the new physical operator in IndexSelectionRule.


Critical: New operator is dead code - planner gate not updated

CypherExecutionPlanner.shouldUseOptimizer() at line 244 unconditionally falls back to the legacy path for all multi-label nodes:

// Multi-label nodes not yet supported in optimizer
if (node.getLabels().size() > 1)
    return false;

This check does not distinguish between conjunction (n:A:B) and disjunction (n:A|B). As a result, NodeByLabelDisjunctionScan is never invoked - every (n:A|B) query falls through to the legacy MatchNodeStep, which already handles disjunctions correctly at lines 456-486.

The tests pass, but they exercise the legacy path - not the new operator. Codecov confirms this: 0% coverage on NodeByLabelDisjunctionScan.java (62 lines uncovered).

The planner gate needs to be updated to allow disjunctions through:

// Multi-label conjunction (n:A:B) not yet supported in optimizer.
// Disjunction (n:A|B) is handled by NodeByLabelDisjunctionScan.
if (node.getLabels().size() > 1 && !node.isLabelDisjunction())
    return false;

Without this change, the PR's stated root cause (optimizer calling getCompositeTypeName(["A","B"])"A~B") does not describe the actual failing path, since the optimizer was never invoked for multi-label nodes.


Test coverage gaps

  • Inheritance/polymorphism with disjunction: If Dog extends Animal, does MATCH (n:Animal|Pet) return Dog nodes? The buildMatchingIterators() loop uses type.instanceOf(label) which correctly traverses the type hierarchy, but this is untested. Without a test, a future refactor could silently break it.

  • Disjunction in non-anchor position: MATCH (a:X)-[:E]->(n:A|B) is not tested.

  • No test actually exercises NodeByLabelDisjunctionScan: because of the planner gate issue above, all current tests go through MatchNodeStep.


Code quality notes

NodeByLabelDisjunctionScan.java

  • The iterateType(type.getName(), false) choice (non-polymorphic) is correct. Because the outer loop already visits every schema type including subtypes (via instanceOf()), using polymorphic=true would produce duplicates - subtype records would appear once from the parent bucket and once from the subtype bucket. Good call.

  • The close() no-op is fine since these iterators hold no external resources, but a one-line comment saying why would prevent future maintainers from guessing.

  • Minor naming: inside the anonymous ResultSet, iteratorQueue is actually an Iterator<Iterator<Identifiable>>, not a queue. A name like typeIteratorCursor or labelIteratorCursor would be clearer.

  • The operator does not support inline WHERE pre-filtering (unlike NodeByLabelScan which accepts a whereFilter). Once the planner gate is fixed and the optimizer handles disjunctions, a WHERE clause like MATCH (n:A|B) WHERE n.id > 1 would need the filter applied as a separate step. This is functionally correct but worth documenting.

IndexSelectionRule.java

The three comments that were removed ("// FULL SCAN", "// INDEX SEEK (equality)", "// RANGE SCAN - pass predicates for runtime parameter resolution") served as navigation markers for the branching logic. Consider keeping them since the code is routing between meaningfully different physical strategies.

LogicalNode.java

The two-constructor approach (3-arg delegating to 4-arg) is clean and backward-compatible. Using final on labelDisjunction is consistent with the rest of the class.


Summary

Severity Issue
Critical CypherExecutionPlanner.java:244 still excludes all multi-label nodes - NodeByLabelDisjunctionScan is never reached
High 0% test coverage on the new operator (confirmed by Codecov) because the planner gate prevents it from being invoked
Medium No test for polymorphic inheritance with disjunction (e.g., subtype records via `MATCH (n:Animal
Low Removed routing comments from IndexSelectionRule.java reduce navigability
Low iteratorQueue variable name is misleading in the anonymous ResultSet

The architecture and operator implementation are sound. The missing piece is wiring the planner to allow disjunctions through to the optimizer, and adding a test that actually exercises the new operator end-to-end.

Code review revealed the new operator was dead code: the planner gate at
CypherExecutionPlanner.shouldUseOptimizer falls back to the legacy path
for every multi-label node, so NodeByLabelDisjunctionScan was never
invoked. The disjunction tests passed via the legacy MatchNodeStep
(already fixed for issue #4105).

The gate now allows disjunction patterns through to the optimizer when
the node has no incident relationships, since ExpandAll/ExpandInto carry
a single targetLabel and cannot represent OR semantics in non-anchor
position.

Adds a subtype/inheritance regression test (Animal|Pet matches A:Dog
node via instanceOf hierarchy traversal). Renames the internal cursor
variable for clarity.
@robfrank
Copy link
Copy Markdown
Collaborator Author

The critical finding is correct. Verified with a probe (throw from NodeByLabelDisjunctionScan.execute()): the original tests passed without the operator ever being invoked - the legacy MatchNodeStep was handling the disjunction (already fixed for issue #4105 in commit d621269).

Fixed in 4845bd6:

  • CypherExecutionPlanner.shouldUseOptimizer now lets disjunction nodes through to the optimizer when the path has no incident relationships. Target-side disjunction ((a)-[:R]->(n:A|B)) stays on the legacy path because ExpandAll/ExpandInto carry a single targetLabel and can't represent OR semantics. Documented in the gate comment.
  • Probe-verified: 8/9 tests now invoke NodeByLabelDisjunctionScan.execute (the single-label sanity test correctly does not).
  • Added the subtype/inheritance regression test ((:Animal:Dog) and (:Pet) both match MATCH (n:Animal|Pet) via instanceOf hierarchy).
  • Renamed iteratorQueuetypeIteratorCursor.

Skipped the target-side disjunction test: it exposes a pre-existing gap in ExpandAll.targetLabel that's outside the scope of #4211 and would also fail on legacy. Worth a separate issue.

Skipped restoring the // FULL SCAN / // INDEX SEEK markers in IndexSelectionRule: per CLAUDE.md, comments should explain WHY, not WHAT - the branching on anchor.useIndex()/anchor.isRangeScan() is already self-describing.

The commit message on the original commit (b0b8d6e) overstated the root cause; the new commit explains the actual wiring.

@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Code Review - PR #4221: fix: [cypher] label disjunction (n:A|B) returned 0 rows

Overview

This PR fixes a real, reproducible bug: MATCH (n:A|B) returning 0 rows when nodes exist. The root cause is clear and well-diagnosed - the optimizer was calling Labels.getCompositeTypeName(["A","B"]) to produce "A~B", which never exists as a schema type, causing NodeByLabelScan to short-circuit on existsType("A~B") == false. The fix is architecturally sound: propagate the labelDisjunction flag from AST to the logical plan, then route disjunction anchors to a new physical operator that walks each matching VertexType individually.

Overall this is a clean, well-structured fix. A few things worth addressing before merging:


Issues

1. Missing WHERE predicate pushdown (moderate)

NodeByLabelScan accepts an optional BooleanExpression whereFilter that is evaluated inline during scanning (predicate pushdown). NodeByLabelDisjunctionScan has no equivalent. For queries like:

MATCH (n:A|B) WHERE n.status = 'active'

all matching nodes must be fully materialized before the filter step runs, rather than filtering during the scan. For large datasets this will be noticeably slower than the single-label case. At minimum this should be documented as a known limitation, ideally with a follow-up ticket.

2. buildMatchingIterators opens all type iterators eagerly

if (typeIteratorCursor == null)
    typeIteratorCursor = buildMatchingIterators(context).iterator();

buildMatchingIterators calls context.getDatabase().iterateType(...) for every matching VertexType immediately. For schemas with many subtypes, this opens many iterators before any records are consumed. A lazy approach - opening each type's iterator only when the previous one is exhausted - would reduce resource contention on large schemas.

3. getLabels() returns the mutable internal list

public List<String> getLabels() {
    return labels;
}

This exposes the internal List<String>, allowing callers to mutate operator state. Should return Collections.unmodifiableList(labels). (Same concern exists in NodeByLabelScan for getLabel(), but that returns an immutable String.)

4. Test gap - relationship pattern fallback

The shouldUseOptimizer() guard falls back to the legacy path when path.getRelationshipCount() > 0. There is no test verifying that MATCH (n:A|B)-[:REL]->(m) returns correct results (rather than 0 rows) via the legacy path. This is the most common real-world pattern for label disjunction and should have coverage.


Minor observations

  • Comment removal in IndexSelectionRule: The PR removes the // RANGE SCAN, // INDEX SEEK (equality), and // FULL SCAN inline comments. These were useful orientation markers. Not a blocker, but reconsider keeping them.

  • iterateType(type.getName(), false) vs true: NodeByLabelScan uses true (include subtypes). The new operator uses false (direct type only) but covers subtypes by explicitly iterating each subtype found via type.instanceOf(label). This is correct and avoids duplicates - the break on first label match ensures each type is added at most once. The logic is sound but subtle; a brief comment would help the next reader.

  • Single-label disjunction guard: The condition labels.size() > 1 in createAnchorOperator means a parser-produced single-label disjunction (impossible today, but defensive) would silently fall through to NodeByLabelScan. This is fine but worth noting.


Test coverage

Eight tests covering the core cases (A|B, A|C, B|C, A|B|C, count, WHERE filter, subtypes) are solid. The regression test suite run described in the PR description is thorough. The main gap is the relationship-pattern fallback case noted above.


Summary

The fix is correct, the operator pattern follows existing conventions, and test coverage is good. The two things I'd want addressed before merge are: (1) a test for the relationship-pattern fallback path, and (2) a follow-up ticket (or at least a // TODO comment) for WHERE predicate pushdown in NodeByLabelDisjunctionScan.

Adds regression test for MATCH (n:A|B)-[:REL]->(m) routing through the
legacy MatchNodeStep (planner gate falls back when relationships are
incident on a disjunction node). Documents the polymorphic=false choice
in NodeByLabelDisjunctionScan.buildMatchingIterators.
@robfrank
Copy link
Copy Markdown
Collaborator Author

Per-item evaluation:

#1 WHERE predicate pushdown - Skipped. Pushdown is a perf optimization, not a correctness gap, and the existing tests show WHERE works as a separate filter step. Worth a follow-up issue but out of scope for fixing #4211.

#2 Eager iterator construction - Skipped. LocalDatabase.iterateType just wraps each bucket's iterator in a MultiIterator under the read lock; the page reads themselves stay lazy inside Bucket.iterator(). Eager wrapper construction is cheap. Lazy type-iterator opening would be a micro-optimization with minimal practical benefit (typical disjunctions are 2-3 labels).

#3 getLabels() mutable list - Skipped per CLAUDE.md guidance against defensive validation for scenarios that can't happen. The getter is only consumed within the engine; no caller mutates it.

#4 Relationship-pattern fallback test - Added in 159204b: labelDisjunctionAnchorWithRelationshipFallsBackToLegacy exercises MATCH (n:A|B)-[:REL]->(m) and verifies the legacy MatchNodeStep returns the expected rows.

Minor: polymorphic=false reasoning - Added a Javadoc explaining why iterateType(..., false) is correct here (outer schema-types loop already enumerates every subtype; using true would duplicate records that match through both their own type and a parent type).

Minor: removed inline comments in IndexSelectionRule - Keeping them removed per CLAUDE.md: comments should state WHY, not WHAT; the branching on anchor.useIndex()/anchor.isRangeScan() is already self-describing.

@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Code Review - fix: [cypher] label disjunction (n:A|B) returned 0 rows

Overview

Good, targeted fix for a clear bug. The root cause (optimizer collapsing A|B labels into the composite AND-type name A~B) is correctly diagnosed, and the solution introduces a dedicated NodeByLabelDisjunctionScan operator rather than hacking around it in the existing scan path. The fallback to the legacy MatchNodeStep path for patterns with relationships is a safe conservative choice.


Code Quality

Strengths

  • buildMatchingIterators correctly passes polymorphic=false to iterateType - the outer loop already enumerates every subtype, so using true would produce duplicate rows. The Javadoc explains this well.
  • LogicalNode is kept immutable; the flag is threaded cleanly from NodePatternLogicalNodeIndexSelectionRule.
  • explain() output follows the existing pattern in NodeByLabelScan.
  • Good inline comments in shouldUseOptimizer explaining why disjunction with relationships falls back.

Issues

1. getLabels() exposes the internal mutable list

// NodeByLabelDisjunctionScan.java
public List<String> getLabels() {
    return labels;
}

Callers can modify the internal labels field. Either wrap in Collections.unmodifiableList(labels) in the constructor or return List.copyOf(labels) from the getter, consistent with how other immutable value objects are written in this codebase.

2. No predicate pushdown support

NodeByLabelScan accepts an optional whereFilter (BooleanExpression) so that WHERE predicates can be evaluated inline during the scan, avoiding a separate FilterOperator pass. NodeByLabelDisjunctionScan has no such facility. The test labelDisjunctionWithPropertyFilter passes today because FilterOperator sits above the scan, but it means every row is materialised before being filtered. For large datasets with selective WHERE clauses on (n:A|B) patterns, this is a measurable regression vs. the single-label path.

This does not block the PR, but is worth a TODO comment or follow-up issue.

3. Duplicate-label edge case produces duplicate rows

buildMatchingIterators iterates each matching type and breaks on the first label match:

for (final String label : labels) {
    if (type.instanceOf(label)) {
        iterators.add(iter);
        break;          // correct: avoids duplicate for a type matching multiple labels
    }
}

The break correctly avoids adding the same type twice when a single type matches multiple disjunction labels (e.g., a subtype of both A and B). However, if the labels list itself contains duplicates - (n:A|A) - no deduplication happens on the label list and the type is added once per unique label it matches. The grammar/parser should prevent this, but a defensive Set<String> for labels in the constructor would make the operator robust.

4. Overly conservative optimizer fallback

if (path.getRelationshipCount() > 0)
    return false;

This applies to the whole path whenever any node in the path has a multi-label disjunction. For the pattern (n:A|B)-[:REL]->(m:C) the anchor n:A|B could use NodeByLabelDisjunctionScan followed by a normal ExpandAll to m:C, since the expansion target has a single label and does not need OR semantics. The restriction is safe and the PR description explains it clearly, but it means this common pattern still uses the legacy path. A follow-up issue to relax the check for anchor-side disjunction only would be worthwhile.


Test Coverage

  • 8 regression tests cover the primary cases (A|B, A|C, B|C, A|B|C, count, single-label sanity, WHERE filter, relationship fallback). Good breadth.
  • The test labelDisjunctionMatchesSubtypeInstances creates (:Animal:Dog {id:10}) (conjunction/AND) and then queries (n:Animal|Pet). The test intent is to verify instanceOf semantics in the disjunction scan. However, it is unclear whether the composite type Animal~Dog actually has instanceOf("Animal") return true in ArcadeDB without an explicit inheritance declaration. If the composite type is created ad-hoc (no EXTENDS Animal in the schema), the test may be relying on implicit behaviour that could change. Consider replacing with an explicit schema setup:
database.command("sql", "CREATE VERTEX TYPE Animal");
database.command("sql", "CREATE VERTEX TYPE Dog EXTENDS Animal");
database.command("opencypher", "CREATE (:Dog {id: 10})");
  • No test covers the case where a disjunction label does not exist in the schema at all ((n:Nonexistent|B)). The scan should return only nodes of type B - worth adding a short test.
  • No test covers (n:A|B) where a subtype of A also extends B - the break path in buildMatchingIterators prevents double-counting, but this is worth an explicit test.

Minor Nits

  • Unused import: java.util.Locale is imported but only used in explain() via String.format(Locale.US, ...) - that's fine, just making sure it stays if the format line is ever refactored.
  • The removed // RANGE SCAN, // INDEX SEEK (equality), and // FULL SCAN comments in IndexSelectionRule were useful orientation markers. Their removal is fine since the section now has a longer block comment above each branch, but the disjunction block itself could use one line noting why index seeks are skipped (e.g., // Index seek is only valid for single-label anchors; disjunction requires per-type scan).

Summary

The fix is correct and the approach is sound. The main actionable items before merge are:

  1. Wrap getLabels() return in an unmodifiable view (defensive).
  2. Consider adding a test for a non-existent label in the disjunction.
  3. Clarify the labelDisjunctionMatchesSubtypeInstances test setup so the inheritance relationship is explicit.

Items 2 (predicate pushdown) and 4 (relaxing the relationship fallback) are good follow-up issues but do not block correctness.

Documents the deliberate scope decisions raised in code review: no
inline WHERE pushdown, eager iterator construction is cheap due to
MultiIterator lazy page reads, and getLabels does not defensively copy.
@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Code Review

Overview

This PR correctly fixes a long-standing bug where MATCH (n:A|B) returned 0 rows. The root cause - the optimizer folding A|B labels into the composite conjunction name A~B then failing the existsType check - is cleanly addressed. The fix is minimal and well-scoped: a new labelDisjunction flag propagated from NodePattern to LogicalNode, a new NodeByLabelDisjunctionScan physical operator, and gating in shouldUseOptimizer().


Strengths

  • Correct deduplication: Using polymorphic=false in buildMatchingIterators and breaking after the first matching label means each concrete type's iterator is added exactly once - no duplicate rows when a subtype satisfies multiple disjunction labels.
  • Backward-compatible: The constructor chain in LogicalNode defaults labelDisjunction=false, so no existing callers change behavior.
  • Consistent style: The ResultSet anonymous-class structure, the double cast (Iterator<Identifiable>) (Object), and the empty close() all mirror NodeByLabelScan. Good.
  • Relationship fallback guarded correctly: The path.getRelationshipCount() > 0 guard in shouldUseOptimizer() prevents routing disjunction anchors through ExpandAll, which cannot represent OR semantics on the target side.
  • Good test coverage: 9 regression tests covering the main cross products of labels, count variants, WHERE pushdown, and the relationship-fallback path.

Issues

1. fetchMore does not eagerly set finished when the last iterator empties during a full batch

In NodeByLabelScan.fetchMore, after filling the buffer the code does:

if (!iterator.hasNext()) {
  finished = true;
}

NodeByLabelDisjunctionScan only sets finished = true inside the !typeIteratorCursor.hasNext() branch, so if the very last type iterator is drained exactly when buffer.size() == n, the finished flag stays false. The next hasNext() call triggers a redundant fetchMore that immediately returns an empty buffer and sets finished. This is one extra method call and a buffer clear per query that reads to the end - minor, but worth aligning with NodeByLabelScan's pattern.

2. labelDisjunctionMatchesSubtypeInstances does not actually test instanceOf with a schema hierarchy

The test creates (:Animal:Dog {id: 10}) - that is a multi-label conjunction vertex, not a subtype relationship. instanceOf("Animal") returns true here because the vertex directly has that label, not because Dog extends Animal. The polymorphic=false + outer-loop deduplication logic for a real subtype hierarchy (where DogType.instanceOf("Animal") is true via the schema parent chain) is never exercised. Consider adding a variant that uses database.getSchema().createVertexType("Dog").addSuperType("Animal") to verify the inheritance path.

3. Cost estimation for disjunction is inaccurate

estimatedCardinality passed to NodeByLabelDisjunctionScan comes from anchor.getEstimatedCardinality(), which is computed for a single-label/conjunction scan. For (n:A|B) the true upper bound is |A| + |B|. This matters when the optimizer compares the disjunction anchor against other candidate anchors in a join: an underestimated cardinality might cause the optimizer to prefer the disjunction side as the driving node when it should be the probing side. Worth tracking as a known limitation even if not fixed here.

4. No WHERE pushdown for NodeByLabelDisjunctionScan

The Javadoc already acknowledges this, so it's clearly a known gap. It just means that MATCH (n:A|B) WHERE n.active = true performs a full disjunction scan and then a separate filter step, rather than pushing the filter into the scan. Fine to leave as a follow-up, but worth calling out in the PR description for tracking.


Minor / Style

  • IndexSelectionRule removes the // RANGE SCAN, // INDEX SEEK (equality), // FULL SCAN inline section labels. CLAUDE.md guidance says to omit comments that explain what the code does, so these removals are fine - just worth confirming the intent was cleanup rather than accidental deletion.
  • getLabels() returns the internal list undefensively (documented in the comment). This is acceptable but callers should be aware.

Summary

The fix is correct and the test suite is solid. Two things worth addressing before merging:

  1. Align the finished flag behavior in fetchMore with NodeByLabelScan (trivial).
  2. Add a test that exercises instanceOf through a real schema subtype hierarchy, not just a multi-label conjunction vertex.

The cost estimation and WHERE pushdown gaps are reasonable deferred items.

- fetchMore now sets finished eagerly when the last type iterator drains
  in the same call that fills the buffer, matching NodeByLabelScan's
  pattern and avoiding one redundant fetchMore round-trip per query.
- Adds a regression test using a real schema parent chain (Dog extends
  Animal via addSuperType) to exercise type.instanceOf walking the
  schema hierarchy, distinct from the existing multi-label conjunction
  test (:Animal:Dog).
- Documents the cost-estimation gap for disjunctions as a known
  limitation (estimatedCardinality reflects only the first label).
@robfrank
Copy link
Copy Markdown
Collaborator Author

Per-item evaluation (4e632de):

#1 fetchMore finished-flag eagerness - Fixed. After the fill loop, the operator now checks (currentIterator drained) && (typeIteratorCursor drained) and sets finished = true in the same call, mirroring NodeByLabelScan's post-loop check. Eliminates one redundant fetchMore round-trip per full-read query.

#2 Subtype-hierarchy coverage - Fixed. Added labelDisjunctionMatchesSchemaSubtype: creates Dog as a child of Animal via schema.getOrCreateVertexType("Dog").addSuperType("Animal"), then verifies MATCH (n:Animal|Pet) returns Dog records. This exercises the type.instanceOf(label) walk through the parent chain, which the previous (:Animal:Dog) multi-label test could not isolate.

#3 Cost estimation gap - Documented in the operator Javadoc as a known limitation: estimatedCardinality reflects the first-label cost only, so join-order ranking with disjunction anchors can be biased. Worth a follow-up issue.

#4 WHERE pushdown - Already documented in the operator Javadoc; flagged in PR description for follow-up.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 13, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 29 complexity

Metric Results
Complexity 29

View in Codacy

🟢 Coverage 78.75% diff coverage · -7.11% coverage variation

Metric Results
Coverage variation -7.11% coverage variation
Diff coverage 78.75% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (89e0d16) 126818 93429 73.67%
Head commit (4e632de) 158572 (+31754) 105542 (+12113) 66.56% (-7.11%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4221) 80 63 78.75%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Code Review: PR #4221 - fix: [cypher] label disjunction (n:A|B) returned 0 rows

Summary

This PR fixes issue #4211 where MATCH (n:A|B) returned 0 rows via the optimizer path. The root cause was IndexSelectionRule.createAnchorOperator() calling Labels.getCompositeTypeName(["A","B"]) which returns "A~B" - a composite type that doesn't exist in the schema - causing NodeByLabelScan to short-circuit on existsType("A~B") == false.

The fix introduces a new NodeByLabelDisjunctionScan operator and routes simple multi-label disjunction patterns to it in the optimizer planner.


What Works Well

  • Correct root cause analysis - The composite-type lookup bug is well identified and the fix is targeted.
  • Correct scope limitation - The planner correctly falls back to the legacy path when a disjunction node has incident relationships (path.getRelationshipCount() > 0), since ExpandAll/ExpandInto carry a single targetLabel and cannot represent OR semantics.
  • Schema inheritance - buildMatchingIterators uses type.instanceOf(label) (not simple name equality), so composite types like Animal~Dog correctly match a query for Animal.
  • polymorphic=false choice - The outer schema loop already enumerates all subtypes; using polymorphic=true would produce duplicates for types matching through both their own name and a parent name.
  • Buffer management - fetchMore() correctly handles type boundary crossings, exhausted iterators, and the finished flag.
  • Thorough Javadoc - NodeByLabelDisjunctionScan explicitly documents known limitations (cardinality underestimation, no inline WHERE pushdown) as follow-up items.
  • Good test variety - Tests cover two-label, three-label, non-exhaustive, schema inheritance, multi-label composite type, relationship fallback, single-label sanity, and WHERE filter cases.

Issues

Major

M1: Cardinality underestimation in AnchorSelector for disjunction anchors

AnchorSelector.evaluateNode() calls node.getFirstLabel() and queries statistics only for the first label. For (n:A|B), if A has 10 records and B has 1000 records, the estimated cardinality is 10 but the actual is ~1010. This causes incorrect join-order decisions when a disjunction node competes against other anchors in complex queries. While this is acknowledged in the Javadoc as a known limitation, it should be tracked as a follow-up GitHub issue to avoid silently producing suboptimal plans.

M2: No test verifying the optimizer path is actually taken

Commit history shows the new operator was dead code in the first commit (the planner gate was not updated). The tests still passed because they were testing the legacy MatchNodeStep path. Consider adding a test that verifies NodeByLabelDisjunctionScan appears in the EXPLAIN output, which would catch regressions where the operator becomes dead code again:

@Test
public void testOptimizerPlanUsesDisjunctionScan() {
    // EXPLAIN returns the execution plan as a string
    try (ResultSet rs = database.query("cypher", "EXPLAIN MATCH (n:A|B) RETURN n")) {
        String plan = rs.next().getProperty("plan").toString();
        assertThat(plan).contains("NodeByLabelDisjunctionScan");
    }
}

Minor

m1: ResultSet is never closed in CypherLabelDisjunctionTest

None of the @Test methods call rs.close(). Other tests in the same package (e.g., CypherCountNonExistingLabelTest) explicitly close the result set. Since ResultSet implements AutoCloseable, prefer try-with-resources:

try (ResultSet rs = database.query("cypher", "MATCH (n:A|B) RETURN n")) {
    // assertions
}

m2: Queries run outside transactions

All queries in CypherLabelDisjunctionTest run outside a transaction, while the surrounding test class style uses database.transaction(...). This is inconsistent with existing tests in the same package.

m3: Stale Javadoc on CypherOptimizer.createAnchorOperator()

The @return tag still says physical operator (NodeIndexSeek or NodeByLabelScan) but the method can now also return NodeByLabelDisjunctionScan or NodeIndexRangeScan. Worth updating for future maintainers.

m4: Ordering assumption in labelDisjunctionWithPropertyFilter

assertThat(ids).containsExactly(2);

containsExactly implies ordering matters. Since there's only one result this works, but containsExactlyInAnyOrder(2) (or simply containsOnly(2)) would be more semantically precise for a query without ORDER BY.

m5: getLabels() returns internal list without defensive copy

NodeByLabelDisjunctionScan.getLabels() returns the labels field directly. The Javadoc notes this is intentional, but it is fragile for future callers. Since the list is small (typically 2-3 labels), a Collections.unmodifiableList() wrapper would add safety at negligible cost.


Summary Recommendation

The bug fix is correct and the design is sound. The main actionable items are:

  1. File a follow-up issue for cardinality estimation (sum record counts across all disjunction labels, not just the first).
  2. Add an EXPLAIN-based test to guard against the operator silently becoming dead code again.
  3. Add rs.close() / try-with-resources in CypherLabelDisjunctionTest.
  4. Update the stale @return Javadoc on createAnchorOperator().

The PR is otherwise ready to merge - the fix is correct, the scope is appropriately narrow, and the test coverage is good for the functional cases.

@robfrank robfrank merged commit 5df3b6f into main May 13, 2026
22 of 24 checks passed
@robfrank robfrank added this to the 26.6.1 milestone May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Node label disjunction patterns may match no rows

1 participant