Skip to content

fix(opencypher): MATCH WHERE ID(n) = <expr> falls back to full scan when expr is dynamic#3865

Merged
lvca merged 2 commits intoArcadeData:mainfrom
ExtReMLapin:opt_match_expr
Apr 15, 2026
Merged

fix(opencypher): MATCH WHERE ID(n) = <expr> falls back to full scan when expr is dynamic#3865
lvca merged 2 commits intoArcadeData:mainfrom
ExtReMLapin:opt_match_expr

Conversation

@ExtReMLapin
Copy link
Copy Markdown
Contributor

Fixes #3864

Long story short, in

UNWIND $batch AS BatchEntry
MATCH (b:CHUNK) WHERE ID(b) = BatchEntry.destRID
CREATE (p:CHUNK_EMBEDDING {vector: BatchEntry.vector})
CREATE (p)-[:embb]->(b)  

MATCH (b:CHUNK) WHERE ID(b) = BatchEntry.destRID is doing a full scan on CHUNK

@ExtReMLapin
Copy link
Copy Markdown
Contributor Author

Gemini wrote that, claude checked it

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements dynamic ID filtering within MatchNodeStep to optimize OpenCypher queries where the node ID is specified in a WHERE clause. It introduces logic to extract ID filters from boolean expressions and refactors the ExpressionEvaluator into a class member for reuse. Review feedback highlights a potential NullPointerException when a vertex is not found by its RID and suggests optimizing the AST traversal by pre-analyzing the filter expression.

Comment on lines +309 to 312
final RID rid = new RID(context.getDatabase(), effectiveIdFilter);
final Identifiable vertex = context.getDatabase().lookupByRID(rid, true);
// Return single-element iterator for the matched vertex
return List.of(vertex).iterator();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The List.of(vertex) call will throw a NullPointerException if the RID is valid but the vertex does not exist in the database (i.e., lookupByRID returns null). Since List.of does not allow null elements, you should check for null and return an empty iterator instead.

        final RID rid = new RID(context.getDatabase(), effectiveIdFilter);
        final Identifiable vertex = context.getDatabase().lookupByRID(rid, true);
        if (vertex == null)
          return Collections.emptyIterator();
        // Return single-element iterator for the matched vertex
        return List.of(vertex).iterator();

// Check for dynamic ID filter from WHERE clause if static idFilter is not present
String effectiveIdFilter = this.idFilter;
if ((effectiveIdFilter == null || effectiveIdFilter.isEmpty()) && whereFilter != null) {
effectiveIdFilter = extractDynamicIdFilter(whereFilter, currentInputResult);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The extractDynamicIdFilter method is called for every input row and performs a recursive traversal of the whereFilter AST. Since the structure of the whereFilter is constant for the duration of this execution step, this traversal is redundant.

It would be more efficient to pre-analyze the whereFilter once (e.g., in the constructor) to identify the Expression that provides the ID value, and then simply evaluate that expression here. Additionally, consider supporting elementId() in addition to id() for broader Cypher compatibility.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 15, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

@ExtReMLapin
Copy link
Copy Markdown
Contributor Author

According to claude (but I don't have any tokens left) There might be other cases where it's not fixed, for example

UNWIND $batch AS BatchEntry
MATCH (b:CHUNK {someKey: BatchEntry.value})  -- inline props, not WHERE clause
  1. MergeStep.findNode() / findAllNodes() - always full scan
    MERGE (n:CHUNK {name: $value})

@lvca lvca self-requested a review April 15, 2026 17:13
@lvca lvca added this to the 26.4.1 milestone Apr 15, 2026
@lvca lvca merged commit eb0faca into ArcadeData:main Apr 15, 2026
13 of 16 checks passed
@lvca
Copy link
Copy Markdown
Member

lvca commented Apr 15, 2026

It actually makes sense! Merged, thanks!! I'm going to write some test cases to avoid future regressions.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 38.82353% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.94%. Comparing base (973cb52) to head (801337d).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...edb/query/opencypher/executor/steps/MergeStep.java 16.66% 32 Missing and 8 partials ⚠️
...query/opencypher/executor/steps/MatchNodeStep.java 67.56% 5 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3865      +/-   ##
==========================================
+ Coverage   64.66%   64.94%   +0.28%     
==========================================
  Files        1579     1579              
  Lines      116503   116618     +115     
  Branches    24707    24749      +42     
==========================================
+ Hits        75335    75742     +407     
+ Misses      30871    30504     -367     
- Partials    10297    10372      +75     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cypher Batch creation is slow with vector indexes

2 participants