Skip to content

fix: Phase 7 — pattern predicates, var-length exact, OPTIONAL MATCH correctness#376

Merged
DecisionNerd merged 6 commits into
mainfrom
fix/306-302-pattern-predicates-triadic
Apr 30, 2026
Merged

fix: Phase 7 — pattern predicates, var-length exact, OPTIONAL MATCH correctness#376
DecisionNerd merged 6 commits into
mainfrom
fix/306-302-pattern-predicates-triadic

Conversation

@DecisionNerd
Copy link
Copy Markdown
Owner

@DecisionNerd DecisionNerd commented Apr 30, 2026

Summary

  • Implements pattern predicates in WHERE clauses (WHERE (n)-[:R]->() as boolean existence checks) — fixes issues fix: filter node patterns in WHERE clause #306 and fix: triadic friend-of-friend pattern matching #302
  • Adds var_length_exact grammar support (*2 for fixed-count traversals)
  • Adds ApplyOptional operator for atomically left-joining multi-hop OPTIONAL MATCH patterns
  • Fixes _null_fill in optional expand to preserve pre-bound variables (triadic regressions)
  • Fixes var-length OPTIONAL MATCH bound-dst filtering and null propagation
  • Inlines dst label filters into OptionalExpandEdges.dst_labels to prevent post-filter from dropping null rows

TCK Impact

~+13 net tests fixed in pattern/optional/triadic categories. Zero new regressions introduced. All 4,257 unit + integration tests pass.

Known Deferred (tracked in #309)

  • var_length_optional_relationships: edge-tracking cycle detection needed (B→LOOP→B returns B once instead of twice)
  • optional_match_and_bound_nodes: unbound src OPTIONAL MATCH needs ApplyOptional wrapping
  • optionally_matching_relationship_with_a_relationship_that_is_already_bound: bound-edge lookup not yet implemented
  • EXISTS { bare_pattern } (no MATCH keyword): grammar alternative needed

Closes #306
Closes #302

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Pattern-based existence predicates in WHERE (node–relationship–node patterns)
    • Exact var-length hops (e.g., *2)
    • Apply-optional operator to run inner subplans per outer row
    • Enhanced OPTIONAL MATCH: improved null-filling, multi-hop handling, and destination-label filtering
  • Behavior / Validation

    • Stricter validation and scoping for pattern predicates in projections, SET, and RETURN
  • Tests

    • Normalization now treats plain Python lists as order-insensitive for comparisons

…orrectness (#306, #302)

- Add PatternPredicate AST node and grammar rule for WHERE (n)-[:R]->() as boolean existence checks
- Add var_length_exact (*N) grammar for fixed-count var-length patterns
- Add _evaluate_pattern_predicate and _var_length_reachable in evaluator for pattern predicate evaluation
- Add UndefinedVariable validation for named vars in pattern predicates
- Add UnexpectedSyntax validation blocking pattern predicates in RETURN/WITH/SET contexts
- Add ApplyOptional operator for atomic multi-hop OPTIONAL MATCH left-join semantics
- Add optional flag to ExpandVariableLength for null-src propagation in OPTIONAL context
- Add dst_labels field to OptionalExpandEdges to inline label filter (prevents post-filter from dropping null rows)
- Fix _null_fill to preserve pre-bound variables (triadic friend-of-friend regression fix)
- Fix _execute_variable_expand to respect bound dst_var (only yield paths to specific target node)
- Fix var-length OPTIONAL MATCH to null-fill when no paths found (not drop the row)
- Fix conftest raw Python list handling in comparable value helper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

Walkthrough

Adds a PatternPredicate AST node and end-to-end support: parser + transformer, evaluator traversal (including var-length BFS, direction/type/label checks), planner/operator changes for OPTIONAL semantics (ApplyOptional, null-filling), and executor adaptations for optional/variable-length expansion and ApplyOptional execution.

Changes

Cohort / File(s) Summary
AST Model
src/graphforge/ast/expression.py
Adds immutable PatternPredicate(parts: list[Any]) AST node representing alternating node/relationship pattern segments.
Parser Grammar & Transformer
src/graphforge/parser/cypher.lark, src/graphforge/parser/parser.py
Grammar accepts pattern_predicate and exact var-length *N; transformer builds PatternPredicate and maps *N to equal min/max hop range.
Evaluator
src/graphforge/executor/evaluator.py
Adds evaluation for PatternPredicate: resolves source binding, returns False/Null for unbound/NULL, traverses hops with direction/type/label filters, supports var-length BFS (_var_length_reachable), and returns CypherBool/CypherNull.
Executor Operators & Traversal
src/graphforge/executor/executor.py
Adds _execute_apply_optional and dispatch; refactors OPTIONAL/variable-length expansion to buffer emissions, propagate CypherNull for optional contexts, constrain to pre-bound destinations, and centralize null-filling.
Planner Operators
src/graphforge/planner/operators.py
ExpandVariableLength gains optional; OptionalExpandEdges adds null_fill_vars and dst_labels; new ApplyOptional operator added with non-empty-subplan validator.
Planner Logic & Validation
src/graphforge/planner/planner.py
Reworks OPTIONAL MATCH planning to emit source scans only when unbound, uses ApplyOptional for multi-hop null-fill, folds dst label filters appropriately, and tightens compile-time validation (forbids PatternPredicate in WITH/RETURN/SET RHS; enforces pattern-variable scoping and boundness).
Test Utilities
tests/tck/conftest.py
Normalization extended to handle raw Python list for order-insensitive comparisons.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Parser
    participant Planner
    participant Evaluator
    participant Executor
    participant Graph

    Client->>Parser: Parse query with WHERE pattern predicate
    Parser->>Planner: Emit PatternPredicate AST
    Planner->>Evaluator: Include PatternPredicate in plan/WHERE
    Evaluator->>Executor: Request predicate evaluation with ExecutionContext
    Executor->>Graph: Resolve source variable binding
    Graph-->>Executor: Return node / NULL / unbound
    alt source resolved and not NULL
        Executor->>Graph: Traverse hops (direction/type/var-length)
        Graph-->>Executor: Return candidate paths
        Executor->>Evaluator: Apply dst label / pre-bound dst checks
        alt matching full path found
            Evaluator-->>Client: CypherBool(True)
        else
            Evaluator-->>Client: CypherBool(False)
        end
    else source NULL
        Evaluator-->>Client: CypherNull()
    else source unbound
        Evaluator-->>Client: CypherBool(False)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • #306 — Implements pattern predicate parsing/evaluation and scoping required to filter node/relationship patterns in WHERE.
  • #302 — Provides pattern existence checks used by triadic friend-of-friend queries and negative pattern predicates.

Possibly related PRs

  • #287 — Adds new WHERE predicate AST nodes and matching parser/evaluator handling (similar AST+evaluator changes).
  • #374 — Modifies OPTIONAL MATCH handling in planner/executor (related ApplyOptional and optional expansion logic).
  • #369 — Related planner validation changes for pattern predicate variable binding and scoping.

Suggested labels

enhancement

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly and concisely summarizes the PR's main contributions: pattern predicates, var-length exact syntax, and OPTIONAL MATCH fixes.
Description check ✅ Passed Description covers key changes, includes issue references (#306, #302), TCK impact, and known deferred items; fulfills required template sections appropriately.
Linked Issues check ✅ Passed Changes fully implement pattern predicates (#306) and triadic pattern matching fixes (#302): AST model added, parser/planner/executor updated, new operators introduced, and nullable/scope validation implemented.
Out of Scope Changes check ✅ Passed All changes directly support pattern predicates, var-length exact, and OPTIONAL MATCH correctness. Test normalization change in conftest.py is scoped to enable list-order-insensitive TCK validation.
Docstring Coverage ✅ Passed Docstring coverage is 96.77% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/306-302-pattern-predicates-triadic

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/graphforge/planner/planner.py (1)

1226-1263: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Track only newly introduced vars for ApplyOptional, and include path_var.

introduced_vars currently appends every rel_var/dst_var even when that name was already bound in the outer scope, while path_var is never added before the ApplyOptional is built. On a failed multi-hop optional, that can both clobber outer bindings and leave the optional path variable missing entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/planner/planner.py` around lines 1226 - 1263, Only
newly-introduced variables should be passed to ApplyOptional and the path_var
must be included if it was bound by this pattern: when creating introduced_vars,
check self._type_context.has_variable(var) before appending rel_var or dst_var
(bind/validate as you do, but only add to introduced_vars when it was not
previously bound), and ensure path_var is added to introduced_vars (or added to
the list right after you bind it with
self._type_context.bind_variable(path_var)) so ApplyOptional(subplan=subplan,
null_fill_vars=introduced_vars) contains the newly introduced rel/dst vars and
the path_var.
🧹 Nitpick comments (1)
src/graphforge/ast/expression.py (1)

486-497: Consider using tuple for parts to fully enforce immutability and prevent unintended mutations.

While frozen=True prevents attribute reassignment, the parts list field can still be mutated in place. The guideline requires immutable frozen models for AST nodes. Switching to parts: tuple[Any, ...] would fully enforce immutability and be more explicit. Note that other collection fields in nearby AST nodes (e.g., FunctionCall.args, CaseExpression.when_clauses) currently use list, so this change would diverge from the existing pattern—consider aligning similar collection fields across AST nodes if pursuing this refactoring.

♻️ Suggested adjustment
-    parts: list[Any] = Field(..., description="Pattern parts (alternating NodePattern/RelationshipPattern)")
+    parts: tuple[Any, ...] = Field(..., description="Pattern parts (alternating NodePattern/RelationshipPattern)")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/ast/expression.py` around lines 486 - 497, PatternPredicate
currently declares parts as a mutable list which can be mutated even though
model_config sets frozen=True; change the field type on PatternPredicate (the
parts attribute) from list[Any] to an immutable tuple type (tuple[Any, ...]) and
update its Field description accordingly, then update any code that constructs
PatternPredicate instances to pass a tuple (and consider adjusting related AST
nodes like FunctionCall.args and CaseExpression.when_clauses for consistency if
you choose to standardize on tuples) so the model is truly immutable at runtime.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/graphforge/executor/evaluator.py`:
- Around line 1705-1741: The current DFS uses a node-based `visited` set (stack
initialized with `(start, 0, {start.id})` and checks `if nxt_id not in visited`)
which prevents revisiting the same node via different edges and breaks cyclic
variable-length path matches; change the visitation tracking to be
path/edge-based instead: replace the node-id `visited` set with a set of edge
ids (or edge tuples) and update the checks and pushes accordingly (use `edge.id`
or `(edge.src.id, edge.dst.id)` when checking/adding instead of `nxt_id`, and
propagate `visited | {edge_id}` when pushing to `stack`), leaving the rest of
the traversal logic (edge selection via `executor.graph.get_outgoing_edges` /
`get_incoming_edges`, direction handling, and depth checks) intact so branches
can revisit nodes via different edges while still preventing infinite edge-reuse
loops.
- Around line 1652-1659: The var-length branch currently extends next_nodes with
the raw results of _var_length_reachable(executor, node, rel_pattern.types,
dir_val, min_h, max_h) without applying the destination label and
bound-destination checks used in the fixed-hop path; update the var-length
handling so you filter the reachable candidates through the same
destination-constraint logic (the same predicates applied in the fixed-hop
branch around the fixed-hop path handling) before calling next_nodes.extend,
i.e. call _var_length_reachable, then for each candidate apply the destination
label checks and bound-destination validation (using the same helper/condition
used in the fixed-hop code) and only append those that pass.
- Around line 1606-1608: The current logic in the pattern predicate evaluator
treats an unbound source variable the same as an explicit NULL; change it so an
unbound variable (i.e., src_var not in ctx.bindings) returns CypherBool(False)
while an explicitly bound NULL (src_node is None or isinstance(src_node,
CypherNull)) still returns CypherNull(); update the block that reads src_node =
ctx.get(src_var) if src_var in ctx.bindings else None to first check membership
in ctx.bindings and return CypherBool(False) for unbound, and only after
confirming bound, treat None/CypherNull as CypherNull() (adjust analogous logic
for any dst/target handling if present), referencing src_var, src_node,
ctx.bindings, CypherNull and CypherBool in evaluator.py.

In `@src/graphforge/parser/parser.py`:
- Around line 1119-1127: The transformer now emits PatternPredicate from
pattern_predicate, but _parse_rel_parts still filters only certain expression
classes and thus drops embedded predicates; update _parse_rel_parts to recognize
graphforge.ast.expression.PatternPredicate (in addition to existing
node/relationship expression types) when harvesting pattern_where and to attach
such a PatternPredicate into the RelationshipPattern parts (or predicate slot)
so constructs like ()-[:R WHERE (a)-[:S]->()]->() are preserved; locate
_parse_rel_parts and add PatternPredicate to its accepted-types check and the
code path that appends/assigns parts/predicate to RelationshipPattern.

In `@src/graphforge/planner/planner.py`:
- Around line 1312-1319: Replace the shallow isinstance check over
with_clause.items that only catches top-level PatternPredicate with a call to
the recursive validator: remove the current loop that checks
isinstance(item.expression, PatternPredicate) and instead call
_validate_no_pattern_predicate(with_clause) (the existing recursive validator
used by _validate_return_clause()) so nested pattern predicates (e.g. WITH NOT
((n)-[:R]->()) AS ok) are rejected as well.
- Around line 1190-1198: The current code flattens dst_pattern.labels into
flat_dst_labels which collapses OR-of-ANDs into a single list and causes
patterns like (:B|C) to be treated as (:B:C); instead preserve the disjunctive
groups and pass a list-of-conjunctions to OptionalExpandEdges and ScanNodes.
Replace the flat_dst_labels logic with code that keeps each conj as a list (e.g.
for each conj in dst_pattern.labels use conj if it's already a list else [conj])
and set inline_dst_labels to that list-of-lists when dst_was_prebound is false
(otherwise keep it as []). Ensure you update references to inline_dst_labels and
remove flat_dst_labels so the executor still receives the OR-of-ANDs structure
it expects.
- Around line 1154-1165: The planner currently only folds relationship
properties into combined_where and never applies destination-node constraints,
causing OPTIONAL MATCH to drop dst_pattern.properties and label filters for
newly introduced destination variables; update the block around combined_where
(and the analogous block at the other occurrence) to also handle
dst_pattern.properties: if dst_pattern.properties exists ensure dst_var is
created (generate one if None), build dst_prop_pred via
self._properties_to_predicate(dst_var, dst_pattern.properties), and then combine
that predicate with the existing combined_where using the same BinaryOp("AND")
logic so both rel and destination node predicates are enforced for single-hop,
multi-hop and var-length optional planning paths (also ensure any label
filtering for the destination var is applied consistently regardless of
is_single_hop).
- Around line 1111-1123: When src_var is already bound the code currently skips
type checks and label/property predicates; update the branch where
self._type_context.has_variable(src_var) is True to still call
self._type_context.validate_compatible(src_var, VariableType.NODE) and, if
src_pattern.labels or src_pattern.properties exist, build the same predicate via
self._properties_to_predicate(src_var, src_pattern.properties) (and include
label checks as done for unbound vars) and append a Filter(predicate=...) so
existing variables are validated against the pattern rather than silently
expanding (affecting OPTIONAL MATCH null-filling semantics); keep existing
binding logic only in the unbound branch but mirror validation/predicate
application for the bound case.

---

Outside diff comments:
In `@src/graphforge/planner/planner.py`:
- Around line 1226-1263: Only newly-introduced variables should be passed to
ApplyOptional and the path_var must be included if it was bound by this pattern:
when creating introduced_vars, check self._type_context.has_variable(var) before
appending rel_var or dst_var (bind/validate as you do, but only add to
introduced_vars when it was not previously bound), and ensure path_var is added
to introduced_vars (or added to the list right after you bind it with
self._type_context.bind_variable(path_var)) so ApplyOptional(subplan=subplan,
null_fill_vars=introduced_vars) contains the newly introduced rel/dst vars and
the path_var.

---

Nitpick comments:
In `@src/graphforge/ast/expression.py`:
- Around line 486-497: PatternPredicate currently declares parts as a mutable
list which can be mutated even though model_config sets frozen=True; change the
field type on PatternPredicate (the parts attribute) from list[Any] to an
immutable tuple type (tuple[Any, ...]) and update its Field description
accordingly, then update any code that constructs PatternPredicate instances to
pass a tuple (and consider adjusting related AST nodes like FunctionCall.args
and CaseExpression.when_clauses for consistency if you choose to standardize on
tuples) so the model is truly immutable at runtime.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 318014d4-390e-40bd-9b68-ac50186fc2b1

📥 Commits

Reviewing files that changed from the base of the PR and between 696ac8a and abe9f88.

📒 Files selected for processing (8)
  • src/graphforge/ast/expression.py
  • src/graphforge/executor/evaluator.py
  • src/graphforge/executor/executor.py
  • src/graphforge/parser/cypher.lark
  • src/graphforge/parser/parser.py
  • src/graphforge/planner/operators.py
  • src/graphforge/planner/planner.py
  • tests/tck/conftest.py

Comment thread src/graphforge/executor/evaluator.py Outdated
Comment on lines +1652 to +1659
if rel_pattern.min_hops is not None or rel_pattern.max_hops is not None:
min_h = rel_pattern.min_hops if rel_pattern.min_hops is not None else 1
max_h = rel_pattern.max_hops # None = unbounded
reachable = _var_length_reachable(
executor, node, rel_pattern.types, dir_val, min_h, max_h
)
next_nodes.extend(reachable)
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply destination constraints to var-length candidates before appending

Lines 1652-1659 append _var_length_reachable(...) results directly, skipping the destination label and bound-destination checks used in the fixed-hop path (Lines 1671-1682). This can produce false positives for constrained predicates.

Suggested fix
             if rel_pattern.min_hops is not None or rel_pattern.max_hops is not None:
                 min_h = rel_pattern.min_hops if rel_pattern.min_hops is not None else 1
                 max_h = rel_pattern.max_hops  # None = unbounded
                 reachable = _var_length_reachable(
                     executor, node, rel_pattern.types, dir_val, min_h, max_h
                 )
-                next_nodes.extend(reachable)
+                for dst_node in reachable:
+                    if dst_pattern.labels:
+                        node_labels = executor.graph.get_node_labels(dst_node.id)
+                        if not all(
+                            lbl in node_labels
+                            for lbl_group in dst_pattern.labels
+                            for lbl in lbl_group
+                        ):
+                            continue
+                    if dst_pattern.variable and dst_pattern.variable in ctx.bindings:
+                        bound_dst = ctx.bindings[dst_pattern.variable]
+                        if isinstance(bound_dst, NodeRef) and bound_dst.id != dst_node.id:
+                            continue
+                    next_nodes.append(dst_node)
                 continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/executor/evaluator.py` around lines 1652 - 1659, The
var-length branch currently extends next_nodes with the raw results of
_var_length_reachable(executor, node, rel_pattern.types, dir_val, min_h, max_h)
without applying the destination label and bound-destination checks used in the
fixed-hop path; update the var-length handling so you filter the reachable
candidates through the same destination-constraint logic (the same predicates
applied in the fixed-hop branch around the fixed-hop path handling) before
calling next_nodes.extend, i.e. call _var_length_reachable, then for each
candidate apply the destination label checks and bound-destination validation
(using the same helper/condition used in the fixed-hop code) and only append
those that pass.

Comment on lines +1705 to +1741
stack = [(start, 0, {start.id})]
while stack:
node, depth, visited = stack.pop()
if min_hops <= depth:
result.append(node)
if max_hops is not None and depth >= max_hops:
continue

if direction == "OUT":
edges = executor.graph.get_outgoing_edges(node.id)
elif direction == "IN":
edges = executor.graph.get_incoming_edges(node.id)
else:
outgoing = executor.graph.get_outgoing_edges(node.id)
incoming = executor.graph.get_incoming_edges(node.id)
seen_ids = set()
edges = []
for e in outgoing + incoming:
if e.id not in seen_ids:
edges.append(e)
seen_ids.add(e.id)

if edge_types:
edges = [e for e in edges if e.type in edge_types]

for edge in edges:
if direction == "IN":
nxt_id = edge.src.id
elif direction == "OUT":
nxt_id = edge.dst.id
else:
nxt_id = edge.dst.id if edge.src.id == node.id else edge.src.id

if nxt_id not in visited:
nxt_node = executor.graph.get_node(nxt_id)
if nxt_node:
stack.append((nxt_node, depth + 1, visited | {nxt_id}))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Node-based visited tracking can miss valid cyclic var-length paths

The traversal state uses visited node ids (Line 1705 / Line 1738), which blocks revisiting a node via different edges. That can incorrectly fail existence checks for cyclic patterns at exact/deeper hop counts.

A safer approach is to track path edge usage (or path state) instead of global node-visited pruning for each branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/executor/evaluator.py` around lines 1705 - 1741, The current
DFS uses a node-based `visited` set (stack initialized with `(start, 0,
{start.id})` and checks `if nxt_id not in visited`) which prevents revisiting
the same node via different edges and breaks cyclic variable-length path
matches; change the visitation tracking to be path/edge-based instead: replace
the node-id `visited` set with a set of edge ids (or edge tuples) and update the
checks and pushes accordingly (use `edge.id` or `(edge.src.id, edge.dst.id)`
when checking/adding instead of `nxt_id`, and propagate `visited | {edge_id}`
when pushing to `stack`), leaving the rest of the traversal logic (edge
selection via `executor.graph.get_outgoing_edges` / `get_incoming_edges`,
direction handling, and depth checks) intact so branches can revisit nodes via
different edges while still preventing infinite edge-reuse loops.

Comment on lines +1119 to +1127
def pattern_predicate(self, items):
"""Transform pattern predicate expression.

Syntax: (n)-[:REL]->() used as a boolean existence check in WHERE.
Items alternate: NodePattern, RelationshipPattern, NodePattern, ...
"""
from graphforge.ast.expression import PatternPredicate

return PatternPredicate(parts=list(items))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Thread PatternPredicate through relationship-pattern predicate parsing.

This transformer now emits PatternPredicate, but _parse_rel_parts() still only recognizes a fixed set of expression classes when harvesting pattern_where. A predicate like ()-[:R WHERE (a)-[:S]->()]->() will be parsed and then silently dropped instead of being attached to the RelationshipPattern. Please add PatternPredicate to that path before shipping this feature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/parser/parser.py` around lines 1119 - 1127, The transformer
now emits PatternPredicate from pattern_predicate, but _parse_rel_parts still
filters only certain expression classes and thus drops embedded predicates;
update _parse_rel_parts to recognize graphforge.ast.expression.PatternPredicate
(in addition to existing node/relationship expression types) when harvesting
pattern_where and to attach such a PatternPredicate into the RelationshipPattern
parts (or predicate slot) so constructs like ()-[:R WHERE (a)-[:S]->()]->() are
preserved; locate _parse_rel_parts and add PatternPredicate to its
accepted-types check and the code path that appends/assigns parts/predicate to
RelationshipPattern.

Comment thread src/graphforge/planner/planner.py
Comment thread src/graphforge/planner/planner.py Outdated
Comment thread src/graphforge/planner/planner.py Outdated
Comment thread src/graphforge/planner/planner.py Outdated
DecisionNerd and others added 3 commits April 30, 2026 08:29
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…306/#302)

- Only add newly-introduced vars to ApplyOptional null_fill_vars
  (pre-bound dst/rel were incorrectly included, causing spurious nulls)
- Include path_var in null_fill_vars before building ApplyOptional
  (was bound after ApplyOptional creation, so never null-filled on miss)
- Unbound src in pattern predicate returns CypherBool(False) not CypherNull
  (correct semantics: unbound var → pattern cannot exist → False)
- WITH clause pattern predicate check uses recursive _validate_no_pattern_predicate
  instead of shallow isinstance (catches nested pattern predicates in expressions)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
src/graphforge/planner/planner.py (4)

1154-1170: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Destination node property predicates are still not planned in OPTIONAL MATCH.

This block only combines relationship-side predicates; dst_pattern.properties are not incorporated, so destination inline node predicates can be dropped in optional plans.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/planner/planner.py` around lines 1154 - 1170, The code
currently only folds relationship properties into combined_where but omits
destination node inline predicates (dst_pattern.properties), causing them to be
lost for OPTIONAL MATCH; update the same block in planner.py so that if
dst_pattern.properties exists you generate/ensure a destination variable (e.g.,
dst_var similar to rel_var or via self._generate_anonymous_variable()), call
self._properties_to_predicate(dst_var, dst_pattern.properties) to get a
dst_prop_pred, and then AND it into combined_where (using the same BinaryOp
op="AND" pattern used for rel_prop_pred) so both rel_prop_pred and dst_prop_pred
are preserved in combined_where when hop_idx == 0 and is_single_hop.

1111-1123: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate pre-bound src_var against source pattern constraints.

When src_var is already in scope, this branch skips type compatibility and source label/property predicates, so OPTIONAL MATCH (a:Label {…})-... can expand from non-matching a instead of null-filling the optional pattern.

Suggested fix
-                # Emit ScanNodes only if src is not already bound
-                if not self._type_context.has_variable(src_var):
+                # Always validate type compatibility for src
+                self._type_context.validate_compatible(src_var, VariableType.NODE)
+                if not self._type_context.has_variable(src_var):
                     operators.append(
                         ScanNodes(
                             variable=src_var,
                             labels=src_pattern.labels if src_pattern.labels else None,
                         )
                     )
-                    self._type_context.validate_compatible(src_var, VariableType.NODE)
                     self._type_context.bind_variable(src_var, VariableType.NODE)
-                    if src_pattern.properties:
-                        predicate = self._properties_to_predicate(src_var, src_pattern.properties)
-                        operators.append(Filter(predicate=predicate))
+                elif src_pattern.labels:
+                    operators.append(ScanNodes(variable=src_var, labels=src_pattern.labels))
+                if src_pattern.properties:
+                    predicate = self._properties_to_predicate(src_var, src_pattern.properties)
+                    operators.append(Filter(predicate=predicate))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/planner/planner.py` around lines 1111 - 1123, When src_var is
already bound (i.e., self._type_context.has_variable(src_var) is True) ensure
you still validate it against the source pattern: call
self._type_context.validate_compatible(src_var, VariableType.NODE) to enforce
node type, check that the existing variable's labels include src_pattern.labels
and if not raise or handle a mismatch, and if src_pattern.properties exists
build the predicate via self._properties_to_predicate(src_var,
src_pattern.properties) and append a Filter(predicate=...) so pre-bound
variables are filtered by the pattern’s labels/properties just like newly
scanned variables; use the same helpers (_type_context, validate_compatible,
_properties_to_predicate, Filter) and do not skip these checks when has_variable
returns True.

2279-2285: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use recursive pattern-predicate validation in RETURN items.

This top-level isinstance(..., PatternPredicate) check still misses nested cases; _validate_no_pattern_predicate() already exists for recursive traversal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/planner/planner.py` around lines 2279 - 2285, The current
top-level isinstance check on return_clause.items misses nested PatternPredicate
cases; instead call the recursive validator _validate_no_pattern_predicate on
each item's expression (e.g., inside the loop over return_clause.items call
self._validate_no_pattern_predicate(item.expression) or the equivalent validator
function) so nested pattern predicates are discovered and a SyntaxError is
raised consistently for RETURN projections; remove the plain isinstance(...,
_PPType) check and use _validate_no_pattern_predicate for each item.expression.

1199-1205: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t flatten disjunctive destination label groups.

Flattening dst_pattern.labels changes OR-of-AND semantics (e.g. (:B|C) becomes (:B:C) downstream).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/planner/planner.py` around lines 1199 - 1205, The current
logic flattens dst_pattern.labels into flat_dst_labels which destroys OR-of-AND
semantics; instead preserve the original list-of-conjunctions structure and stop
merging inner lists. Replace the flat_dst_labels construction with using
dst_pattern.labels (or [] if None) directly and set inline_dst_labels =
dst_pattern.labels if not dst_was_prebound else []; update any downstream uses
that expect flat lists to handle the conjunction-list shape or convert there
with an explicit deliberate transform rather than here.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/graphforge/executor/executor.py`:
- Around line 3403-3406: The loop over op.subplan passes a constant op_index of
0 to each sub-operator, causing position-dependent operators (e.g.,
_execute_aggregate which checks op_index < total_ops - 1) to behave incorrectly;
update the loop in Executor._execute_operator where it iterates "for sub_op in
op.subplan" to use enumerate and pass the actual index and total_ops (e.g.,
replace the constant 0 with the enumerated index and keep len(op.subplan) as
total_ops) so each sub-operator receives its correct op_index (affecting
ApplyOptional subplans and any operator relying on position).

In `@src/graphforge/planner/planner.py`:
- Around line 1155-1156: The absorbed WHERE (where_predicate) is only being
passed into combined_where for single-hop (combined_where = where_predicate if
(hop_idx == 0 and is_single_hop) else None) so var-length and multi-hop
OptionalExpandEdges lose the predicate; fix by either gating absorption at the
caller (stop absorbing into where_predicate unless caller confirms single-hop)
or by threading combined_where (or an equivalent predicate) into the var-length
and multi-hop branches and into the OptionalExpandEdges construction so the
predicate is applied for all branches; update usages in _plan_simple_query, the
OptionalExpandEdges creation sites, and any logic that checks
hop_idx/is_single_hop to ensure predicates aren’t silently dropped.

---

Duplicate comments:
In `@src/graphforge/planner/planner.py`:
- Around line 1154-1170: The code currently only folds relationship properties
into combined_where but omits destination node inline predicates
(dst_pattern.properties), causing them to be lost for OPTIONAL MATCH; update the
same block in planner.py so that if dst_pattern.properties exists you
generate/ensure a destination variable (e.g., dst_var similar to rel_var or via
self._generate_anonymous_variable()), call
self._properties_to_predicate(dst_var, dst_pattern.properties) to get a
dst_prop_pred, and then AND it into combined_where (using the same BinaryOp
op="AND" pattern used for rel_prop_pred) so both rel_prop_pred and dst_prop_pred
are preserved in combined_where when hop_idx == 0 and is_single_hop.
- Around line 1111-1123: When src_var is already bound (i.e.,
self._type_context.has_variable(src_var) is True) ensure you still validate it
against the source pattern: call self._type_context.validate_compatible(src_var,
VariableType.NODE) to enforce node type, check that the existing variable's
labels include src_pattern.labels and if not raise or handle a mismatch, and if
src_pattern.properties exists build the predicate via
self._properties_to_predicate(src_var, src_pattern.properties) and append a
Filter(predicate=...) so pre-bound variables are filtered by the pattern’s
labels/properties just like newly scanned variables; use the same helpers
(_type_context, validate_compatible, _properties_to_predicate, Filter) and do
not skip these checks when has_variable returns True.
- Around line 2279-2285: The current top-level isinstance check on
return_clause.items misses nested PatternPredicate cases; instead call the
recursive validator _validate_no_pattern_predicate on each item's expression
(e.g., inside the loop over return_clause.items call
self._validate_no_pattern_predicate(item.expression) or the equivalent validator
function) so nested pattern predicates are discovered and a SyntaxError is
raised consistently for RETURN projections; remove the plain isinstance(...,
_PPType) check and use _validate_no_pattern_predicate for each item.expression.
- Around line 1199-1205: The current logic flattens dst_pattern.labels into
flat_dst_labels which destroys OR-of-AND semantics; instead preserve the
original list-of-conjunctions structure and stop merging inner lists. Replace
the flat_dst_labels construction with using dst_pattern.labels (or [] if None)
directly and set inline_dst_labels = dst_pattern.labels if not dst_was_prebound
else []; update any downstream uses that expect flat lists to handle the
conjunction-list shape or convert there with an explicit deliberate transform
rather than here.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 52976807-6243-4eda-8d9a-5364a242fe97

📥 Commits

Reviewing files that changed from the base of the PR and between abe9f88 and e600acf.

📒 Files selected for processing (4)
  • src/graphforge/ast/expression.py
  • src/graphforge/executor/evaluator.py
  • src/graphforge/executor/executor.py
  • src/graphforge/planner/planner.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/graphforge/executor/evaluator.py
  • src/graphforge/ast/expression.py

Comment thread src/graphforge/executor/executor.py Outdated
Comment thread src/graphforge/planner/planner.py Outdated
- executor: use enumerate in ApplyOptional subplan loop so each sub-op
  receives its correct op_index (fixes Aggregate position detection)
- planner: pre-bound src_var in OPTIONAL MATCH now validates type and
  applies label/property filters (previously skipped entirely)
- planner: change OptionalExpandEdges.dst_labels to list[list[str]] to
  preserve disjunction-of-conjunctions semantics; fix executor check to
  use _node_matches_labels instead of naive all()
- planner: thread absorbed WHERE predicate into var-length and multi-hop
  OPTIONAL MATCH branches (previously only applied to single-hop)
- planner: fold dst_pattern.properties into combined_where for single-hop
  OptionalExpandEdges; emit as Filter for var-length/multi-hop so dst_var
  is bound before evaluation
- planner: use recursive _validate_no_pattern_predicate for RETURN clause
  (catches nested PatternPredicates, not just top-level)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/graphforge/executor/executor.py (1)

3286-3338: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply dst_labels before accepting a pre-bound destination.

This branch filters by edge reachability when dst_var is already bound, but it never validates op.dst_labels against that bound node. A pattern like (a)-[:R]->(b:Label) will therefore match any reachable pre-bound b, even when b does not satisfy the inline destination labels.

🐛 Proposed fix
             if dst_bound:
                 from graphforge.types.graph import NodeRef as _NodeRef

                 bound_dst = ctx.get(op.dst_var)
                 if isinstance(bound_dst, _NodeRef):
+                    if op.dst_labels and not self._node_matches_labels(bound_dst, op.dst_labels):
+                        result.append(_null_fill(ctx))
+                        continue
                     bound_dst_id = bound_dst.id
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/executor/executor.py` around lines 3286 - 3338, When dst_var
is pre-bound (the branch starting with dst_bound = op.dst_var in ctx.bindings),
you must enforce op.dst_labels against the bound_dst before accepting matches:
after obtaining bound_dst = ctx.get(op.dst_var) and confirming it's a NodeRef,
check that the node's labels include all labels in op.dst_labels (if any); if
the bound node does not satisfy op.dst_labels, treat it as no match (append the
same _null_fill(ctx) and skip further edge/path handling), otherwise proceed
with the existing matching_edges, where_predicate evaluation, CypherPath
binding, and result population. Ensure you reference op.dst_var, op.dst_labels,
ctx.get, bound_dst, _null_fill, edges, op.direction, src_node,
evaluate_expression, ExecutionContext and CypherPath when locating and updating
the code.
🧹 Nitpick comments (2)
src/graphforge/planner/operators.py (1)

706-709: ⚡ Quick win

Reject empty ApplyOptional.subplan at construction.

An empty subplan currently becomes a no-op branch in _execute_apply_optional, so a malformed plan would silently drop OPTIONAL null-filling instead of failing fast.

♻️ Proposed fix
-    subplan: list[Any] = Field(..., description="Inner operators to execute per outer row")
+    subplan: list[Any] = Field(
+        ..., min_length=1, description="Inner operators to execute per outer row"
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/planner/operators.py` around lines 706 - 709, ApplyOptional
currently allows an empty subplan which makes _execute_apply_optional silently
skip null-filling; add validation to reject empty subplan at construction by
enforcing that ApplyOptional.subplan is non-empty (e.g., a pydantic `@validator`
or __init__ check) and raise a clear ValueError if len(subplan) == 0; update any
tests that construct ApplyOptional to provide at least one inner operator.
Ensure the check references the ApplyOptional.subplan field so malformed plans
fail fast.
src/graphforge/planner/planner.py (1)

1296-1307: 💤 Low value

dst_was_prebound_check may incorrectly identify anonymous destinations as pre-bound.

When dst_pattern.variable is None (anonymous node), the check dst_pattern.variable not in introduced_vars[-1:] evaluates to True (since None is never in the list), causing the code to treat anonymous destinations as pre-bound and emit a redundant ScanNodes for label filtering.

Consider guarding with dst_pattern.variable is not None:

 dst_was_prebound_check = (
     dst_pattern.variable is not None
-    and dst_pattern.variable not in introduced_vars[-1:]
+    and dst_pattern.variable not in introduced_vars
 )

The simplified check not in introduced_vars (without slicing) is clearer and handles the case where the variable was introduced in an earlier hop of the same pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/planner/planner.py` around lines 1296 - 1307, The
dst_was_prebound_check logic can misclassify anonymous destinations as
pre-bound; change the guard so you only consider a destination pre-bound when
dst_pattern.variable is not None and that variable truly wasn't introduced
earlier (use dst_pattern.variable is not None and dst_pattern.variable not in
introduced_vars rather than the current slice check). Update the condition
around ScanNodes creation (referencing dst_was_prebound_check,
dst_pattern.variable, introduced_vars, ScanNodes, OptionalExpandEdges) so
anonymous nodes are skipped and pre-bound detection correctly checks the full
introduced_vars history; keep the same append behavior to operators or subplan
depending on is_single_hop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/graphforge/executor/executor.py`:
- Around line 1111-1142: When short-circuiting OPTIONAL var-length hops (in the
src_node is CypherNull branch and the "Bound dst is null" branch), preserve
existing bindings from ctx.bindings and only inject CypherNull for those
variables that were not already bound: copy ctx.bindings into null_ctx.bindings,
then if op.dst_var not already in ctx.bindings bind op.dst_var -> CypherNull(),
if op.edge_var exists and op.edge_var not already in ctx.bindings bind
op.edge_var -> CypherNull(), and if op.path_var exists bind op.path_var ->
CypherNull(); return that null_ctx. This ensures ExecutionContext, CypherNull,
op.dst_var, op.edge_var, op.path_var and ctx.bindings behave consistently with
the later found_rows/optional fallback.

---

Outside diff comments:
In `@src/graphforge/executor/executor.py`:
- Around line 3286-3338: When dst_var is pre-bound (the branch starting with
dst_bound = op.dst_var in ctx.bindings), you must enforce op.dst_labels against
the bound_dst before accepting matches: after obtaining bound_dst =
ctx.get(op.dst_var) and confirming it's a NodeRef, check that the node's labels
include all labels in op.dst_labels (if any); if the bound node does not satisfy
op.dst_labels, treat it as no match (append the same _null_fill(ctx) and skip
further edge/path handling), otherwise proceed with the existing matching_edges,
where_predicate evaluation, CypherPath binding, and result population. Ensure
you reference op.dst_var, op.dst_labels, ctx.get, bound_dst, _null_fill, edges,
op.direction, src_node, evaluate_expression, ExecutionContext and CypherPath
when locating and updating the code.

---

Nitpick comments:
In `@src/graphforge/planner/operators.py`:
- Around line 706-709: ApplyOptional currently allows an empty subplan which
makes _execute_apply_optional silently skip null-filling; add validation to
reject empty subplan at construction by enforcing that ApplyOptional.subplan is
non-empty (e.g., a pydantic `@validator` or __init__ check) and raise a clear
ValueError if len(subplan) == 0; update any tests that construct ApplyOptional
to provide at least one inner operator. Ensure the check references the
ApplyOptional.subplan field so malformed plans fail fast.

In `@src/graphforge/planner/planner.py`:
- Around line 1296-1307: The dst_was_prebound_check logic can misclassify
anonymous destinations as pre-bound; change the guard so you only consider a
destination pre-bound when dst_pattern.variable is not None and that variable
truly wasn't introduced earlier (use dst_pattern.variable is not None and
dst_pattern.variable not in introduced_vars rather than the current slice
check). Update the condition around ScanNodes creation (referencing
dst_was_prebound_check, dst_pattern.variable, introduced_vars, ScanNodes,
OptionalExpandEdges) so anonymous nodes are skipped and pre-bound detection
correctly checks the full introduced_vars history; keep the same append behavior
to operators or subplan depending on is_single_hop.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3d934b35-2a07-4aa8-ac5a-cca8842b44fe

📥 Commits

Reviewing files that changed from the base of the PR and between e600acf and e70964f.

📒 Files selected for processing (3)
  • src/graphforge/executor/executor.py
  • src/graphforge/planner/operators.py
  • src/graphforge/planner/planner.py

Comment thread src/graphforge/executor/executor.py
- executor: guard null-ctx bindings in var-length null-src and bound-null-dst
  branches — only inject CypherNull for vars not already in ctx.bindings
- executor: enforce op.dst_labels against pre-bound dst node in
  _execute_optional_expand before filtering edges; null-fill if label check fails
- operators: add @field_validator on ApplyOptional.subplan to reject empty
  subplans at construction time (fail-fast, prevents silent null-fill skip)
- planner: fix dst_was_prebound_check slice bug — was checking only the last
  element of introduced_vars; now checks the full list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/graphforge/executor/executor.py`:
- Around line 1255-1262: The null-fill block creates a copy ExecutionContext
(null_ctx) but always overwrites op.path_var with CypherNull(), which clobbers
outer bindings; change the logic to only call null_ctx.bind(op.path_var,
CypherNull()) if op.path_var is not already present in ctx.bindings (same check
used for op.dst_var and op.edge_var). Update the identical pattern elsewhere in
the file (the other null-fill block) so both places preserve pre-bound path_var
before binding CypherNull().
- Around line 1168-1170: The code binds var-length relationship results as a raw
Python list (edge_path) via new_ctx.bind(op.edge_var, edge_path); change this to
wrap the list in the CypherList wrapper so downstream expressions see a proper
Cypher list (e.g., new_ctx.bind(op.edge_var, CypherList(edge_path))). Also audit
places in executor functions where query results are bound (search for
new_ctx.bind usage and op.edge_var handling in src/graphforge/executor/*.py) and
ensure all scalar and collection results are wrapped in the appropriate
CypherValue subclasses (CypherString, CypherInt, CypherFloat, CypherBool,
CypherNull, CypherList, CypherMap) before binding.
- Around line 1130-1145: The code currently only propagates prior CypherNull
bindings when op.optional is true, but when op.dst_var is bound to CypherNull
and op.optional is false you must drop that row so the DFS/ExpandEdges logic
doesn't treat it as an unbound node; inside the dst_bound handling in
executor.py (the block that inspects ctx.get(op.dst_var) and sets
bound_dst_node) add an explicit branch for elif isinstance(_bound_dst,
CypherNull) and not getattr(op, "optional", False): simply continue to skip
emitting/processing this context (i.e., drop the NULL-constrained row) so
bound_dst_node remains not set but the path is not explored.

In `@src/graphforge/planner/planner.py`:
- Around line 1205-1211: Optional variable-length relationship patterns drop
inline predicates when properties are folded into combined_where because the
predicate argument uses combined_where or rel_pattern.predicate as a fallback;
update the predicate construction (where this call is made for optional=True in
planner.py) to AND the two when both exist: if combined_where is not None and
rel_pattern.predicate is not None create a _BinaryOp(op="AND",
left=rel_pattern.predicate, right=combined_where), otherwise use combined_where
or rel_pattern.predicate; ensure you change the predicate= argument accordingly
so inline predicates are preserved for optional variable-length expansions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1c5907aa-021a-4cda-82e6-30e864dcdebb

📥 Commits

Reviewing files that changed from the base of the PR and between e70964f and 55650d3.

📒 Files selected for processing (3)
  • src/graphforge/executor/executor.py
  • src/graphforge/planner/operators.py
  • src/graphforge/planner/planner.py

Comment on lines +1130 to +1145
dst_bound = op.dst_var in ctx.bindings
bound_dst_node = None
if dst_bound:
_bound_dst = ctx.get(op.dst_var)
if isinstance(_bound_dst, _NodeRefCls):
bound_dst_node = _bound_dst
elif isinstance(_bound_dst, CypherNull) and getattr(op, "optional", False):
# Bound dst is null from prior optional — propagate null
null_ctx = ExecutionContext()
null_ctx.bindings = dict(ctx.bindings)
if op.edge_var and op.edge_var not in ctx.bindings:
null_ctx.bind(op.edge_var, CypherNull())
if op.path_var and op.path_var not in ctx.bindings:
null_ctx.bind(op.path_var, CypherNull())
result.append(null_ctx)
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drop NULL-constrained rows in regular var-length MATCH.

When op.dst_var is already bound to CypherNull() and op.optional is false, this falls through with bound_dst_node = None, so the DFS can emit paths and overwrite the existing NULL binding. ExpandEdges treats that case as "no match"; this path should too.

Suggested fix
             if dst_bound:
                 _bound_dst = ctx.get(op.dst_var)
                 if isinstance(_bound_dst, _NodeRefCls):
                     bound_dst_node = _bound_dst
-                elif isinstance(_bound_dst, CypherNull) and getattr(op, "optional", False):
-                    # Bound dst is null from prior optional — propagate null
-                    null_ctx = ExecutionContext()
-                    null_ctx.bindings = dict(ctx.bindings)
-                    if op.edge_var and op.edge_var not in ctx.bindings:
-                        null_ctx.bind(op.edge_var, CypherNull())
-                    if op.path_var and op.path_var not in ctx.bindings:
-                        null_ctx.bind(op.path_var, CypherNull())
-                    result.append(null_ctx)
+                elif isinstance(_bound_dst, CypherNull):
+                    if getattr(op, "optional", False):
+                        # Bound dst is null from prior optional — propagate null
+                        null_ctx = ExecutionContext()
+                        null_ctx.bindings = dict(ctx.bindings)
+                        if op.edge_var and op.edge_var not in ctx.bindings:
+                            null_ctx.bind(op.edge_var, CypherNull())
+                        if op.path_var and op.path_var not in ctx.bindings:
+                            null_ctx.bind(op.path_var, CypherNull())
+                        result.append(null_ctx)
+                    # A regular MATCH cannot satisfy a NULL-constrained destination.
                     continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/executor/executor.py` around lines 1130 - 1145, The code
currently only propagates prior CypherNull bindings when op.optional is true,
but when op.dst_var is bound to CypherNull and op.optional is false you must
drop that row so the DFS/ExpandEdges logic doesn't treat it as an unbound node;
inside the dst_bound handling in executor.py (the block that inspects
ctx.get(op.dst_var) and sets bound_dst_node) add an explicit branch for elif
isinstance(_bound_dst, CypherNull) and not getattr(op, "optional", False):
simply continue to skip emitting/processing this context (i.e., drop the
NULL-constrained row) so bound_dst_node remains not set but the path is not
explored.

Comment on lines +1168 to +1170
# Bind edge list if variable provided
if op.edge_var:
new_ctx.bind(op.edge_var, edge_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap var-length relationship bindings in CypherList.

edge_path is being stored as a raw Python list here, so the relationship variable for [*] paths has a different runtime type than other executor outputs and can bypass Cypher list semantics in downstream expressions.

Suggested fix
                         # Bind edge list if variable provided
                         if op.edge_var:
-                            new_ctx.bind(op.edge_var, edge_path)
+                            new_ctx.bind(op.edge_var, CypherList(list(edge_path)))
Based on learnings: Applies to `src/graphforge/executor/**/*.py` : Wrap all query results in `CypherValue` type wrappers (CypherString, CypherInt, CypherFloat, CypherBool, CypherNull, CypherList, CypherMap).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/executor/executor.py` around lines 1168 - 1170, The code binds
var-length relationship results as a raw Python list (edge_path) via
new_ctx.bind(op.edge_var, edge_path); change this to wrap the list in the
CypherList wrapper so downstream expressions see a proper Cypher list (e.g.,
new_ctx.bind(op.edge_var, CypherList(edge_path))). Also audit places in executor
functions where query results are bound (search for new_ctx.bind usage and
op.edge_var handling in src/graphforge/executor/*.py) and ensure all scalar and
collection results are wrapped in the appropriate CypherValue subclasses
(CypherString, CypherInt, CypherFloat, CypherBool, CypherNull, CypherList,
CypherMap) before binding.

Comment on lines +1255 to +1262
null_ctx = ExecutionContext()
null_ctx.bindings = dict(ctx.bindings)
if op.dst_var not in ctx.bindings:
null_ctx.bind(op.dst_var, CypherNull())
if op.edge_var and op.edge_var not in ctx.bindings:
null_ctx.bind(op.edge_var, CypherNull())
if op.path_var:
null_ctx.bind(op.path_var, CypherNull())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve pre-bound path_var on OPTIONAL null-fill.

Both null-fill paths still overwrite an already-bound path_var with CypherNull(). That loses outer-scope bindings even though these helpers are otherwise preserving pre-bound vars.

Suggested fix
-                if op.path_var:
+                if op.path_var and op.path_var not in ctx.bindings:
                     null_ctx.bind(op.path_var, CypherNull())
-            if op.path_var:
+            if op.path_var and op.path_var not in base_ctx.bindings:
                 null_ctx.bind(op.path_var, CypherNull())

Also applies to: 3249-3250

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/executor/executor.py` around lines 1255 - 1262, The null-fill
block creates a copy ExecutionContext (null_ctx) but always overwrites
op.path_var with CypherNull(), which clobbers outer bindings; change the logic
to only call null_ctx.bind(op.path_var, CypherNull()) if op.path_var is not
already present in ctx.bindings (same check used for op.dst_var and
op.edge_var). Update the identical pattern elsewhere in the file (the other
null-fill block) so both places preserve pre-bound path_var before binding
CypherNull().

Comment on lines +1205 to +1211
predicate=(
combined_where
if combined_where is not None
else rel_pattern.predicate
),
optional=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if RelationshipPattern can have both properties and predicate
rg -n "class RelationshipPattern" -A 30 src/graphforge/ast/pattern.py | head -50

Repository: DecisionNerd/graphforge

Length of output: 1660


🏁 Script executed:

cd src/graphforge/planner && sed -n '1160,1220p' planner.py | cat -n

Repository: DecisionNerd/graphforge

Length of output: 3981


🏁 Script executed:

cd src/graphforge/planner && sed -n '975,995p' planner.py | cat -n

Repository: DecisionNerd/graphforge

Length of output: 1481


🏁 Script executed:

cd src/graphforge/planner && sed -n '940,1000p' planner.py | grep -A 5 -B 5 "properties"

Repository: DecisionNerd/graphforge

Length of output: 49


🏁 Script executed:

cd src/graphforge/planner && sed -n '900,980p' planner.py | cat -n

Repository: DecisionNerd/graphforge

Length of output: 4739


🏁 Script executed:

cd src/graphforge/planner && sed -n '945,1010p' planner.py | cat -n

Repository: DecisionNerd/graphforge

Length of output: 4284


Fix inline relationship predicates being dropped in optional variable-length patterns.

When a relationship pattern has both inline properties and an inline predicate (e.g., -[r:TYPE {prop: value} WHERE r.x > 5]->), the predicate is silently dropped in OPTIONAL MATCH variable-length expansions. The properties are folded into combined_where, but rel_pattern.predicate is only used as a fallback when combined_where is None—dropping the predicate when properties exist.

In regular MATCH, properties and predicates are handled independently (properties via Filter, predicates passed to the operator). For consistency and correctness, they should also be ANDed together in the optional path:

predicate=(
    _BinaryOp(op="AND", left=rel_pattern.predicate, right=combined_where)
    if combined_where is not None and rel_pattern.predicate is not None
    else combined_where or rel_pattern.predicate
),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/planner/planner.py` around lines 1205 - 1211, Optional
variable-length relationship patterns drop inline predicates when properties are
folded into combined_where because the predicate argument uses combined_where or
rel_pattern.predicate as a fallback; update the predicate construction (where
this call is made for optional=True in planner.py) to AND the two when both
exist: if combined_where is not None and rel_pattern.predicate is not None
create a _BinaryOp(op="AND", left=rel_pattern.predicate, right=combined_where),
otherwise use combined_where or rel_pattern.predicate; ensure you change the
predicate= argument accordingly so inline predicates are preserved for optional
variable-length expansions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: filter node patterns in WHERE clause fix: triadic friend-of-friend pattern matching

1 participant