fix: Phase 4 compile-time validations (#370)#371
Conversation
- Add VariableAlreadyBound detection for MERGE standalone node/relationship reuse - Add NoSingleRelationshipType detection for MERGE relationship patterns - Add UndefinedVariable detection for SET/DELETE/RETURN clause variables - Add ColumnNameConflict detection for duplicate RETURN aliases - Add NoVariablesInScope detection for RETURN * with only anonymous variables - Add InvalidArgumentType for WHERE (n) with bound node variable - Add scope-aware free-variable collection (_collect_free_variables) to avoid false UndefinedVariable errors inside quantifiers/comprehensions/reduce - Bind path variable after multi-hop variable-length expansion loops - Propagate OPTIONAL MATCH bound variables to type context - Propagate CALL subquery RETURN variables to outer type context - Allow CREATE + MERGE in a single query via grammar rule extension - Fix ON CREATE/MATCH SET validation ordering (bind pattern vars first) All 276 planner unit tests, 2418 unit tests, 4256+ total tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughAdds parser support for CREATE...MERGE write queries and implements extensive compile-time planner validations (MERGE structural checks, RETURN/SET/DELETE undefined-variable and duplicate-column checks, CALL subquery alias binding, WHERE pattern predicate rejection) plus a TypeContext accessor; tests updated accordingly. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/graphforge/planner/planner.py`:
- Around line 1018-1020: The new binds for OPTIONAL MATCH skip compatibility
checks and can overwrite existing types; before calling
self._type_context.bind_variable for node_pattern.variable,
relationship_pattern.variable, and the destination variable (e.g.,
destination.variable), first call self._type_context.validate_compatible(var,
<appropriate VariableType>) to ensure existing type is compatible (use
VariableType.NODE for node binds and destination binds,
VariableType.RELATIONSHIP for relationship binds), and only then call
bind_variable; this preserves existing VariableTypeConflict behavior instead of
silently rebinding.
- Around line 2029-2038: _validate_set_clause_variables delegates RHS checking
to _validate_expr_variables_in_scope(rhs) but that helper only handles a subset
of expression node types, so complex RHS forms like list comprehensions or CASE
expressions bypass undefined-variable checks; update
_validate_expr_variables_in_scope to recursively traverse and validate all
expression AST node kinds used in SET RHS (at minimum handle list/comprehension
nodes like list comprehensions ([x IN y | ...] / Comprehension), CASE/WHEN nodes
(CaseExpression/ConditionalExpression), and other composite constructors such as
list/map/tuple literals and nested property/call expressions), calling the
existing variable-existence check when encountering Variable nodes so patterns
like SET n.v = [x IN missing | x] and SET n.v = CASE WHEN missing THEN ... are
caught by _validate_set_clause_variables().
- Around line 2100-2161: _collect_free_variables is missing Variable owners
stored as strings and never inspects PatternComprehension.pattern; add explicit
handling for PropertyAccess (check for isinstance(expr, PropertyAccess) and
traverse its owner: if owner is a string, convert or wrap it as Variable before
recursing, otherwise recurse into owner) and for PatternComprehension traverse
expr.pattern as well as expr.filter_expr/map_expr, and when traversing a pattern
extract any names it binds and include them in locally_bound (so bound anchors
in the pattern are not reported as free) before recursing into the map/filter
expressions; update the branches for PatternComprehension and add a new branch
for PropertyAccess in _collect_free_variables to perform these traversals.
- Around line 269-288: The code currently exports subquery aliases into
projected_vars always as VariableType.SCALAR; instead capture each alias's
actual inferred type from the subquery's type context before you restore the
outer context. While iterating ReturnClause items in the inner subquery (using
clause.query, ReturnClause, Variable), look up the alias/expression type from
the current self._type_context (e.g. via whatever lookup/get_binding method you
have on the type context) and append (alias, inferred_type) to projected_vars;
only then restore saved_type_context and rebind each var_name_inner with that
inferred_type using _type_context.bind_variable instead of hard-coding
VariableType.SCALAR.
- Around line 2204-2209: The current branch rejects any bound variable used as a
parenthesized WHERE predicate by unconditionally raising an InvalidArgumentType
SyntaxError; change it to only reject when the bound variable is entity-typed (a
Node), and allow it when its resolved type is Boolean (or other scalar boolean).
Locate the block that raises the SyntaxError (the code referencing inner.name)
and replace the unconditional raise with a type check on the resolved/inferred
type of inner (e.g., inner.type / inner.resolved_type / inner.inferred_type or
by calling the planner's existing is_entity/is_node helper); if that type
indicates a Node/Entity then raise the SyntaxError (and include the actual type
in the message), otherwise accept boolean/scalar types and continue processing
the predicate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b8749acb-345f-4d17-a3ea-bc505aaf0903
📒 Files selected for processing (8)
src/graphforge/parser/cypher.larksrc/graphforge/planner/planner.pysrc/graphforge/planner/types.pytests/integration/test_merge_on_create_real.pytests/unit/planner/test_planner_branch_coverage.pytests/unit/planner/test_planner_edge_cases.pytests/unit/planner/test_syntax_error_validation.pytests/unit/planner/test_type_tracking.py
…ations - Fix mypy: rename projected_type loop var to avoid Literal narrowing conflict - CRab #3: _validate_expr_variables_in_scope now recursively traverses CASE, list/pattern comprehensions, quantifiers, reduce, filter, extract expressions, respecting locally-bound variables from comprehension loop vars - CRab #4: _collect_free_variables now handles PropertyAccess string owners (wraps expr.variable as Variable) and PatternComprehension.pattern to extract anchor variables as locally-bound before traversing filter/map expressions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ariables Pattern AST nodes come as dicts with 'parts' key rather than raw lists; extract pattern_parts correctly from the dict before iterating anchors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/graphforge/planner/planner.py`:
- Around line 2260-2263: The PatternComprehension branch in the _check closure
doesn't add pattern-bound variables to the local binding set (lb), causing false
UndefinedVariable errors; update the PatternComprehension handling to extract
variable names from the pattern elements (inspect e.pattern nodes for
NodePattern and RelationshipPattern variables) and temporarily add those names
to lb before calling _check on e.filter_expr and e.map_expr, then restore lb
after; also import NodePattern and RelationshipPattern from
graphforge.ast.pattern as suggested; this mirrors how _collect_free_variables
gathers pattern-bound variables.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4177cae5-3cbc-4ee2-a109-8726354870e6
📒 Files selected for processing (1)
src/graphforge/planner/planner.py
| elif isinstance(e, PatternComprehension): | ||
| if e.filter_expr is not None: | ||
| _check(e.filter_expr, lb) | ||
| _check(e.map_expr, lb) |
There was a problem hiding this comment.
PatternComprehension handling is inconsistent with _collect_free_variables.
_collect_free_variables (lines 2154-2173) correctly extracts pattern-bound variables (node/relationship variables from the pattern) and adds them to locally_bound. However, this method does not, which will cause false positive UndefinedVariable errors for valid queries like:
MATCH (n) SET n.friends = [(n)-[:FRIEND]->(f) | f.name]Here, f is bound by the pattern but would be incorrectly flagged as undefined.
Proposed fix
elif isinstance(e, PatternComprehension):
+ # Collect variable names bound by the pattern (pattern anchors)
+ pattern_bound: set[str] = set()
+ raw_pattern = e.pattern
+ if isinstance(raw_pattern, dict):
+ pattern_parts_inner: list[Any] = list(
+ raw_pattern.get("parts", raw_pattern.get("elements", [])) or []
+ )
+ elif isinstance(raw_pattern, list):
+ pattern_parts_inner = raw_pattern
+ else:
+ pattern_parts_inner = [raw_pattern]
+ for part_inner in pattern_parts_inner:
+ if isinstance(part_inner, (NodePattern, RelationshipPattern)):
+ if part_inner.variable:
+ pattern_bound.add(part_inner.variable)
+ new_lb = lb | pattern_bound
if e.filter_expr is not None:
- _check(e.filter_expr, lb)
- _check(e.map_expr, lb)
+ _check(e.filter_expr, new_lb)
+ _check(e.map_expr, new_lb)Note: You'll need to add NodePattern, RelationshipPattern imports from graphforge.ast.pattern at the top of the _check closure or in the outer function imports.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/graphforge/planner/planner.py` around lines 2260 - 2263, The
PatternComprehension branch in the _check closure doesn't add pattern-bound
variables to the local binding set (lb), causing false UndefinedVariable errors;
update the PatternComprehension handling to extract variable names from the
pattern elements (inspect e.pattern nodes for NodePattern and
RelationshipPattern variables) and temporarily add those names to lb before
calling _check on e.filter_expr and e.map_expr, then restore lb after; also
import NodePattern and RelationshipPattern from graphforge.ast.pattern as
suggested; this mirrors how _collect_free_variables gathers pattern-bound
variables.
EXISTS/COUNT subqueries have their own scope — variables bound inside
(e.g. f in COUNT { MATCH (p)-[:KNOWS]->(f) WHERE f.age > 28 }) are not
free in the outer query. Guard both _collect_free_variables and
_validate_expr_variables_in_scope to return early on SubqueryExpression.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
VariableAlreadyBounddetection for MERGE standalone node/relationship reuseNoSingleRelationshipTypedetection for MERGE without exactly one relationship typeUndefinedVariabledetection for SET, DELETE, and RETURN clause variablesColumnNameConflictdetection for duplicate RETURN aliasesNoVariablesInScopedetection forRETURN *with only anonymous variablesInvalidArgumentTypeforWHERE (n)with a bound node (not a valid boolean predicate)_collect_free_variablesto avoid false-positive errors in quantifiers, comprehensions, and reduce expressionsCREATE … MERGE …in a single query via grammar rule extensionCloses #370
TCK Impact
Test plan
fail_whenscenarios pass🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes