Skip to content

fix: Phase 4 compile-time validations (#370)#371

Merged
DecisionNerd merged 4 commits into
mainfrom
fix/370-compile-time-validations
Apr 28, 2026
Merged

fix: Phase 4 compile-time validations (#370)#371
DecisionNerd merged 4 commits into
mainfrom
fix/370-compile-time-validations

Conversation

@DecisionNerd
Copy link
Copy Markdown
Owner

@DecisionNerd DecisionNerd commented Apr 28, 2026

Summary

  • Add VariableAlreadyBound detection for MERGE standalone node/relationship reuse
  • Add NoSingleRelationshipType detection for MERGE without exactly one relationship type
  • Add UndefinedVariable detection for SET, DELETE, and RETURN clause variables
  • Add ColumnNameConflict detection for duplicate RETURN aliases
  • Add NoVariablesInScope detection for RETURN * with only anonymous variables
  • Add InvalidArgumentType for WHERE (n) with a bound node (not a valid boolean predicate)
  • Implement scope-aware _collect_free_variables to avoid false-positive errors in quantifiers, comprehensions, and reduce expressions
  • Propagate OPTIONAL MATCH and CALL subquery bound variables into type context
  • Allow CREATE … MERGE … in a single query via grammar rule extension
  • Fix ON CREATE/MATCH SET validation order (bind pattern variables before validating SET expressions)
  • Bind path variable after multi-hop variable-length expansion

Closes #370

TCK Impact

  • Before: 3,635 / 3,885 (93.6%)
  • After: 3,647 / 3,885 (93.9%) — +12 scenarios

Test plan

  • All 276 planner unit tests pass
  • All 2418 unit tests pass
  • All integration tests pass
  • 14 target TCK fail_when scenarios pass
  • No regressions in existing passing tests

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Write queries can now use one or more CREATE clauses followed by one or more MERGE clauses.
  • Bug Fixes

    • Stricter compile-time validation to catch undefined variables in SET, DELETE, and RETURN.
    • Improved MERGE pattern and variable validation; invalid patterns now raise clear errors.
    • Rejected invalid boolean predicates in WHERE (e.g., bare entity references) and improved scope handling for subqueries and multi-hop patterns.

- Add VariableAlreadyBound detection for MERGE standalone node/relationship reuse
- Add NoSingleRelationshipType detection for MERGE relationship patterns
- Add UndefinedVariable detection for SET/DELETE/RETURN clause variables
- Add ColumnNameConflict detection for duplicate RETURN aliases
- Add NoVariablesInScope detection for RETURN * with only anonymous variables
- Add InvalidArgumentType for WHERE (n) with bound node variable
- Add scope-aware free-variable collection (_collect_free_variables) to avoid
  false UndefinedVariable errors inside quantifiers/comprehensions/reduce
- Bind path variable after multi-hop variable-length expansion loops
- Propagate OPTIONAL MATCH bound variables to type context
- Propagate CALL subquery RETURN variables to outer type context
- Allow CREATE + MERGE in a single query via grammar rule extension
- Fix ON CREATE/MATCH SET validation ordering (bind pattern vars first)

All 276 planner unit tests, 2418 unit tests, 4256+ total tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Warning

Rate limit exceeded

@DecisionNerd has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 59 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4947ef2b-8813-4c18-bb9e-6f243c234158

📥 Commits

Reviewing files that changed from the base of the PR and between 8777dfa and a48280d.

📒 Files selected for processing (1)
  • src/graphforge/planner/planner.py

Walkthrough

Adds parser support for CREATE...MERGE write queries and implements extensive compile-time planner validations (MERGE structural checks, RETURN/SET/DELETE undefined-variable and duplicate-column checks, CALL subquery alias binding, WHERE pattern predicate rejection) plus a TypeContext accessor; tests updated accordingly.

Changes

Cohort / File(s) Summary
Grammar extension
src/graphforge/parser/cypher.lark
Added a new write_query alternative to accept one-or-more CREATE clauses followed by one-or-more MERGE clauses, preserving existing optional continuations (SET, RETURN, ORDER BY, SKIP, LIMIT).
Planner validation logic
src/graphforge/planner/planner.py
Added compile-time validations: _validate_merge_patterns and MERGE binding rules, validations for SET/DELETE undefined variables, stricter RETURN checks (RETURN * with empty scope, duplicate column names, undefined free-vars while respecting local bindings), CALL subquery alias propagation, immediate binding of MATCH path/node/rel variables, expanded _validate_pattern_types, and rejection of WHERE (n) as non-boolean.
Type context accessor
src/graphforge/planner/types.py
Introduced TypeContext.bound_variables() returning the set of currently bound variable names.
Integration test
tests/integration/test_merge_on_create_real.py
Changed test_merge_on_create_no_variable to expect a SyntaxError (UndefinedVariable) when ON CREATE SET references an undefined variable.
Unit tests
tests/unit/planner/test_planner_branch_coverage.py, tests/unit/planner/test_planner_edge_cases.py, tests/unit/planner/test_syntax_error_validation.py, tests/unit/planner/test_type_tracking.py
Adjusted tests to align with stricter scope/type validations: replaced unbound returns with literals/bound vars, changed expectations to assert SyntaxError for undefined variables, and relaxed one MERGE error assertion to accept multiple related error types.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • #370: This PR implements the compile-time validations (MERGE structure, RETURN/SET/DELETE undefined-variable checks, WHERE node predicate) described in the issue.
  • #258: Overlaps in planner validation and SyntaxError gap fixes for undefined variables and scope checking.
  • #368: MERGE structural validation and variable-binding behavior align with pattern-based write-query validation goals.

Possibly related PRs

  • #66: Related MERGE handling changes and ON CREATE/ON MATCH flow that intersect with these planner validations.
  • #183: Modifies the same write_query grammar to allow CREATE before MERGE; closely matches the parser change here.
  • #369: Overlapping compile-time planner validations for pattern/variable scoping and undefined-variable detection.

Suggested labels

parser

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'fix: Phase 4 compile-time validations (#370)' clearly and specifically summarizes the main changes: implementing compile-time validations as part of Phase 4, addressing issue #370.
Description check ✅ Passed The PR description follows the template structure with Summary section addressing all objectives, TCK impact metrics, and test plan confirmation. All critical information is present and well-organized.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding objectives from #370: MERGE structural validations [VariableAlreadyBound, NoSingleRelationshipType], RETURN validations [UndefinedVariable, ColumnNameConflict, NoVariablesInScope], SET/DELETE/MERGE ON CREATE/MATCH validations, WHERE predicate validation, scope-aware variable collection, grammar extension, and variable binding order corrections.
Out of Scope Changes check ✅ Passed All code changes directly support the #370 objectives: grammar rule extension for CREATE…MERGE, planner validations for MERGE/RETURN/SET/DELETE/WHERE, TypeContext accessor for bound variables, and test updates to verify new validations. No extraneous changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 90.32% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/370-compile-time-validations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/graphforge/planner/planner.py`:
- Around line 1018-1020: The new binds for OPTIONAL MATCH skip compatibility
checks and can overwrite existing types; before calling
self._type_context.bind_variable for node_pattern.variable,
relationship_pattern.variable, and the destination variable (e.g.,
destination.variable), first call self._type_context.validate_compatible(var,
<appropriate VariableType>) to ensure existing type is compatible (use
VariableType.NODE for node binds and destination binds,
VariableType.RELATIONSHIP for relationship binds), and only then call
bind_variable; this preserves existing VariableTypeConflict behavior instead of
silently rebinding.
- Around line 2029-2038: _validate_set_clause_variables delegates RHS checking
to _validate_expr_variables_in_scope(rhs) but that helper only handles a subset
of expression node types, so complex RHS forms like list comprehensions or CASE
expressions bypass undefined-variable checks; update
_validate_expr_variables_in_scope to recursively traverse and validate all
expression AST node kinds used in SET RHS (at minimum handle list/comprehension
nodes like list comprehensions ([x IN y | ...] / Comprehension), CASE/WHEN nodes
(CaseExpression/ConditionalExpression), and other composite constructors such as
list/map/tuple literals and nested property/call expressions), calling the
existing variable-existence check when encountering Variable nodes so patterns
like SET n.v = [x IN missing | x] and SET n.v = CASE WHEN missing THEN ... are
caught by _validate_set_clause_variables().
- Around line 2100-2161: _collect_free_variables is missing Variable owners
stored as strings and never inspects PatternComprehension.pattern; add explicit
handling for PropertyAccess (check for isinstance(expr, PropertyAccess) and
traverse its owner: if owner is a string, convert or wrap it as Variable before
recursing, otherwise recurse into owner) and for PatternComprehension traverse
expr.pattern as well as expr.filter_expr/map_expr, and when traversing a pattern
extract any names it binds and include them in locally_bound (so bound anchors
in the pattern are not reported as free) before recursing into the map/filter
expressions; update the branches for PatternComprehension and add a new branch
for PropertyAccess in _collect_free_variables to perform these traversals.
- Around line 269-288: The code currently exports subquery aliases into
projected_vars always as VariableType.SCALAR; instead capture each alias's
actual inferred type from the subquery's type context before you restore the
outer context. While iterating ReturnClause items in the inner subquery (using
clause.query, ReturnClause, Variable), look up the alias/expression type from
the current self._type_context (e.g. via whatever lookup/get_binding method you
have on the type context) and append (alias, inferred_type) to projected_vars;
only then restore saved_type_context and rebind each var_name_inner with that
inferred_type using _type_context.bind_variable instead of hard-coding
VariableType.SCALAR.
- Around line 2204-2209: The current branch rejects any bound variable used as a
parenthesized WHERE predicate by unconditionally raising an InvalidArgumentType
SyntaxError; change it to only reject when the bound variable is entity-typed (a
Node), and allow it when its resolved type is Boolean (or other scalar boolean).
Locate the block that raises the SyntaxError (the code referencing inner.name)
and replace the unconditional raise with a type check on the resolved/inferred
type of inner (e.g., inner.type / inner.resolved_type / inner.inferred_type or
by calling the planner's existing is_entity/is_node helper); if that type
indicates a Node/Entity then raise the SyntaxError (and include the actual type
in the message), otherwise accept boolean/scalar types and continue processing
the predicate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b8749acb-345f-4d17-a3ea-bc505aaf0903

📥 Commits

Reviewing files that changed from the base of the PR and between 432f079 and 9feb644.

📒 Files selected for processing (8)
  • src/graphforge/parser/cypher.lark
  • src/graphforge/planner/planner.py
  • src/graphforge/planner/types.py
  • tests/integration/test_merge_on_create_real.py
  • tests/unit/planner/test_planner_branch_coverage.py
  • tests/unit/planner/test_planner_edge_cases.py
  • tests/unit/planner/test_syntax_error_validation.py
  • tests/unit/planner/test_type_tracking.py

Comment thread src/graphforge/planner/planner.py Outdated
Comment thread src/graphforge/planner/planner.py
Comment thread src/graphforge/planner/planner.py
Comment thread src/graphforge/planner/planner.py
Comment thread src/graphforge/planner/planner.py Outdated
DecisionNerd and others added 2 commits April 28, 2026 10:22
…ations

- Fix mypy: rename projected_type loop var to avoid Literal narrowing conflict
- CRab #3: _validate_expr_variables_in_scope now recursively traverses CASE,
  list/pattern comprehensions, quantifiers, reduce, filter, extract expressions,
  respecting locally-bound variables from comprehension loop vars
- CRab #4: _collect_free_variables now handles PropertyAccess string owners
  (wraps expr.variable as Variable) and PatternComprehension.pattern to extract
  anchor variables as locally-bound before traversing filter/map expressions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ariables

Pattern AST nodes come as dicts with 'parts' key rather than raw lists;
extract pattern_parts correctly from the dict before iterating anchors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/graphforge/planner/planner.py`:
- Around line 2260-2263: The PatternComprehension branch in the _check closure
doesn't add pattern-bound variables to the local binding set (lb), causing false
UndefinedVariable errors; update the PatternComprehension handling to extract
variable names from the pattern elements (inspect e.pattern nodes for
NodePattern and RelationshipPattern variables) and temporarily add those names
to lb before calling _check on e.filter_expr and e.map_expr, then restore lb
after; also import NodePattern and RelationshipPattern from
graphforge.ast.pattern as suggested; this mirrors how _collect_free_variables
gathers pattern-bound variables.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4177cae5-3cbc-4ee2-a109-8726354870e6

📥 Commits

Reviewing files that changed from the base of the PR and between 9feb644 and 8777dfa.

📒 Files selected for processing (1)
  • src/graphforge/planner/planner.py

Comment on lines +2260 to +2263
elif isinstance(e, PatternComprehension):
if e.filter_expr is not None:
_check(e.filter_expr, lb)
_check(e.map_expr, lb)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

PatternComprehension handling is inconsistent with _collect_free_variables.

_collect_free_variables (lines 2154-2173) correctly extracts pattern-bound variables (node/relationship variables from the pattern) and adds them to locally_bound. However, this method does not, which will cause false positive UndefinedVariable errors for valid queries like:

MATCH (n) SET n.friends = [(n)-[:FRIEND]->(f) | f.name]

Here, f is bound by the pattern but would be incorrectly flagged as undefined.

Proposed fix
             elif isinstance(e, PatternComprehension):
+                # Collect variable names bound by the pattern (pattern anchors)
+                pattern_bound: set[str] = set()
+                raw_pattern = e.pattern
+                if isinstance(raw_pattern, dict):
+                    pattern_parts_inner: list[Any] = list(
+                        raw_pattern.get("parts", raw_pattern.get("elements", [])) or []
+                    )
+                elif isinstance(raw_pattern, list):
+                    pattern_parts_inner = raw_pattern
+                else:
+                    pattern_parts_inner = [raw_pattern]
+                for part_inner in pattern_parts_inner:
+                    if isinstance(part_inner, (NodePattern, RelationshipPattern)):
+                        if part_inner.variable:
+                            pattern_bound.add(part_inner.variable)
+                new_lb = lb | pattern_bound
                 if e.filter_expr is not None:
-                    _check(e.filter_expr, lb)
-                _check(e.map_expr, lb)
+                    _check(e.filter_expr, new_lb)
+                _check(e.map_expr, new_lb)

Note: You'll need to add NodePattern, RelationshipPattern imports from graphforge.ast.pattern at the top of the _check closure or in the outer function imports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/graphforge/planner/planner.py` around lines 2260 - 2263, The
PatternComprehension branch in the _check closure doesn't add pattern-bound
variables to the local binding set (lb), causing false UndefinedVariable errors;
update the PatternComprehension handling to extract variable names from the
pattern elements (inspect e.pattern nodes for NodePattern and
RelationshipPattern variables) and temporarily add those names to lb before
calling _check on e.filter_expr and e.map_expr, then restore lb after; also
import NodePattern and RelationshipPattern from graphforge.ast.pattern as
suggested; this mirrors how _collect_free_variables gathers pattern-bound
variables.

EXISTS/COUNT subqueries have their own scope — variables bound inside
(e.g. f in COUNT { MATCH (p)-[:KNOWS]->(f) WHERE f.age > 28 }) are not
free in the outer query. Guard both _collect_free_variables and
_validate_expr_variables_in_scope to return early on SubqueryExpression.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: additional compile-time validations — MERGE structural, RETURN scope, SET/DELETE UndefinedVariable, WHERE node predicate

1 participant