Skip to content

feat(gf-cypher): Pratt expression parser — all Expr variants#655

Merged
DecisionNerd merged 3 commits into
mainfrom
feature/650-pratt-expr-parser
May 29, 2026
Merged

feat(gf-cypher): Pratt expression parser — all Expr variants#655
DecisionNerd merged 3 commits into
mainfrom
feature/650-pratt-expr-parser

Conversation

@DecisionNerd
Copy link
Copy Markdown
Owner

@DecisionNerd DecisionNerd commented May 29, 2026

Summary

  • Implements parse_expr(ts, min_bp) in crates/gf-cypher/src/parser/expr.rs (~500 lines)
  • Full 11-level binding power table: OR → XOR → AND → NOT → comparisons → +/- → */% → ^ → postfix .[]
  • All Expr AST variants produced: Literal, Var, Param, BinaryOp, UnaryOp, FunctionCall, Property, List, Map, Case, ListComprehension, IsNull, InList, StringOp, RegexMatch, Parenthesized
  • Compound lexer tokens handled as single-token operators: IS NULL, IS NOT NULL, NOT IN, STARTS WITH, ENDS WITH, CONTAINS
  • COUNT(*), COUNT(DISTINCT x), keyword-named functions (exists, any, none, single, filter, extract, reduce, shortestPath)
  • Simple and searched CASE expressions
  • List comprehensions with optional WHERE filter and | projection
  • Subscript [idx] and slice [lo..hi] via synthetic _subscript/_slice FunctionCall nodes
  • eat_ident: error span now references the actual bad token (CodeRabbit finding)
  • parse_map_literal: duplicate keys are now a ParseError (CodeRabbit finding)

Test plan

Closes #650. Part of #554.

🤖 Generated with Claude Code

Note

Implement Pratt expression parser for all Expr variants in gf-cypher

  • Adds a full Pratt parser in crates/gf-cypher/src/parser/expr.rs that parses all Cypher expression forms into gf_ast::Expr nodes with accurate spans.
  • Supports literals, unary/binary ops, logical/comparison/arithmetic operators with correct precedence and associativity, IS [NOT] NULL, IN/NOT IN, string predicates, property access, subscript/slice, function calls (including COUNT(*) and DISTINCT), list literals, list comprehensions, map literals, and both simple and searched CASE expressions.
  • Subscript and slice access are encoded as FunctionCall nodes with synthetic names _subscript and _slice.
  • Keywords are accepted as property and map keys via a permissive eat_ident helper.
  • Exposes parse_expr through the parser module's public API via crates/gf-cypher/src/parser/mod.rs.

Macroscope summarized 8d35a45.

Summary by CodeRabbit

  • New Features
    • Added comprehensive Cypher expression parsing: literals, parameters, variables, function calls (DISTINCT, count(*)), logical/arithmetic/comparison operators, string predicates, IS NULL/IN semantics, property access, indexing/slicing, CASE, list/map literals and comprehensions.
    • Added parser token-stream API with lookahead/consumption helpers and span-aware error reporting.
  • Tests
    • Added unit tests for precedence, predicates, property/subscript/slice handling, function calls, literals, maps/lists, CASE, comprehensions, and error cases.

Review Change Stack

Implements parse_expr(ts, min_bp) in crates/gf-cypher/src/parser/expr.rs.

Binding power table:
  OR(10/11), XOR(20/21), AND(30/31), NOT prefix(35),
  comparisons(40/41), IS NULL/NOT IN/STARTS WITH/etc(40/40),
  +/-(50/51), */%( 60/61), ^(70/70 right-assoc),
  ./[(80/81 postfix)

Produces all Expr AST variants:
  Literal, Var, Param, BinaryOp, UnaryOp, FunctionCall,
  Property, List, Map, Case, ListComprehension,
  IsNull, InList, StringOp, RegexMatch, Parenthesized

- COUNT(*), COUNT(DISTINCT x), keyword-named functions (exists,
  any, none, single, filter, extract, reduce, shortestPath)
- Simple and searched CASE expressions
- List comprehensions with optional WHERE filter and | projection
- Subscript [idx] and slice [lo..hi] encoded as synthetic _subscript
  and _slice FunctionCall nodes until AST gains a dedicated Slice node
- eat_ident: error span now references the actual bad token
- parse_map_literal: duplicate keys are now a ParseError

49 tests passing (11 TokenStream + 38 expression).
Part of #554. Depends on #650 scaffolding (merged in #654).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Walkthrough

Adds a Pratt-style Cypher expression parser and TokenStream scaffolding: precedence table, prefix/infix/postfix handlers (operators, property access, subscripts/slices), function calls (count(*)/DISTINCT), lists/maps/comprehensions, CASE, helper utilities, and comprehensive unit tests.

Changes

Expression Parser Implementation

Layer / File(s) Summary
TokenStream infrastructure and APIs
crates/gf-cypher/src/parser/mod.rs
Introduces TokenStream<'input> that eagerly collects lexer tokens into byte-offset triples with lookahead/consume helpers (peek, peek_n, advance, eat, eat_if), position/span helpers (current_pos, current_span, span_from), and error constructors (err, err_at). Includes unit tests for token collection, lookahead, advancing, eat success/mismatch/EOF, eat_if, empty detection, lexer error propagation, and span behavior.
Pratt parser structure and operator precedence
crates/gf-cypher/src/parser/expr.rs (lines 1–120)
Defines InfixOp classification and infix_binding_power table and implements pub fn parse_expr(ts: &mut TokenStream, min_bp: u8) -> Result<Expr, ParseError> as a precedence-climbing Pratt loop.
Prefix and infix expression parsing
crates/gf-cypher/src/parser/expr.rs (lines 121–446)
Implements parse_prefix for literals, params, variables, keyword-named function calls, unary NOT/-, parenthesized expressions, list literals/comprehensions, map literals, and CASE; implements parse_infix for binary logical/arithmetic/comparison ops, regex =~, IS NULL/IS NOT NULL, IN/NOT IN, string predicates (STARTS WITH/ENDS WITH/CONTAINS), property access (.Property), and bracket indexing/slicing (synthesized into _subscript / _slice).
Complex expression types and handlers
crates/gf-cypher/src/parser/expr.rs (lines 447–612)
Parses function-call argument lists (handles count(*), DISTINCT), list literals and list comprehensions (with optional WHERE and projection), map literals with duplicate-key detection, and simple/searched CASE expressions with multiple WHEN/THEN and optional ELSE.
Parsing helpers and comprehensive expression tests
crates/gf-cypher/src/parser/expr.rs (lines 613–1191)
Adds helpers (eat_ident, operator→AST mapping, keyword name/normalization) and a comprehensive unit test suite validating precedence levels, associativity, predicate operators, property access, subscripts/slices, function calls (count(*), DISTINCT), collection literals, CASE, list comprehensions, and parenthesized expressions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description is comprehensive, including summary, test plan, closing issues, and implementation details. Provided description lacks some template sections but core content about changes is present and clear. Consider completing the optional sections like 'Type of Change', 'Related Issues', 'Testing' details, and 'Checklist' from the template for future consistency and clarity.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: implementing a Pratt expression parser that handles all Expr variants.
Linked Issues check ✅ Passed The PR comprehensively implements all requirements from issue #650: full binding power table (11 levels), all Expr AST variants, compound token handling, prefix/infix/postfix operators, special cases (COUNT(*), DISTINCT, CASE forms, list comprehensions), and 49 passing unit tests covering all exit criteria.
Out of Scope Changes check ✅ Passed All changes are strictly scoped to implementing the Pratt expression parser per #650: expr.rs adds parse_expr and helpers (~1191 lines), mod.rs adds TokenStream scaffolding and module exports (~400 lines). No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/650-pratt-expr-parser

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.09%. Comparing base (12565bd) to head (8d35a45).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #655   +/-   ##
=======================================
  Coverage   84.09%   84.09%           
=======================================
  Files          49       49           
  Lines       12921    12921           
  Branches     3628     3628           
=======================================
  Hits        10866    10866           
  Misses       1270     1270           
  Partials      785      785           
Flag Coverage Δ
full-coverage 84.09% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
parser 92.93% <ø> (ø)
planner 79.90% <ø> (ø)
executor 75.46% <ø> (ø)
storage 98.68% <ø> (ø)
ast 97.51% <ø> (ø)
types 90.66% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12565bd...8d35a45. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread crates/gf-cypher/src/parser/expr.rs
Add NOT, AND, OR, XOR, OPTIONAL, ANY, NONE, SINGLE, FILTER, EXTRACT,
REDUCE, SHORTESTPATH, ALLSHORTESTPATHS to tok_as_keyword_str so that
property accesses like n.not or n.any parse correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
// Prefix (nud)
// ---------------------------------------------------------------------------

fn parse_prefix(ts: &mut TokenStream) -> Result<Expr, ParseError> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low parser/expr.rs:126

Atomic expressions use ts.span_from(start) after ts.advance(), which measures from the expression start to the next token's start. For 42 + 1, the literal 42 gets span (0, 3) instead of (0, 2), incorrectly including whitespace before +. Consider capturing the token's end position (r) from the triple before discarding it.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/gf-cypher/src/parser/expr.rs around line 126:

Atomic expressions use `ts.span_from(start)` after `ts.advance()`, which measures from the expression start to the *next* token's start. For `42 + 1`, the literal `42` gets span `(0, 3)` instead of `(0, 2)`, incorrectly including whitespace before `+`. Consider capturing the token's end position (`r`) from the triple before discarding it.

Evidence trail:
crates/gf-cypher/src/parser/expr.rs lines 126-154 (parse_prefix function with advance() + span_from pattern), crates/gf-cypher/src/parser/mod.rs lines 72-80 (advance() returns (usize, Tok, usize) triple), crates/gf-cypher/src/parser/mod.rs lines 124-130 (current_pos() returns start of current token at self.pos), crates/gf-cypher/src/parser/mod.rs lines 138-141 (span_from uses current_pos() as end), crates/gf-cypher/src/parser/mod.rs lines 26-29 (tokens are Vec<(usize, Tok, usize)> where third element is token end position)

Atomic literal/param prefixes now capture (l, _, r) from advance() and
use Span::new(l, r) directly, so spans don't bleed into trailing
whitespace before the next token.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DecisionNerd DecisionNerd merged commit 1a4f1cc into main May 29, 2026
42 checks passed
@DecisionNerd DecisionNerd deleted the feature/650-pratt-expr-parser branch May 29, 2026 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parser: Pratt expression parser — all Expr variants (src/parser/expr.rs)

1 participant