feat(gf-cypher): Pratt expression parser — all Expr variants#655
Conversation
Implements parse_expr(ts, min_bp) in crates/gf-cypher/src/parser/expr.rs. Binding power table: OR(10/11), XOR(20/21), AND(30/31), NOT prefix(35), comparisons(40/41), IS NULL/NOT IN/STARTS WITH/etc(40/40), +/-(50/51), */%( 60/61), ^(70/70 right-assoc), ./[(80/81 postfix) Produces all Expr AST variants: Literal, Var, Param, BinaryOp, UnaryOp, FunctionCall, Property, List, Map, Case, ListComprehension, IsNull, InList, StringOp, RegexMatch, Parenthesized - COUNT(*), COUNT(DISTINCT x), keyword-named functions (exists, any, none, single, filter, extract, reduce, shortestPath) - Simple and searched CASE expressions - List comprehensions with optional WHERE filter and | projection - Subscript [idx] and slice [lo..hi] encoded as synthetic _subscript and _slice FunctionCall nodes until AST gains a dedicated Slice node - eat_ident: error span now references the actual bad token - parse_map_literal: duplicate keys are now a ParseError 49 tests passing (11 TokenStream + 38 expression). Part of #554. Depends on #650 scaffolding (merged in #654). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WalkthroughAdds a Pratt-style Cypher expression parser and TokenStream scaffolding: precedence table, prefix/infix/postfix handlers (operators, property access, subscripts/slices), function calls (count(*)/DISTINCT), lists/maps/comprehensions, CASE, helper utilities, and comprehensive unit tests. ChangesExpression Parser Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #655 +/- ##
=======================================
Coverage 84.09% 84.09%
=======================================
Files 49 49
Lines 12921 12921
Branches 3628 3628
=======================================
Hits 10866 10866
Misses 1270 1270
Partials 785 785
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Add NOT, AND, OR, XOR, OPTIONAL, ANY, NONE, SINGLE, FILTER, EXTRACT, REDUCE, SHORTESTPATH, ALLSHORTESTPATHS to tok_as_keyword_str so that property accesses like n.not or n.any parse correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| // Prefix (nud) | ||
| // --------------------------------------------------------------------------- | ||
|
|
||
| fn parse_prefix(ts: &mut TokenStream) -> Result<Expr, ParseError> { |
There was a problem hiding this comment.
🟢 Low parser/expr.rs:126
Atomic expressions use ts.span_from(start) after ts.advance(), which measures from the expression start to the next token's start. For 42 + 1, the literal 42 gets span (0, 3) instead of (0, 2), incorrectly including whitespace before +. Consider capturing the token's end position (r) from the triple before discarding it.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/gf-cypher/src/parser/expr.rs around line 126:
Atomic expressions use `ts.span_from(start)` after `ts.advance()`, which measures from the expression start to the *next* token's start. For `42 + 1`, the literal `42` gets span `(0, 3)` instead of `(0, 2)`, incorrectly including whitespace before `+`. Consider capturing the token's end position (`r`) from the triple before discarding it.
Evidence trail:
crates/gf-cypher/src/parser/expr.rs lines 126-154 (parse_prefix function with advance() + span_from pattern), crates/gf-cypher/src/parser/mod.rs lines 72-80 (advance() returns (usize, Tok, usize) triple), crates/gf-cypher/src/parser/mod.rs lines 124-130 (current_pos() returns start of current token at self.pos), crates/gf-cypher/src/parser/mod.rs lines 138-141 (span_from uses current_pos() as end), crates/gf-cypher/src/parser/mod.rs lines 26-29 (tokens are Vec<(usize, Tok, usize)> where third element is token end position)
Atomic literal/param prefixes now capture (l, _, r) from advance() and use Span::new(l, r) directly, so spans don't bleed into trailing whitespace before the next token. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
parse_expr(ts, min_bp)incrates/gf-cypher/src/parser/expr.rs(~500 lines)ExprAST variants produced:Literal,Var,Param,BinaryOp,UnaryOp,FunctionCall,Property,List,Map,Case,ListComprehension,IsNull,InList,StringOp,RegexMatch,ParenthesizedIS NULL,IS NOT NULL,NOT IN,STARTS WITH,ENDS WITH,CONTAINSCOUNT(*),COUNT(DISTINCT x), keyword-named functions (exists,any,none,single,filter,extract,reduce,shortestPath)WHEREfilter and|projection[idx]and slice[lo..hi]via synthetic_subscript/_sliceFunctionCallnodeseat_ident: error span now references the actual bad token (CodeRabbit finding)parse_map_literal: duplicate keys are now aParseError(CodeRabbit finding)Test plan
cargo clippy -D warningscleancargo fmt --checkcleanCloses #650. Part of #554.
🤖 Generated with Claude Code
Note
Implement Pratt expression parser for all
Exprvariants in gf-cyphercrates/gf-cypher/src/parser/expr.rsthat parses all Cypher expression forms intogf_ast::Exprnodes with accurate spans.IS [NOT] NULL,IN/NOT IN, string predicates, property access, subscript/slice, function calls (includingCOUNT(*)andDISTINCT), list literals, list comprehensions, map literals, and both simple and searchedCASEexpressions.FunctionCallnodes with synthetic names_subscriptand_slice.eat_identhelper.parse_exprthrough theparsermodule's public API viacrates/gf-cypher/src/parser/mod.rs.Macroscope summarized 8d35a45.
Summary by CodeRabbit