Skip to content

feat(ast): Token enum, ParseError, and full AST node types#635

Merged
DecisionNerd merged 2 commits into
mainfrom
feature/552-553-ast-token-types
May 27, 2026
Merged

feat(ast): Token enum, ParseError, and full AST node types#635
DecisionNerd merged 2 commits into
mainfrom
feature/552-553-ast-token-types

Conversation

@DecisionNerd
Copy link
Copy Markdown
Owner

@DecisionNerd DecisionNerd commented May 27, 2026

Summary

  • Implements the frozen Token ABI (rust: define Token, Span, and ParseError types (parser ABI freeze) #552): Token enum with all openCypher token classes, Keyword enum, ParseError/ParseErrorKind with structured span info — all Clone + Debug + PartialEq + Serialize + Deserialize
  • Implements full AST node types (rust: define AST node types in gf-ast #553): AstQuery, AstClause (12 variants), NodePattern, RelPattern, PathPattern, Expr (15 variants), BinaryOp, FunctionCall, CaseExpr, ListComprehension, PatternComprehension, and all supporting types
  • 47 Rust unit tests (span extraction, JSON round-trips, all major node types)
  • gf-cypher::parse() updated to return ParseError (not GfError) aligned with the frozen ABI
  • Fixes pre-existing CI failures that came in with the rust-core merge:
    • api_v05.py: missing directory check, whitespace-only query validation, mypy arg-type error
    • api_steps.py: RUF012 mutable class defaults
    • conftest.py: ruff format

Test plan

  • cargo test --package gf-ast — 47 tests pass
  • cargo clippy --workspace -- -D warnings — clean
  • uv run ruff check . — clean
  • uv run mypy src/graphforge — clean
  • uv run pytest tests/unit — 3672 pass

Closes #552, #553

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Full OpenCypher AST model and token types for richer query representation and tooling
    • Structured parse errors with kinds and source-location reporting
  • Bug Fixes / Behavior

    • Parser API updated: parse now returns parser errors (stubbed "not implemented" until further work) and exposes AST/token types
    • Python API: persistent path validation, edge ID coercion to int, improved empty-query handling
  • Tests

    • Extensive AST and JSON round-trip tests added

Review Change Stack

Note

Add full AST node types, Token enum, and ParseError to gf-ast crate

  • Introduces a complete Cypher AST in ast.rs, including AstQuery, all clause types (MatchClause, ReturnClause, MergeClause, etc.), expression types (Expr, Literal, BinaryOp, etc.), and path/pattern nodes.
  • Adds token.rs with a spanned Token enum covering keywords, literals, punctuation, trivia, and EOF, plus Keyword and helper methods span()/is_trivia().
  • Adds parse_error.rs with ParseError and ParseErrorKind, implementing Display and std::error::Error.
  • Updates lib.rs to re-export all new types; removes the old placeholder AstQuery struct.
  • Updates gf-cypher's parse() to return Result<AstQuery, ParseError> (stub that always returns UnexpectedEof) instead of GfError.
  • Risk: the old AstQuery { source, span } struct is removed from the public API, which is a breaking change for any existing consumers.

Macroscope summarized 6944056.

gf-ast now contains:
- Token enum with all openCypher token classes (keywords, literals,
  operators, punctuation, trivia) — frozen lexer ABI
- Keyword enum covering all reserved words
- ParseError / ParseErrorKind with structured error kinds and span info
- Complete AST: AstQuery, AstClause (12 variants), NodePattern,
  RelPattern, PathPattern, Expr (15 variants), plus all supporting
  structs (BinaryOp, FunctionCall, CaseExpr, ListComprehension, etc.)
- All types are Clone + Debug + PartialEq + Serialize + Deserialize
- 47 unit tests covering span extraction, JSON round-trips, and all
  major node types

gf-cypher updated: parse() now returns ParseError (not GfError) to
align with the frozen ABI; stub returns UnexpectedEof until #554 lands.

Also fixes pre-existing CI failures on main:
- api_v05.py: add directory check in __init__; fix whitespace-only
  query validation in find(); fix mypy arg-type error in add_edges
- tests/features/steps/api_steps.py: fix RUF012 mutable class defaults
- Reformat conftest.py, api_steps.py to satisfy ruff format

Closes #552, #553

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6ae9bc5c-9f30-4f9d-8a47-6ea1d5f526de

📥 Commits

Reviewing files that changed from the base of the PR and between 9992c02 and 6944056.

📒 Files selected for processing (1)
  • crates/gf-ast/src/tests.rs

Walkthrough

This PR introduces a complete OpenCypher AST module (gf-ast) with span-rich, JSON-serializable data structures for Cypher parsing and interop. It defines AstQuery (dialect + ordered clauses), 12 AstClause variants, detailed pattern and expression trees, along with Token/Keyword and ParseError types for lexer/parser ABI contracts. Comprehensive round-trip tests validate all node types and clauses. Updates to gf-cypher integrate the new error contract; formatting refactors improve test infrastructure and Python API logic.

Changes

AST & Parser Contract Implementation

Layer / File(s) Summary
Development dependencies and module setup
crates/gf-ast/Cargo.toml, crates/gf-ast/src/lib.rs
Added serde_json dev-dependency and declared ast, parse_error, token submodules with broad re-exports of AST/token/error types via pub use.
AstQuery container and clause enum
crates/gf-ast/src/ast.rs
Defined AstQuery wrapper holding dialect, ordered top-level clauses, and full-query span. Introduced AstClause enum with 12 variants and AstClause::span() accessor.
Core read and projection clauses
crates/gf-ast/src/ast.rs
Implemented MatchClause, WhereClause, WithClause, ReturnClause with ReturnItem projection structs; added OrderByClause, SortItem, and SortOrder for ordering with span tracking.
Mutation and creation clauses
crates/gf-ast/src/ast.rs
Implemented CreateClause, MergeClause, SetClause (SetItem), RemoveClause (RemoveItem), DeleteClause, UnwindClause, and CallClause with span-carrying variants.
Pattern and path representations
crates/gf-ast/src/ast.rs
Defined PathPattern/PathElement, NodePattern, RelPattern, and Direction enum (Incoming/Outgoing/Undirected) with span fields.
Expression types and operators
crates/gf-ast/src/ast.rs
Implemented recursive Expr enum with literals, vars, property access, unary/binary ops, function calls, list/map literals, CASE, list/pattern comprehensions; added Expr::span() and Literal::span().
Token and Keyword lexer ABI
crates/gf-ast/src/token.rs
Defined Keyword enum (non-exhaustive) and Token enum where variants carry Span; added Token::span() and Token::is_trivia().
ParseError infrastructure
crates/gf-ast/src/parse_error.rs
Defined ParseErrorKind enum and ParseError struct with kind, span, message; implemented Display, std::error::Error, and ParseError::new.
Parser stub update and exports
crates/gf-cypher/src/lib.rs
Updated parse() to return Result<AstQuery, ParseError> using a stubbed UnexpectedEof ParseError and adjusted public re-exports to expose AstQuery, ParseError, and Token.
Span and error formatting tests
crates/gf-ast/src/tests.rs
Added unit tests verifying Span formatting/defaults, Token span extraction and trivia classification, and ParseError Display/Error conformance.
AST node JSON round-trip tests
crates/gf-ast/src/tests.rs
Comprehensive serde_json round-trip tests for AstQuery, clause variants, expressions, literals, patterns, and enum defaults.
Formatting and test infrastructure updates
crates/gf-core/src/lib.rs, crates/gf-core/tests/bdd/api_steps.rs, tests/features/conftest.py, tests/features/steps/api_steps.py, tests/features/test_api_bdd.py
Reformatted Rust struct initializers and BDD step definitions into multiline forms; removed unused imports; refactored _Ctx to per-instance init.
Python GraphForge API validation updates
src/graphforge/api_v05.py
Changed GraphForge.__init__ to reject non-directory persistent paths; add_edges coerces src_id/dst_id to int and skips invalid rows; find() treats empty/whitespace query as missing unless vector provided; plus import/export reformatting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • #553: Implements the OpenCypher AST node types, enums, and patterns (AstQuery, AstClause variants, NodePattern/RelPattern/Direction, Expr tree) requested by this issue.
  • #554: Adds Token/Span/ParseError contract that the LALRPOP-based Cypher parser will depend on.

Possibly related PRs

  • DecisionNerd/graphforge#137: Related AST model and unit-test additions validating AST node/pattern/expression serialization and behavior.

Suggested labels

enhancement

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main changes: introduction of Token enum, ParseError, and AST node types for the AST module.
Linked Issues check ✅ Passed The code changes fully implement the requirements from #552: Token enum with all openCypher tokens, ParseError with structured span info, and all required derives (Clone, Debug, PartialEq, Serialize, Deserialize). Issue #553 AST node types are also comprehensively implemented with 47 unit tests.
Out of Scope Changes check ✅ Passed While the PR includes some formatting/refactoring changes to Python files (api_v05.py, api_steps.py, conftest.py) and minor Rust formatting, all changes are directly related to fixing CI failures mentioned in the PR description and aligning the parse() function return type with the new ParseError ABI.
Description check ✅ Passed PR description comprehensively covers objectives, changes, test plan, and closing issues, following the template structure with detailed implementation summary.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/552-553-ast-token-types

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 96.15385% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.09%. Comparing base (f9a3752) to head (6944056).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #635   +/-   ##
=======================================
  Coverage   84.08%   84.09%           
=======================================
  Files          49       49           
  Lines       12915    12921    +6     
  Branches     3627     3628    +1     
=======================================
+ Hits        10860    10866    +6     
  Misses       1270     1270           
  Partials      785      785           
Flag Coverage Δ
full-coverage 84.09% <96.15%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
parser 92.93% <ø> (ø)
planner 79.90% <ø> (ø)
executor 75.46% <ø> (ø)
storage 98.68% <ø> (ø)
ast 97.51% <ø> (ø)
types 90.66% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f9a3752...6944056. Read the comment docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/gf-ast/src/tests.rs (1)

7-7: 💤 Low value

Optional: Remove redundant crate import.

In Rust 2018+, you can use serde_json::to_string directly without use serde_json;. The import is harmless but unnecessary.

♻️ Proposed cleanup
 use crate::{
     ast::*, parse_error::*, token::*, Span,
 };
-use serde_json;

Then use serde_json::to_string and serde_json::from_str throughout.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gf-ast/src/tests.rs` at line 7, The tests include an unnecessary crate
import "use serde_json;"—remove that redundant use statement and update call
sites (e.g., any existing to_string/from_str usage) to call the functions with
the full path serde_json::to_string and serde_json::from_str where needed; this
cleans up imports while keeping behavior the same.
crates/gf-cypher/src/lib.rs (1)

15-23: 💤 Low value

Remove redundant import alias.

Span is already re-exported from gf_core at line 7. The alias Span as AstSpan on line 16 is unnecessary since gf_ast::Span is the same type.

♻️ Proposed simplification
 pub fn parse(_input: &str) -> Result<AstQuery, ParseError> {
-    use gf_ast::{ParseErrorKind, Span as AstSpan};
+    use gf_ast::ParseErrorKind;
     Err(ParseError::new(
         ParseErrorKind::UnexpectedEof {
             expected: vec!["statement".to_owned()],
         },
-        AstSpan::new(0, 0),
+        Span::new(0, 0),
         "parser not yet implemented — lands in M9 issue `#554`",
     ))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gf-cypher/src/lib.rs` around lines 15 - 23, Remove the unnecessary
alias for Span in the parse function import and update its usage: replace the
import gf_ast::{ParseErrorKind, Span as AstSpan} with gf_ast::{ParseErrorKind,
Span} and change the call AstSpan::new(0, 0) to Span::new(0, 0) so the code uses
the re-exported Span type directly in function parse.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gf-ast/src/parse_error.rs`:
- Around line 33-42: The file contains a duplicated doc comment for the
ParseError type (the two identical blocks mentioning `ParseError`,
`gf_cypher::parse`, and the `span` and `kind` fields); remove the second
duplicated comment block so only one doc comment describing
`ParseError`/`gf_cypher::parse`/`span` and `kind` remains (locate the duplicate
by searching for the repeated text mentioning `ParseError` and
`gf_cypher::parse` and delete the latter occurrence).

---

Nitpick comments:
In `@crates/gf-ast/src/tests.rs`:
- Line 7: The tests include an unnecessary crate import "use serde_json;"—remove
that redundant use statement and update call sites (e.g., any existing
to_string/from_str usage) to call the functions with the full path
serde_json::to_string and serde_json::from_str where needed; this cleans up
imports while keeping behavior the same.

In `@crates/gf-cypher/src/lib.rs`:
- Around line 15-23: Remove the unnecessary alias for Span in the parse function
import and update its usage: replace the import gf_ast::{ParseErrorKind, Span as
AstSpan} with gf_ast::{ParseErrorKind, Span} and change the call AstSpan::new(0,
0) to Span::new(0, 0) so the code uses the re-exported Span type directly in
function parse.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: abc6ca59-b5b5-447d-873b-06319790515a

📥 Commits

Reviewing files that changed from the base of the PR and between f9a3752 and 9992c02.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock, !**/*.lock
📒 Files selected for processing (14)
  • crates/gf-ast/Cargo.toml
  • crates/gf-ast/src/ast.rs
  • crates/gf-ast/src/lib.rs
  • crates/gf-ast/src/parse_error.rs
  • crates/gf-ast/src/tests.rs
  • crates/gf-ast/src/token.rs
  • crates/gf-core/src/lib.rs
  • crates/gf-core/tests/bdd/api_steps.rs
  • crates/gf-cypher/src/lib.rs
  • crates/gf-exec/src/lib.rs
  • src/graphforge/api_v05.py
  • tests/features/conftest.py
  • tests/features/steps/api_steps.py
  • tests/features/test_api_bdd.py
💤 Files with no reviewable changes (1)
  • tests/features/test_api_bdd.py

Comment on lines +33 to +42
/// A structured parse error produced by `gf-cypher`.
///
/// `ParseError` is the error type in the `Result` returned by
/// `gf_cypher::parse`. The differential test harness uses the `span` and
/// `kind` fields to assert that both parsers flag the same source location.
/// A structured parse error produced by `gf-cypher`.
///
/// `ParseError` is the error type in the `Result` returned by
/// `gf_cypher::parse`. The differential test harness uses the `span` and
/// `kind` fields to assert that both parsers flag the same source location.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove duplicate doc comment.

Lines 33–37 and 38–42 are identical. Remove the second block (lines 38–42).

📝 Proposed fix
 /// A structured parse error produced by `gf-cypher`.
 ///
 /// `ParseError` is the error type in the `Result` returned by
 /// `gf_cypher::parse`. The differential test harness uses the `span` and
 /// `kind` fields to assert that both parsers flag the same source location.
-/// A structured parse error produced by `gf-cypher`.
-///
-/// `ParseError` is the error type in the `Result` returned by
-/// `gf_cypher::parse`. The differential test harness uses the `span` and
-/// `kind` fields to assert that both parsers flag the same source location.
 #[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// A structured parse error produced by `gf-cypher`.
///
/// `ParseError` is the error type in the `Result` returned by
/// `gf_cypher::parse`. The differential test harness uses the `span` and
/// `kind` fields to assert that both parsers flag the same source location.
/// A structured parse error produced by `gf-cypher`.
///
/// `ParseError` is the error type in the `Result` returned by
/// `gf_cypher::parse`. The differential test harness uses the `span` and
/// `kind` fields to assert that both parsers flag the same source location.
/// A structured parse error produced by `gf-cypher`.
///
/// `ParseError` is the error type in the `Result` returned by
/// `gf_cypher::parse`. The differential test harness uses the `span` and
/// `kind` fields to assert that both parsers flag the same source location.
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gf-ast/src/parse_error.rs` around lines 33 - 42, The file contains a
duplicated doc comment for the ParseError type (the two identical blocks
mentioning `ParseError`, `gf_cypher::parse`, and the `span` and `kind` fields);
remove the second duplicated comment block so only one doc comment describing
`ParseError`/`gf_cypher::parse`/`span` and `kind` remains (locate the duplicate
by searching for the repeated text mentioning `ParseError` and
`gf_cypher::parse` and delete the latter occurrence).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rust: define Token, Span, and ParseError types (parser ABI freeze)

1 participant