Skip to content

Opencypher refactoring#3367

Merged
lvca merged 3 commits intomainfrom
opencypher-refactoring
Feb 7, 2026
Merged

Opencypher refactoring#3367
lvca merged 3 commits intomainfrom
opencypher-refactoring

Conversation

@lvca
Copy link
Contributor

@lvca lvca commented Feb 7, 2026

No description provided.

@lvca lvca added this to the 26.2.1 milestone Feb 7, 2026
@lvca lvca self-assigned this Feb 7, 2026
@lvca lvca marked this pull request as ready for review February 7, 2026 21:25
@lvca lvca merged commit b06b4c1 into main Feb 7, 2026
8 of 10 checks passed
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @lvca, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the OpenCypher query parsing and Abstract Syntax Tree (AST) processing within the engine. The changes aim to improve the modularity, readability, and extensibility of the query language implementation. By introducing a more structured approach to clause dispatching, statement construction, and expression parsing, the codebase becomes easier to maintain and extend. Additionally, a new AST rewriting framework is established to enable compile-time optimizations, and enhanced semantic analysis tools are integrated to provide more accurate validation and user-friendly error messages.

Highlights

  • Modular Parser Architecture: Introduced new classes like ClauseDispatcher, ClauseHandler, StatementBuilder, ExpressionTypeDetector, and ExpressionPrecedence to modularize and streamline the Cypher query parsing process, replacing large, complex if/else blocks with more maintainable and extensible patterns.
  • AST Rewriting Framework: Implemented a robust Abstract Syntax Tree (AST) rewriting pipeline, including a base ExpressionRewriter and specialized rewriters such as ComparisonNormalizer, ConstantFolder, and BooleanSimplifier, to perform compile-time optimizations and canonicalization of expressions.
  • Enhanced Semantic Analysis: Added FunctionValidator for comprehensive function argument validation, VariableScopeTracker for improved variable scope management and error suggestions, and a structured SemanticError class for clearer error reporting.
  • Code Utility Consolidation: Created ParserUtils to centralize common parsing utility functions, such as string manipulation and value parsing, reducing code duplication across the parser components.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • engine/src/main/java/com/arcadedb/query/opencypher/parser/ClauseDispatcher.java
    • Added new class to implement a strategy pattern for dispatching Cypher clause contexts to appropriate handlers.
  • engine/src/main/java/com/arcadedb/query/opencypher/parser/ClauseHandler.java
    • Added new functional interface defining the contract for handling different Cypher clause types.
  • engine/src/main/java/com/arcadedb/query/opencypher/parser/CypherASTBuilder.java
    • Refactored visitSingleQuery to utilize StatementBuilder and ClauseDispatcher for cleaner clause processing.
    • Introduced AST_REWRITER (a CompositeRewriter) to enable post-parsing AST optimizations.
    • Delegated common utility methods like parseValueString, extractLabels, stripBackticks, and decodeStringLiteral to the new ParserUtils class.
  • engine/src/main/java/com/arcadedb/query/opencypher/parser/CypherExpressionBuilder.java
    • Refactored parseExpression to use ExpressionTypeDetector for a more organized and precedence-aware parsing logic.
    • Changed visibility of findTopLevelExpressionX methods to package-private for use by ExpressionTypeDetector.
    • Delegated findOperatorOutsideParentheses to ParserUtils.
  • engine/src/main/java/com/arcadedb/query/opencypher/parser/ExpressionPrecedence.java
    • Added new enum to formally document the operator precedence hierarchy in Cypher expressions.
  • engine/src/main/java/com/arcadedb/query/opencypher/parser/ExpressionTypeDetector.java
    • Added new class to centralize and categorize the logic for detecting and parsing various Cypher expression types based on precedence.
  • engine/src/main/java/com/arcadedb/query/opencypher/parser/FunctionValidator.java
    • Added new class to validate Cypher function calls, including argument counts and types, and to maintain a registry of built-in function signatures.
  • engine/src/main/java/com/arcadedb/query/opencypher/parser/ParserUtils.java
    • Added new utility class to consolidate common parsing helper methods, such as string escaping, label extraction, and value parsing.
  • engine/src/main/java/com/arcadedb/query/opencypher/parser/SemanticError.java
    • Added new class to provide a structured representation for semantic errors in Cypher queries, including error type, context, and suggestions.
  • engine/src/main/java/com/arcadedb/query/opencypher/parser/StatementBuilder.java
    • Added new class implementing a builder pattern to accumulate and construct Cypher statement components during parsing.
  • engine/src/main/java/com/arcadedb/query/opencypher/parser/VariableScopeTracker.java
    • Added new class to track variable scope and definitions across Cypher query clauses, aiding in the detection and suggestion for undefined variables.
  • engine/src/main/java/com/arcadedb/query/opencypher/rewriter/BooleanSimplifier.java
    • Added new ExpressionRewriter subclass to simplify boolean expressions using algebraic identities.
  • engine/src/main/java/com/arcadedb/query/opencypher/rewriter/ComparisonNormalizer.java
    • Added new ExpressionRewriter subclass to normalize comparison expressions, placing variables/properties on the left for better optimizer matching.
  • engine/src/main/java/com/arcadedb/query/opencypher/rewriter/CompositeRewriter.java
    • Added new ExpressionRewriter subclass that allows chaining multiple rewrite rules to be applied sequentially.
  • engine/src/main/java/com/arcadedb/query/opencypher/rewriter/ConstantFolder.java
    • Added new ExpressionRewriter subclass to fold constant expressions into literal values at rewrite time, reducing runtime computation.
  • engine/src/main/java/com/arcadedb/query/opencypher/rewriter/ExpressionRewriter.java
    • Added new abstract base class providing a visitor pattern for traversing and transforming expression trees in the AST.
  • engine/src/test/java/com/arcadedb/query/opencypher/parser/ParserUtilsTest.java
    • Added unit tests for the ParserUtils class.
  • engine/src/test/java/com/arcadedb/query/opencypher/parser/StatementBuilderTest.java
    • Added unit tests for the StatementBuilder class.
  • engine/src/test/java/com/arcadedb/query/opencypher/rewriter/ComparisonNormalizerTest.java
    • Added unit tests for the ComparisonNormalizer class.
  • engine/src/test/java/com/arcadedb/query/opencypher/rewriter/CompositeRewriterTest.java
    • Added unit tests for the CompositeRewriter class.
  • engine/src/test/java/com/arcadedb/query/opencypher/rewriter/ConstantFolderTest.java
    • Added unit tests for the ConstantFolder class.
Activity
  • The pull request introduces a comprehensive refactoring of the OpenCypher parser, moving towards a more modular and extensible design.
  • New core components for clause dispatching, statement building, and expression parsing have been added, replacing previous monolithic logic.
  • An Abstract Syntax Tree (AST) rewriting pipeline has been established, incorporating initial optimization rules like comparison normalization and constant folding.
  • Semantic analysis capabilities have been enhanced with new classes for function validation and variable scope tracking, designed to provide richer error feedback.
  • Extensive unit tests have been added for the newly introduced utility classes and rewriters, ensuring the correctness and reliability of the refactored components.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@lvca lvca deleted the opencypher-refactoring branch February 7, 2026 21:25
@mergify
Copy link
Contributor

mergify bot commented Feb 7, 2026

🧪 CI Insights

Here's what we observed from your CI run for a936031.

🟢 All jobs passed!

But CI Insights is watching 👀

@claude
Copy link

claude bot commented Feb 7, 2026

Comprehensive PR Review: OpenCypher Parser Refactoring

This is a substantial refactoring of the OpenCypher parser components with excellent architectural improvements. The code is well-structured, documented, and includes comprehensive test coverage.


1. Code Quality & ArcadeDB Conventions ✅

Strengths:

  • Excellent separation of concerns: The refactoring breaks down the monolithic parser into focused components (ClauseDispatcher, StatementBuilder, ExpressionTypeDetector, etc.)
  • Proper coding style: Follows ArcadeDB conventions (final keywords, single-statement if-blocks without braces, clear naming)
  • Copyright headers: All files have proper Apache 2.0 headers with SPDX identifiers
  • Good JavaDoc: Comprehensive documentation with examples and rationale

Minor Issues:

  • Most classes are package-private - verify this is intentional for classes like ExpressionPrecedence that might benefit from being public for documentation purposes

2. Potential Bugs ⚠️

Medium Priority:

  1. BooleanSimplifier incomplete (BooleanSimplifier.java:149-170):

    private boolean isWrappedTrue(final BooleanExpression expr) {
        // Note: BooleanWrapperExpression doesn't expose inner expression yet
        // For now, return false - this will be implemented when wrapper API is available
        return false;
    }
    • The simplifier cannot detect wrapped boolean literals, rendering AND/OR identity optimizations ineffective
    • Recommendation: Either implement the wrapper introspection or remove this rewriter until the API is available
  2. ConstantFolder limited boolean folding (ConstantFolder.java:75-105):

    final Object foldedValue = foldComparison(leftValue, rightValue, comp.getOperator());
    if (foldedValue != null) {
        // Note: LiteralExpression implements Expression, not BooleanExpression
        // We need to wrap it in something that implements BooleanExpression
        // For now, keep the original comparison expression
        return comp;
    }
    • Comparison folding is computed but not applied due to type mismatch
    • This means 5 > 3 won't fold to true at parse time
    • Recommendation: Introduce a BooleanLiteralExpression or wrapper to enable this optimization
  3. Integer overflow detection (CypherExpressionBuilder.java:2141-2142):

    • The overflow check could throw for malformed numbers like "123abc" which aren't actually overflows
    • Recommendation: Catch NumberFormatException more precisely or validate before parsing
  4. AST Rewriter not integrated (CypherASTBuilder.java:139-145):

    // NOTE: AST rewriting infrastructure is in place (AST_REWRITER defined above)
    // Full integration would require walking the statement tree and rewriting all expressions.
    // This is deferred to avoid breaking existing functionality.
    • The rewriter is defined but never applied
    • Tests exist for rewriters but they're not exercised in real queries
    • Recommendation: Add a feature flag or configuration to enable rewriting once tested

3. Performance Considerations ✅

Strengths:

  • Efficient dispatch pattern: ClauseDispatcher uses Map-based strategy pattern (O(1) lookup)
  • Fast path optimizations: ParserUtils.decodeStringLiteral() checks for backslash before allocating StringBuilder
  • Reusable components: Expression type detection is centralized, avoiding duplicate traversals

Potential Improvements:

  • ExpressionTypeDetector recursive searches: Methods like findFunctionInvocationRecursive() traverse the entire parse tree. Consider caching results or limiting depth for very large expressions
  • Levenshtein distance in VariableScopeTracker is O(n²) but acceptable since it's on error path

4. Security Concerns ✅

No significant security issues found. The code properly:

  • Handles null inputs defensively
  • Validates numeric literals for overflow
  • Uses immutable AST nodes to prevent tampering
  • Doesn't expose sensitive information in error messages

5. Test Coverage ✅⚠️

Excellent coverage for utility classes:

  • ParserUtilsTest: 11 tests covering backticks, property parsing, string decoding, operator finding
  • StatementBuilderTest: 10 tests covering all builder methods and clause ordering
  • Rewriter tests: Comprehensive coverage for ComparisonNormalizer, ConstantFolder, CompositeRewriter

Missing coverage:

  • No tests for new parser classes: ClauseDispatcher, ClauseHandler, ExpressionTypeDetector, ExpressionPrecedence
  • No tests for: SemanticError, VariableScopeTracker, FunctionValidator
  • No integration tests: The refactored components aren't tested together in real Cypher query scenarios

Recommendations:

  1. Add integration tests that parse complete Cypher queries to ensure components work together
  2. Test error cases: malformed queries, undefined variables, type mismatches
  3. Test VariableScopeTracker with complex scope scenarios (WITH clauses, subqueries)

6. Architecture & Design Patterns ✅✅

Excellent architectural decisions:

  1. Strategy Pattern (ClauseDispatcher): Eliminates cascading if/else with clean dispatch table
  2. Builder Pattern (StatementBuilder): Reduces local variable clutter and makes clause ordering explicit
  3. Visitor Pattern (ExpressionRewriter): Provides extensible AST transformation framework
  4. Composite Pattern (CompositeRewriter): Allows chaining multiple rewrite rules
  5. Separation of Concerns: Clear division between parsing, type detection, utilities, and statement building

Design strengths:

  • Immutable AST: Rewriters create new nodes instead of mutating, preventing side effects
  • Clear documentation: Each class has purpose statement and usage examples
  • Extensibility: New rewrite rules can be added without modifying existing code
  • Type safety: Uses enums for operators, error types, precedence levels

Summary Recommendations

Must Fix Before Merge:

  1. ✅ Complete or remove BooleanSimplifier (currently non-functional)
  2. ✅ Fix ConstantFolder boolean folding or document limitation
  3. ✅ Add integration tests for refactored components

Should Fix:
4. ⚠️ Fix integer overflow detection in tryParseLiteral
5. ⚠️ Add missing unit tests (VariableScopeTracker, FunctionValidator, SemanticError)
6. ⚠️ Verify package visibility is correct for utility classes

Nice to Have:
7. 📋 Document when AST rewriter integration will be completed
8. 📋 Extract duplicated label parsing logic
9. 📋 Consider caching in recursive expression searches for performance


Overall Assessment ⭐⭐⭐⭐ (4/5 stars)

This is a high-quality refactoring that significantly improves code organization, maintainability, and extensibility. The separation of concerns is excellent, design patterns are well-applied, and documentation is thorough.

The main concerns are:

  • Incomplete implementations (BooleanSimplifier, ConstantFolder boolean folding)
  • Missing test coverage for some components
  • AST rewriter infrastructure exists but isn't used

Once the incomplete implementations are addressed and test coverage is improved, this PR will be ready to merge. The architectural improvements are solid and will make future development much easier.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a major and impressive refactoring of the OpenCypher parser. The introduction of patterns like Strategy (ClauseDispatcher), Builder (StatementBuilder), and a dedicated utility class (ParserUtils) significantly improves the structure, readability, and maintainability of the code. Breaking down the monolithic CypherASTBuilder and CypherExpressionBuilder into smaller, more focused components is an excellent move. The addition of an AST rewriter framework with constant folding and normalization rules is a great foundation for future query optimizations. The new unit tests for the refactored components are also a very welcome addition. I've found a few minor issues, but overall this is a high-quality contribution.

Comment on lines +116 to +120
if (value.startsWith("'") && value.endsWith("'"))
return value.substring(1, value.length() - 1);

if (value.startsWith("\"") && value.endsWith("\""))
return value.substring(1, value.length() - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The parseValueString method does not decode escape sequences (like \n, \t) within string literals. It only strips the outer quotes. This will lead to incorrect string values when they are parsed from contexts that don't handle unescaping beforehand, such as the fallback parser in CypherASTBuilder. To ensure correctness, decodeStringLiteral should be called on the content of quoted strings.

Suggested change
if (value.startsWith("'") && value.endsWith("'"))
return value.substring(1, value.length() - 1);
if (value.startsWith("\"") && value.endsWith("\""))
return value.substring(1, value.length() - 1);
if (value.startsWith("'") && value.endsWith("'"))
return decodeStringLiteral(value.substring(1, value.length() - 1));
if (value.startsWith("\"") && value.endsWith("\""))
return decodeStringLiteral(value.substring(1, value.length() - 1));

Comment on lines +48 to +50
final List<Expression> args = new java.util.ArrayList<>();
args.add(new com.arcadedb.query.opencypher.ast.StarExpression());
return new com.arcadedb.query.opencypher.ast.FunctionCallExpression("count", args, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code uses fully qualified class names for java.util.ArrayList and AST classes like StarExpression and FunctionCallExpression. This makes the code unnecessarily verbose. Please add the corresponding import statements at the top of the file and use the simple class names instead.

Suggested change
final List<Expression> args = new java.util.ArrayList<>();
args.add(new com.arcadedb.query.opencypher.ast.StarExpression());
return new com.arcadedb.query.opencypher.ast.FunctionCallExpression("count", args, false);
final List<Expression> args = new ArrayList<>();
args.add(new StarExpression());
return new FunctionCallExpression("count", args, false);

Comment on lines +149 to +157
private boolean isWrappedTrue(final BooleanExpression expr) {
if (expr instanceof BooleanWrapperExpression wrapper) {
// Check if wrapper contains literal true
// Note: BooleanWrapperExpression doesn't expose inner expression yet
// For now, return false - this will be implemented when wrapper API is available
return false;
}
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The isWrappedTrue method (and similarly isWrappedFalse) will never work as intended. Its parameter expr has the type BooleanExpression, but the instanceof check is for BooleanWrapperExpression, which implements Expression, not BooleanExpression. Due to this type mismatch in the AST hierarchy, the check expr instanceof BooleanWrapperExpression will always evaluate to false, and the boolean simplification logic for literals will never be triggered. This part of the rewriter needs to be revisited to correctly handle boolean literals within logical expressions.

@codacy-production
Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-11.66% 48.78%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (95820cc) 79964 55743 69.71%
Head commit (a936031) 110218 (+30254) 63982 (+8239) 58.05% (-11.66%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3367) 939 458 48.78%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@codecov
Copy link

codecov bot commented Feb 7, 2026

Codecov Report

❌ Patch coverage is 42.17252% with 543 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.77%. Comparing base (95820cc) to head (a936031).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...edb/query/opencypher/parser/FunctionValidator.java 0.00% 99 Missing ⚠️
.../query/opencypher/parser/VariableScopeTracker.java 0.00% 92 Missing ⚠️
.../query/opencypher/rewriter/ExpressionRewriter.java 21.00% 67 Missing and 12 partials ⚠️
...dedb/query/opencypher/rewriter/ConstantFolder.java 32.71% 46 Missing and 26 partials ⚠️
...rcadedb/query/opencypher/parser/SemanticError.java 0.00% 64 Missing ⚠️
...b/query/opencypher/rewriter/BooleanSimplifier.java 2.27% 43 Missing ⚠️
.../query/opencypher/parser/ExpressionPrecedence.java 0.00% 32 Missing ⚠️
.../arcadedb/query/opencypher/parser/ParserUtils.java 80.18% 12 Missing and 9 partials ⚠️
...uery/opencypher/parser/ExpressionTypeDetector.java 84.78% 6 Missing and 8 partials ⚠️
...dedb/query/opencypher/parser/ClauseDispatcher.java 84.37% 9 Missing and 1 partial ⚠️
... and 5 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3367      +/-   ##
==========================================
- Coverage   61.03%   60.77%   -0.26%     
==========================================
  Files        1164     1177      +13     
  Lines       79964    80654     +690     
  Branches    16036    16166     +130     
==========================================
+ Hits        48805    49021     +216     
- Misses      24248    24678     +430     
- Partials     6911     6955      +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant