Skip to content

feat(cosmosdb): improve SQL parser compatibility for all 13 query feature areas#62

Merged
rebelice merged 3 commits intomainfrom
vk/6cd7-improve-cosmosdb
Mar 18, 2026
Merged

feat(cosmosdb): improve SQL parser compatibility for all 13 query feature areas#62
rebelice merged 3 commits intomainfrom
vk/6cd7-improve-cosmosdb

Conversation

@h3n4l
Copy link
Member

@h3n4l h3n4l commented Mar 18, 2026

Summary

  • Rewrite CosmosDB ANTLR4 grammar to support all 13 query feature areas that were failing with syntax errors (BYT-9043)
  • Merge split scalar_expression / scalar_expression_in_where into a single unified scalar_expression rule
  • Add 18 new test SQL examples covering all feature areas

Grammar Changes

Lexer (CosmosDBLexer.g4):

  • Add missing keyword tokens: IN, BETWEEN, TOP, VALUE, ORDER, BY, GROUP, OFFSET, LIMIT, ASC, DESC, EXISTS, LIKE, HAVING, JOIN
  • Add != operator (NOT_EQUAL_OPERATOR)
  • Fix IDENTIFIER rule to allow leading underscore (for system properties like _ts, _etag, _rid)

Parser (CosmosDBParser.g4):

  • Unify scalar_expression and scalar_expression_in_where into single scalar_expression (resolves the TODO in the grammar)
  • Add SELECT TOP N, SELECT VALUE, SELECT DISTINCT VALUE support
  • Add ORDER BY clause with ASC/DESC
  • Add GROUP BY and HAVING clauses
  • Add OFFSET ... LIMIT ... pagination
  • Add IN, BETWEEN, LIKE, NOT expression support
  • Add EXISTS subquery and scalar subquery support
  • Add JOIN clause for intra-document joins
  • Fix object_constant_field_pair to use COLON_SYMBOL instead of COMMA_SYMBOL
  • Add identifier rule allowing keywords as property names/aliases
  • Support empty function argument lists and COUNT(*)

13 Feature Areas Now Supported

  1. SELECT TOP N
  2. WHERE with !=, IN, BETWEEN, underscore fields
  3. Functions and arithmetic in SELECT projections
  4. ORDER BY
  5. Aggregation functions (COUNT, SUM, AVG, MIN, MAX)
  6. GROUP BY
  7. String functions (UPPER, LOWER, LENGTH)
  8. Math functions (ROUND)
  9. Type checking functions (IS_STRING, IS_NUMBER, IS_DEFINED)
  10. SELECT DISTINCT VALUE
  11. SELECT VALUE
  12. OFFSET ... LIMIT ... pagination
  13. Geo-spatial functions with JSON object literals

Test plan

  • All 5 existing parser tests pass
  • All 18 new test SQL examples parse without errors (23 total)
  • make build generates parser successfully
  • make test passes

🤖 Generated with Claude Code

h3n4l and others added 2 commits March 18, 2026 15:52
…w clauses

Add missing lexer tokens: !=, IN, BETWEEN, TOP, VALUE, ORDER, BY,
GROUP, OFFSET, LIMIT, ASC, DESC, EXISTS, LIKE, HAVING, JOIN.

Fix IDENTIFIER to allow leading underscore (for _ts, _etag, etc.).

Merge scalar_expression and scalar_expression_in_where into a single
unified scalar_expression rule. Add TOP, VALUE, ORDER BY, GROUP BY,
OFFSET LIMIT, HAVING, JOIN, IN, BETWEEN, LIKE, EXISTS, NOT, and
subquery support. Fix object_constant_field_pair to use COLON_SYMBOL.

Resolves all 13 failing query feature areas from BYT-9043.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers: SELECT TOP, WHERE operators (!=, IN, BETWEEN, _ts fields),
functions in SELECT, ORDER BY, aggregation, GROUP BY, string/math/
type-check functions, DISTINCT VALUE, VALUE keyword, OFFSET LIMIT,
geospatial with JSON objects, and NOT EQUAL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 18, 2026 08:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the Cosmos DB SQL ANTLR grammar and adds example queries to validate parsing of additional query constructs.

Changes:

  • Extended SELECT grammar to support TOP, VALUE, GROUP BY/HAVING, ORDER BY, OFFSET ... LIMIT ..., and JOIN.
  • Unified/expanded scalar_expression to cover more operators and expression forms (e.g., IN, BETWEEN, LIKE, EXISTS, object/array creation).
  • Added a suite of example .sql files that are exercised by parser_test.go.

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cosmosdb/CosmosDBParser.g4 Adds/updates core parser rules for new clauses and expressions.
cosmosdb/CosmosDBLexer.g4 Adds new keyword/operator tokens and broadens identifier start chars (e.g., _).
cosmosdb/cosmosdb_lexer.go Regenerated lexer output reflecting lexer grammar changes.
cosmosdb/cosmosdbparser_visitor.go Regenerated visitor interface for new/removed parser rules.
cosmosdb/cosmosdbparser_listener.go Regenerated listener interface for new/removed parser rules.
cosmosdb/cosmosdbparser_base_visitor.go Regenerated base visitor implementations.
cosmosdb/cosmosdbparser_base_listener.go Regenerated base listener implementations.
cosmosdb/examples/aggregation.sql Example for aggregation COUNT.
cosmosdb/examples/between_expression.sql Example for BETWEEN.
cosmosdb/examples/distinct_value.sql Example for DISTINCT VALUE.
cosmosdb/examples/geospatial.sql Example for geospatial function + object/array literals.
cosmosdb/examples/group_by.sql Example for GROUP BY with aggregation.
cosmosdb/examples/in_expression.sql Example for IN (...).
cosmosdb/examples/math_functions.sql Example for numeric + conversion functions.
cosmosdb/examples/not_equal.sql Example for !=.
cosmosdb/examples/offset_limit.sql Example for ORDER BY ... OFFSET ... LIMIT ....
cosmosdb/examples/order_by.sql Example for ORDER BY.
cosmosdb/examples/select_functions.sql Example for scalar functions and arithmetic in projection.
cosmosdb/examples/select_top.sql Example for TOP.
cosmosdb/examples/string_functions.sql Example for string functions.
cosmosdb/examples/type_check_functions.sql Example for type-check functions.
cosmosdb/examples/underscore_fields.sql Example for system properties like _ts, _etag, _rid.
cosmosdb/examples/value_count.sql Example for SELECT VALUE COUNT(1).
cosmosdb/examples/value_keyword.sql Example for SELECT VALUE projection.
cosmosdb/examples/where_operators.sql Example for WHERE with != and _ts filtering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Split binary_operator into precedence-tiered rules (multiplicative,
additive, shift, comparison) and reorder scalar_expression alternatives
so AND/OR have lower precedence than comparison operators. Previously
AND/OR bound tighter than =, >, < which caused incorrect parse trees.
Also fix STRINGTONUMBER to StringToNumber for consistency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rebelice rebelice merged commit 6c474df into main Mar 18, 2026
5 checks passed
@rebelice rebelice deleted the vk/6cd7-improve-cosmosdb branch March 18, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants