Skip to content

[Parser] Complete JSON/JSONB operator support in expressions #175

@ajitpratap0

Description

@ajitpratap0

Feature Description

Complete support for PostgreSQL JSON and JSONB operators in expression parsing. The tokenizer already recognizes these operators, but parser integration is needed to handle them in expressions.

SQL Standard

PostgreSQL-specific operators (not in SQL standard, though SQL:2016 added JSON support)

Current Status

From codebase analysis:

  • Tokenizer: Already recognizes JSON operators (->, ->>, #>, #>>, @>, <@, ?, ?|, ?&)
  • Parser: Limited or no expression support for JSON operators
  • AST: May need JSON-specific expression nodes

Example SQL

Basic JSON Operators

-- JSON field access (returns JSON)
SELECT data -> 'name' FROM users;
SELECT data -> 'address' -> 'city' FROM users;

-- JSON field access (returns text)
SELECT data ->> 'email' FROM users;
SELECT data -> 'address' ->> 'zipcode' FROM users;

-- JSON path extraction (returns JSON)
SELECT data #> '{address,city}' FROM users;

-- JSON path extraction (returns text)
SELECT data #>> '{address,city}' FROM users;

JSON Containment and Existence

-- Contains operator (does left contain right?)
SELECT * FROM products WHERE tags @> '["electronics", "sale"]';
SELECT * FROM users WHERE data @> '{"role": "admin"}';

-- Contained by operator (is left contained in right?)
SELECT * FROM settings WHERE config <@ '{"enabled": true, "debug": false}';

-- Key existence
SELECT * FROM users WHERE data ? 'email';  -- Does key exist?
SELECT * FROM users WHERE data ?| array['phone', 'mobile'];  -- Any key exists?
SELECT * FROM users WHERE data ?& array['email', 'name'];  -- All keys exist?

Complex Expressions

-- JSON operators in WHERE clauses
SELECT * FROM orders 
WHERE metadata ->> 'status' = 'shipped'
  AND metadata -> 'shipping' ->> 'carrier' = 'USPS';

-- JSON operators in SELECT list
SELECT 
    id,
    data ->> 'name' as name,
    data -> 'address' ->> 'city' as city,
    (data @> '{"verified": true}') as is_verified
FROM users;

-- JSON operators with functions
SELECT 
    id,
    jsonb_array_length(data -> 'items') as item_count,
    data #>> '{items,0,name}' as first_item_name
FROM orders;

-- Chained JSON operations
SELECT 
    data -> 'user' -> 'profile' ->> 'email' as email,
    data #> '{user,settings,notifications}' as notification_prefs
FROM events;

Advanced Use Cases

-- JSON in JOIN conditions
SELECT o.id, u.name
FROM orders o
JOIN users u ON o.data ->> 'user_id' = u.id::text;

-- JSON in aggregations
SELECT 
    data ->> 'category' as category,
    COUNT(*) as count,
    array_agg(data ->> 'name') as names
FROM products
WHERE data @> '{"in_stock": true}'
GROUP BY data ->> 'category';

-- JSON with CASE expressions
SELECT 
    id,
    CASE 
        WHEN data @> '{"type": "premium"}' THEN 'Premium'
        WHEN data ? 'discount' THEN 'Discounted'
        ELSE 'Standard'
    END as product_tier
FROM products;

Implementation Notes

Parser Changes Required

  1. Tokenizer (already complete):

    • ✅ All JSON operators tokenized
    • Token types: TOKEN_ARROW, TOKEN_DOUBLE_ARROW, TOKEN_HASH_GT, TOKEN_HASH_DOUBLE_GT, etc.
  2. AST Nodes (pkg/sql/ast/):

    • Extend BinaryExpression to handle JSON operators
    • Or create specific JSONAccessExpression node:
      • Object Expression - the JSON/JSONB column
      • Operator string - the JSON operator (->, ->>, #>, etc.)
      • Path Expression - the key/path (string, int, or array)
    • May need JSONContainmentExpression for @>, <@
    • May need JSONExistenceExpression for ?, ?|, ?&
  3. Parser Logic (pkg/sql/parser/):

    • In parseExpression() and parseBinaryExpression():
      • Recognize JSON operator tokens
      • Handle operator precedence (JSON operators should be high precedence)
      • Support chaining: data -> 'user' -> 'profile' ->> 'email'
    • In parsePrimaryExpression():
      • Handle JSON path arrays: '{address,city}'
      • Handle array literals for ?| and ?& operators
  4. Operator Precedence:

    • JSON access operators (->, ->>, #>, #>>) should be left-associative and high precedence
    • JSON containment/existence operators (@>, <@, ?, ?|, ?&) should be comparison-level precedence

Testing Requirements

  • Parse single-level JSON access (->)
  • Parse chained JSON access (-> -> ->>)
  • Parse JSON path access (#>, #>>)
  • Parse JSON containment operators (@>, <@)
  • Parse JSON existence operators (?, ?|, ?&)
  • Parse JSON operators in WHERE clauses
  • Parse JSON operators in SELECT list
  • Parse JSON operators in JOIN conditions
  • Test operator precedence and associativity
  • Test with complex nested expressions
  • Formatter support to render JSON operators correctly

Compatibility

  • PostgreSQL: ✅ Full support for JSON/JSONB operators
  • MySQL: ⚠️ Partial support (JSON_EXTRACT function, not operators)
  • SQL Server: ⚠️ Different syntax (JSON_VALUE, JSON_QUERY functions)
  • Oracle: ⚠️ Different syntax (JSON_VALUE, JSON_QUERY functions)
  • SQLite: ⚠️ JSON functions only (json_extract), not operators

JSON Operator Reference

Operator Description Example Returns
-> Get JSON field by key data -> 'name' JSON/JSONB
->> Get JSON field as text data ->> 'name' text
#> Get JSON at path data #> '{a,b}' JSON/JSONB
#>> Get JSON at path as text data #>> '{a,b}' text
@> Contains data @> '{"a":1}' boolean
<@ Contained by '{"a":1}' <@ data boolean
? Key exists data ? 'key' boolean
`? ` Any key exists `data ?
?& All keys exist data ?& array['a','b'] boolean

Priority

Medium-High - PostgreSQL heavily uses JSON/JSONB types, and operators are essential for practical JSON querying

Related Issues

  • Part of PostgreSQL enhancement roadmap
  • Tokenizer already supports these operators (partial work done)
  • Related to expression parsing enhancements

Dependencies

  • Requires tokenizer JSON operator support (✅ already complete)
  • May require AST node enhancements
  • May require operator precedence adjustments

Acceptance Criteria

  • Parser successfully handles all JSON/JSONB operators
  • AST correctly represents JSON expressions
  • Support for chained JSON operations
  • Correct operator precedence and associativity
  • Test coverage for all JSON operators
  • Test coverage for complex nested JSON expressions
  • Documentation updated with examples
  • Formatter preserves JSON operators correctly
  • Clear error messages for invalid JSON operator usage
  • Dialect-specific notes for cross-database compatibility

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions