Skip to content

feat(query): Enhance JSON Parsing with Decimal Support and Extended Syntax #18252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 30, 2025

Conversation

b41sh
Copy link
Member

@b41sh b41sh commented Jun 25, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR introduces significant enhancements to Databend's parse_json function, focusing on improved numerical precision and expanded syntax support for greater flexibility in handling diverse JSON formats.

Key Changes:

  1. Decimal Support for Numbers:

    • Addresses the previous limitation where large numbers in JSON (e.g., 99999999999999999999) would be parsed as floating-point numbers (e.g., 1e20), resulting in a loss of precision.
    • parse_json now supports parsing numbers as Decimal data types, preserving the original precision of large integers.
    • Supports up to 78 digits of precision. Numbers exceeding this limit will still be converted to floating-point representation.
  2. Extended JSON Syntax Support:

    • Improves compatibility with a wider range of JSON formats by relaxing certain parsing restrictions.

    • The following extended syntax features are now supported:

      • Empty Array Elements: Arrays can now contain empty elements (e.g., [1,2,,4]), which will be parsed as null values (e.g., [1,2,null,4]).
      • Positive Number Prefix: Numbers are now allowed to have a + prefix (e.g., +123), which will be parsed as the corresponding positive number (e.g., 123).
      • Multiple Leading Zeros: Numbers can now have multiple leading zeros (e.g., 00001), which will be parsed as the equivalent integer (e.g., 1).
      • Missing Digits Around Decimal Point: Numbers can now have missing digits before or after the decimal point (e.g., .1 will be parsed as 0.1, and 1. will be parsed as 1).

Benefits:

  • Increased Data Accuracy: Preserves the precision of large numbers when parsing JSON data.
  • Improved Compatibility: Handles a wider variety of JSON formats, reducing parsing errors and simplifying data ingestion.
  • Enhanced User Experience: Provides a more flexible and forgiving JSON parsing experience.
  • fixes: #[Link the issue here]

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jun 25, 2025
@b41sh b41sh force-pushed the feat-parse-json branch from 4c3df1b to 8bc8eea Compare June 27, 2025 03:38
@b41sh b41sh marked this pull request as ready for review June 27, 2025 05:54
@b41sh b41sh requested a review from sundy-li June 27, 2025 05:54
@b41sh b41sh force-pushed the feat-parse-json branch from 7eb43e8 to 8e3f480 Compare June 27, 2025 08:42
@b41sh b41sh merged commit beca0a1 into databendlabs:main Jun 30, 2025
86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants