[fix](be) Preserve Variant predefined decimal precision#63322
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Pull request overview
This PR fixes precision loss for Variant typed decimal extraction by preserving high-precision JSON numeric tokens as exact text during parsing, and ensuring typed-path casts keep Variant’s nullable/array nullability semantics. It also adds targeted unit/regression coverage for high-precision decimals in both scalar and array paths.
Changes:
- Preserve raw JSON numeric tokens (via simdjson on-demand parsing) and materialize precision-sensitive numbers as strings for later exact decimal casts.
- Preserve Variant nullability (including nested array nullability) when converting typed paths to storage types.
- Add unit + regression tests covering high-precision decimals in Variant scalar and array typed paths.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/variant_p0/predefine/test_variant_high_precision_decimal.groovy | New regression test validating high-precision decimal preservation for Variant typed paths (scalar + array). |
| be/test/core/jsonb/json_parser_test.cpp | Unit tests asserting high-precision numeric tokens are preserved as string fields (including inside arrays). |
| be/test/core/column/column_variant_test.cpp | Unit test validating typed-path conversion + serialization preserves decimal text/precision for scalars and arrays. |
| be/src/util/json/simd_json_parser.h | Refactors SimdJSONParser to build an internal node tree from simdjson on-demand and exposes raw number tokens. |
| be/src/util/json/json_parser.h | Adds heuristic to preserve precision-sensitive doubles as strings based on raw token digits. |
| be/src/exprs/function/cast/cast_to_decimal.h | Enhances string→decimal cast to accept nullable string columns and propagate nulls. |
| be/src/core/column/column_variant.cpp | Preserves Variant nullability when casting typed paths to declared storage types (incl. arrays). |
Comments suppressed due to low confidence (1)
be/src/util/json/json_parser.h:30
- json_parser.h introduces should_preserve_json_number_as_string(std::string_view ...) but the header does not include <string_view>. Relying on transitive includes is brittle; add #include <string_view> to ensure this header compiles standalone.
#include <parallel_hashmap/phmap.h>
#include <cstddef>
#include <optional>
#include <string>
#include <utility>
#include <vector>
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
I found one blocking correctness regression. The PR addresses typed Variant decimal precision for the tested path, but the parser change currently applies to all Variant subcolumn extraction and changes some JSON numbers into strings when the path is not declared as a typed decimal path.
Critical checkpoints:
- Goal/test: The PR fixes the covered typed decimal path case and adds BE/regression coverage for that case, but misses untyped Variant numeric semantics.
- Scope/focus: The implementation is broader than the stated goal because generic JSON parsing now changes inferred Field type based on token precision.
- Concurrency/lifecycle: No new shared concurrency or non-obvious lifecycle issue identified in the reviewed paths.
- Compatibility/behavior: Behavior changes for existing Variant data: high-precision JSON numeric tokens on untyped paths can become strings. This is user-visible and not covered by compatibility/tests.
- Parallel paths: Doc-value/jsonb writing still treats doubles as doubles; the subcolumn path is the one affected by the string conversion.
- Tests: Added tests cover typed decimal conversion, but no negative/regression coverage proves ordinary untyped numbers remain numeric.
- Performance/memory/observability: No additional blocking issue found beyond the correctness problem above.
User focus: No additional user-provided review focus was specified.
ecba866 to
9ccf680
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found one additional correctness issue beyond the existing review threads.
Critical checkpoint conclusions:
- Goal/test coverage: The PR aims to preserve high-precision JSON numbers for Variant decimal typed paths and adds BE/regression coverage, but scientific-notation decimal tokens remain uncovered and can still lose precision.
- Scope/focus: The change is mostly focused on Variant JSON parsing and typed-path decimal conversion, though the simdjson wrapper rewrite is broader than the local fix.
- Concurrency/lifecycle: No new shared mutable runtime state or lock-order concern found in the changed path. Parser-owned node lifetimes are local to each parse call.
- Configuration/compatibility: No new config or storage-format change found. Existing typed path schema metadata is used to drive parsing.
- Parallel paths: Typed decimal preservation is wired into normal Variant ingest, but the heuristic misses exponent notation on the same path.
- Error handling/memory safety: No ignored Status issue found in the reviewed changes. The parser rewrite adds extra allocations, but I did not find a separate blocking memory-safety issue.
- Data correctness: Blocking issue below: some valid decimal typed-path inputs still round through double before decimal conversion.
- Observability: No additional observability requirement identified for this local parsing/casting fix.
User focus: No additional user-provided review focus was specified.
9ccf680 to
832ca14
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found one additional correctness issue. The existing review threads already cover the raw-number heuristic/scientific-notation precision cases and the missing self-contained <cassert> include, so I did not duplicate those.
Critical checkpoint conclusions:
- Goal/test: the PR aims to preserve high-precision decimal JSON tokens for Variant predefined decimal paths; it adds non-doc-mode unit/regression coverage, but doc-mode predefined typed paths are not covered and still lose precision.
- Scope: the change is focused, but the fix is incomplete across the parallel doc-mode write path.
- Concurrency/lifecycle: no new shared mutable concurrency or special lifecycle issue found in the reviewed change.
- Config/compatibility: no new external config or storage-format compatibility change found.
- Parallel paths: doc-mode Variant materialization is a parallel path and needs the same decimal-preservation setup.
- Tests: add a doc-mode predefined decimal regression case, preferably under
variant_p0/doc_mode/predefine, to prove the fix. - Observability/transactions/data writes: no additional logging or transaction/persistence concern found beyond the data-correctness issue below.
- Performance: the parser now materializes/copies the JSON tree to retain raw tokens; this is worth watching on insert hot paths, but I did not mark it blocking without benchmark evidence.
User focus: no additional user-provided review focus was supplied.
832ca14 to
43b1a14
Compare
|
/review |
|
run buildall |
There was a problem hiding this comment.
I completed a full pass over the PR diff and the existing inline review context. I am requesting changes based on the already-open inline threads rather than adding duplicate comments.
Summary opinion: the PR is targeted at preserving high-precision JSON decimal tokens for predefined Variant decimal paths, and the current non-doc-mode path is closer to the goal. However, the implementation is still incomplete for doc-mode predefined Variant columns, where the parser is configured as OnlyDocValueColumn and the decimal raw-number preserve paths are still not installed. be/src/util/json/simd_json_parser.h also still uses assert() without including <cassert>, so the header remains non-self-contained. I did not find additional distinct issues beyond the existing threads.
Critical checkpoint conclusions:
- Goal and tests: the non-doc-mode typed decimal case has regression/unit coverage, but doc-mode typed decimal coverage is still missing and the doc-mode path does not accomplish the precision-preservation goal.
- Scope/focus: the functional change is mostly focused, though the simdjson wrapper refactor is larger than the minimal parser hook; no separate blocking issue found beyond the existing header/self-containment concern.
- Concurrency: no new shared mutable cross-thread state or lock interaction was introduced in the reviewed paths.
- Lifecycle/static initialization: no new static initialization dependency was found. Parser element lifetimes are local to parsing/traversal and copied into the new tree before the padded input is destroyed.
- Configuration: no new configuration items were added.
- Compatibility: no storage-format or FE-BE protocol change was found. The remaining compatibility risk is behavioral inconsistency for doc-mode predefined decimal Variant paths.
- Parallel paths: doc-mode is the parallel insert/materialization path that still needs the same decimal preserve-path setup.
- Conditional checks: the typed-path preservation condition is now path-gated; no additional distinct condition issue found.
- Test coverage/results: tests cover non-doc mode, arrays, glob matching, and scientific notation, but not doc-mode predefined decimals.
- Observability: no new observability requirement identified for this parser/cast fix.
- Transactions/persistence/data writes: no new transaction or persistence metadata path was introduced.
- Performance: the JSON parser wrapper now materializes a copied node tree, but I did not identify a concrete correctness blocker from this beyond the existing issues.
- User focus: no additional user-provided review focus was specified.
43b1a14 to
7885a6c
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found one additional correctness issue in the nullable string-to-decimal cast path. Existing inline threads already cover the remaining blockers around doc-mode typed decimal preservation and parser/header concerns, so I did not duplicate those comments.
Critical checkpoint conclusions:
- Goal/test: The PR aims to preserve exact JSON numeric text for predefined decimal Variant typed paths and adds unit/regression coverage, but doc-mode typed paths remain covered by an existing unresolved thread and nullable decimal conversion needs more coverage.
- Scope: The change is mostly focused, but the parser replacement adds broad allocation/copying behavior; no separate blocker found beyond the submitted issue and existing comments.
- Concurrency/lifecycle: No new shared concurrent state or non-obvious lifecycle risk identified; parser result references are copied into the owned tree before traversal.
- Config/compatibility: No new config or storage format incompatibility identified.
- Parallel paths: Non-doc-mode path is configured; doc-mode path remains an existing unresolved review concern.
- Error handling/data correctness: The new strict nullable cast path can misparse rows because skipped null rows do not advance string offsets in the decimal serde.
- Tests: Added tests cover high-precision predefined decimals in non-doc mode, arrays, globs, and exponent cases; missing coverage remains for doc mode (already raised) and nullable/missing typed decimal rows.
- Observability/performance: No new observability requirement identified; parser memory overhead is higher but not raised as a blocking issue.
User focus: No additional user-provided review focus was specified.
TPC-H: Total hot run time: 31288 ms |
TPC-H: Total hot run time: 31294 ms |
TPC-H: Total hot run time: 31205 ms |
7885a6c to
85fcf56
Compare
|
run buildall |
|
/review |
TPC-H: Total hot run time: 30897 ms |
There was a problem hiding this comment.
The PR still has a blocking correctness issue. The main goal is to preserve predefined Variant decimal precision during JSON parsing/materialization; the added regression coverage covers scalar and array-of-scalar typed decimal paths, but not typed decimal paths inside nested arrays serialized through JSONB.
Critical checkpoint conclusions:
- Goal/test coverage: the goal is only partially achieved; decimal precision can still be lost in the nested-array JSONB serialization path, and there is no regression for that path.
- Scope/focus: the core change is focused on JSON parsing/materialization, but the raw-number preservation decision is applied only to the scalar Field path.
- Concurrency/lifecycle/config compatibility: no new concurrency, lifecycle, persistent format, or dynamic config issue found in the reviewed diff.
- Parallel paths: existing review threads already cover doc-mode/raw-number, big-integer, nullable, and parser hot-path concerns; I did not repeat them.
- Data correctness: blocking issue below can round high-precision typed decimals before materialization when the value is inside an array containing objects.
- Performance/observability: no additional distinct blocking issue beyond already-open raw parser allocation comments.
- User focus: no additional user-provided focus was specified.
9aa9fa9 to
67a3b9f
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
I did not add duplicate inline comments because the blocking issues I found are already covered by existing review threads. The PR still needs changes before it is safe to merge.
Critical checkpoint conclusions:
- Goal/test proof: the goal is to preserve high-precision JSON numeric tokens for predefined Variant decimal paths. The added tests cover several scalar, array, glob, insert, and staged stream-load cases, but existing review threads identify untested failing cases such as doc mode, nullable direct Variant load, decimal paths inside array-of-object JSONB serialization, unrelated oversized integers, and Decimal256 big integers.
- Scope/focus: the change is not yet small/focused enough for the hot parser path because it replaces the simdjson DOM wrapper with an eager copied Node tree for all parses, which is already called out in the existing performance/memory thread.
- Concurrency: no new shared mutable concurrency path was found in the changed parser/config code. The glob regex cache uses the existing mutex-protected cache path.
- Lifecycle/static initialization: no new cross-TU static initialization dependency was found. The parser Node tree lifetime is local to the parser instance and consumed before the next parse.
- Configuration: no new user config item is added. Existing session/config variables in tests are relevant to the feature.
- Compatibility: no storage format or FE-BE protocol change was found. Behavior compatibility remains a concern for Variant parsing edge cases already raised in the existing threads.
- Parallel paths: doc-mode Variant parsing and direct stream-load/serde paths are not fully covered by the current preservation setup, as already noted in existing threads.
- Conditional checks: the new raw-number preservation conditions are path-dependent, but existing comments show distinct paths where the condition is bypassed or causes fallback/loss.
- Test coverage: coverage is improved, but missing negative/end-to-end coverage remains for the existing blocking cases.
- Test results: I reviewed the added test assertions; no additional result-file issue was found.
- Observability: no new observability requirement was identified.
- Transaction/persistence/data writes: no EditLog or transaction protocol changes were found. Data correctness is still blocked by the existing Variant materialization and parse fallback issues.
- FE/BE variable passing: not applicable.
- Performance: the eager copied JSON tree and untracked allocations are already raised as a blocking performance/memory concern.
- Other issues: no additional distinct issue beyond the existing inline review context was found.
User focus: no additional user-provided review focus was present, and I found no extra focus-specific issue.
67a3b9f to
65ddea5
Compare
|
/review |
|
run buildall |
There was a problem hiding this comment.
Requesting changes based on the unresolved blocking issues already raised in the existing inline review threads. I did not add new inline comments because the review context already covers the distinct blocking defects I found, and duplicating those threads is explicitly disallowed.
Critical checkpoint conclusions:
- Goal/test proof: the PR aims to preserve high-precision JSON numeric tokens for predefined Variant decimal paths and adds unit/regression coverage, but the current implementation still misses important ingestion/materialization paths covered by existing threads, including doc-mode predefined decimal paths and decimal paths serialized through nested array JSONB.
- Scope/clarity: the change is focused on Variant JSON parsing, but it introduces a full ondemand Node tree on the hot path and extra fallback behavior for oversized integers; existing comments identify correctness and performance risks that need resolution.
- Concurrency/lifecycle: no new concurrency or cross-thread lifecycle issue was found in this pass. Parser-owned Node lifetime appears local to parse traversal, though the added eager tree allocations are a performance/memory-accounting concern already raised.
- Configuration: no new config item is added. Existing Variant doc-mode configuration path is still not applying decimal preservation, as already commented.
- Compatibility/storage format: no serialized format change was identified, but behavior compatibility for untyped or unrelated oversized numeric Variant paths remains blocked by the existing comments.
- Parallel code paths: doc-mode Variant and JSONB nested-array paths remain parallel paths that do not get equivalent raw decimal preservation, already covered by existing threads.
- Conditional checks: the new big-integer invalid/fallback handling still has known edge cases with unrelated paths; no additional distinct condition issue was found.
- Test coverage: tests were added, but existing comments identify missing regressions for doc mode, nested array-of-object decimal paths, nullable/missing typed decimal casting, and unrelated oversized integers.
- Observability: no additional observability requirement was identified.
- Transaction/persistence/data writes: no direct transaction or edit-log change was found; ingestion correctness remains the main data correctness risk.
- FE/BE protocol: no FE/BE protocol change was found.
- Performance: the eager recursive JSON tree allocation remains a blocking performance/memory-accounting concern already raised.
Focus points: the focus file states no additional user-provided review focus, and I found no extra focus-specific issue beyond the existing review threads.
65ddea5 to
624b378
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
This PR still needs changes before approval. I did not add duplicate inline comments because the blocking concerns are already present in the existing review threads.
Critical checkpoint conclusions:
- Goal/test: The goal is to preserve precision for predefined Variant decimal paths, and the new regression/unit coverage targets direct insert, staged stream load, stream load, arrays, glob paths, exponent notation, and Decimal256 integers. However, already-open threads show important paths remain uncovered or incorrect, including doc-mode predefined decimals, decimal paths inside array-of-object JSONB serialization, nullable/missing decimal casts, and unrelated oversized integers during raw-preservation parsing.
- Scope/focus: The change is focused on Variant JSON parsing, but it replaces the hot parser representation with a fully materialized Node tree and copies strings/numbers for all parses, which is broader than necessary and is already called out as a performance/memory-accounting blocker.
- Concurrency/lifecycle: No new shared concurrency or lock ordering was introduced. Parser-owned Node lifetimes are bounded by parse traversal and released after parse; no static initialization concern found.
- Configuration/compatibility: No new config item or storage-format incompatibility was introduced. FE/BE protocol compatibility is not affected.
- Parallel paths: A blocking existing thread notes doc-mode Variant parsing is not configured with decimal preserve paths, so the non-doc and doc-mode paths are inconsistent.
- Special conditions/error handling: Existing comments identify lossy or failing numeric handling for specific cases, including BigInteger conversion before preservation and JSONB serialization of nested arrays. These are correctness issues, not just style issues.
- Test coverage/results: The added tests improve coverage, but the existing review context identifies missing regressions for doc mode, nullable Variant stream-load consistency, nullable/missing typed decimal casts, array-of-object decimal paths, escaped-key behavior, and oversized unrelated integer coexistence. No test results were provided here.
- Observability: No new observability appears necessary for this parser-level precision fix.
- Transactions/persistence/data writes: The change affects load/write materialization of Variant columns; correctness must hold across direct insert and stream load paths. Existing blockers show some write paths can still store rounded or missing values.
- Performance: Existing hot-path concern remains significant because ordinary Variant ingestion now builds a recursive copied tree before traversal.
User focus: No additional user-provided review focus was supplied.
Please address the existing inline threads before requesting another approval pass.
624b378 to
04ef206
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found one additional blocking gap beyond the existing inline threads. The PR goal is to preserve raw JSON numeric tokens for predefined Variant decimal typed paths, and the scalar-root INSERT/string path is covered by the new config and tests, but a normal stream-load row where the Variant field is a JSON object still bypasses that config.
Critical checkpoint conclusions: goal is only partially accomplished because direct object stream-load remains lossy; the change is focused but the config is not propagated to all parallel ingestion paths; no new concurrency or lifecycle risk was found beyond existing parser allocation/lifetime threads; no config item or storage-format compatibility change is introduced; FE/BE protocol is unchanged; test coverage covers INSERT, staged string stream-load, and string-valued direct stream-load, but misses direct object stream-load into the Variant column; existing review threads already cover doc-mode, nested JSONB array, nullable, big-integer, and performance concerns; no additional user focus was provided in review_focus.txt.
04ef206 to
28faf36
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found one additional blocking issue distinct from the existing inline threads.
Critical checkpoint conclusions:
- Goal/test coverage: the PR improves predefined Variant decimal preservation and adds regression coverage for inserts and stream load, but it regresses the generic Variant JSON deserialization path used by casts, which is not covered by the new test.
- Scope/focus: the parser changes are focused on decimal preservation, but changing
DataTypeVariantSerDe::deserialize_one_cell_from_json()to always store a scalar string is broader than the precision fix and changes existing query semantics. - Concurrency/lifecycle: no new concurrent shared state or non-trivial static lifecycle issue found beyond already-open parser allocation/lifetime discussions.
- Configuration/compatibility: no new config item; the serde change is a user-visible behavioral compatibility regression for
CAST(... AS VARIANT). - Parallel paths: stream-load/direct-load paths are considered in tests, but the cast path that shares the same serde was missed.
- Data correctness: decimal typed paths are better preserved in the covered paths, but Variant subpath data produced by casts can become inaccessible because JSON is stored as a scalar string.
- Error handling/memory/observability: no separate new issue found in these areas beyond the existing review threads.
- User focus: no additional user-provided review focus was specified.
| variant_util::parse_json_to_variant(column, json_ref, nullptr, parse_config)); | ||
| auto& variant = assert_cast<ColumnVariant&>(column); | ||
| Field field = Field::create_field<TYPE_STRING>(String(slice.data, slice.size)); | ||
| VariantMap object; |
There was a problem hiding this comment.
This serde is also used by the generic cast path, not only by stream load. cast_from_string_to_generic() calls serde->deserialize_one_cell_from_json() for CAST(string AS VARIANT), and existing suites rely on cast('{"a":1}' as variant)['a'] exposing subpaths. With this change the same input is always inserted as a root TYPE_STRING, so query-time casts no longer parse the JSON object and subpath access like c1['a'] returns missing/null instead of 1. This is distinct from the direct Variant stream-load concern because it affects expression evaluation without the later storage-layer parse_and_materialize_variant_columns() pass. Please keep the old parse behavior for casts/generic serde callers and restrict raw-root preservation to the load path that can materialize later, and add a regression for cast('{"a":1}' as variant)['a'].
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
28faf36 to
de9b9d8
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
Review conclusion: I did not find any additional distinct inline issue beyond the existing review threads.
This is not an approval recommendation because the existing inline threads already document blocking correctness/performance concerns that remain part of the review context, including cast/generic serde behavior, direct object stream-load materialization, doc-mode decimal preservation, nullable Variant load consistency, JSONB nested-array decimal precision, big-integer handling, and raw-preservation overhead.
Critical checkpoints:
- Goal/test proof: The PR aims to preserve predefined Variant decimal precision. The new tests cover several insert and stream-load paths, but existing review threads identify uncovered/incorrect paths that still need resolution.
- Scope/focus: The current GitHub PR diff is focused on Variant JSON parsing/loading and related tests.
- Concurrency/lifecycle: No new concurrency-sensitive path was identified. Parser-owned Node lifetime is bounded by
JSONDataParser::parse()traversal. - Configuration: No new config item was added. Existing session variables are used in regression coverage.
- Compatibility/protocol/storage: No FE/BE protocol change was identified. Existing threads cover behavior compatibility risks for cast/generic serde and doc-mode Variant storage.
- Parallel paths: I checked stream-load, generic serde/cast, doc-mode/non-doc-mode, arrays, and direct object vs string-staged Variant load paths; the distinct concerns I found were already present in existing threads, so I did not duplicate them.
- Conditional checks: The new raw-staging and decimal-path matching checks are localized; existing comments already cover the problematic branches.
- Test coverage/results: New regression and unit tests exist, but existing threads point out missing negative/edge coverage for several important paths. I found no additional unique missing test beyond those.
- Observability: No new observability requirement beyond normal error propagation was identified.
- Transactions/data writes: The changes affect ingest/materialization but do not introduce new transaction metadata paths. Existing data correctness concerns are covered by prior threads.
- Performance: Existing threads already cover the major hot-path allocation/copy regression risk.
User focus: no additional user-provided review focus was specified.
### What problem does this PR solve?
Issue Number: N/A (CIR-20240)
Related PR: N/A
Problem Summary: Variant predefined decimal typed paths could lose precision because JSON numeric values were materialized through binary numeric values before the later decimal cast. This change keeps raw number tokens in the simdjson-based Variant parser and, only when TabletSchema predefined Variant decimal typed paths are configured through ParseConfig, materializes matching numeric tokens as strings for the decimal cast. Untyped Variant numeric paths keep their existing numeric behavior.
### Release note
Fix Variant predefined decimal typed path precision loss for high precision and scientific notation JSON numbers.
### Check List (For Author)
- Test:
- BE unit test: ./run-be-ut.sh --run --filter='JsonParserTest.*'
- Regression test: ./run-regression-test.sh --conf tmp/regression-conf.auto.groovy --run -d variant_p0/predefine -s test_variant_high_precision_decimal
- Build: BUILD_TYPE=ASAN USE_MEM_TRACKER=ON ./build.sh --be
- Format: PATH=/tmp/doris-codex-clang:/mnt/disk1/claude-max/bin:/mnt/disk1/claude-max/.local/bin:/mnt/disk1/claude-max/.npm-global/bin:/mnt/disk1/claude-max/.opencode/bin:/mnt/disk1/claude-max/.nvm/versions/node/v24.14.1/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin build-support/check-format.sh, git diff --check
- Static analysis: Not run to completion; clang-tidy was attempted but blocked by local toolchain/pre-existing header issues.
- Behavior changed: Yes. Variant predefined decimal typed paths preserve matching JSON numeric tokens for decimal materialization.
- Does this need documentation: No
de9b9d8 to
38f5b5d
Compare
|
run buildall |
|
/review |
What problem does this PR solve?
Issue Number: N/A (CIR-20240)
Related PR: N/A
Problem Summary: Variant predefined decimal typed paths could lose numeric precision because matching JSON numbers were parsed through the default numeric DOM representation before decimal cast. This PR preserves raw numeric tokens only when materializing Variant columns with predefined decimal typed paths, including array and glob decimal paths, while leaving ordinary untyped Variant parsing on the existing DOM parser path.
Release note
Fix Variant predefined decimal typed path precision loss for high precision and scientific notation JSON numbers.
Check List (For Author)
./run-be-ut.sh --run --filter='JsonParserTest.*'./run-regression-test.sh --conf tmp/regression-conf.auto.groovy --run -d variant_p0/predefine -s test_variant_high_precision_decimalBUILD_TYPE=ASAN USE_MEM_TRACKER=ON ./build.sh --be --febuild-support/clang-format.sh,build-support/check-format.shbuild-support/run-clang-tidy.shattempted, butclang-tidywas not found in the local environment.