[fix](variant) return raw string for element_at on scalar-string variant#64103
Open
csun5285 wants to merge 1 commit into
Open
[fix](variant) return raw string for element_at on scalar-string variant#64103csun5285 wants to merge 1 commit into
csun5285 wants to merge 1 commit into
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
c9c907e to
f5b09c1
Compare
Contributor
Author
|
run buildall |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
TPC-H: Total hot run time: 28634 ms |
Contributor
TPC-DS: Total hot run time: 168660 ms |
When extracting a string property from a scalar-string variant (the shape produced by `cast(text as variant)`), `element_at` goes through the simdjson document path and stored `simdjson::to_json_string(value)` for the extracted value. For a JSON string that representation keeps the surrounding double quotes (e.g. `"2026-05-20 18:40:02"`), which leaked into the result and made scalar-string variants inconsistent with the structured-subcolumn path, which returns the string unquoted. This also broke downstream string ops, e.g. `substring(v['k'], 1, 10)` consumed the leading quote. Add a dedicated `string` branch in `_write_data_to_column` that extracts the raw, unescaped value via `value.get_string()`. number/array/object values keep their JSON-text representation through `to_json_string` as before. Add BE unit test `extract_string_from_scalar_root` and regression assertions covering string/substring/escaped/number/array/object extraction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f5b09c1 to
feaa5f6
Compare
Contributor
Author
|
run buildall |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
TPC-H: Total hot run time: 29557 ms |
Contributor
TPC-DS: Total hot run time: 169761 ms |
Contributor
Author
|
/review |
Contributor
There was a problem hiding this comment.
Code review summary: no blocking issues found.
Critical checkpoint conclusions:
- Goal/test proof: The PR fixes scalar-string-root variant extraction so JSON string values returned by
element_atare raw/unescaped instead of JSON-quoted. The changed code accomplishes this by handlingsimdjson::ondemand::json_type::stringseparately, and coverage includes BE unit tests plus regression assertions and the updated SQL expected output. - Scope/focus: The implementation is small and focused; non-string scalar/object/array behavior remains on the existing
to_json_stringpath. - Concurrency/lifecycle: No new shared state, locks, threads, static initialization, or lifecycle-sensitive ownership was introduced.
- Configuration/compatibility: No new configuration, storage format, or FE-BE protocol compatibility concerns.
- Parallel paths: The change targets the scalar-string-root simdjson document path. Structured subcolumn extraction is intentionally unchanged and now matches the scalar-string behavior for strings.
- Conditional checks/error handling: The new type branch is specific to confirmed simdjson string values; existing parse/extract error behavior is preserved.
- Test coverage/results: Added tests cover string, substring downstream use, escaped string content, number, array, and object cases. Existing CI reports compile, formatter, BE UT, P0 regression, and related checks passing.
- Observability/transactions/data writes: No new observability need and no transaction, persistence, or data-write path changes.
- Performance/memory: The change avoids the extra JSON-token representation for strings and copies directly into
ColumnString; no significant new allocation or hot-path regression identified.
User focus: No additional user-provided review focus was supplied.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)