[fix](be) Compare JSON numeric values by value#63396
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
1 similar comment
|
/review |
|
run buildall |
TPC-H: Total hot run time: 30868 ms |
TPC-DS: Total hot run time: 169460 ms |
| decimal.value / scale_multiplier == integer_value; | ||
| } | ||
|
|
||
| inline wide::Int256 power_of_five(uint32_t exponent) { |
| const auto divisor = wide::Int256(1) << divisor_exponent; | ||
| return significand % divisor == 0 && value == significand / divisor; | ||
| } | ||
| if (exponent >= 255) { |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
### What problem does this PR solve? Issue Number: None Related PR: apache#63396 Problem Summary: Optimize the JSON numeric comparison helper by replacing per-call power-of-five multiplication with a precomputed table, document the Int256 shift boundary, and refresh the Presto JSON function expected results affected by numeric equality semantics. ### Release note None ### Check List (For Author) - Test: Compile check / Format check - build-support/clang-format.sh - git diff --check - ninja -C be/ut_build_ASAN test/CMakeFiles/doris_be_test.dir/core/jsonb/jsonb_document_test.cpp.o src/util/CMakeFiles/Util.dir/jsonb_document.cpp.o - Behavior changed: No - Does this need documentation: No
Issue Number: None
Related PR: None
Problem Summary: json_contains compared JSONB numeric values by their stored binary categories. As a result, semantically equal numbers such as integer 1 and floating-point 1.0, or decimals with different scales, were not treated as contained. This changes JsonbValue::contains to compare integer, floating-point, and decimal JSONB values by numeric value, while keeping exact integer and decimal comparisons to avoid precision-related false matches.
Fix json_contains to match semantically equal JSON numeric values across numeric storage categories.
- Test:
- Build: ./build.sh --be --fe
- Unit Test: ./run-be-ut.sh --run --filter=JsonbDocumentTest.contains_numeric_equality
- Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-29000.groovy --run -d query_p0/sql_functions/json_functions -s test_json_contains_numeric_equality -forceGenOut
- Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-29000.groovy --run -d query_p0/sql_functions/json_functions -s test_json_contains_numeric_equality
- Behavior changed: Yes (json_contains now treats semantically equal JSON numeric values across integer, floating-point, and decimal JSONB representations as contained)
- Does this need documentation: No
### What problem does this PR solve? Issue Number: None Related PR: apache#63396 Problem Summary: Optimize the JSON numeric comparison helper by replacing per-call power-of-five multiplication with a precomputed table, document the Int256 shift boundary, and refresh the Presto JSON function expected results affected by numeric equality semantics. ### Release note None ### Check List (For Author) - Test: Compile check / Format check - build-support/clang-format.sh - git diff --check - ninja -C be/ut_build_ASAN test/CMakeFiles/doris_be_test.dir/core/jsonb/jsonb_document_test.cpp.o src/util/CMakeFiles/Util.dir/jsonb_document.cpp.o - Behavior changed: No - Does this need documentation: No
378b5b6 to
c9db681
Compare
|
/review |
There was a problem hiding this comment.
Review result: request changes.
Critical checkpoint conclusions:
- Goal/test: The PR implements value-based JSON numeric equality for JSONB contains and adds BE/regression coverage for common int/double/decimal cases. The main goal is covered for normal valid inputs.
- Scope: The change is focused and localized to JSONB numeric comparison plus tests.
- Concurrency/lifecycle: No new concurrency, locking, background thread, or non-trivial lifecycle management is introduced.
- Configuration/compatibility: No new config or storage-format change. Existing serialized JSONB decimals are reinterpreted with new comparison semantics; invalid scale metadata now matters in a new path.
- Parallel paths: Array/object contains delegate to the same JsonbValue::contains path, so the behavior is applied consistently.
- Error handling/memory safety: One issue found: release builds can read past the powers-of-five lookup table for a decimal scale outside MAX_DECIMALV3_SCALE.
- Tests: Tests cover ordinary valid numeric equality but do not cover boundary/malformed decimal scale handling.
- Observability/performance: No new observability requirement. Existing review threads already cover the decimal multiplier performance and shift-limit comment concerns, so I did not repeat them.
User focus: No additional user-provided review focus was present.
### What problem does this PR solve? Issue Number: None Related PR: apache#63396 Problem Summary: Validate JSONB decimal scale metadata before using it in numeric contains comparison so malformed or incompatible serialized decimals cannot index past the powers-of-five lookup table in release builds. Add BE unit coverage for invalid decimal scale metadata. ### Release note None ### Check List (For Author) - Test: Unit Test / Format check - ninja -C be/ut_build_ASAN test/CMakeFiles/doris_be_test.dir/core/jsonb/jsonb_document_test.cpp.o - build-support/clang-format.sh - build-support/check-format.sh - git diff --check - Behavior changed: No - Does this need documentation: No
|
/review |
|
run buildall |
There was a problem hiding this comment.
Automated review summary: I found no additional blocking issues beyond the existing inline review context. The prior concerns about power-of-five lookup performance, the Int256 shift-boundary explanation, and release-path decimal scale validation appear addressed in the current head.
Critical checkpoint conclusions:
- Goal and tests: The change implements value-based numeric comparison for
json_contains/JsonbValue::containsacross integer, floating-point, and decimal JSONB storage categories. BE unit coverage and a regression case were added, and affected Presto expected outputs were refreshed. - Scope: The implementation is focused on JSONB numeric comparison helpers plus related tests/results.
- Concurrency: No shared mutable state or runtime concurrency path was introduced; the inline lookup table is immutable.
- Lifecycle/static initialization: No non-trivial cross-TU lifecycle dependency was found; the added table is an inline constexpr value in the header.
- Configuration/compatibility/persistence/data writes: No config, protocol, storage-format, EditLog, transaction, or data-write path changes were introduced.
- Parallel paths: The reviewed
json_containspath reachesJsonbValue::contains; recursive array/object containment continues to reuse that path. - Error handling and invariants: Decimal scale is now validated on the release path before table indexing. Numeric boundary checks avoid the reviewed out-of-range casts/shifts.
- Test coverage/results: Tests cover core numeric category equality and invalid decimal scale metadata. I did not rerun the build/tests in this review; residual risk is mainly additional extreme numeric boundary combinations not covered by the added cases.
- Observability/performance: No new observability appears necessary for this local comparison helper. The precomputed table addresses the previously raised obvious performance concern.
- User focus: No additional user-provided review focus was specified.
TPC-H: Total hot run time: 31325 ms |
TPC-DS: Total hot run time: 169323 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: json_contains compared JSONB numeric values by their stored binary categories. As a result, semantically equal numbers such as integer 1 and floating-point 1.0, or decimals with different scales, were not treated as contained. This changes JsonbValue::contains to compare integer, floating-point, and decimal JSONB values by numeric value, while keeping exact integer and decimal comparisons to avoid precision-related false matches.
Release note
Fix json_contains to match semantically equal JSON numeric values across numeric storage categories.
Check List (For Author)