[fix](be) Validate TDigest sizes in quantile state#63662
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: SQL users can pass base64 strings to quantile_state_from_base64(), which decodes the payload into QuantileState::deserialize(). QuantileState::is_valid() previously checked only the outer TDigest serialized length, while TDigest::unserialize() trusted nested vector count fields before resizing and copying from the input buffer. A malformed TDigest payload could therefore pass validation and drive out-of-bounds reads or oversized allocations during deserialization.
This change validates the serialized TDigest layout against the available buffer before QuantileState accepts it, including the total length and all nested vector count fields. It also tightens QuantileState length checks for explicit and TDigest payloads, and adds a BE unit test that starts from a real TDigest-backed QuantileState, corrupts one nested count after base64 roundtrip, and verifies validation and deserialization reject the payload.
### Release note
Malformed TDigest quantile state payloads are now rejected during deserialization.
### Check List (For Author)
- Test:
- Unit Test: ./run-be-ut.sh --run --filter=function_quantile_state_test.rejects_tdigest_base64_with_corrupted_inner_count -j 90
- Unit Test: ./run-be-ut.sh --run --filter=function_quantile_state_test.* -j 90
- Static Analysis: build-support/run-clang-tidy.sh
- Code Style: python3 build-support/run_clang_format.py --clang-format-executable /mnt/disk6/common/ldb_toolchain_toucan/bin/clang-format --inplace false ...
- Behavior changed: Yes. Malformed TDigest-backed quantile state payloads are rejected instead of being accepted for deserialization.
- Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found one remaining malformed-input path in the validation being changed.
Critical checkpoint conclusions:
- Goal/test: The PR aims to reject malformed TDigest-backed quantile state payloads before deserialization and adds a focused BE unit test for a corrupted TDigest count. The TDigest count case is covered, but the same hardened
QuantileState::is_valid()path still accepts an invalid EXPLICIT state with zero elements. - Scope/focus: The change is small and focused on serialization validation.
- Concurrency/lifecycle: No new concurrency, locks, static initialization, or non-obvious lifecycle ownership changes were introduced.
- Config/compatibility: No config item or serialized-format change was introduced; this only rejects malformed existing payloads.
- Parallel paths:
QuantileState::deserialize()now calls the new TDigest validator, but explicit-state validation remains incomplete in the same function. - Special checks/error handling: The added size checks improve bounds safety, but one validation condition is still missing.
- Tests: Added TDigest malformed-count coverage. I could not run the focused test in this runner because BE UT configuration failed before build:
thirdparty/installed/bin/protoc: No such file or directory. - Observability/transactions/data writes/protocol variables: Not applicable.
- Performance: The structural validation is linear in the serialized vector metadata and does not add meaningful overhead beyond deserialization.
User focus: No additional user-provided review focus was supplied.
| } | ||
| uint16_t num_explicits = decode_fixed16_le(ptr); | ||
| ptr += sizeof(uint16_t); | ||
| const size_t remaining_size = static_cast<size_t>(end - ptr); |
There was a problem hiding this comment.
This validation still accepts a malformed EXPLICIT state with num_explicits == 0. That payload is not produced by normal QuantileState construction, but quantile_state_from_base64() can supply it directly: is_valid() returns true when the remaining size is zero, then deserialize() does _explicit_data.resize(0) and evaluates &_explicit_data[0] in the subsequent memcpy, which is undefined for an empty vector even with a zero byte count. Since this PR is hardening untrusted serialized input in QuantileState::is_valid(), please reject zero explicit counts here (or the minimum valid explicit count, if that invariant should be >= 2) before accepting the payload.
TPC-H: Total hot run time: 31607 ms |
TPC-DS: Total hot run time: 172739 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary:
quantile_state_from_base64()decodes user-provided base64 strings intoQuantileState::deserialize().QuantileState::is_valid()previously checked only the outer TDigest serialized length, whileTDigest::unserialize()trusted nested vector count fields before resizing and copying from the input buffer. A malformed TDigest payload could therefore pass validation and drive out-of-bounds reads or oversized allocations during deserialization.This PR validates the serialized TDigest layout against the available buffer before
QuantileStateaccepts it, including the total length and all nested vector count fields. It also tightensQuantileStatelength checks for explicit and TDigest payloads, and adds a BE unit test that starts from a real TDigest-backedQuantileState, corrupts one nested count after a base64 roundtrip, and verifies validation and deserialization reject the payload.Release note
Malformed TDigest quantile state payloads are now rejected during deserialization.
Check List (For Author)
./run-be-ut.sh --run --filter=function_quantile_state_test.rejects_tdigest_base64_with_corrupted_inner_count -j 90./run-be-ut.sh --run --filter=function_quantile_state_test.* -j 90build-support/run-clang-tidy.shpython3 build-support/run_clang_format.py --clang-format-executable /mnt/disk6/common/ldb_toolchain_toucan/bin/clang-format --inplace false ...