[fix](be) Fix varbinary literal construction#64089
Merged
BiteTheDDDDt merged 1 commit intoJun 4, 2026
Merged
Conversation
Contributor
Author
|
run buildall |
Contributor
Author
|
/review |
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: VARBINARY values in expression Fields are stored as StringView, but create_texpr_literal_node<TYPE_VARBINARY> interpreted the input pointer as std::string. When TopN pushdown builds a binary expression from a VARBINARY runtime value, the StringView layout can be read as a std::string and ASAN may report an oversized allocation while copying the literal. Read VARBINARY literal data as StringView, preserve the exact byte payload when creating TVarBinaryLiteral, and add coverage for embedded null bytes.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-be-ut.sh --run --filter=TEST_VEXPR.LITERALTEST
- build-support/check-format.sh
- git diff --check
- Behavior changed: No
- Does this need documentation: No
ab0ea31 to
b65a93c
Compare
Contributor
TPC-H: Total hot run time: 29117 ms |
Contributor
TPC-DS: Total hot run time: 169778 ms |
Contributor
There was a problem hiding this comment.
Review result: no blocking issues found.
Critical checkpoint conclusions:
- Goal and tests: The PR fixes VARBINARY literal construction by treating VARBINARY Field/raw values as StringView and copying the exact byte range into TVarBinaryLiteral. The added BE unit coverage exercises Field -> TExprNode -> VLiteral -> ColumnVarbinary round-trip with embedded null bytes and both inline/long StringView payloads.
- Scope: The change is small and focused on literal construction plus targeted test coverage. The unrelated style-only literal suffix edits are harmless.
- Concurrency and lifecycle: No new concurrency, locking, static initialization, or non-trivial lifecycle behavior is introduced. The StringView payload is copied into the thrift string before the source Field/StringView can go out of scope.
- Configuration and compatibility: No new config items, storage format changes, thrift schema changes, or FE/BE protocol shape changes are introduced. Existing VARBINARY_LITERAL thrift payload remains a string value.
- Parallel paths: Both create_texpr_node_from(Field, ...) and create_texpr_node_from(const void*, ...) now support TYPE_VARBINARY consistently with PrimitiveTypeTraits<TYPE_VARBINARY>::CppType and ColumnVarbinary storage.
- Error handling and memory safety: No Status is newly ignored in production code. The fix removes the prior invalid std::string reinterpretation and avoids bogus allocations by constructing std::string(data, size).
- Data correctness: Embedded null bytes are preserved by size-aware copying; no query visibility, transaction, delete bitmap, or persistence behavior is affected.
- Observability: No new observability is needed for this local literal-construction fix.
- Test results: I verified
git diff --checkpassed.build-support/check-format.shcould not run because the runner clang-format is not version 16../run-be-ut.sh --run --filter=TEST_VEXPR.LITERALTESTcould not complete in this checkout because gensrc failed with missingthirdparty/installed/bin/protoc.
User focus: No additional user-provided review focus was present.
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
|
skip buildall |
hello-stephen
approved these changes
Jun 4, 2026
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary:
create_texpr_literal_node<TYPE_VARBINARY>treated the input pointer asstd::string*, but DorisFieldstoresTYPE_VARBINARYvalues asStringView. When TopN predicate conversion builds a VARBINARY literal from aField, the helper reinterprets aStringView*as astd::string*, which can makestd::stringassignment read a bogus size and request a huge allocation under ASAN.This PR reads VARBINARY literal input as
StringView, copies the exact byte range into the thrift literal, and adds VARBINARY coverage forcreate_texpr_node_from(Field, TYPE_VARBINARY, ...)andVLiteralround trip. It also wires theconst void*helper forTYPE_VARBINARY.Release note
None
Check List (For Author)
./run-be-ut.sh --run --filter=TEST_VEXPR.LITERALTESTbuild-support/check-format.shgit diff --check upstream/master...HEADbuild-support/run-clang-tidy.sh --base upstream/master --build-dir be/ut_build_ASANselected only the 3 changed files with line-level filtering, but exited non-zero because existing translation-unit analysis errors are emitted fromvcompound_pred.h,vexpr.cpp,jni-util.h, and pre-existingvexpr_test.cppchecks. No new changed-line clang-tidy diagnostics remain.