[refactor](be) Replace std::unique_ptr with DorisUniqueBufferPtr in SingleValueDataString#62374
Conversation
…ingleValueDataString ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: SingleValueDataString uses std::unique_ptr<char[]> with new[] for large string storage, which bypasses the Doris memory tracking allocator. This replaces it with DorisUniqueBufferPtr<char> so that allocations go through Allocator and are properly tracked. ### Release note None ### Check List (For Author) - Test: Unit Test (added SingleValueDataStringTest) - Behavior changed: No - Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
|
No issues found in this review. The allocator swap in SingleValueDataString preserves the serialized layout via the added static_assert, keeps ownership semantics local to the aggregate state, and the new unit tests cover small/large string mutation, reset, and serialize/deserialize paths. Residual risk: I did not run BE unit tests in this runner, so this conclusion is based on code inspection only. |
There was a problem hiding this comment.
Pull request overview
Refactors BE aggregate min/max string state (SingleValueDataString) to allocate large-string storage via Doris’s tracked allocator (instead of new[]), and adds unit tests to validate the behavior/serialization paths.
Changes:
- Replace
std::unique_ptr<char[]>withDorisUniqueBufferPtr<char>forSingleValueDataString::large_dataallocations. - Update reset/allocation sites to use the new buffer type.
- Add a new GTest suite covering small/large strings, reset, comparisons, and write/read round-trips.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| be/src/exprs/aggregate/aggregate_function_min_max.h | Switch large-string storage to allocator-tracked buffer + add a size static_assert. |
| be/test/exprs/aggregate/aggregate_function_min_max_test.cpp | New unit tests exercising SingleValueDataString behavior for small/large values and serialization. |
Comments suppressed due to low confidence (1)
be/src/exprs/aggregate/aggregate_function_min_max.h:364
- Switching
large_datafromstd::unique_ptr<char[]>toDorisUniqueBufferPtr<char>likely increases the size of the pointer member (stateful deleter), which reducesMAX_SMALL_STRING_SIZEand can cause more strings to spill to heap allocations than before. If keeping the previous small-string inline threshold matters for performance, consider an alternative that keepslarge_datapointer-sized (e.g. store a rawchar*and free viaAllocator::free(ptr, capacity)in destructor/move/reset, using the existingcapacityfield) while still routing allocations through the tracked allocator.
DorisUniqueBufferPtr<char> large_data;
public:
static constexpr Int32 AUTOMATIC_STORAGE_SIZE = 64;
static constexpr Int32 MAX_SMALL_STRING_SIZE =
AUTOMATIC_STORAGE_SIZE - sizeof(size) - sizeof(capacity) - sizeof(large_data);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #include "core/column/column_array.h" | ||
| #include "core/column/column_fixed_length_object.h" | ||
| #include "core/column/column_string.h" | ||
| #include "core/custom_allocator.h" | ||
| #include "core/data_type/data_type.h" |
There was a problem hiding this comment.
core/custom_allocator.h uses std::map/std::vector/std::unique_ptr but doesn’t include the corresponding STL headers itself; by adding this include here, aggregate_function_min_max.h now relies on transitive includes for those types. Add the missing standard headers (e.g. <map>) before including core/custom_allocator.h (or otherwise ensure the needed STL headers are directly included) to avoid brittle compilation depending on unrelated includes.
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…ingleValueDataString (apache#62374) Issue Number: close #xxx Problem Summary: SingleValueDataString uses std::unique_ptr<char[]> with new[] for large string storage, which bypasses the Doris memory tracking allocator. This replaces it with DorisUniqueBufferPtr<char> so that allocations go through Allocator and are properly tracked. None - Test: Unit Test (added SingleValueDataStringTest) - Behavior changed: No - Does this need documentation: No Issue Number: close #xxx Related PR: #xxx Problem Summary: None - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…ingleValueDataString (#62374) (#62408) Pick #62374 Problem Summary: SingleValueDataString uses std::unique_ptr<char[]> with new[] for large string storage, which bypasses the Doris memory tracking allocator. This replaces it with DorisUniqueBufferPtr<char> so that allocations go through Allocator and are properly tracked. None - Test: Unit Test (added SingleValueDataStringTest) - Behavior changed: No - Does this need documentation: No Issue Number: close #xxx Related PR: #xxx Problem Summary: None - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary: SingleValueDataString uses std::unique_ptr<char[]> with new[] for large string storage, which bypasses the Doris memory tracking allocator. This replaces it with DorisUniqueBufferPtr so that allocations go through Allocator and are properly tracked.
Release note
None
Check List (For Author)
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)