Skip to content

[refactor](be) Replace std::unique_ptr with DorisUniqueBufferPtr in SingleValueDataString#62374

Merged
yiguolei merged 1 commit intoapache:masterfrom
mrhhsg:replace_unique_ptr
Apr 12, 2026
Merged

[refactor](be) Replace std::unique_ptr with DorisUniqueBufferPtr in SingleValueDataString#62374
yiguolei merged 1 commit intoapache:masterfrom
mrhhsg:replace_unique_ptr

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented Apr 11, 2026

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: SingleValueDataString uses std::unique_ptr<char[]> with new[] for large string storage, which bypasses the Doris memory tracking allocator. This replaces it with DorisUniqueBufferPtr so that allocations go through Allocator and are properly tracked.

Release note

None

Check List (For Author)

  • Test: Unit Test (added SingleValueDataStringTest)
  • Behavior changed: No
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…ingleValueDataString

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: SingleValueDataString uses std::unique_ptr<char[]> with new[] for large string
storage, which bypasses the Doris memory tracking allocator. This replaces it with
DorisUniqueBufferPtr<char> so that allocations go through Allocator and are properly tracked.

### Release note

None

### Check List (For Author)

- Test: Unit Test (added SingleValueDataStringTest)
- Behavior changed: No
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented Apr 11, 2026

/review

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented Apr 11, 2026

run buildall

@github-actions
Copy link
Copy Markdown
Contributor

No issues found in this review. The allocator swap in SingleValueDataString preserves the serialized layout via the added static_assert, keeps ownership semantics local to the aggregate state, and the new unit tests cover small/large string mutation, reset, and serialize/deserialize paths. Residual risk: I did not run BE unit tests in this runner, so this conclusion is based on code inspection only.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors BE aggregate min/max string state (SingleValueDataString) to allocate large-string storage via Doris’s tracked allocator (instead of new[]), and adds unit tests to validate the behavior/serialization paths.

Changes:

  • Replace std::unique_ptr<char[]> with DorisUniqueBufferPtr<char> for SingleValueDataString::large_data allocations.
  • Update reset/allocation sites to use the new buffer type.
  • Add a new GTest suite covering small/large strings, reset, comparisons, and write/read round-trips.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
be/src/exprs/aggregate/aggregate_function_min_max.h Switch large-string storage to allocator-tracked buffer + add a size static_assert.
be/test/exprs/aggregate/aggregate_function_min_max_test.cpp New unit tests exercising SingleValueDataString behavior for small/large values and serialization.
Comments suppressed due to low confidence (1)

be/src/exprs/aggregate/aggregate_function_min_max.h:364

  • Switching large_data from std::unique_ptr<char[]> to DorisUniqueBufferPtr<char> likely increases the size of the pointer member (stateful deleter), which reduces MAX_SMALL_STRING_SIZE and can cause more strings to spill to heap allocations than before. If keeping the previous small-string inline threshold matters for performance, consider an alternative that keeps large_data pointer-sized (e.g. store a raw char* and free via Allocator::free(ptr, capacity) in destructor/move/reset, using the existing capacity field) while still routing allocations through the tracked allocator.
    DorisUniqueBufferPtr<char> large_data;

public:
    static constexpr Int32 AUTOMATIC_STORAGE_SIZE = 64;
    static constexpr Int32 MAX_SMALL_STRING_SIZE =
            AUTOMATIC_STORAGE_SIZE - sizeof(size) - sizeof(capacity) - sizeof(large_data);


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 37 to 41
#include "core/column/column_array.h"
#include "core/column/column_fixed_length_object.h"
#include "core/column/column_string.h"
#include "core/custom_allocator.h"
#include "core/data_type/data_type.h"
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core/custom_allocator.h uses std::map/std::vector/std::unique_ptr but doesn’t include the corresponding STL headers itself; by adding this include here, aggregate_function_min_max.h now relies on transitive includes for those types. Add the missing standard headers (e.g. <map>) before including core/custom_allocator.h (or otherwise ensure the needed STL headers are directly included) to avoid brittle compilation depending on unrelated includes.

Copilot uses AI. Check for mistakes.
@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (3/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.04% (20147/37984)
Line Coverage 36.59% (189464/517827)
Region Coverage 32.88% (147229/447793)
Branch Coverage 33.99% (64423/189523)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (3/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.51% (27345/37198)
Line Coverage 57.22% (295397/516243)
Region Coverage 54.44% (246015/451925)
Branch Coverage 56.10% (106656/190105)

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit bcfcef3 into apache:master Apr 12, 2026
36 of 39 checks passed
@mrhhsg mrhhsg deleted the replace_unique_ptr branch April 12, 2026 08:53
mrhhsg added a commit to mrhhsg/doris that referenced this pull request Apr 12, 2026
…ingleValueDataString (apache#62374)

Issue Number: close #xxx

Problem Summary: SingleValueDataString uses std::unique_ptr<char[]> with
new[] for large string storage, which bypasses the Doris memory tracking
allocator. This replaces it with DorisUniqueBufferPtr<char> so that
allocations go through Allocator and are properly tracked.

None

- Test: Unit Test (added SingleValueDataStringTest)
- Behavior changed: No
- Does this need documentation: No

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
yiguolei pushed a commit that referenced this pull request Apr 13, 2026
…ingleValueDataString (#62374) (#62408)

Pick #62374

Problem Summary: SingleValueDataString uses std::unique_ptr<char[]> with
new[] for large string storage, which bypasses the Doris memory tracking
allocator. This replaces it with DorisUniqueBufferPtr<char> so that
allocations go through Allocator and are properly tracked.

None

- Test: Unit Test (added SingleValueDataStringTest)
- Behavior changed: No
- Does this need documentation: No

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change. - [ ] No code files have been
changed. - [ ] Other reason <!-- Add your reason? -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants