Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #56545

#56545)

related issue #56087  

### What problem does this PR solve?
This PR addresses a critical issue where Apache Doris may encounter a
fatal `OutOfMemoryError` during the construction of execution profiles.

The root cause is unbounded memory growth in the `StringBuilder` used
inside the recursive method `RuntimeProfile.prettyPrint` and
`RuntimeProfile.printChildCounters`. When the number of counters is
extremely large or the counter hierarchy is deeply nested (sometimes
even circular), `StringBuilder` attempts to expand its internal
character array beyond Java’s maximum allowed array size
(`Integer.MAX_VALUE`), leading to:

`java.lang.OutOfMemoryError: Required array length 2147483638+34 is too
large
`

This not only crashes the current thread but can also impact the
stability of the entire Frontend process.

To resolve this, this PR introduces a safe alternative:
`SafeStringBuilder`, which allows profile strings to be constructed up
to a safe upper limit. Once the limit is reached, the content is
truncated gracefully and marked with a `[TRUNCATED]` flag. Early-exit
checks are also added to avoid unnecessary computation once truncation
occurs.

## add SafeStringBuilder to avoid OOM in profile building​

### 1. Background
During execution, Doris collects and prints execution profiles
(execution stats and counters). The printing is implemented recursively
via:

`RuntimeProfile.printChildCounters(String prefix, String counterName,
StringBuilder builder)`

This can trigger deep recursion, particularly if there are: 
- A large number of counters
- Complex or circular child counter relationships 

This unbounded recursion causes the `StringBuilder` to allocate more and
more memory, until Java refuses the allocation and throws
`OutOfMemoryError`. The error stack trace typically points to:
`java.lang.StringBuilder.append → newCapacity → hugeLength →
OutOfMemoryError`.

This PR addresses both the root cause (unsafe memory growth) and its
consequences (crash).

### 2. Key Code Changes​
#### 2.1 Introduced Class​​: SafeStringBuilder.java
A new utility class `SafeStringBuilder` has been Introduced to replace
the original `StringBuilder` in profile building code. It provides the
following features:
- Enforces a maximum capacity limit (default: `Integer.MAX_VALUE - 16`)
- Automatically truncates appended content once the limit is reached
- Appends `[TRUNCATED]` to the final output if truncation occurs
#### 2.2 Refactored Methods
The following key methods have been updated to use `SafeStringBuilder`
instead of `StringBuilder`:
- `Profile.getProfileByLevel()`
- `Profile.getChangedSessionVars(SafeStringBuilder builder)`
- `Profile.getExecutionProfileContent(SafeStringBuilder builder)`
- `Profile.getOnStorageProfile(SafeStringBuilder builder)`
- `RuntimeProfile.prettyPrint(SafeStringBuilder builder, String prefix)`
- `RuntimeProfile.printChildCounters(String prefix, String counterName,
SafeStringBuilder builder)`
- `SummaryProfile.prettyPrint(SafeStringBuilder builder)`
#### 2.3 Early Exit After Truncation
To avoid unnecessary computation or memory usage after truncation
occurs, additional early-exit checks have been added to all major
profile building methods. Specifically:
- Before any recursive calls (e.g., in `RuntimeProfile.prettyPrint`),
`builder.isTruncated()` is checked, and further traversal is skipped if
true.
- Each stage in profile generation (session variables, execution
profile, etc.) now checks if truncation has already occurred, and exits
early if so.

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [x] Regression test
    - [x] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [x] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Sep 30, 2025
@hello-stephen
Copy link
Contributor

run buildall

@yiguolei yiguolei merged commit 883d141 into branch-4.0 Sep 30, 2025
20 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants