Skip to content

branch-4.0: [fix](retention) Limit param count to 32 to avoid BE heap overflow #64521#65036

Merged
morningman merged 1 commit into
apache:branch-4.0from
924060929:cherry-pick-64521-branch-4.0
Jul 1, 2026
Merged

branch-4.0: [fix](retention) Limit param count to 32 to avoid BE heap overflow #64521#65036
morningman merged 1 commit into
apache:branch-4.0from
924060929:cherry-pick-64521-branch-4.0

Conversation

@924060929

@924060929 924060929 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Cherry-pick from master #64521

…pache#64521)

Problem Summary:

`retention()` keeps its aggregate state in a fixed-size
`RetentionState::events[32]` array and serializes it into a single
`int64` bitmap, so it can represent at most 32 events. However neither
FE nor BE limited the number of arguments:

- `AggregateFunctionRetention::add()` iterates over all argument columns
and calls `RetentionState::set(i)` (`events[i] = 1`) without any bound
check, so with more than 32 params it writes past the end of `events[]`
(heap out-of-bounds write).
- `RetentionState::insert_result_into()` reads `events[]` for the actual
argument count, so it reads past the array as well.

With a query calling `retention()` with 102 boolean arguments, this
corrupted the heap and later crashed BE in the streaming aggregation
output path:

```
[E-3113]string column length is too large: total_length=4295003729, element_number=4064
...
doris::vectorized::ColumnStr<unsigned int>::insert_many_strings -> memcpy   (SIGSEGV)
doris::pipeline::StreamingAggLocalState::_get_results_with_serialized_key
```

This PR enforces the 32-event limit on both sides:

- **FE** (`Retention.checkLegalityBeforeTypeCoercion`): reject `> 32`
params with a clear `AnalysisException` at planning time.
- **BE** (`AggregateFunctionRetention` constructor): throw
`INVALID_ARGUMENT` when more than `RetentionState::MAX_EVENTS` argument
types are given, as a backstop for any path that reaches BE (e.g. an
older FE during a rolling upgrade, or a direct BE call).

Fix BE crash when `retention()` is called with more than 32 conditions;
it now reports a clear error instead of corrupting memory.
@924060929 924060929 requested a review from morningman as a code owner June 30, 2026 09:40
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@924060929

Copy link
Copy Markdown
Contributor Author

run buildall

@924060929

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review summary for PR #65036

I reviewed the changed retention aggregate paths and found no blocking issues to raise as inline comments.

Critical checkpoints:

  • Retention argument limit: FE now rejects more than 32 retention conditions, matching the fixed-size BE state and int64 bitmap representation.
  • BE backstop: aggregate construction now rejects more than 32 arguments before add()/insert_result_into() can index past RetentionState::events, including nullable factory creation paths.
  • Boundary behavior: existing <=32-argument behavior is preserved; the new BE unit test covers 32 accepted and 33 rejected.
  • Regression coverage: the new SQL regression covers the default Nereids analysis path and verifies the user-facing >32-argument error.
  • AGENTS checkpoint: I checked the updated checkLegalityBeforeTypeCoercion path. The direct children count is equivalent for this arity guard, so I did not raise a functional issue.
  • Existing review context: there were no existing inline threads to duplicate.

Validation performed: exact PR file list checked through GitHub API, local base/head diff reviewed, and git diff --check passed for the changed files. I did not run the full BE or regression suites in this review runner.

Subagent conclusions: optimizer-rewrite and tests-session-config both reported no initial candidates, then both returned NO_NEW_VALUABLE_FINDINGS in convergence round 1 for the same current ledger and empty proposed final comment set.

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.52% (19561/36550)
Line Coverage 36.59% (183062/500296)
Region Coverage 33.16% (142352/429286)
Branch Coverage 34.01% (61522/180899)

@morningman morningman merged commit 5a4b2c9 into apache:branch-4.0 Jul 1, 2026
29 of 33 checks passed
@924060929 924060929 deleted the cherry-pick-64521-branch-4.0 branch July 1, 2026 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants