branch-4.0: [fix](retention) Limit param count to 32 to avoid BE heap overflow #64521#65036
Conversation
…pache#64521) Problem Summary: `retention()` keeps its aggregate state in a fixed-size `RetentionState::events[32]` array and serializes it into a single `int64` bitmap, so it can represent at most 32 events. However neither FE nor BE limited the number of arguments: - `AggregateFunctionRetention::add()` iterates over all argument columns and calls `RetentionState::set(i)` (`events[i] = 1`) without any bound check, so with more than 32 params it writes past the end of `events[]` (heap out-of-bounds write). - `RetentionState::insert_result_into()` reads `events[]` for the actual argument count, so it reads past the array as well. With a query calling `retention()` with 102 boolean arguments, this corrupted the heap and later crashed BE in the streaming aggregation output path: ``` [E-3113]string column length is too large: total_length=4295003729, element_number=4064 ... doris::vectorized::ColumnStr<unsigned int>::insert_many_strings -> memcpy (SIGSEGV) doris::pipeline::StreamingAggLocalState::_get_results_with_serialized_key ``` This PR enforces the 32-event limit on both sides: - **FE** (`Retention.checkLegalityBeforeTypeCoercion`): reject `> 32` params with a clear `AnalysisException` at planning time. - **BE** (`AggregateFunctionRetention` constructor): throw `INVALID_ARGUMENT` when more than `RetentionState::MAX_EVENTS` argument types are given, as a backstop for any path that reaches BE (e.g. an older FE during a rolling upgrade, or a direct BE call). Fix BE crash when `retention()` is called with more than 32 conditions; it now reports a clear error instead of corrupting memory.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Automated review summary for PR #65036
I reviewed the changed retention aggregate paths and found no blocking issues to raise as inline comments.
Critical checkpoints:
- Retention argument limit: FE now rejects more than 32 retention conditions, matching the fixed-size BE state and int64 bitmap representation.
- BE backstop: aggregate construction now rejects more than 32 arguments before add()/insert_result_into() can index past RetentionState::events, including nullable factory creation paths.
- Boundary behavior: existing <=32-argument behavior is preserved; the new BE unit test covers 32 accepted and 33 rejected.
- Regression coverage: the new SQL regression covers the default Nereids analysis path and verifies the user-facing >32-argument error.
- AGENTS checkpoint: I checked the updated checkLegalityBeforeTypeCoercion path. The direct children count is equivalent for this arity guard, so I did not raise a functional issue.
- Existing review context: there were no existing inline threads to duplicate.
Validation performed: exact PR file list checked through GitHub API, local base/head diff reviewed, and git diff --check passed for the changed files. I did not run the full BE or regression suites in this review runner.
Subagent conclusions: optimizer-rewrite and tests-session-config both reported no initial candidates, then both returned NO_NEW_VALUABLE_FINDINGS in convergence round 1 for the same current ledger and empty proposed final comment set.
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Cherry-pick from master #64521