[opt](function) optimize repeat and trim string functions by Mryange · Pull Request #63784 · apache/doris

Mryange · 2026-05-28T04:09:48Z

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary:

This PR optimizes two hot BE string function families that spend significant time on repeated buffer growth and high-overhead character lookup.

repeat
- Replace the old per-row append loop with a shared two-pass execution path for both vector repeat counts and constant repeat counts.
- Precompute res_offsets and total output size before writing results.
- Write directly into ColumnString::Chars with StringOP::fast_repeat().
trim_in / ltrim_in / rtrim_in
- Reuse SIMD-assisted ASCII symbol search for small trim sets.
- Add reverse symbol search helpers for right trim.
- Use a fixed-size UTF-8 small-set lookup path for common cases while preserving fallback paths for larger trim strings.
Coverage and observability
- Add a larger repeat('a', 256) correctness case.
- Add find_symbols unit coverage for empty ranges, embedded NUL bytes, boundary lengths, and cross-check cases.
- Add old/new microbenchmarks for both repeat and trim paths.

Benchmark results

Local Release benchmark results on this machine:

repeat

Case	Old	New	Speedup
RepeatVector 4096/16/8	86184 ns	42878 ns	2.01x
RepeatVector 4096/16/64	411330 ns	146498 ns	2.81x
RepeatVector 1024/128/16	84985 ns	39620 ns	2.15x
RepeatVector 4096/0/64	222362 ns	7382 ns	30.12x
RepeatConst 4096/16/8	127848 ns	59495 ns	2.15x
RepeatConst 4096/16/64	764912 ns	205294 ns	3.73x
RepeatConst 1024/128/16	158499 ns	78090 ns	2.03x
RepeatConst 4096/0/64	395137 ns	7107 ns	55.60x

trim

Case	Old	New	Speedup
ASCII, 4 trim chars, 65536 rows	2398.526 us	997.839 us	2.40x
ASCII, 8 trim chars, 65536 rows	2452.269 us	872.865 us	2.81x
ASCII, no match, 65536 rows	318.894 us	309.079 us	1.03x
UTF-8, 2 trim chars, 65536 rows	7832.364 us	6799.569 us	1.15x

Note:
The medium and large trim cases reproduced the expected improvement direction on this machine. One small-input case (BM_TrimInAscii/1024) was slightly slower at 0.81x, while the 4096-row and 65536-row cases remained clearly faster.

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-05-28T04:09:53Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Mryange · 2026-05-28T04:14:34Z

/review

github-actions

I found two blocking correctness issues in the BE string-function changes.

Critical checkpoint conclusions:

Goal/test proof: The PR targets repeat and trim_in performance. The added tests cover normal repeat and symbol-search cases, but they do not cover repeat output-size overflow or the scalar all-trim tail in the new runtime reverse search under sanitizers.
Scope/focus: The effective GitHub PR diff is focused on BE string functions and benchmarks.
Concurrency: No new shared mutable state or locking path is introduced.
Lifecycle/static initialization: No new cross-TU static initialization dependency was found in the changed code.
Config/compatibility/protocol/storage: No new config, FE-BE protocol, storage-format, or rolling-upgrade compatibility concern was found.
Parallel paths: repeat vector/const paths were unified; trim left/right ASCII and UTF-8 paths were reviewed. The new reverse runtime symbol path has a distinct scalar-tail bug.
Error handling/memory safety: repeat now computes output sizes in uint32_t before checking, which can wrap and lead to under-allocation followed by out-of-bounds writes. The new SSE2 runtime reverse search can decrement before begin on all-trim scalar tails.
Test coverage: Existing tests are insufficient for these edge cases; add overflow/large-size checks for repeat and sanitizer-safe all-trim scalar-tail coverage for runtime SearchSymbols.
Observability/performance: The intended performance direction is reasonable, but correctness must be fixed before merging.
User focus: No additional user-provided review focus was specified.

github-actions · 2026-05-28T04:21:08Z

+            const auto str_ref = source_column.get_data_at(i);
+            const uint32_t size = str_ref.size;
+            total_size += size * repeat;
+            res_offsets[i] = total_size;


This pre-size pass can overflow before ColumnString::check_chars_length() sees the real output size. Both total_size and size are uint32_t, and size * repeat is evaluated in 32-bit unsigned arithmetic, so a large but valid input such as one row with size=65536 and repeat=65536 wraps to zero. The later res_data.resize(total_size) then allocates too little and the second pass writes the full repeated payload through fast_repeat, causing memory corruption instead of raising STRING_OVERFLOW_IN_VEC_ENGINE. Please compute in size_t/checked arithmetic and call check_chars_length() before narrowing into ColumnString::Offset.

github-actions · 2026-05-28T04:21:08Z

+
+    --pos;
+    for (; pos >= begin; --pos)
+        if (maybe_negate<positive>(is_in(*pos, symbols, num_chars))) return pos;


The scalar tail has undefined behavior when every remaining byte matches the trim set. For example, find_last_not_symbols_or_null(s.data(), s.data() + 4, SearchSymbols(" ")) on four spaces reaches pos == begin, the loop body does not return, and the --pos in the loop expression decrements before the start of the array; the next pos >= begin comparison is also outside the valid range. This is the runtime SSE2 path used by the new ASCII rtrim optimization. Please use the same for (const char* p = pos; p != begin;) { --p; ... } pattern used in the SSE4.2 tail or otherwise avoid forming begin - 1.

Mryange · 2026-05-28T04:32:52Z

/review

github-actions

Review completed for PR 63784. I found one blocking correctness issue in the new runtime reverse-symbol SSE2 helper.

Critical checkpoints:

Goal/test coverage: the PR targets BE string-function performance for repeat and trim_in/ltrim_in/rtrim_in; it adds BE unit coverage and benchmarks, and the updated tests now cover the previously reported repeat overflow and scalar-tail cases.
Scope/focus: the BE changes are focused on the stated string-function hot paths plus related tests/benchmarks.
Concurrency/lifecycle/config/compatibility/persistence/data writes: not materially involved.
Parallel paths: runtime SearchSymbols reverse search is added alongside existing template helpers; the new runtime SSE2 path still needs the bounds-safe loop condition used by the SSE4.2 path.
Memory/data correctness: one undefined-behavior risk remains in pointer arithmetic before bounds are checked. Existing review threads already covered the scalar-tail UB and repeat overflow; I did not duplicate those.
Observability: no new observability needed for this local function optimization.
User focus: no additional user-provided review focus was supplied.

github-actions · 2026-05-28T04:38:06Z

+#if defined(__SSE2__)
+    const auto needles = mm_is_in_prepare(symbols, num_chars);
+    for (; pos - 16 >= begin; pos -= 16) {
+        __m128i bytes = _mm_loadu_si128(reinterpret_cast<const __m128i*>(pos - 16));


This loop forms pos - 16 before proving that there are 16 bytes left in the range. For any runtime SearchSymbols call on a string shorter than 16 bytes (for example trim_in('abc', ' ') through the new ASCII rtrim path), pos is end and pos - 16 is outside the array before the comparison is evaluated, which is undefined behavior even if the loop body is skipped. This is distinct from the scalar-tail issue already raised: the UB happens in the SIMD loop condition before reaching the tail. Please use the same safe form as the SSE4.2 helper, e.g. static_cast<size_t>(pos - begin) >= 16, before computing pos - 16.

Suggested change

__m128i bytes = _mm_loadu_si128(reinterpret_cast<const __m128i*>(pos - 16));

for (; static_cast<size_t>(pos - begin) >= 16; pos -= 16) {

upd

f9fb404

github-actions Bot requested changes May 28, 2026

View reviewed changes

upd

eec872f

github-actions Bot requested changes May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[opt](function) optimize repeat and trim string functions#63784

[opt](function) optimize repeat and trim string functions#63784
Mryange wants to merge 2 commits into
apache:masterfrom
Mryange:opt-string-func-5.28

Mryange commented May 28, 2026

Uh oh!

hello-stephen commented May 28, 2026

Uh oh!

Mryange commented May 28, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 28, 2026

Uh oh!

github-actions Bot May 28, 2026

Uh oh!

Mryange commented May 28, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	__m128i bytes = _mm_loadu_si128(reinterpret_cast<const __m128i*>(pos - 16));
	for (; static_cast<size_t>(pos - begin) >= 16; pos -= 16) {

Conversation

Mryange commented May 28, 2026

What problem does this PR solve?

Benchmark results

repeat

trim

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented May 28, 2026

Uh oh!

Mryange commented May 28, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Mryange commented May 28, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants