[opt](function) speed up multi_search_all_positions#64012
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Review result: request changes. The PR goal is clear and the implementation is focused, but the new MultiStringSearcher forms out-of-range pointers on valid inputs before its bounds checks reject them. This is a correctness/memory-safety issue and should be fixed before merge.
Critical checkpoint conclusions:
- Goal/test proof: The optimization targets constant-needle multi_search_all_positions and existing tests cover basic behavior, but the new long-needle/short-haystack bounds cases are not covered.
- Scope: The change is small and focused.
- Concurrency: No new shared state or concurrent lifecycle is introduced.
- Lifecycle/static initialization: No special lifecycle or cross-TU static initialization issue found.
- Configuration/compatibility/protocol: No config, storage format, or FE-BE protocol compatibility change found.
- Parallel paths: Dynamic-needle path is intentionally unchanged; constant-needle path is the modified path.
- Special checks: The new batching and bounds checks need adjustment to avoid pointer UB.
- Test coverage: Please add regression or unit coverage for long constant needles with haystacks shorter than the needle / candidate near row end.
- Observability: No additional observability appears necessary for this function-level optimization.
- Transactions/persistence/data writes: Not applicable.
- Performance: The algorithmic direction is reasonable; no separate blocking performance issue found.
User focus: No additional user-provided review focus was supplied.
| return; | ||
| } | ||
|
|
||
| for (const uint8_t* pos = haystack + _step - sizeof(Ngram); |
There was a problem hiding this comment.
This can create an out-of-range pointer before the loop condition has a chance to fail. For example, with a 254-byte non-fallback needle _step becomes 253; a 10-byte haystack passes the 2-byte check above, then haystack + _step - sizeof(Ngram) forms haystack + 251, which is undefined behavior even if the loop never runs. Please either guard haystack_end - haystack < _step before this expression or rewrite the scan using integer offsets. The same issue exists a few lines below at pos < haystack + _hash[cell].off - 1; compare offsets like (pos - haystack) < _hash[cell].off - 1 instead of forming a potentially out-of-range pointer.
| const auto& needle = _needles[needle_index]; | ||
| // The hash table only proves that one 2-byte ngram matched. Full memcmp is | ||
| // required to discard false positives from duplicate ngrams and collisions. | ||
| if (match + needle.size <= haystack_end && |
There was a problem hiding this comment.
This bounds check still performs match + needle.size before verifying that the result is within the haystack object. For a long needle and a candidate near the end of a short row, that pointer arithmetic can go far past one-past-end and is undefined behavior. Please use a subtraction-style check such as haystack_end - match >= needle.size before memcmp, and add coverage for long constant needles against shorter haystacks / near-end candidates.
|
/review |
There was a problem hiding this comment.
Review completed for PR 64012 at head 4aecb95d4e95909e2986907cab05d202ffbfdb25.
I did not find additional blocking issues beyond the two existing inline review threads in .opencode-review.M6EuG5/pr_review_threads.md. Those threads covered out-of-range pointer arithmetic in MultiStringSearcher; the later head diff changes the loop and final bounds check to offset/subtraction-style logic, so I did not re-submit the same issues.
Critical checkpoint conclusions:
- Goal and proof: The PR optimizes constant-needle
multi_search_all_positionsby batching many needles through a new ngram-basedMultiStringSearcher; the benchmark cross-checks old and new results, and existing regression coverage for the function remains relevant, but no new regression/unit case was added for the new edge cases. - Scope: The production change is focused on the constant-needle path plus a reusable searcher and benchmark.
- Concurrency: No shared mutable state across threads is introduced;
MultiStringSearcheris per execution and not used concurrently. - Lifecycle/static initialization: No non-trivial static/global lifecycle dependency was introduced.
- Configuration: No new configuration items.
- Compatibility/storage/protocol: No storage format, function signature, or FE-BE protocol compatibility change found.
- Parallel code paths: The vector-needle path remains on the old per-row search path; this is consistent with the optimization targeting constant needles.
- Conditional checks: The special fallback handling for very short/long needles is clear and uses the existing single-string searcher path.
- Test coverage: Existing SQL tests cover basic semantics; no additional issue found here, though adding targeted long-needle/short-haystack coverage would be useful given the fixed edge cases.
- Observability: No new observability is needed for this hot scalar-function path.
- Transactions/persistence/data writes: Not applicable.
- FE-BE variable passing: Not applicable.
- Performance: The batching approach avoids redundant row scans per needle on constant arrays; no additional obvious hot-path regression found.
User focus: No additional user-provided review focus was specified.
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 29260 ms |
TPC-DS: Total hot run time: 170342 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
multi_search_all_positionsused to scan the same haystack once for every constant needle. This is expensive when the needle array is large.This PR adds
MultiStringSearcherfor the constant-needle path. It batches needles into a 2-byte ngram hash table, scans each haystack once per batch, and verifies full needle matches before writing the result. Short or unsupported needles still use the existing single-string searcher fallback, and the dynamic-needle path is unchanged.Benchmark result for 4096 rows, 1024-byte haystacks, and 64 constant needles:
The benchmark host had high load and CPU scaling enabled, so the numbers are intended as relative performance evidence.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)