Skip to content

[opt](function) speed up multi_search_all_positions#64012

Open
Mryange wants to merge 3 commits into
apache:masterfrom
Mryange:opt-multi_search_all_positions
Open

[opt](function) speed up multi_search_all_positions#64012
Mryange wants to merge 3 commits into
apache:masterfrom
Mryange:opt-multi_search_all_positions

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented Jun 2, 2026

What problem does this PR solve?

multi_search_all_positions used to scan the same haystack once for every constant needle. This is expensive when the needle array is large.

This PR adds MultiStringSearcher for the constant-needle path. It batches needles into a 2-byte ngram hash table, scans each haystack once per batch, and verifies full needle matches before writing the result. Short or unsupported needles still use the existing single-string searcher fallback, and the dynamic-needle path is unchanged.

Benchmark result for 4096 rows, 1024-byte haystacks, and 64 constant needles:

Case Old CPU time New CPU time Speedup
NoHit 37.23 ms 0.84 ms 44.4x
RareHit 46.20 ms 0.94 ms 49.0x
LastNeedleHit 46.21 ms 1.18 ms 39.2x

The benchmark host had high load and CPU scaling enabled, so the numbers are intended as relative performance evidence.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Jun 2, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: request changes. The PR goal is clear and the implementation is focused, but the new MultiStringSearcher forms out-of-range pointers on valid inputs before its bounds checks reject them. This is a correctness/memory-safety issue and should be fixed before merge.

Critical checkpoint conclusions:

  • Goal/test proof: The optimization targets constant-needle multi_search_all_positions and existing tests cover basic behavior, but the new long-needle/short-haystack bounds cases are not covered.
  • Scope: The change is small and focused.
  • Concurrency: No new shared state or concurrent lifecycle is introduced.
  • Lifecycle/static initialization: No special lifecycle or cross-TU static initialization issue found.
  • Configuration/compatibility/protocol: No config, storage format, or FE-BE protocol compatibility change found.
  • Parallel paths: Dynamic-needle path is intentionally unchanged; constant-needle path is the modified path.
  • Special checks: The new batching and bounds checks need adjustment to avoid pointer UB.
  • Test coverage: Please add regression or unit coverage for long constant needles with haystacks shorter than the needle / candidate near row end.
  • Observability: No additional observability appears necessary for this function-level optimization.
  • Transactions/persistence/data writes: Not applicable.
  • Performance: The algorithmic direction is reasonable; no separate blocking performance issue found.

User focus: No additional user-provided review focus was supplied.

return;
}

for (const uint8_t* pos = haystack + _step - sizeof(Ngram);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can create an out-of-range pointer before the loop condition has a chance to fail. For example, with a 254-byte non-fallback needle _step becomes 253; a 10-byte haystack passes the 2-byte check above, then haystack + _step - sizeof(Ngram) forms haystack + 251, which is undefined behavior even if the loop never runs. Please either guard haystack_end - haystack < _step before this expression or rewrite the scan using integer offsets. The same issue exists a few lines below at pos < haystack + _hash[cell].off - 1; compare offsets like (pos - haystack) < _hash[cell].off - 1 instead of forming a potentially out-of-range pointer.

const auto& needle = _needles[needle_index];
// The hash table only proves that one 2-byte ngram matched. Full memcmp is
// required to discard false positives from duplicate ngrams and collisions.
if (match + needle.size <= haystack_end &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bounds check still performs match + needle.size before verifying that the result is within the haystack object. For a long needle and a candidate near the end of a short row, that pointer arithmetic can go far past one-past-end and is undefined behavior. Please use a subtraction-style check such as haystack_end - match >= needle.size before memcmp, and add coverage for long constant needles against shorter haystacks / near-end candidates.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Jun 2, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed for PR 64012 at head 4aecb95d4e95909e2986907cab05d202ffbfdb25.

I did not find additional blocking issues beyond the two existing inline review threads in .opencode-review.M6EuG5/pr_review_threads.md. Those threads covered out-of-range pointer arithmetic in MultiStringSearcher; the later head diff changes the loop and final bounds check to offset/subtraction-style logic, so I did not re-submit the same issues.

Critical checkpoint conclusions:

  • Goal and proof: The PR optimizes constant-needle multi_search_all_positions by batching many needles through a new ngram-based MultiStringSearcher; the benchmark cross-checks old and new results, and existing regression coverage for the function remains relevant, but no new regression/unit case was added for the new edge cases.
  • Scope: The production change is focused on the constant-needle path plus a reusable searcher and benchmark.
  • Concurrency: No shared mutable state across threads is introduced; MultiStringSearcher is per execution and not used concurrently.
  • Lifecycle/static initialization: No non-trivial static/global lifecycle dependency was introduced.
  • Configuration: No new configuration items.
  • Compatibility/storage/protocol: No storage format, function signature, or FE-BE protocol compatibility change found.
  • Parallel code paths: The vector-needle path remains on the old per-row search path; this is consistent with the optimization targeting constant needles.
  • Conditional checks: The special fallback handling for very short/long needles is clear and uses the existing single-string searcher path.
  • Test coverage: Existing SQL tests cover basic semantics; no additional issue found here, though adding targeted long-needle/short-haystack coverage would be useful given the fixed edge cases.
  • Observability: No new observability is needed for this hot scalar-function path.
  • Transactions/persistence/data writes: Not applicable.
  • FE-BE variable passing: Not applicable.
  • Performance: The batching approach avoids redundant row scans per needle on constant arrays; no additional obvious hot-path regression found.

User focus: No additional user-provided review focus was specified.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Jun 2, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/93) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.11% (21121/39030)
Line Coverage 37.67% (200636/532660)
Region Coverage 33.80% (157941/467308)
Branch Coverage 34.75% (68950/198392)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.55% (87/93) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.92% (28252/38220)
Line Coverage 57.86% (307378/531284)
Region Coverage 54.54% (257303/471729)
Branch Coverage 56.06% (111638/199138)

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Jun 2, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29260 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d16bc5fbe342e6ac3cf142a9f51e045f56f8bb96, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17813	4164	4217	4164
q2	q3	10814	1406	840	840
q4	4684	469	349	349
q5	7635	876	605	605
q6	187	177	141	141
q7	817	938	647	647
q8	9381	1667	1682	1667
q9	5841	4559	4548	4548
q10	6742	1845	1535	1535
q11	431	268	253	253
q12	624	426	302	302
q13	18232	3411	2748	2748
q14	269	261	243	243
q15	q16	818	769	716	716
q17	978	942	1067	942
q18	6895	5675	5534	5534
q19	1337	1252	1092	1092
q20	498	397	268	268
q21	5933	2650	2353	2353
q22	432	375	313	313
Total cold run time: 100361 ms
Total hot run time: 29260 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4543	4434	4375	4375
q2	q3	4555	4994	4370	4370
q4	2072	2252	1388	1388
q5	4470	4306	4316	4306
q6	232	173	131	131
q7	1735	1951	1867	1867
q8	2761	2391	2273	2273
q9	8203	8429	8020	8020
q10	4838	4751	4391	4391
q11	585	431	398	398
q12	742	787	539	539
q13	3362	3681	2991	2991
q14	304	305	271	271
q15	q16	712	720	665	665
q17	1371	1335	1518	1335
q18	7833	7277	7437	7277
q19	1181	1093	1114	1093
q20	2234	2256	1955	1955
q21	5313	4647	4505	4505
q22	556	459	407	407
Total cold run time: 57602 ms
Total hot run time: 52557 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170342 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d16bc5fbe342e6ac3cf142a9f51e045f56f8bb96, data reload: false

query5	4341	631	494	494
query6	447	203	205	203
query7	4836	544	315	315
query8	383	227	204	204
query9	8769	4074	4079	4074
query10	466	317	259	259
query11	5930	2327	2193	2193
query12	156	101	98	98
query13	1385	615	435	435
query14	6465	5463	5087	5087
query14_1	4418	4459	4463	4459
query15	212	201	178	178
query16	1003	438	394	394
query17	947	691	575	575
query18	2453	479	349	349
query19	194	181	144	144
query20	119	108	102	102
query21	214	137	119	119
query22	13656	13605	13469	13469
query23	17503	16470	16283	16283
query23_1	16334	16430	16280	16280
query24	7521	1789	1337	1337
query24_1	1303	1319	1336	1319
query25	543	446	384	384
query26	1279	326	160	160
query27	2696	540	341	341
query28	4492	2039	2031	2031
query29	1116	628	474	474
query30	316	237	189	189
query31	1127	1080	970	970
query32	103	61	60	60
query33	529	320	250	250
query34	1189	1178	664	664
query35	798	795	685	685
query36	1407	1372	1270	1270
query37	151	109	92	92
query38	3226	3167	3071	3071
query39	946	926	892	892
query39_1	879	888	889	888
query40	223	120	104	104
query41	66	63	62	62
query42	96	94	96	94
query43	332	330	293	293
query44	
query45	198	188	179	179
query46	1096	1198	729	729
query47	2404	2332	2225	2225
query48	392	410	282	282
query49	642	469	366	366
query50	975	353	270	270
query51	4323	4388	4281	4281
query52	91	92	83	83
query53	255	277	195	195
query54	280	235	222	222
query55	84	78	73	73
query56	258	253	239	239
query57	1441	1403	1333	1333
query58	269	236	227	227
query59	1619	1750	1489	1489
query60	306	264	232	232
query61	182	183	178	178
query62	697	677	586	586
query63	246	193	189	189
query64	2635	857	668	668
query65	
query66	1821	481	359	359
query67	29806	29797	29603	29603
query68	
query69	427	314	271	271
query70	983	991	978	978
query71	315	232	207	207
query72	3178	2675	2407	2407
query73	869	741	416	416
query74	5153	4954	4823	4823
query75	2673	2579	2253	2253
query76	2296	1165	759	759
query77	353	385	294	294
query78	12559	12498	11897	11897
query79	1424	1072	763	763
query80	1283	483	435	435
query81	524	291	252	252
query82	600	157	119	119
query83	361	280	250	250
query84	313	146	108	108
query85	928	544	444	444
query86	440	317	284	284
query87	3427	3345	3231	3231
query88	3621	2742	2740	2740
query89	439	385	335	335
query90	1988	184	188	184
query91	183	172	134	134
query92	70	61	56	56
query93	1465	1419	910	910
query94	730	359	324	324
query95	677	468	361	361
query96	1011	813	332	332
query97	2678	2713	2558	2558
query98	212	212	204	204
query99	1197	1183	1037	1037
Total cold run time: 253380 ms
Total hot run time: 170342 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.55% (87/93) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.84% (28221/38221)
Line Coverage 57.81% (307168/531348)
Region Coverage 54.71% (258104/471783)
Branch Coverage 56.11% (111746/199154)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants