Skip to content

[opt](function) speed up count_substrings with constant pattern#64121

Open
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:opt-count-substrings-searcher
Open

[opt](function) speed up count_substrings with constant pattern#64121
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:opt-count-substrings-searcher

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented Jun 4, 2026

What problem does this PR solve?

count_substrings(str, pattern) previously scanned each row byte by byte and compared the pattern at every candidate offset with memcmp_small_allow_overflow15. This is expensive when pattern is constant across a block, especially for long strings or rare matches.

Root cause: the existing implementation did not reuse a prebuilt string searcher for constant patterns, so every row still used the naive per-offset comparison path.

This change builds one ASCIICaseSensitiveStringSearcher per block when pattern is constant and uses it to count non-overlapping matches. The non-constant pattern path is unchanged. A BE benchmark was added for count_substrings(str, const_pattern) to compare the old naive path, StringSearch, direct searcher, and the actual function path.

Case Naive FunctionConstPattern Speedup
SmallFrequent 1298 us 446 us 2.9x
MediumRare 2753 us 278 us 9.9x
LongNoFirstByte 21689 us 1231 us 17.6x
LongFalseFirstByte 21026 us 1215 us 17.3x
LongRare 21810 us 1265 us 17.2x
LongFrequent 20672 us 1435 us 14.4x
LongNeedle 20962 us 1331 us 15.7x

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Jun 4, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking findings in the actual GitHub PR diff.

Critical checkpoint conclusions:

  • Goal and tests: The PR optimizes count_substrings for constant patterns and adds a benchmark that cross-checks the new searcher path against the old naive path and the function result. Existing regression coverage already includes constant-pattern and start-position cases.
  • Scope: The production code change is small and focused on the pattern_const path; benchmark additions are isolated under be/benchmark.
  • Concurrency: No shared mutable state or thread-safety issue found in the changed function path.
  • Lifecycle/static initialization: No new cross-TU static initialization dependency found; benchmark constants are local to the included benchmark translation unit.
  • Configuration/compatibility/persistence: No config, storage-format, FE/BE protocol, or persistence changes in the actual PR diff.
  • Parallel paths: Non-constant-pattern behavior is preserved; both 2-argument and 3-argument constant-pattern paths were updated consistently.
  • Special conditions: Empty pattern and out-of-range start positions are handled consistently with the existing behavior.
  • Test coverage: Existing regression tests cover count_substrings; the new benchmark also validates result equivalence before timing. I did not run tests in this review environment.
  • Observability: Not applicable for this local scalar function optimization.
  • Performance: The change reuses a prebuilt searcher for constant patterns and removes repeated byte-by-byte scans in that path; no obvious regression found during static review.

User focus: No additional user-provided review focus was specified.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Jun 5, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 28187 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b0d3c666f12c729f1b267b2e8d84e12665d93a88, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17621	3917	3882	3882
q2	q3	10747	1373	773	773
q4	4683	465	345	345
q5	7521	867	594	594
q6	180	168	135	135
q7	783	825	638	638
q8	9698	1621	1449	1449
q9	6796	4493	4468	4468
q10	6833	1807	1512	1512
q11	434	262	251	251
q12	652	420	286	286
q13	18106	3406	2756	2756
q14	286	264	246	246
q15	q16	832	785	714	714
q17	1372	1105	750	750
q18	6723	5738	5535	5535
q19	1391	1185	1061	1061
q20	513	397	268	268
q21	5908	2636	2222	2222
q22	425	361	302	302
Total cold run time: 101504 ms
Total hot run time: 28187 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4313	4223	4247	4223
q2	q3	4535	4997	4324	4324
q4	2097	2208	1362	1362
q5	4413	4381	4485	4381
q6	241	179	133	133
q7	1839	1700	2071	1700
q8	2615	2256	2090	2090
q9	8043	7835	7884	7835
q10	4854	4754	4260	4260
q11	579	439	386	386
q12	739	767	556	556
q13	3390	3505	2951	2951
q14	303	340	302	302
q15	q16	715	769	647	647
q17	1356	1322	1293	1293
q18	7979	7444	7247	7247
q19	1089	1103	1095	1095
q20	2193	2240	1938	1938
q21	5236	4563	4430	4430
q22	532	455	432	432
Total cold run time: 57061 ms
Total hot run time: 51585 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169030 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b0d3c666f12c729f1b267b2e8d84e12665d93a88, data reload: false

query5	4347	645	470	470
query6	443	202	186	186
query7	4912	590	301	301
query8	364	212	202	202
query9	8772	4049	4023	4023
query10	440	312	262	262
query11	5950	2360	2186	2186
query12	156	107	103	103
query13	1263	604	464	464
query14	6378	5419	5086	5086
query14_1	4410	4403	4417	4403
query15	207	199	180	180
query16	1044	456	353	353
query17	1132	710	609	609
query18	2532	510	383	383
query19	205	175	144	144
query20	113	104	107	104
query21	212	139	118	118
query22	13622	13559	13334	13334
query23	17295	16581	16161	16161
query23_1	16334	16365	16229	16229
query24	7680	1771	1292	1292
query24_1	1279	1311	1301	1301
query25	541	452	399	399
query26	1335	316	166	166
query27	2619	566	342	342
query28	4473	2062	2098	2062
query29	1195	597	486	486
query30	300	234	205	205
query31	1118	1068	961	961
query32	108	58	61	58
query33	537	310	244	244
query34	1180	1128	663	663
query35	760	787	673	673
query36	1389	1429	1226	1226
query37	144	99	87	87
query38	3215	3139	3067	3067
query39	933	936	906	906
query39_1	894	890	877	877
query40	224	124	101	101
query41	64	61	61	61
query42	96	93	93	93
query43	319	320	283	283
query44	
query45	201	186	179	179
query46	1115	1222	754	754
query47	2368	2377	2264	2264
query48	386	401	300	300
query49	625	462	346	346
query50	1027	370	268	268
query51	4383	4351	4237	4237
query52	89	89	76	76
query53	242	263	190	190
query54	274	219	209	209
query55	84	79	71	71
query56	248	238	241	238
query57	1443	1411	1327	1327
query58	252	219	229	219
query59	1586	1640	1397	1397
query60	316	263	238	238
query61	184	177	211	177
query62	686	669	580	580
query63	231	181	180	180
query64	2578	796	626	626
query65	
query66	1780	471	342	342
query67	29742	29754	29532	29532
query68	
query69	434	310	259	259
query70	904	937	945	937
query71	308	224	209	209
query72	3065	2829	2402	2402
query73	842	799	443	443
query74	5089	4970	4767	4767
query75	2661	2596	2263	2263
query76	2330	1134	850	850
query77	352	373	282	282
query78	12449	12249	11886	11886
query79	1466	1025	743	743
query80	815	468	400	400
query81	489	280	245	245
query82	566	151	117	117
query83	344	277	246	246
query84	258	138	114	114
query85	923	528	434	434
query86	408	294	300	294
query87	3400	3425	3183	3183
query88	3641	2746	2697	2697
query89	436	371	324	324
query90	1762	179	187	179
query91	181	167	144	144
query92	64	62	53	53
query93	1462	1434	859	859
query94	610	364	317	317
query95	672	456	363	363
query96	1051	792	356	356
query97	2711	2691	2574	2574
query98	212	209	208	208
query99	1177	1159	1029	1029
Total cold run time: 251816 ms
Total hot run time: 169030 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (63/63) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.89% (27503/38256)
Line Coverage 55.46% (294630/531261)
Region Coverage 52.26% (246073/470871)
Branch Coverage 53.36% (106279/199183)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants