Skip to content

[opt](group-commit) Skip createLocation in group commit stream load sink#63561

Open
liaoxin01 wants to merge 1 commit into
apache:masterfrom
liaoxin01:opt-groupcommit-skip-sink-location-master
Open

[opt](group-commit) Skip createLocation in group commit stream load sink#63561
liaoxin01 wants to merge 1 commit into
apache:masterfrom
liaoxin01:opt-groupcommit-skip-sink-location-master

Conversation

@liaoxin01
Copy link
Copy Markdown
Contributor

Summary

The BE-side GroupCommitBlockSinkOperatorX::init does not consume TOlapTableSink.location or slave_location (it only reads tuple_id / schema / db_id / table_id / partition / group_commit_mode / load_id / max_filter_ratio). However, FE still runs createLocation, which iterates O(partitions * indexes * tablets * replicas) and, for every replica, takes the CloudSystemInfoService RW read lock via CloudReplica.getCurrentClusterId.

Under high-concurrency group commit stream load on wide-partition tables (3000+ partitions in a real production incident), CAS contention on the RW lock's state cache line saturated all FE CPUs, and the cluster could not recover even after scaling out (more cores = more CAS contenders = worse contention).

Change

  • Introduce a protected initLocationParams(TOlapTableSink) hook on OlapTableSink. Default behavior delegates to createLocation, so non-group-commit sinks are unaffected.
  • Route both init(...) overloads in OlapTableSink through the hook.
  • GroupCommitBlockSink overrides the hook to return empty placeholder TOlapTableLocationParam objects. TOlapTableSink.location is a required thrift field, so we still set non-null placeholders, but no tablet/replica enumeration happens.

Effect on the group-commit path:

  • Per-request FE CPU: O(partitions * indexes * tablets * replicas)O(1)
  • CloudSystemInfoService RW lock acquisitions: hundreds of concurrent CAS spinners → 0

Test plan

  • Added GroupCommitBlockSinkTest covering:
    • initLocationParams returns 2 placeholders with empty tablet lists (verifies the override is what runs, not createLocation).
    • parseGroupCommit parses async_mode / sync_mode / off_mode (case-insensitive) and returns null for unknown values.
  • Existing regression tests for stream load with group_commit=true still pass.
  • Manual high-concurrency stream load run on a wide-partition table to confirm FE CPU is no longer dominated by CloudSystemInfoService lock contention.

Copilot AI review requested due to automatic review settings May 23, 2026 14:43
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01
Copy link
Copy Markdown
Contributor Author

/review

@liaoxin01
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Skips the expensive createLocation tablet/replica enumeration for the group-commit stream load sink, which under cloud mode contends on CloudSystemInfoService's RW lock and dominates FE CPU on wide-partition tables. BE's GroupCommitBlockSinkOperatorX::init does not consume location/slave_location, so empty placeholders satisfy the required thrift field at O(1) cost.

Changes:

  • Add a protected initLocationParams(TOlapTableSink) hook on OlapTableSink (default delegates to createLocation) and route both init overloads through it.
  • Override initLocationParams in GroupCommitBlockSink to return two empty placeholder TOlapTableLocationParam objects.
  • Add GroupCommitBlockSinkTest covering the override and parseGroupCommit parsing.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
fe/fe-core/src/main/java/org/apache/doris/planner/OlapTableSink.java Introduces initLocationParams hook and routes both init overloads through it.
fe/fe-core/src/main/java/org/apache/doris/planner/GroupCommitBlockSink.java Overrides the hook to return empty placeholder location params, skipping replica enumeration.
fe/fe-core/src/test/java/org/apache/doris/planner/GroupCommitBlockSinkTest.java New unit tests for the override behavior and parseGroupCommit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found.

Critical checkpoint conclusions:

  • Goal/test coverage: The change avoids FE tablet replica location enumeration for GroupCommitBlockSink while preserving default OlapTableSink behavior. The added unit test covers the override contract and group_commit parsing. I could not run it locally because thirdparty/installed/bin/protoc is absent in this runner, and FE build instructions require stopping before Maven when protoc is missing.
  • Scope/focus: The implementation is small and focused: a protected hook plus a GroupCommitBlockSink override.
  • Concurrency/locking: The changed FE code does not introduce new shared mutable state or locks. It removes the high-contention createLocation path from the per-request group commit block sink initialization.
  • Lifecycle/static initialization: No new lifecycle-sensitive objects or static initialization dependencies were added.
  • Configuration/compatibility: No config items or storage/protocol-incompatible fields were added. Required thrift location fields are still populated with non-null placeholders.
  • Parallel code paths: Both relevant OlapTableSink init overloads now route through the hook; non-group-commit and RemoteOlapTableSink paths retain createLocation via the default implementation.
  • Error handling/data correctness: BE GroupCommitBlockSinkOperatorX::init only consumes schema, partition, ids, group_commit_mode, load_id, and max_filter_ratio; the real internal group-commit insert plan still obtains the normal OLAP sink planning path. I did not find a data visibility or transaction correctness regression from the skipped placeholder locations.
  • Observability/performance: The change addresses the intended hot path without adding noisy logs. Existing group commit BE logging remains available.

User focus: No additional user-provided review focus was supplied.

@liaoxin01 liaoxin01 force-pushed the opt-groupcommit-skip-sink-location-master branch from 4372498 to cd29627 Compare May 23, 2026 14:59
@liaoxin01
Copy link
Copy Markdown
Contributor Author

run buildall

The BE-side GroupCommitBlockSinkOperatorX::init does not consume
TOlapTableSink.location or slave_location (it only reads tuple_id,
schema, db_id, table_id, partition, group_commit_mode, load_id and
max_filter_ratio). However, FE still ran createLocation, which iterates
O(partitions * indexes * tablets * replicas) and, for every replica,
takes the CloudSystemInfoService RW read lock via
CloudReplica.getCurrentClusterId. Under high-concurrency group commit
stream load on wide-partition tables (3000+ partitions in one
production incident), CAS contention on the RW lock's state cache line
saturated all FE CPUs and the cluster could not recover even after
scaling.

Introduce an initLocationParams hook on OlapTableSink so subclasses
can override how location params are populated. Both init(...)
overloads now route through this hook. GroupCommitBlockSink overrides
it to return empty placeholder params (location is a required thrift
field, but its contents are unused on BE for the group commit path).

Add GroupCommitBlockSinkTest to lock in the contract.
@liaoxin01 liaoxin01 force-pushed the opt-groupcommit-skip-sink-location-master branch from cd29627 to 683edaa Compare May 25, 2026 03:23
@liaoxin01
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31551 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 683edaa65cf3384ab904b6bbed2d66d9b9db8dd4, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17677	4030	4022	4022
q2	q3	10792	1369	867	867
q4	4688	481	356	356
q5	7583	2277	2129	2129
q6	245	179	139	139
q7	955	788	640	640
q8	9399	1928	1631	1631
q9	5233	4998	4922	4922
q10	6391	2226	1901	1901
q11	423	275	241	241
q12	638	439	302	302
q13	18151	3297	2816	2816
q14	268	259	238	238
q15	q16	826	776	722	722
q17	962	969	1024	969
q18	7066	5627	5586	5586
q19	1383	1268	1142	1142
q20	515	408	282	282
q21	6078	2643	2346	2346
q22	446	365	300	300
Total cold run time: 99719 ms
Total hot run time: 31551 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4341	4265	4251	4251
q2	q3	4509	4952	4345	4345
q4	2117	2240	1431	1431
q5	4487	4327	4378	4327
q6	273	314	166	166
q7	2197	1942	1689	1689
q8	2631	2224	2231	2224
q9	8315	7967	8036	7967
q10	4844	4783	4335	4335
q11	595	436	391	391
q12	784	770	567	567
q13	3293	3639	2922	2922
q14	296	318	291	291
q15	q16	704	756	651	651
q17	1375	1382	1489	1382
q18	7901	7470	7422	7422
q19	1159	1129	1124	1124
q20	2226	2231	1965	1965
q21	5344	4632	4562	4562
q22	520	475	397	397
Total cold run time: 57911 ms
Total hot run time: 52409 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31678 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 683edaa65cf3384ab904b6bbed2d66d9b9db8dd4, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17908	4110	4008	4008
q2	q3	10784	1403	835	835
q4	4678	472	350	350
q5	7630	2250	2083	2083
q6	239	174	139	139
q7	928	782	633	633
q8	9461	1725	1559	1559
q9	6384	4934	4960	4934
q10	6484	2243	1875	1875
q11	435	272	251	251
q12	690	422	297	297
q13	18220	3435	2796	2796
q14	275	259	238	238
q15	q16	818	768	703	703
q17	994	958	896	896
q18	6796	5712	5621	5621
q19	1241	1350	1196	1196
q20	540	406	263	263
q21	5944	2680	2683	2680
q22	451	366	321	321
Total cold run time: 100900 ms
Total hot run time: 31678 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4840	4755	4910	4755
q2	q3	4929	5288	4744	4744
q4	2124	2219	1434	1434
q5	5014	4820	4742	4742
q6	237	183	135	135
q7	1886	1753	1584	1584
q8	2362	1974	1947	1947
q9	7473	7434	7452	7434
q10	4741	4696	4246	4246
q11	544	390	363	363
q12	738	750	541	541
q13	3013	3439	2855	2855
q14	276	287	250	250
q15	q16	681	706	611	611
q17	1297	1274	1262	1262
q18	7388	6787	6858	6787
q19	1114	1130	1081	1081
q20	2236	2235	1947	1947
q21	5305	4629	4479	4479
q22	514	449	401	401
Total cold run time: 56712 ms
Total hot run time: 51598 ms

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 173573 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 683edaa65cf3384ab904b6bbed2d66d9b9db8dd4, data reload: false

query5	4348	663	524	524
query6	323	232	205	205
query7	4243	541	303	303
query8	334	260	229	229
query9	8831	4147	4127	4127
query10	445	345	306	306
query11	5788	2490	2238	2238
query12	210	127	124	124
query13	1281	612	419	419
query14	6144	5516	5598	5516
query14_1	4512	4525	4476	4476
query15	210	205	188	188
query16	1037	446	434	434
query17	1123	718	607	607
query18	2483	485	352	352
query19	220	208	164	164
query20	136	137	131	131
query21	225	145	120	120
query22	13633	13624	13381	13381
query23	17443	16662	16474	16474
query23_1	16708	16549	16522	16522
query24	7520	1790	1328	1328
query24_1	1351	1337	1329	1329
query25	587	495	442	442
query26	1332	323	177	177
query27	2713	541	353	353
query28	4499	2047	2021	2021
query29	1022	648	511	511
query30	317	248	211	211
query31	1141	1105	963	963
query32	90	76	74	74
query33	587	373	288	288
query34	1179	1107	656	656
query35	784	812	700	700
query36	1422	1393	1274	1274
query37	153	104	87	87
query38	3232	3190	3078	3078
query39	921	928	899	899
query39_1	871	896	893	893
query40	231	146	124	124
query41	67	64	62	62
query42	109	107	109	107
query43	329	333	299	299
query44	
query45	214	204	200	200
query46	1092	1212	747	747
query47	2405	2372	2248	2248
query48	398	439	308	308
query49	626	495	381	381
query50	987	348	248	248
query51	4388	4355	4339	4339
query52	104	105	94	94
query53	250	286	205	205
query54	311	271	287	271
query55	92	91	87	87
query56	302	309	290	290
query57	1435	1400	1361	1361
query58	301	267	265	265
query59	1638	1697	1457	1457
query60	334	329	308	308
query61	170	158	166	158
query62	702	660	594	594
query63	243	207	205	205
query64	2424	831	634	634
query65	
query66	1721	489	366	366
query67	30317	30005	29796	29796
query68	
query69	474	343	302	302
query70	1052	985	993	985
query71	306	282	261	261
query72	3255	2731	2400	2400
query73	887	730	452	452
query74	5131	4986	4880	4880
query75	2696	2625	2304	2304
query76	2310	1208	778	778
query77	424	410	350	350
query78	12440	12309	11904	11904
query79	1501	1081	728	728
query80	1192	550	460	460
query81	522	291	250	250
query82	1356	159	122	122
query83	361	279	252	252
query84	261	140	115	115
query85	949	549	453	453
query86	447	351	342	342
query87	3471	3414	3264	3264
query88	3695	2802	2769	2769
query89	456	392	341	341
query90	1767	183	182	182
query91	182	175	146	146
query92	78	78	76	76
query93	1469	1451	887	887
query94	639	366	298	298
query95	685	462	373	373
query96	1015	831	378	378
query97	2760	2752	2592	2592
query98	235	226	225	225
query99	1162	1150	1024	1024
Total cold run time: 256776 ms
Total hot run time: 173573 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (8/8) 🎉
Increment coverage report
Complete coverage report

@liaoxin01
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found.

Critical checkpoint conclusions:

  • Goal and proof: The PR aims to avoid expensive tablet replica location enumeration for group-commit stream-load sinks. The change accomplishes this by routing OlapTableSink location initialization through an overridable hook and returning empty required thrift placeholders for GroupCommitBlockSink. The added unit test verifies the override returns empty placeholders without touching the table/tuple mocks.
  • Scope and clarity: The change is small and focused on the group-commit sink path; default OlapTableSink behavior remains unchanged.
  • Concurrency: No new shared mutable state or locking is introduced. The optimization removes work that previously took backend-location related locks during createLocation.
  • Lifecycle/static initialization: No special lifecycle or static initialization concerns found.
  • Configuration/compatibility: No new configuration or storage/protocol-incompatible field is introduced. The required thrift location field is still set; only its contents are empty on the group-commit sink path that BE does not consume.
  • Parallel code paths: Both modified OlapTableSink.init(...) overloads now use the hook, so the normal insert and stream-load initialization paths remain consistent.
  • Test coverage: GroupCommitBlockSinkTest covers the new hook behavior and parse helper. I did not run tests in this review environment.
  • Observability: No additional observability seems necessary for this local FE planning CPU optimization.
  • Transactions/data writes: The PR does not change transaction commit/publish semantics or BE load queue behavior; schema, partition, db/table ids, load id, group commit mode, and max filter ratio are still populated.
  • Performance: The intended O(tablets * replicas) location enumeration is skipped only for the BE group-commit sink path where code inspection confirms it is unused.

User focus: no additional user-provided review focus was supplied.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (8/8) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.x dev/4.0.x dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants