[fix](cloud) modify primary cluster bes to be in CloudReplica to reduce memory #59932

mymeiyi · 2026-01-15T11:05:13Z

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

…ce memory

hello-stephen · 2026-01-15T11:05:22Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Copilot

Pull request overview

This PR optimizes memory usage in CloudReplica by changing the storage of primary cluster backend IDs from a List to a single Long value. The refactoring reflects the reality that each replica is mapped to only one backend per cluster, despite the previous data structure suggesting multiple backends.

Changes:

Refactored CloudReplica to use a single Long instead of List for primary cluster backend mapping
Made Replica class abstract with abstract methods for getBackendId() and checkVersionCatchUp()
Moved method implementations from Replica to LocalReplica to support the abstract class pattern
Added migration logic via GsonPostProcessable interface to ensure backward compatibility during deserialization

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
CloudReplica.java	Replaced `primaryClusterToBackends` (Map<String, List>) with `primaryClusterToBackend` (Map<String, Long>) to reduce memory; added gsonPostProcess migration logic; updated all usages
Replica.java	Converted to abstract class; made getBackendId() and checkVersionCatchUp() abstract methods
LocalReplica.java	Implemented checkVersionCatchUp() abstract method with the original logic from Replica class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java

mymeiyi · 2026-01-15T11:20:37Z

run buildall

doris-robot · 2026-01-15T11:40:40Z

TPC-H: Total hot run time: 31453 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit af821cd369b978aa929897fb8fedff37242d27c0, data reload: false

------ Round 1 ----------------------------------
q1	17618	4228	4041	4041
q2	2054	376	273	273
q3	10110	1302	699	699
q4	10234	764	306	306
q5	8379	2144	1800	1800
q6	213	167	139	139
q7	949	815	662	662
q8	9285	1433	1093	1093
q9	4940	4670	4519	4519
q10	6783	1797	1471	1471
q11	514	310	276	276
q12	756	734	623	623
q13	17772	3838	3054	3054
q14	289	298	281	281
q15	589	501	506	501
q16	701	679	625	625
q17	656	772	540	540
q18	6777	6404	6437	6404
q19	1248	989	630	630
q20	384	373	241	241
q21	3111	2572	2309	2309
q22	1063	996	966	966
Total cold run time: 104425 ms
Total hot run time: 31453 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4079	4014	4007	4007
q2	330	427	318	318
q3	2102	2614	2213	2213
q4	1329	1728	1319	1319
q5	4065	3978	4071	3978
q6	212	167	128	128
q7	1865	1800	1690	1690
q8	2462	2529	2288	2288
q9	6699	6647	6640	6640
q10	2319	2504	2077	2077
q11	527	474	438	438
q12	663	686	545	545
q13	3340	3795	3093	3093
q14	280	287	259	259
q15	535	501	495	495
q16	620	641	603	603
q17	1102	1226	1280	1226
q18	7511	7214	7288	7214
q19	850	847	818	818
q20	1878	1949	1790	1790
q21	4579	4258	4162	4162
q22	1092	1013	1005	1005
Total cold run time: 48439 ms
Total hot run time: 46306 ms

doris-robot · 2026-01-15T11:51:13Z

TPC-DS: Total hot run time: 173602 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit af821cd369b978aa929897fb8fedff37242d27c0, data reload: false

query5	4955	623	496	496
query6	319	235	225	225
query7	3917	471	255	255
query8	360	255	239	239
query9	7175	2872	2897	2872
query10	450	386	351	351
query11	15297	15076	14860	14860
query12	175	112	111	111
query13	1024	463	369	369
query14	4582	2980	2782	2782
query14_1	2665	2646	2627	2627
query15	190	189	172	172
query16	964	497	477	477
query17	829	645	547	547
query18	2226	423	331	331
query19	222	220	192	192
query20	116	112	112	112
query21	213	131	109	109
query22	4003	4077	4075	4075
query23	15891	15524	15413	15413
query23_1	15420	15496	15458	15458
query24	6235	1521	1161	1161
query24_1	1151	1138	1177	1138
query25	516	431	396	396
query26	1137	265	144	144
query27	2746	440	268	268
query28	4434	2123	2106	2106
query29	759	515	440	440
query30	311	243	212	212
query31	770	626	562	562
query32	83	77	74	74
query33	554	338	307	307
query34	878	864	529	529
query35	733	752	666	666
query36	882	907	843	843
query37	132	99	90	90
query38	2747	2787	2681	2681
query39	791	783	730	730
query39_1	730	721	709	709
query40	222	133	119	119
query41	65	60	59	59
query42	106	97	100	97
query43	461	473	405	405
query44	1314	729	730	729
query45	184	177	185	177
query46	831	946	561	561
query47	1448	1505	1391	1391
query48	307	309	232	232
query49	591	417	335	335
query50	621	261	209	209
query51	3815	3795	3722	3722
query52	102	109	94	94
query53	286	325	271	271
query54	298	276	265	265
query55	84	78	76	76
query56	298	304	308	304
query57	1037	992	900	900
query58	280	282	262	262
query59	2051	2146	2062	2062
query60	341	344	314	314
query61	154	158	153	153
query62	377	358	305	305
query63	297	260	270	260
query64	4928	1288	995	995
query65	3798	3814	3771	3771
query66	1435	422	321	321
query67	15568	15631	15419	15419
query68	2445	1077	778	778
query69	452	360	325	325
query70	992	953	900	900
query71	326	316	289	289
query72	5372	2761	3622	2761
query73	600	718	325	325
query74	8725	8735	8506	8506
query75	2742	2806	2464	2464
query76	2262	1058	647	647
query77	363	395	315	315
query78	9832	9984	9194	9194
query79	1064	904	565	565
query80	712	616	526	526
query81	506	262	233	233
query82	1079	145	115	115
query83	391	274	253	253
query84	308	132	103	103
query85	986	592	448	448
query86	383	325	296	296
query87	2872	2848	2750	2750
query88	3473	2585	2578	2578
query89	396	347	322	322
query90	1691	163	163	163
query91	174	160	135	135
query92	82	72	73	72
query93	918	868	526	526
query94	453	327	278	278
query95	582	343	316	316
query96	633	493	229	229
query97	2330	2338	2345	2338
query98	212	202	199	199
query99	583	584	497	497
Total cold run time: 235741 ms
Total hot run time: 173602 ms

doris-robot · 2026-01-15T11:56:13Z

ClickBench: Total hot run time: 26.74 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit af821cd369b978aa929897fb8fedff37242d27c0, data reload: false

query1	0.05	0.04	0.05
query2	0.10	0.05	0.04
query3	0.25	0.08	0.09
query4	1.60	0.12	0.11
query5	0.28	0.26	0.26
query6	1.14	0.66	0.64
query7	0.03	0.03	0.03
query8	0.05	0.04	0.04
query9	0.56	0.50	0.49
query10	0.55	0.54	0.55
query11	0.15	0.09	0.10
query12	0.14	0.10	0.10
query13	0.61	0.59	0.58
query14	0.94	0.95	0.94
query15	0.79	0.78	0.78
query16	0.39	0.40	0.40
query17	1.05	1.05	1.01
query18	0.22	0.22	0.21
query19	1.96	1.83	1.76
query20	0.02	0.01	0.02
query21	15.48	0.28	0.14
query22	5.28	0.06	0.04
query23	16.11	0.27	0.11
query24	1.05	0.59	0.23
query25	0.08	0.08	0.05
query26	0.14	0.15	0.13
query27	0.09	0.05	0.08
query28	3.16	1.08	0.89
query29	12.55	3.97	3.22
query30	0.28	0.14	0.12
query31	2.82	0.61	0.38
query32	3.24	0.57	0.47
query33	2.97	3.06	3.02
query34	15.93	5.05	4.41
query35	4.47	4.50	4.49
query36	0.64	0.50	0.49
query37	0.11	0.06	0.06
query38	0.08	0.05	0.03
query39	0.05	0.03	0.04
query40	0.17	0.14	0.13
query41	0.09	0.03	0.03
query42	0.04	0.04	0.03
query43	0.04	0.04	0.04
Total cold run time: 95.75 s
Total hot run time: 26.74 s

hello-stephen · 2026-01-15T12:24:00Z

FE UT Coverage Report

Increment line coverage 26.67% (8/30) 🎉
Increment coverage report
Complete coverage report

dataroaring · 2026-01-15T17:09:29Z

fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java

-            = new ConcurrentHashMap<String, List<Long>>();
+    private ConcurrentHashMap<String, List<Long>> primaryClusterToBackends = null;
+    @SerializedName(value = "be")
+    private ConcurrentHashMap<String, Long> primaryClusterToBackend = new ConcurrentHashMap<>();


The type of value has to be Long or long works too.

mymeiyi · 2026-01-16T06:23:15Z

run cloud_p0

hello-stephen · 2026-01-16T08:09:22Z

FE Regression Coverage Report

Increment line coverage 36.67% (11/30) 🎉
Increment coverage report
Complete coverage report

deardeng

LGTM

github-actions · 2026-01-20T11:16:39Z

PR approved by anyone and no changes requested.

[fix](cloud) modify primary cluster bes to be in CloudReplica to redu…

af821cd

…ce memory

mymeiyi requested a review from gavinchou as a code owner January 15, 2026 11:05

Copilot AI review requested due to automatic review settings January 15, 2026 11:05

mymeiyi requested review from dataroaring and w41ter as code owners January 15, 2026 11:05

Copilot started reviewing on behalf of mymeiyi January 15, 2026 11:06 View session

Copilot AI reviewed Jan 15, 2026

View reviewed changes

fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java Show resolved Hide resolved

fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java Show resolved Hide resolved

dataroaring reviewed Jan 15, 2026

View reviewed changes

deardeng approved these changes Jan 20, 2026

View reviewed changes

github-actions bot added the reviewed label Jan 20, 2026

[fix](cloud) modify primary cluster bes to be in CloudReplica to reduce memory #59932

Are you sure you want to change the base?

[fix](cloud) modify primary cluster bes to be in CloudReplica to reduce memory #59932

Conversation

mymeiyi commented Jan 15, 2026

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Jan 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

mymeiyi commented Jan 15, 2026

Uh oh!

doris-robot commented Jan 15, 2026

Uh oh!

doris-robot commented Jan 15, 2026

Uh oh!

doris-robot commented Jan 15, 2026

Uh oh!

hello-stephen commented Jan 15, 2026

FE UT Coverage Report

Uh oh!

dataroaring Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

mymeiyi commented Jan 16, 2026

Uh oh!

hello-stephen commented Jan 16, 2026

FE Regression Coverage Report

Uh oh!

deardeng left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants