Skip to content

[improvement](merge-cloud) Add a delay for relocating replica‘s backend when backend is dead.#33901

Closed
yujun777 wants to merge 2 commits intoapache:masterfrom
yujun777:delay-relocate-replica-backend
Closed

[improvement](merge-cloud) Add a delay for relocating replica‘s backend when backend is dead.#33901
yujun777 wants to merge 2 commits intoapache:masterfrom
yujun777:delay-relocate-replica-backend

Conversation

@yujun777
Copy link
Contributor

When a be shutdown, its tablets will migrate to other backends quickly. Make a little delay to avoid too many migration.

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@yujun777
Copy link
Contributor Author

run buildall

@yujun777 yujun777 changed the title [improvement](merge-cloud) delay relocate replica backend [improvement](merge-cloud) Add a delay for relocating replica‘s backend when backend is dead. Apr 19, 2024
@yujun777
Copy link
Contributor Author

run buildall

1 similar comment
@yujun777
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38555 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e5b8cb3e4e8e9924740b555f77abb96765daed19, data reload: false

------ Round 1 ----------------------------------
q1	17622	4351	4254	4254
q2	2027	192	188	188
q3	10448	1153	1208	1153
q4	10191	759	863	759
q5	7499	2678	2603	2603
q6	220	137	138	137
q7	961	535	535	535
q8	9226	2052	2038	2038
q9	7447	6614	6579	6579
q10	8458	3583	3547	3547
q11	451	232	234	232
q12	386	230	221	221
q13	17771	2964	2954	2954
q14	263	237	240	237
q15	530	488	467	467
q16	518	380	376	376
q17	957	669	656	656
q18	7367	6858	6832	6832
q19	1600	1541	1539	1539
q20	650	319	317	317
q21	3548	2618	2838	2618
q22	368	313	316	313
Total cold run time: 108508 ms
Total hot run time: 38555 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4230	4233	4207	4207
q2	370	267	283	267
q3	2955	2772	2761	2761
q4	1853	1516	1568	1516
q5	5258	5277	5282	5277
q6	214	124	125	124
q7	1818	1418	1415	1415
q8	3189	3326	3385	3326
q9	8509	8473	8503	8473
q10	3847	3702	3659	3659
q11	582	485	487	485
q12	762	602	625	602
q13	17692	2953	2950	2950
q14	297	270	281	270
q15	512	467	475	467
q16	483	432	424	424
q17	1778	1479	1477	1477
q18	7458	7405	7288	7288
q19	1642	1534	1559	1534
q20	1928	1746	1732	1732
q21	5016	4846	4834	4834
q22	534	472	463	463
Total cold run time: 70927 ms
Total hot run time: 53551 ms

@yujun777
Copy link
Contributor Author

It will cause query retry many times, make fe busy.

@yujun777 yujun777 closed this Apr 19, 2024
@doris-robot
Copy link

TPC-DS: Total hot run time: 184787 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e5b8cb3e4e8e9924740b555f77abb96765daed19, data reload: false

query1	906	377	361	361
query2	7240	2438	2404	2404
query3	6651	209	215	209
query4	23675	21393	21404	21393
query5	4160	418	412	412
query6	271	175	173	173
query7	4585	293	281	281
query8	240	197	185	185
query9	8565	2316	2321	2316
query10	573	245	238	238
query11	14667	14189	14075	14075
query12	144	89	91	89
query13	1639	363	369	363
query14	10012	8025	7900	7900
query15	258	186	188	186
query16	8188	266	259	259
query17	2061	610	561	561
query18	2013	283	274	274
query19	342	160	161	160
query20	91	86	86	86
query21	208	123	164	123
query22	4978	4840	4789	4789
query23	34041	33151	33022	33022
query24	11862	3037	2982	2982
query25	642	367	370	367
query26	1721	150	151	150
query27	3019	303	315	303
query28	7418	2004	1998	1998
query29	1017	602	590	590
query30	282	175	170	170
query31	934	728	760	728
query32	87	58	54	54
query33	745	263	239	239
query34	1081	473	468	468
query35	834	703	691	691
query36	1030	908	919	908
query37	283	71	68	68
query38	3373	3154	3189	3154
query39	1574	1530	1522	1522
query40	268	131	126	126
query41	47	43	46	43
query42	104	99	97	97
query43	567	523	530	523
query44	1197	735	748	735
query45	279	277	271	271
query46	1081	720	742	720
query47	1946	1865	1836	1836
query48	351	294	294	294
query49	1131	407	379	379
query50	754	383	375	375
query51	6806	6632	6591	6591
query52	97	89	87	87
query53	354	276	274	274
query54	306	229	228	228
query55	76	72	73	72
query56	244	225	219	219
query57	1211	1113	1116	1113
query58	222	195	190	190
query59	3333	2984	3103	2984
query60	242	227	226	226
query61	91	86	87	86
query62	663	446	433	433
query63	305	283	279	279
query64	6371	3918	3804	3804
query65	3101	3098	3048	3048
query66	1398	340	332	332
query67	15393	15041	15125	15041
query68	6745	550	540	540
query69	533	310	299	299
query70	1294	1187	1178	1178
query71	1515	1256	1258	1256
query72	6444	2660	2444	2444
query73	726	315	321	315
query74	6866	6492	6456	6456
query75	3962	2640	2626	2626
query76	4924	992	958	958
query77	640	258	266	258
query78	10961	10179	10181	10179
query79	4024	530	517	517
query80	1128	420	434	420
query81	519	245	238	238
query82	1308	90	101	90
query83	212	165	164	164
query84	264	96	84	84
query85	1473	263	258	258
query86	486	297	305	297
query87	3422	3291	3316	3291
query88	4744	2310	2325	2310
query89	480	373	370	370
query90	2007	177	179	177
query91	123	96	95	95
query92	57	45	46	45
query93	4879	518	507	507
query94	1158	177	175	175
query95	394	305	289	289
query96	610	267	259	259
query97	3144	2936	2955	2936
query98	240	223	218	218
query99	1263	881	904	881
Total cold run time: 295893 ms
Total hot run time: 184787 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.51 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e5b8cb3e4e8e9924740b555f77abb96765daed19, data reload: false

query1	0.04	0.03	0.04
query2	0.07	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.10	0.08
query5	0.50	0.51	0.51
query6	1.47	0.71	0.71
query7	0.02	0.01	0.02
query8	0.04	0.04	0.04
query9	0.54	0.49	0.49
query10	0.53	0.55	0.54
query11	0.16	0.11	0.12
query12	0.14	0.12	0.12
query13	0.61	0.58	0.58
query14	0.75	0.77	0.77
query15	0.82	0.80	0.80
query16	0.36	0.37	0.36
query17	0.94	0.95	1.00
query18	0.20	0.25	0.24
query19	1.80	1.81	1.71
query20	0.01	0.01	0.01
query21	15.42	0.66	0.66
query22	4.06	7.27	2.02
query23	18.22	1.40	1.37
query24	1.65	0.31	0.22
query25	0.15	0.10	0.08
query26	0.25	0.16	0.16
query27	0.08	0.08	0.08
query28	13.30	0.99	0.97
query29	12.59	3.25	3.22
query30	0.26	0.06	0.06
query31	2.87	0.38	0.37
query32	3.29	0.46	0.47
query33	2.87	2.80	2.80
query34	17.05	4.43	4.39
query35	4.46	4.46	4.49
query36	0.65	0.47	0.47
query37	0.20	0.16	0.15
query38	0.16	0.14	0.14
query39	0.04	0.03	0.03
query40	0.17	0.15	0.14
query41	0.10	0.05	0.04
query42	0.06	0.05	0.04
query43	0.04	0.04	0.03
Total cold run time: 108.85 s
Total hot run time: 30.51 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit e5b8cb3e4e8e9924740b555f77abb96765daed19 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.4 seconds inserted 10000000 Rows, about 746K ops/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants