Skip to content

[enhancement] Optimize the retry policy of backend request meta service #50957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 16, 2025

Conversation

luwei16
Copy link
Contributor

@luwei16 luwei16 commented May 15, 2025

  1. Distinguish between request timeouts, connection timeouts, and business errors (transaction conflicts), and apply different retry configurations for each.
  2. Reduce the number of retry attempts.

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@luwei16
Copy link
Contributor Author

luwei16 commented May 15, 2025

run buildall

1 similar comment
@luwei16
Copy link
Contributor Author

luwei16 commented May 15, 2025

run buildall

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels May 15, 2025
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 33681 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6e5804b7dcf8ff963f61dc26c1ba2d49e3d7d926, data reload: false

------ Round 1 ----------------------------------
q1	26120	5050	5223	5050
q2	2082	278	192	192
q3	10390	1261	689	689
q4	10223	1008	528	528
q5	7521	2380	2346	2346
q6	181	163	132	132
q7	917	738	622	622
q8	9310	1268	1074	1074
q9	6899	5047	5150	5047
q10	6895	2322	1897	1897
q11	523	301	268	268
q12	348	352	214	214
q13	17772	3640	3065	3065
q14	227	242	206	206
q15	544	479	489	479
q16	406	432	368	368
q17	609	864	357	357
q18	7570	7179	6995	6995
q19	1214	937	567	567
q20	326	334	225	225
q21	3922	3169	2397	2397
q22	1062	1019	963	963
Total cold run time: 115061 ms
Total hot run time: 33681 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5053	5050	5022	5022
q2	238	327	225	225
q3	2137	2677	2249	2249
q4	1352	1759	1347	1347
q5	4454	4390	4372	4372
q6	221	172	125	125
q7	2038	1917	1793	1793
q8	2606	2673	2564	2564
q9	7330	7277	7043	7043
q10	3052	3143	2782	2782
q11	591	522	491	491
q12	662	788	618	618
q13	3539	3951	3258	3258
q14	280	307	290	290
q15	543	520	503	503
q16	463	482	436	436
q17	1143	1566	1373	1373
q18	7609	7509	7356	7356
q19	815	854	985	854
q20	2015	2044	1905	1905
q21	4898	4529	4477	4477
q22	1112	1075	1015	1015
Total cold run time: 52151 ms
Total hot run time: 50098 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193632 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6e5804b7dcf8ff963f61dc26c1ba2d49e3d7d926, data reload: false

query1	1403	1103	1034	1034
query2	6239	1897	1891	1891
query3	11011	4639	4490	4490
query4	55100	24764	23325	23325
query5	5001	505	462	462
query6	335	224	195	195
query7	4876	506	297	297
query8	309	261	269	261
query9	5266	2629	2622	2622
query10	462	338	265	265
query11	14953	15035	15128	15035
query12	171	108	112	108
query13	1030	515	407	407
query14	10120	6448	6406	6406
query15	213	202	174	174
query16	7119	659	507	507
query17	1091	746	597	597
query18	1547	441	319	319
query19	200	199	172	172
query20	128	129	124	124
query21	222	132	112	112
query22	4316	4454	4442	4442
query23	34347	33597	33658	33597
query24	6590	2462	2439	2439
query25	470	488	402	402
query26	701	281	152	152
query27	2194	514	349	349
query28	3108	2155	2147	2147
query29	606	569	437	437
query30	274	220	196	196
query31	870	892	799	799
query32	78	68	60	60
query33	462	376	333	333
query34	782	876	552	552
query35	782	814	752	752
query36	963	1044	891	891
query37	112	98	73	73
query38	4311	4290	4223	4223
query39	1536	1501	1449	1449
query40	204	123	114	114
query41	59	56	50	50
query42	131	115	107	107
query43	505	531	506	506
query44	1309	823	828	823
query45	184	186	185	185
query46	852	1136	655	655
query47	1851	1870	1810	1810
query48	393	431	359	359
query49	670	514	433	433
query50	690	701	422	422
query51	4248	4236	4217	4217
query52	111	109	100	100
query53	235	260	184	184
query54	589	570	525	525
query55	86	87	85	85
query56	296	330	305	305
query57	1183	1180	1158	1158
query58	275	263	280	263
query59	2738	2825	2708	2708
query60	333	340	326	326
query61	128	121	124	121
query62	739	737	643	643
query63	223	183	184	183
query64	1462	1010	698	698
query65	4289	4233	4217	4217
query66	705	402	307	307
query67	16221	15768	15248	15248
query68	8099	872	519	519
query69	561	303	268	268
query70	1181	1091	1106	1091
query71	493	316	314	314
query72	5423	4891	4884	4884
query73	1247	655	350	350
query74	9277	9117	8975	8975
query75	3802	3213	2698	2698
query76	4231	1199	752	752
query77	631	372	290	290
query78	10207	10290	9398	9398
query79	2508	866	593	593
query80	620	511	452	452
query81	505	249	227	227
query82	410	124	94	94
query83	382	271	227	227
query84	298	110	94	94
query85	801	356	310	310
query86	395	299	302	299
query87	4421	4504	4380	4380
query88	3397	2315	2276	2276
query89	415	328	288	288
query90	1833	211	215	211
query91	209	143	118	118
query92	70	59	55	55
query93	1899	963	593	593
query94	673	403	309	309
query95	403	296	285	285
query96	497	572	283	283
query97	2728	2793	2647	2647
query98	262	215	215	215
query99	1447	1395	1289	1289
Total cold run time: 299332 ms
Total hot run time: 193632 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.9 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6e5804b7dcf8ff963f61dc26c1ba2d49e3d7d926, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.10	0.11
query3	0.26	0.20	0.19
query4	1.59	0.18	0.19
query5	0.47	0.45	0.45
query6	1.16	0.70	0.69
query7	0.02	0.02	0.02
query8	0.03	0.03	0.04
query9	0.61	0.53	0.53
query10	0.58	0.57	0.58
query11	0.16	0.11	0.11
query12	0.15	0.12	0.12
query13	0.63	0.60	0.61
query14	0.81	0.82	0.82
query15	0.90	0.89	0.88
query16	0.39	0.39	0.40
query17	1.04	1.02	1.05
query18	0.22	0.21	0.21
query19	1.92	1.86	1.85
query20	0.01	0.01	0.01
query21	15.39	0.86	0.55
query22	0.78	1.18	0.61
query23	15.01	1.36	0.65
query24	7.33	1.35	0.29
query25	0.30	0.21	0.07
query26	0.60	0.16	0.13
query27	0.05	0.05	0.05
query28	9.72	0.98	0.45
query29	12.58	4.03	3.40
query30	0.25	0.09	0.07
query31	2.83	0.61	0.40
query32	3.23	0.59	0.49
query33	3.12	3.10	3.14
query34	15.79	5.20	4.52
query35	4.59	4.58	4.52
query36	0.67	0.52	0.49
query37	0.08	0.06	0.07
query38	0.06	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.15	0.12
query41	0.07	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 103.82 s
Total hot run time: 28.9 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 55.83% (14874/26640)
Line Coverage 44.63% (131807/295309)
Region Coverage 43.68% (66260/151679)
Branch Coverage 38.30% (33958/88664)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.56% (20867/26227)
Line Coverage 72.80% (214985/295320)
Region Coverage 70.94% (126425/178221)
Branch Coverage 64.65% (65485/101286)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 5ee113f into apache:master May 16, 2025
25 of 31 checks passed
github-actions bot pushed a commit that referenced this pull request May 16, 2025
…ce (#50957)

1. Distinguish between request timeouts, connection timeouts, and
business errors (transaction conflicts), and apply different retry
configurations for each.
2. Reduce the number of retry attempts.
dataroaring pushed a commit that referenced this pull request May 17, 2025
…t meta service #50957 (#50987)

Cherry-picked from #50957

Co-authored-by: Luwei <luwei@selectdb.com>
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…ce (apache#50957)

1. Distinguish between request timeouts, connection timeouts, and
business errors (transaction conflicts), and apply different retry
configurations for each.
2. Reduce the number of retry attempts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.6-merged p0_b reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants