Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](routine-load) enhance auto resume to keep routine load stable #32239

Merged
merged 1 commit into from Mar 18, 2024

Conversation

HHoflittlefish777
Copy link
Contributor

@HHoflittlefish777 HHoflittlefish777 commented Mar 14, 2024

Proposed changes

Auto resume can solve some unstable factors and keep the routine load stable.

some unstable factors:

  1. be restart or shutdown upgrade
  2. RPC timeout
  3. commit failed for unforeseen error.

Auto resume in these situations can avoid users manually resume job, improving the user experience.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@HHoflittlefish777
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38480 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5b985eb4d2d409f25c6ee9136dc84dbcf8a65cbd, data reload: false

------ Round 1 ----------------------------------
q1	17737	4262	4157	4157
q2	2021	157	155	155
q3	10660	1127	907	907
q4	7777	744	730	730
q5	7491	2755	2719	2719
q6	186	126	127	126
q7	1148	828	815	815
q8	9414	2018	1990	1990
q9	7097	6437	6350	6350
q10	8514	3522	3641	3522
q11	438	221	217	217
q12	627	313	300	300
q13	17808	2855	2845	2845
q14	275	243	256	243
q15	493	458	449	449
q16	508	390	384	384
q17	955	538	604	538
q18	7324	6550	6479	6479
q19	3828	1395	1431	1395
q20	544	285	276	276
q21	6240	3571	3607	3571
q22	346	312	331	312
Total cold run time: 111431 ms
Total hot run time: 38480 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4166	4134	4089	4089
q2	325	224	223	223
q3	2972	2796	2812	2796
q4	1849	1534	1583	1534
q5	5219	5217	5247	5217
q6	192	123	116	116
q7	2190	1865	1865	1865
q8	3183	3298	3258	3258
q9	8620	8594	8564	8564
q10	3791	3778	3732	3732
q11	545	465	464	464
q12	749	562	552	552
q13	16921	2837	2863	2837
q14	269	245	255	245
q15	486	446	449	446
q16	448	418	406	406
q17	1747	1489	1445	1445
q18	7600	7146	7101	7101
q19	1591	1543	1528	1528
q20	1897	1684	1715	1684
q21	4940	4712	4790	4712
q22	530	468	456	456
Total cold run time: 70230 ms
Total hot run time: 53270 ms

@HHoflittlefish777 HHoflittlefish777 changed the title [opt](routine-load) support auto resume when get partition failed [opt](routine-load) enhance auto resume to keep routine load stable Mar 15, 2024
@HHoflittlefish777
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38222 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4db900540fdd2910815d11d1eb2d49b6cb1ddebd, data reload: false

------ Round 1 ----------------------------------
q1	17663	4340	4138	4138
q2	2025	156	148	148
q3	10682	1062	899	899
q4	7439	736	705	705
q5	7473	2694	2567	2567
q6	187	121	124	121
q7	1187	815	813	813
q8	9336	2024	2011	2011
q9	7131	6415	6417	6415
q10	8446	3524	3643	3524
q11	426	226	221	221
q12	629	289	286	286
q13	17777	2811	2838	2811
q14	281	252	243	243
q15	496	454	466	454
q16	496	392	390	390
q17	961	564	549	549
q18	7245	6535	6392	6392
q19	1872	1495	1406	1406
q20	562	294	275	275
q21	6257	3610	3554	3554
q22	360	300	314	300
Total cold run time: 108931 ms
Total hot run time: 38222 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4470	4121	4086	4086
q2	324	230	224	224
q3	2985	2811	2827	2811
q4	1841	1538	1516	1516
q5	5223	5245	5248	5245
q6	192	115	117	115
q7	2250	1847	1859	1847
q8	3151	3317	3297	3297
q9	8538	8570	8584	8570
q10	3683	3764	3674	3674
q11	524	466	450	450
q12	727	543	548	543
q13	16872	2850	2857	2850
q14	287	248	257	248
q15	477	435	438	435
q16	456	419	418	418
q17	1751	1466	1473	1466
q18	7553	7159	7213	7159
q19	1581	1576	1563	1563
q20	1885	1717	1741	1717
q21	4787	4684	4613	4613
q22	504	453	459	453
Total cold run time: 70061 ms
Total hot run time: 53300 ms

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 16, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@feifeifeimoon feifeifeimoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants