Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[optimize](cooldown)Reduce the number of calls to the pick_cooldown_rowset #27091

Merged
merged 1 commit into from
Dec 28, 2023

Conversation

xingyingone
Copy link
Contributor

…owset

Proposed changes

Issue Number: close #27055

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@xingyingone
Copy link
Contributor Author

run buildall

@xingyingone xingyingone changed the title [optimize](cooldown)Reduce the number of calls to the pick cooldown r… [optimize](cooldown)Reduce the number of calls to the pick_cooldown_rowset Nov 16, 2023
Copy link
Contributor

@alexxing662 alexxing662 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@xingyingone
Copy link
Contributor Author

run buildall

@xingyingone
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@xingyingone
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@xingyingone
Copy link
Contributor Author

run COMPILE

@xingyingone
Copy link
Contributor Author

run COMPILE (DORIS_COMPILE)

@xingyingone
Copy link
Contributor Author

run compile(doris_compile)

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.75% (8410/22883)
Line Coverage: 29.27% (68402/233664)
Region Coverage: 27.87% (35359/126877)
Branch Coverage: 24.62% (18065/73368)
Coverage Report: http://coverage.selectdb-in.cc/coverage/02ed13f97d64cd77994102ee65cdbe030c832908_02ed13f97d64cd77994102ee65cdbe030c832908/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.76 seconds
stream load tsv: 566 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17098446632 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 02ed13f97d64cd77994102ee65cdbe030c832908, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4948	4697	4638	4638
q2	359	156	159	156
q3	2029	1947	1932	1932
q4	1390	1253	1193	1193
q5	4019	3969	4044	3969
q6	255	133	132	132
q7	1425	887	883	883
q8	2756	2791	2782	2782
q9	9752	9698	9707	9698
q10	3465	3499	3524	3499
q11	381	251	249	249
q12	439	295	298	295
q13	4577	3858	3813	3813
q14	319	282	278	278
q15	590	534	529	529
q16	674	585	586	585
q17	1136	949	974	949
q18	7840	7269	7420	7269
q19	1672	1699	1699	1699
q20	519	317	303	303
q21	4386	4001	3985	3985
q22	472	375	379	375
Total cold run time: 53403 ms
Total hot run time: 49211 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4587	4600	4573	4573
q2	351	272	262	262
q3	4037	4017	4030	4017
q4	2723	2694	2689	2689
q5	9692	9741	9667	9667
q6	245	125	126	125
q7	2616	2304	2257	2257
q8	4402	4436	4376	4376
q9	13247	13162	13133	13133
q10	4099	4163	4182	4163
q11	851	677	714	677
q12	969	799	817	799
q13	4314	3581	3578	3578
q14	384	348	343	343
q15	582	510	515	510
q16	733	676	690	676
q17	3861	3920	3859	3859
q18	9650	9094	8939	8939
q19	1811	1767	1773	1767
q20	2373	2061	2045	2045
q21	9034	8705	8695	8695
q22	902	761	804	761
Total cold run time: 81463 ms
Total hot run time: 77911 ms

task.work_function = [tablet, task_size = tablets.size(), this]() {
Status st = tablet->cooldown();
RowsetSharedPtr rowset = rowsets[index++];
task.work_function = [tablet, rowset, task_size = tablets.size(), this]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we could use std::move to decrease two shared ptr copy

@@ -2057,7 +2057,7 @@ Status Tablet::cooldown() {

if (_cooldown_replica_id == replica_id()) {
// this replica is cooldown replica
RETURN_IF_ERROR(_cooldown_data());
RETURN_IF_ERROR(_cooldown_data(rowset));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RETURN_IF_ERROR(_cooldown_data(rowset));
RETURN_IF_ERROR(_cooldown_data(std::move(rowset)));
```suggestion
RETURN_IF_ERROR(_cooldown_data(rowset));

RowsetSharedPtr old_rowset = NULL;

if (rowset) {
RowsetId rowset_id = rowset->rowset_id();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use const auto& ?

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@xingyingone
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.74% (8413/22896)
Line Coverage: 29.27% (68440/233820)
Region Coverage: 27.86% (35374/126977)
Branch Coverage: 24.62% (18075/73420)
Coverage Report: http://coverage.selectdb-in.cc/coverage/517aa4ad0c82c1fc051a231d5f3bca8fd306781a_517aa4ad0c82c1fc051a231d5f3bca8fd306781a/report/index.html

int64_t id = storage_policy_id();
if (id <= 0) {
VLOG_DEBUG << "tablet does not need cooldown, tablet id: " << tablet_id();
return false;
return NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using nullptr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@xingyingone
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@xingyingone
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.73% (8411/22898)
Line Coverage: 29.25% (68413/233889)
Region Coverage: 27.84% (35366/127055)
Branch Coverage: 24.58% (18060/73470)
Coverage Report: http://coverage.selectdb-in.cc/coverage/42ed4c5b31ecbe8e765bf889cbd04bb83e31d39c_42ed4c5b31ecbe8e765bf889cbd04bb83e31d39c/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 42ed4c5b31ecbe8e765bf889cbd04bb83e31d39c, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4959	4720	4660	4660
q2	359	155	158	155
q3	2060	2007	1952	1952
q4	1392	1208	1176	1176
q5	4010	3999	4011	3999
q6	249	131	135	131
q7	1397	867	877	867
q8	2772	2809	2789	2789
q9	9780	9831	9692	9692
q10	3503	3548	3539	3539
q11	369	237	240	237
q12	439	285	291	285
q13	4609	3810	3840	3810
q14	315	294	283	283
q15	588	528	515	515
q16	659	592	575	575
q17	1142	966	967	966
q18	7887	7436	7512	7436
q19	1684	1676	1681	1676
q20	539	324	327	324
q21	4387	3972	3996	3972
q22	471	389	370	370
Total cold run time: 53570 ms
Total hot run time: 49409 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4587	4571	4602	4571
q2	353	241	232	232
q3	4028	4015	4009	4009
q4	2715	2730	2708	2708
q5	9650	9635	9686	9635
q6	243	118	125	118
q7	2588	2285	2314	2285
q8	4481	4448	4462	4448
q9	13248	13206	13149	13149
q10	4090	4180	4191	4180
q11	780	694	635	635
q12	980	812	801	801
q13	4315	3631	3562	3562
q14	374	359	364	359
q15	576	523	510	510
q16	737	673	661	661
q17	3867	3881	3855	3855
q18	9559	9143	9033	9033
q19	1833	1807	1795	1795
q20	2421	2057	2041	2041
q21	8878	8564	8781	8564
q22	914	808	791	791
Total cold run time: 81217 ms
Total hot run time: 77942 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.09 seconds
stream load tsv: 568 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 29.8 seconds inserted 10000000 Rows, about 335K ops/s
storage size: 17098657228 Bytes

@ByteYue
Copy link
Contributor

ByteYue commented Nov 18, 2023

LGTM

@xingyingone
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 42ed4c5b31ecbe8e765bf889cbd04bb83e31d39c, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4915	4665	4686	4665
q2	380	150	172	150
q3	2037	1926	1917	1917
q4	1370	1230	1221	1221
q5	3944	3944	4019	3944
q6	248	130	133	130
q7	1432	884	885	884
q8	2750	2778	2751	2751
q9	12274	12816	9597	9597
q10	10269	3535	3537	3535
q11	404	258	249	249
q12	449	302	303	302
q13	4574	3837	3805	3805
q14	315	289	291	289
q15	598	538	526	526
q16	675	587	589	587
q17	1137	987	946	946
q18	7839	7252	7345	7252
q19	1699	1675	1655	1655
q20	568	320	317	317
q21	4351	3955	4021	3955
q22	468	371	368	368
Total cold run time: 62696 ms
Total hot run time: 49045 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4589	4574	4594	4574
q2	338	224	242	224
q3	4023	4015	4021	4015
q4	2710	2687	2682	2682
q5	9727	9687	9697	9687
q6	239	123	127	123
q7	2997	2463	2488	2463
q8	4435	4495	4469	4469
q9	13213	13069	13197	13069
q10	4108	4221	4175	4175
q11	764	671	656	656
q12	975	801	816	801
q13	4331	3612	3585	3585
q14	383	360	352	352
q15	585	514	522	514
q16	740	664	680	664
q17	3848	3920	3865	3865
q18	9473	9036	8795	8795
q19	1838	1735	1771	1735
q20	2381	2066	2071	2066
q21	8735	8560	8708	8560
q22	936	859	819	819
Total cold run time: 81368 ms
Total hot run time: 77893 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.52% (8446/23125)
Line Coverage: 28.86% (68656/237926)
Region Coverage: 27.82% (35503/127629)
Branch Coverage: 24.56% (18096/73682)
Coverage Report: http://coverage.selectdb-in.cc/coverage/42ed4c5b31ecbe8e765bf889cbd04bb83e31d39c_42ed4c5b31ecbe8e765bf889cbd04bb83e31d39c/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.86 seconds
stream load tsv: 579 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17099607428 Bytes

@xingyingone
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.53% (8447/23125)
Line Coverage: 28.86% (68687/237961)
Region Coverage: 27.82% (35518/127648)
Branch Coverage: 24.58% (18117/73696)
Coverage Report: http://coverage.selectdb-in.cc/coverage/42ed4c5b31ecbe8e765bf889cbd04bb83e31d39c_42ed4c5b31ecbe8e765bf889cbd04bb83e31d39c/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 42ed4c5b31ecbe8e765bf889cbd04bb83e31d39c, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4926	4688	4676	4676
q2	370	147	156	147
q3	2036	1894	1891	1891
q4	1382	1269	1237	1237
q5	3974	3984	3971	3971
q6	246	135	132	132
q7	1401	878	892	878
q8	2736	2783	2752	2752
q9	53259	10513	9546	9546
q10	10274	3524	3533	3524
q11	377	246	248	246
q12	1167	297	293	293
q13	4588	3819	3816	3816
q14	317	279	297	279
q15	589	533	528	528
q16	674	591	583	583
q17	1155	994	923	923
q18	7737	7361	7254	7254
q19	1725	1679	1667	1667
q20	567	298	307	298
q21	6892	4002	3971	3971
q22	478	376	378	376
Total cold run time: 106870 ms
Total hot run time: 48988 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4625	4565	4561	4561
q2	337	251	267	251
q3	4026	3992	3977	3977
q4	2698	2706	2700	2700
q5	9684	9668	9547	9547
q6	249	127	125	125
q7	2998	2539	2517	2517
q8	4449	4411	4426	4411
q9	13212	13111	13180	13111
q10	4095	4190	4205	4190
q11	765	686	670	670
q12	987	831	812	812
q13	4312	3590	3556	3556
q14	380	356	349	349
q15	587	520	535	520
q16	743	689	710	689
q17	3954	3871	3923	3871
q18	9594	9089	9107	9089
q19	1822	1760	1772	1760
q20	2382	2067	2070	2067
q21	8733	8745	8580	8580
q22	914	827	782	782
Total cold run time: 81546 ms
Total hot run time: 78135 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.07 seconds
stream load tsv: 568 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.4 seconds inserted 10000000 Rows, about 352K ops/s
storage size: 17099502313 Bytes

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit fd90c3a into apache:master Dec 28, 2023
27 of 28 checks passed
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 28, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

hello-stephen pushed a commit to hello-stephen/doris that referenced this pull request Dec 28, 2023
…owset (apache#27091)

Co-authored-by: xingying01 <xingying01@corp.netease.com>
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
…owset (apache#27091)

Co-authored-by: xingying01 <xingying01@corp.netease.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] optimize for cooldown
5 participants