Skip to content

[Refactor]Refactor workload group metric#46640

Merged
wangbo merged 1 commit intoapache:masterfrom
wangbo:0108_refactor_metric
Jan 13, 2025
Merged

[Refactor]Refactor workload group metric#46640
wangbo merged 1 commit intoapache:masterfrom
wangbo:0108_refactor_metric

Conversation

@wangbo
Copy link
Contributor

@wangbo wangbo commented Jan 8, 2025

What problem does this PR solve?

  1. refactor workload group metric code for robustness.
  2. use a new io metric format, using path and id as a dimension instead of a path a per metric.

before:

workload_group_local_scan_bytes_spill_data_dir_0{workload_group="normal"} 17442781285
workload_group_local_scan_bytes_local_data_dir_0{workload_group="normal"} 17442781285

after

doris_be_workload_group_mem_used_bytes{id="1",workload_group="normal"} 7931776
doris_be_workload_group_total_local_scan_bytes{id="1",workload_group="normal"} 7603881651
doris_be_workload_group_cpu_time_sec{id="1",workload_group="normal"} 121
doris_be_workload_group_remote_scan_bytes{id="1",workload_group="normal"} 0
doris_be_workload_group_local_scan_bytes{id="1",path="local_data_dir_1",workload_group="normal"} 7605062168
doris_be_workload_group_local_scan_bytes{id="1",path="spill_data_dir_1",workload_group="normal"} 7605063263

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 8, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@@ -36,35 +43,23 @@ WorkloadGroupMetrics::WorkloadGroupMetrics(WorkloadGroup* wg) {
_entity = DorisMetrics::instance()->metric_registry()->register_entity(
"workload_group." + std::to_string(wg->id()), {{"workload_group", wg->name()}});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于重复创建的同名的wg,这里的 wg->id() 是否会一样?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id每次是自增额度,不会重复

std::shared_ptr<MetricEntity> io_entity =
DorisMetrics::instance()->metric_registry()->register_entity(
data_dir_metric_name,
{{"workload_group", wg->name()}, {"path", data_dir.metric_name}});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于重复创建的同名的 wg,这里是否会生成具有相同 name 以及 label 的entity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

维度肯定会重复啊,名字都一样,但是注册entity的data_dir_metric_name不会,因为data_dir_metric_name离包含了workload group的id的

std::atomic<uint64_t> _memory_used {0};

std::shared_ptr<MetricEntity> _entity {nullptr};
std::vector<std::shared_ptr<MetricEntity>> _io_entity_list;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shared_ptr? may be unique_ptr is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register_entity returns shared_ptr

@wangbo
Copy link
Contributor Author

wangbo commented Jan 9, 2025

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.86% (10128/26061)
Line Coverage: 29.92% (85711/286467)
Region Coverage: 29.01% (43725/150712)
Branch Coverage: 25.55% (22319/87342)
Coverage Report: http://coverage.selectdb-in.cc/coverage/273e822e18f8b5dc4acaed6600c2a5a89944fce1_273e822e18f8b5dc4acaed6600c2a5a89944fce1/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 32886 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 273e822e18f8b5dc4acaed6600c2a5a89944fce1, data reload: false

------ Round 1 ----------------------------------
q1	17586	6165	6051	6051
q2	2060	299	172	172
q3	10420	1230	763	763
q4	10205	882	443	443
q5	7497	2181	2004	2004
q6	199	179	147	147
q7	903	739	621	621
q8	9223	1370	1162	1162
q9	5247	4996	4954	4954
q10	6764	2263	1881	1881
q11	487	276	266	266
q12	353	373	225	225
q13	17791	3631	3101	3101
q14	241	233	207	207
q15	563	518	508	508
q16	621	618	582	582
q17	561	843	330	330
q18	6836	6454	6469	6454
q19	2035	963	563	563
q20	305	319	185	185
q21	2774	2171	1962	1962
q22	368	337	305	305
Total cold run time: 103039 ms
Total hot run time: 32886 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6308	6244	6251	6244
q2	230	326	235	235
q3	2270	2611	2306	2306
q4	1420	1850	1365	1365
q5	4339	4727	4802	4727
q6	194	174	141	141
q7	2078	1967	1812	1812
q8	2625	2855	2704	2704
q9	7279	7148	7250	7148
q10	3067	3401	2783	2783
q11	588	500	513	500
q12	701	780	619	619
q13	3503	3867	3241	3241
q14	276	324	264	264
q15	569	520	510	510
q16	683	704	636	636
q17	1194	1714	1255	1255
q18	7723	7441	7354	7354
q19	871	1167	1092	1092
q20	1982	2069	1910	1910
q21	5706	5056	4810	4810
q22	624	602	589	589
Total cold run time: 54230 ms
Total hot run time: 52245 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 194693 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 273e822e18f8b5dc4acaed6600c2a5a89944fce1, data reload: false

query1	1302	935	917	917
query2	6407	2422	2422	2422
query3	11046	4666	4771	4666
query4	32945	23465	23105	23105
query5	3698	625	469	469
query6	282	198	196	196
query7	3985	483	302	302
query8	303	255	234	234
query9	9492	2645	2643	2643
query10	454	293	243	243
query11	17921	15356	15180	15180
query12	163	114	104	104
query13	1556	536	387	387
query14	10451	7684	7046	7046
query15	227	201	174	174
query16	8482	589	455	455
query17	1603	778	587	587
query18	1994	404	315	315
query19	217	190	158	158
query20	123	112	108	108
query21	205	122	101	101
query22	4549	4507	4523	4507
query23	34037	33282	34005	33282
query24	6559	2313	2343	2313
query25	509	487	413	413
query26	763	277	160	160
query27	2081	460	332	332
query28	5708	2504	2476	2476
query29	594	591	453	453
query30	213	184	154	154
query31	949	909	829	829
query32	75	65	59	59
query33	496	408	292	292
query34	754	829	522	522
query35	797	829	763	763
query36	1015	1044	968	968
query37	116	135	81	81
query38	4244	4108	3989	3989
query39	1476	1443	1451	1443
query40	207	122	108	108
query41	46	45	42	42
query42	127	108	103	103
query43	518	539	510	510
query44	1341	815	814	814
query45	179	180	181	180
query46	882	1045	643	643
query47	1866	1940	1859	1859
query48	403	407	321	321
query49	711	463	401	401
query50	641	692	402	402
query51	7083	6890	6870	6870
query52	103	102	93	93
query53	228	255	185	185
query54	486	495	421	421
query55	78	80	78	78
query56	248	261	253	253
query57	1185	1200	1189	1189
query58	242	245	235	235
query59	3149	3310	3124	3124
query60	276	265	261	261
query61	116	133	108	108
query62	875	773	750	750
query63	232	199	201	199
query64	3068	1029	659	659
query65	3304	3241	3218	3218
query66	737	406	312	312
query67	16361	15657	15316	15316
query68	8810	692	510	510
query69	488	290	253	253
query70	1198	1148	1042	1042
query71	432	278	245	245
query72	6549	3886	3901	3886
query73	645	746	355	355
query74	9841	8852	8976	8852
query75	4498	3168	2648	2648
query76	4677	1157	754	754
query77	850	362	268	268
query78	9961	9875	9475	9475
query79	5203	789	579	579
query80	730	511	428	428
query81	467	269	222	222
query82	232	149	122	122
query83	191	165	150	150
query84	286	90	72	72
query85	746	365	307	307
query86	354	331	300	300
query87	4642	4270	4233	4233
query88	3737	2165	2140	2140
query89	437	323	284	284
query90	2104	185	183	183
query91	138	135	105	105
query92	66	55	52	52
query93	3054	858	522	522
query94	688	406	297	297
query95	327	263	246	246
query96	482	607	278	278
query97	2875	2931	2794	2794
query98	213	208	201	201
query99	1610	1528	1388	1388
Total cold run time: 299014 ms
Total hot run time: 194693 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.31 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 273e822e18f8b5dc4acaed6600c2a5a89944fce1, data reload: false

query1	0.04	0.05	0.03
query2	0.07	0.04	0.03
query3	0.23	0.08	0.06
query4	1.61	0.11	0.11
query5	0.43	0.43	0.40
query6	1.15	0.66	0.65
query7	0.02	0.02	0.02
query8	0.05	0.04	0.03
query9	0.60	0.52	0.49
query10	0.57	0.56	0.55
query11	0.15	0.10	0.11
query12	0.15	0.11	0.11
query13	0.62	0.60	0.60
query14	2.75	2.73	2.88
query15	0.90	0.82	0.82
query16	0.39	0.38	0.39
query17	1.01	1.06	1.01
query18	0.23	0.20	0.20
query19	1.94	1.78	1.99
query20	0.02	0.00	0.02
query21	15.38	0.88	0.59
query22	0.75	0.76	0.55
query23	15.42	1.44	0.53
query24	3.30	1.17	1.70
query25	0.26	0.07	0.18
query26	0.18	0.14	0.14
query27	0.05	0.06	0.05
query28	14.19	1.47	1.05
query29	12.59	3.96	3.22
query30	0.25	0.10	0.06
query31	2.81	0.59	0.40
query32	3.25	0.55	0.46
query33	3.13	3.27	3.11
query34	16.77	5.12	4.51
query35	4.55	4.50	4.50
query36	0.64	0.48	0.48
query37	0.10	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.18	0.13	0.14
query41	0.08	0.02	0.02
query42	0.03	0.03	0.02
query43	0.03	0.04	0.03
Total cold run time: 106.95 s
Total hot run time: 31.31 s

@wangbo wangbo force-pushed the 0108_refactor_metric branch from 273e822 to 36b4a3a Compare January 9, 2025 07:34
@wangbo
Copy link
Contributor Author

wangbo commented Jan 9, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32216 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 36b4a3a80669e67e937bdb57714731f845eaa002, data reload: false

------ Round 1 ----------------------------------
q1	17593	6097	6060	6060
q2	2041	302	185	185
q3	10406	1258	706	706
q4	10215	851	426	426
q5	7532	2148	1922	1922
q6	205	183	148	148
q7	885	742	611	611
q8	9248	1333	1111	1111
q9	5215	4960	4843	4843
q10	6738	2301	1863	1863
q11	476	286	266	266
q12	336	357	215	215
q13	17773	3704	2938	2938
q14	239	253	210	210
q15	551	515	482	482
q16	626	612	583	583
q17	540	840	322	322
q18	6995	6444	6340	6340
q19	1216	942	534	534
q20	299	310	195	195
q21	2802	2115	1954	1954
q22	357	327	302	302
Total cold run time: 102288 ms
Total hot run time: 32216 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6142	6189	6213	6189
q2	237	326	241	241
q3	2227	2619	2342	2342
q4	1445	1799	1340	1340
q5	4331	4735	4629	4629
q6	180	175	145	145
q7	2070	1927	1826	1826
q8	2613	2807	2677	2677
q9	7202	7279	7147	7147
q10	3022	3337	2847	2847
q11	599	513	505	505
q12	644	782	607	607
q13	3363	3787	3184	3184
q14	291	317	291	291
q15	577	503	505	503
q16	663	687	641	641
q17	1227	1702	1256	1256
q18	7645	7533	7110	7110
q19	733	1118	1006	1006
q20	1922	1954	1820	1820
q21	5437	4976	4786	4786
q22	602	598	575	575
Total cold run time: 53172 ms
Total hot run time: 51667 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187492 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 36b4a3a80669e67e937bdb57714731f845eaa002, data reload: false

query1	969	388	384	384
query2	6525	2270	2351	2270
query3	6719	223	215	215
query4	33628	23671	23592	23592
query5	4367	612	452	452
query6	294	191	183	183
query7	4624	499	305	305
query8	305	240	231	231
query9	9673	2713	2699	2699
query10	459	314	248	248
query11	18076	15017	15135	15017
query12	160	114	108	108
query13	1644	511	397	397
query14	10150	6562	7569	6562
query15	263	189	190	189
query16	8834	610	458	458
query17	1648	756	570	570
query18	2119	414	302	302
query19	246	176	152	152
query20	116	106	108	106
query21	207	129	103	103
query22	4166	4262	3966	3966
query23	34033	33084	32867	32867
query24	6361	2243	2268	2243
query25	500	438	370	370
query26	1180	267	148	148
query27	2067	465	338	338
query28	4926	2433	2400	2400
query29	744	526	426	426
query30	231	178	152	152
query31	953	892	800	800
query32	98	62	57	57
query33	508	330	291	291
query34	754	836	510	510
query35	814	815	715	715
query36	987	1046	914	914
query37	132	97	82	82
query38	4222	3971	4015	3971
query39	1471	1429	1407	1407
query40	198	109	97	97
query41	50	47	49	47
query42	121	102	101	101
query43	503	522	490	490
query44	1261	821	817	817
query45	174	169	167	167
query46	849	1045	637	637
query47	1816	1841	1772	1772
query48	378	393	321	321
query49	780	462	388	388
query50	618	658	390	390
query51	6872	6872	6761	6761
query52	103	98	90	90
query53	215	243	180	180
query54	472	481	425	425
query55	82	80	79	79
query56	241	245	231	231
query57	1141	1122	1101	1101
query58	231	220	234	220
query59	3044	3160	3023	3023
query60	288	262	250	250
query61	108	106	109	106
query62	825	737	728	728
query63	224	190	187	187
query64	4289	1008	618	618
query65	3271	3155	3191	3155
query66	1068	409	304	304
query67	15782	15690	15302	15302
query68	9004	695	519	519
query69	465	285	247	247
query70	1207	1161	1147	1147
query71	431	277	251	251
query72	6246	3844	3821	3821
query73	662	751	357	357
query74	10250	9038	9025	9025
query75	4426	3132	2628	2628
query76	3886	1171	767	767
query77	786	371	272	272
query78	10011	10091	9289	9289
query79	3101	810	599	599
query80	573	512	435	435
query81	469	260	228	228
query82	632	147	116	116
query83	160	174	146	146
query84	237	95	68	68
query85	787	339	302	302
query86	430	322	291	291
query87	4447	4480	4290	4290
query88	4538	2183	2155	2155
query89	397	330	292	292
query90	1909	240	186	186
query91	128	132	107	107
query92	66	55	50	50
query93	1400	880	537	537
query94	660	386	283	283
query95	342	262	253	253
query96	481	604	286	286
query97	2871	2944	2808	2808
query98	227	203	192	192
query99	1433	1493	1358	1358
Total cold run time: 292839 ms
Total hot run time: 187492 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.53 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 36b4a3a80669e67e937bdb57714731f845eaa002, data reload: false

query1	0.03	0.02	0.03
query2	0.06	0.04	0.03
query3	0.23	0.07	0.06
query4	1.62	0.10	0.10
query5	0.42	0.39	0.39
query6	1.18	0.64	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.59	0.51	0.50
query10	0.55	0.57	0.55
query11	0.14	0.10	0.10
query12	0.14	0.10	0.10
query13	0.61	0.60	0.59
query14	2.84	2.83	2.71
query15	0.89	0.82	0.82
query16	0.36	0.36	0.39
query17	1.08	1.00	1.06
query18	0.23	0.19	0.21
query19	1.86	1.78	1.93
query20	0.01	0.01	0.02
query21	15.36	0.89	0.57
query22	0.75	0.74	0.71
query23	15.28	1.43	0.53
query24	2.98	1.18	1.64
query25	0.13	0.20	0.19
query26	0.30	0.15	0.13
query27	0.07	0.05	0.04
query28	14.03	1.51	1.04
query29	12.57	4.03	3.35
query30	0.25	0.09	0.06
query31	2.83	0.59	0.39
query32	3.23	0.54	0.45
query33	3.06	3.09	3.20
query34	16.65	5.07	4.53
query35	4.55	4.49	4.52
query36	0.63	0.49	0.48
query37	0.09	0.06	0.06
query38	0.04	0.03	0.04
query39	0.04	0.02	0.03
query40	0.16	0.13	0.13
query41	0.08	0.03	0.02
query42	0.03	0.02	0.01
query43	0.03	0.03	0.03
Total cold run time: 106.04 s
Total hot run time: 31.53 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.88% (10130/26057)
Line Coverage: 29.93% (85708/286346)
Region Coverage: 29.03% (43733/150656)
Branch Coverage: 25.57% (22320/87298)
Coverage Report: http://coverage.selectdb-in.cc/coverage/36b4a3a80669e67e937bdb57714731f845eaa002_36b4a3a80669e67e937bdb57714731f845eaa002/report/index.html

Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2025

PR approved by anyone and no changes requested.

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 13, 2025
@wangbo wangbo added dev/2.1.x dev/3.1.x usercase Important user case type label labels Jan 13, 2025
@wangbo wangbo merged commit 7a0dd23 into apache:master Jan 13, 2025
7 checks passed
wangbo pushed a commit to wangbo/incubator-doris that referenced this pull request Jan 13, 2025
wangbo pushed a commit to wangbo/incubator-doris that referenced this pull request Jan 13, 2025
wangbo pushed a commit to wangbo/incubator-doris that referenced this pull request Jan 13, 2025
wangbo pushed a commit that referenced this pull request Jan 13, 2025
wangbo pushed a commit to wangbo/incubator-doris that referenced this pull request Jan 13, 2025
wangbo pushed a commit to wangbo/incubator-doris that referenced this pull request Jan 13, 2025
wangbo pushed a commit that referenced this pull request Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants