Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](memory) Optimize mem tracker accuracy #32039

Merged
merged 2 commits into from
Mar 29, 2024

Conversation

xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Mar 10, 2024

Proposed changes

  1. not allow memory to be tracked in orphan by default, must be tracked in a specific tracker (query, load, cache, etc.), otherwise DCHECK failed
  2. If value of mem tracker is not equal to 0 at the end of the query, will print warning log, DCHECK is expected after TODO.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@xinyiZzz xinyiZzz marked this pull request as draft March 10, 2024 19:07
@xinyiZzz xinyiZzz force-pushed the 20240305_fix_mem_tracker_dcheck branch from b303117 to d08b3ad Compare March 13, 2024 13:12
@xinyiZzz xinyiZzz marked this pull request as ready for review March 13, 2024 13:13
@xinyiZzz
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/runtime/fragment_mgr.cpp Outdated Show resolved Hide resolved
be/src/runtime/fragment_mgr.cpp Outdated Show resolved Hide resolved
be/src/runtime/thread_context.cpp Show resolved Hide resolved
be/src/runtime/thread_context.h Show resolved Hide resolved
be/src/common/config.cpp Outdated Show resolved Hide resolved
be/src/common/config.cpp Outdated Show resolved Hide resolved
@xinyiZzz xinyiZzz force-pushed the 20240305_fix_mem_tracker_dcheck branch from d08b3ad to ea1c824 Compare March 13, 2024 15:18
@xinyiZzz
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/runtime/thread_context.cpp Show resolved Hide resolved
@xinyiZzz xinyiZzz force-pushed the 20240305_fix_mem_tracker_dcheck branch from 6f1eb15 to c063ba1 Compare March 17, 2024 10:29
@xinyiZzz
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/runtime/group_commit_mgr.cpp Show resolved Hide resolved
be/src/runtime/memory/cache_policy.h Outdated Show resolved Hide resolved
be/src/runtime/memory/cache_policy.h Outdated Show resolved Hide resolved
be/src/runtime/memory/lru_cache_value_base.h Outdated Show resolved Hide resolved
be/src/runtime/thread_context.cpp Show resolved Hide resolved
@xinyiZzz xinyiZzz force-pushed the 20240305_fix_mem_tracker_dcheck branch from c063ba1 to 474d2e6 Compare March 17, 2024 14:53
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/runtime/thread_context.cpp Show resolved Hide resolved
@xinyiZzz xinyiZzz force-pushed the 20240305_fix_mem_tracker_dcheck branch from 474d2e6 to 63c76dc Compare March 17, 2024 15:47
@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.22% (8732/24790)
Line Coverage: 27.04% (71533/264562)
Region Coverage: 26.25% (37046/141134)
Branch Coverage: 23.15% (18940/81812)
Coverage Report: http://coverage.selectdb-in.cc/coverage/19e87b450f680b0f65a1f5ee5fb510a5b857a440_19e87b450f680b0f65a1f5ee5fb510a5b857a440/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38398 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 19e87b450f680b0f65a1f5ee5fb510a5b857a440, data reload: false

------ Round 1 ----------------------------------
q1	17643	4613	4266	4266
q2	2512	171	156	156
q3	11480	1112	1183	1112
q4	10400	801	823	801
q5	7556	3087	3075	3075
q6	207	125	130	125
q7	1061	611	607	607
q8	9498	2037	2042	2037
q9	7387	6802	6694	6694
q10	9435	3461	3578	3461
q11	426	224	215	215
q12	376	213	198	198
q13	17812	2872	2925	2872
q14	223	195	198	195
q15	509	457	456	456
q16	514	369	359	359
q17	958	566	587	566
q18	7447	6569	6377	6377
q19	2154	1444	1460	1444
q20	555	263	262	262
q21	3554	2821	3103	2821
q22	358	299	306	299
Total cold run time: 112065 ms
Total hot run time: 38398 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4164	4092	4066	4066
q2	341	233	227	227
q3	2996	2867	2866	2866
q4	1869	1576	1552	1552
q5	5327	5363	5506	5363
q6	192	116	119	116
q7	2253	1853	1894	1853
q8	3193	3309	3289	3289
q9	8686	8673	8653	8653
q10	3789	3800	3783	3783
q11	545	431	445	431
q12	713	548	533	533
q13	16911	2855	2847	2847
q14	288	250	259	250
q15	493	456	461	456
q16	469	435	417	417
q17	1736	1527	1481	1481
q18	7370	7272	7117	7117
q19	1625	1554	1548	1548
q20	1929	1739	1723	1723
q21	4884	4615	4874	4615
q22	518	432	429	429
Total cold run time: 70291 ms
Total hot run time: 53615 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187317 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 19e87b450f680b0f65a1f5ee5fb510a5b857a440, data reload: false

query1	934	382	356	356
query2	7477	2017	2006	2006
query3	6709	211	216	211
query4	31817	21336	21372	21336
query5	4367	500	408	408
query6	271	183	170	170
query7	4618	289	296	289
query8	231	166	167	166
query9	9628	2380	2369	2369
query10	562	251	262	251
query11	17306	14343	14318	14318
query12	135	93	87	87
query13	1648	435	448	435
query14	12053	11560	11609	11560
query15	298	199	198	198
query16	8145	295	263	263
query17	2066	578	566	566
query18	1877	305	295	295
query19	340	156	154	154
query20	91	88	85	85
query21	205	131	129	129
query22	5009	4796	4740	4740
query23	33374	32954	32748	32748
query24	10743	2923	2790	2790
query25	611	363	369	363
query26	1473	153	155	153
query27	2921	348	355	348
query28	7723	1942	1905	1905
query29	896	621	615	615
query30	301	150	146	146
query31	965	728	729	728
query32	91	58	53	53
query33	780	259	247	247
query34	1045	478	491	478
query35	839	606	603	603
query36	1025	886	884	884
query37	121	77	77	77
query38	3591	3385	3387	3385
query39	1483	1421	1382	1382
query40	214	112	110	110
query41	47	49	45	45
query42	103	92	97	92
query43	496	449	451	449
query44	1210	722	752	722
query45	278	265	263	263
query46	1104	709	699	699
query47	1924	1812	1842	1812
query48	447	359	351	351
query49	1147	340	332	332
query50	757	375	372	372
query51	6911	6730	6790	6730
query52	102	90	93	90
query53	340	275	277	275
query54	313	248	251	248
query55	85	83	80	80
query56	272	230	234	230
query57	1217	1136	1131	1131
query58	248	208	210	208
query59	2832	2664	2560	2560
query60	293	259	259	259
query61	114	116	114	114
query62	676	465	445	445
query63	307	271	279	271
query64	6270	4124	4138	4124
query65	3151	3043	3043	3043
query66	1441	381	371	371
query67	15534	14727	14662	14662
query68	7053	545	535	535
query69	630	382	368	368
query70	1264	1218	1210	1210
query71	526	288	280	280
query72	6559	2826	2570	2570
query73	717	326	336	326
query74	7754	6619	6712	6619
query75	3916	2897	2858	2858
query76	5112	891	873	873
query77	640	261	262	261
query78	10890	10093	10133	10093
query79	10682	525	521	521
query80	1868	385	368	368
query81	542	213	210	210
query82	887	198	193	193
query83	222	147	143	143
query84	291	79	83	79
query85	1517	316	316	316
query86	477	307	277	277
query87	3773	3562	3502	3502
query88	5124	2400	2408	2400
query89	529	363	375	363
query90	2003	174	176	174
query91	171	135	141	135
query92	63	47	47	47
query93	7451	501	483	483
query94	1212	180	174	174
query95	431	326	330	326
query96	599	276	276	276
query97	3099	2865	2900	2865
query98	234	210	202	202
query99	1246	885	890	885
Total cold run time: 319084 ms
Total hot run time: 187317 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 19e87b450f680b0f65a1f5ee5fb510a5b857a440 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.3 seconds inserted 10000000 Rows, about 469K ops/s

@xinyiZzz xinyiZzz force-pushed the 20240305_fix_mem_tracker_dcheck branch from 19e87b4 to 95419b4 Compare March 28, 2024 02:30
@xinyiZzz
Copy link
Contributor Author

run buildall

@xinyiZzz
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38344 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cfd6a72eaba9516cd62df0378bba8f0ae8cbe924, data reload: false

------ Round 1 ----------------------------------
q1	18312	4720	4212	4212
q2	3107	166	178	166
q3	11161	1129	1184	1129
q4	10547	809	803	803
q5	7855	3079	3042	3042
q6	210	125	124	124
q7	1058	610	582	582
q8	9530	2072	2013	2013
q9	7356	6642	6632	6632
q10	8420	3464	3533	3464
q11	428	220	213	213
q12	367	197	193	193
q13	17793	2846	2867	2846
q14	236	215	202	202
q15	519	461	462	461
q16	486	376	378	376
q17	959	647	574	574
q18	7185	6537	6442	6442
q19	2209	1398	1449	1398
q20	556	261	269	261
q21	3531	2910	2985	2910
q22	344	308	301	301
Total cold run time: 112169 ms
Total hot run time: 38344 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4156	4084	4050	4050
q2	329	226	228	226
q3	2972	2889	2820	2820
q4	1848	1545	1551	1545
q5	5305	5388	5340	5340
q6	197	114	116	114
q7	2254	1892	1898	1892
q8	3176	3283	3301	3283
q9	8706	8684	8732	8684
q10	3779	3772	3756	3756
q11	536	443	434	434
q12	725	531	531	531
q13	16910	2886	2819	2819
q14	284	258	269	258
q15	511	466	454	454
q16	489	432	423	423
q17	1744	1504	1454	1454
q18	7451	7228	7163	7163
q19	1629	1525	1530	1525
q20	1909	1731	1727	1727
q21	4840	4676	4674	4674
q22	524	453	460	453
Total cold run time: 70274 ms
Total hot run time: 53625 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181516 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit cfd6a72eaba9516cd62df0378bba8f0ae8cbe924, data reload: false

query1	927	375	356	356
query2	6535	1966	1893	1893
query3	6710	211	210	210
query4	31822	21338	21322	21322
query5	4273	411	402	402
query6	267	184	182	182
query7	4641	295	296	295
query8	236	165	168	165
query9	9243	2292	2276	2276
query10	576	233	248	233
query11	17246	14269	14253	14253
query12	133	100	85	85
query13	1629	423	414	414
query14	10008	7602	8024	7602
query15	266	203	195	195
query16	8240	264	259	259
query17	1995	597	546	546
query18	2126	288	292	288
query19	358	153	154	153
query20	95	88	89	88
query21	207	128	129	128
query22	5122	4841	4824	4824
query23	33682	32564	32867	32564
query24	11659	2913	2866	2866
query25	625	384	396	384
query26	1718	167	164	164
query27	2954	359	356	356
query28	7657	1892	1851	1851
query29	1020	639	631	631
query30	300	161	157	157
query31	962	758	729	729
query32	95	61	60	60
query33	777	253	251	251
query34	1053	497	493	493
query35	848	612	621	612
query36	1015	899	899	899
query37	117	62	66	62
query38	3560	3430	3440	3430
query39	1501	1503	1425	1425
query40	282	108	108	108
query41	53	44	45	44
query42	104	96	96	96
query43	482	455	464	455
query44	1143	749	728	728
query45	278	256	252	252
query46	1092	716	705	705
query47	1956	1879	1862	1862
query48	441	356	349	349
query49	1183	324	330	324
query50	764	364	366	364
query51	6857	6682	6560	6560
query52	102	94	93	93
query53	350	277	273	273
query54	289	239	238	238
query55	86	80	76	76
query56	244	219	215	215
query57	1236	1134	1139	1134
query58	224	201	199	199
query59	2870	2700	2435	2435
query60	257	236	253	236
query61	95	92	111	92
query62	681	448	448	448
query63	301	274	274	274
query64	6384	4032	4036	4032
query65	3086	3058	3026	3026
query66	1106	360	357	357
query67	15643	14802	14801	14801
query68	5551	514	524	514
query69	583	379	376	376
query70	1175	1172	1133	1133
query71	445	273	263	263
query72	6298	2832	2690	2690
query73	708	321	321	321
query74	6851	6480	6489	6480
query75	2973	2205	2188	2188
query76	3474	881	926	881
query77	527	273	264	264
query78	10721	10113	10206	10113
query79	8034	524	522	522
query80	1253	400	390	390
query81	513	221	226	221
query82	513	86	92	86
query83	205	151	148	148
query84	285	86	85	85
query85	1243	318	319	318
query86	369	307	303	303
query87	3764	3518	3549	3518
query88	4815	2321	2305	2305
query89	491	363	363	363
query90	2156	174	173	173
query91	175	134	138	134
query92	61	50	46	46
query93	6499	494	494	494
query94	1330	174	179	174
query95	446	328	325	325
query96	619	273	276	273
query97	2671	2456	2477	2456
query98	239	219	201	201
query99	1171	879	908	879
Total cold run time: 305945 ms
Total hot run time: 181516 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit cfd6a72eaba9516cd62df0378bba8f0ae8cbe924 with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       14.4 seconds inserted 10000000 Rows, about 694K ops/s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.60% (8835/24820)
Line Coverage: 27.33% (72439/265068)
Region Coverage: 26.54% (37523/141384)
Branch Coverage: 23.35% (19131/81940)
Coverage Report: http://coverage.selectdb-in.cc/coverage/cfd6a72eaba9516cd62df0378bba8f0ae8cbe924_cfd6a72eaba9516cd62df0378bba8f0ae8cbe924/report/index.html

@@ -327,6 +327,8 @@ struct IteratorItem {

Status RowIdStorageReader::read_by_rowids(const PMultiGetRequest& request,
PMultiGetResponse* response) {
SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use scope attach task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use attach, I will do it later

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 28, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@wangbo wangbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants