Skip to content

[Feature](Compaction)Improve Compaction Profiling and Logging #50950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Yukang-Lian
Copy link
Collaborator

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #42227

Problem Summary:

Issue Number: close #41753
Add detailed parameters to compaction status:

add

last cumulative failure time
last base failure time
last full failure time
last cumulative success time
last base success time
last full success time
last cumulative schedule time
last base schedule time
last full schedule time
last cumulative status
last base status
last full status
in cloud_tablet.cpp

and add

last cumulative schedule time
last full schedule time
last cumulative status
last full status
in tablet.cpp

Co-authored-by: yoruet 1559650411@qq.com

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented May 15, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 33849 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e088da1289627faf91fbe4fea1f753b9c2e8315e, data reload: false

------ Round 1 ----------------------------------
q1	26388	5072	4996	4996
q2	2077	303	186	186
q3	10379	1247	692	692
q4	10224	1011	529	529
q5	7524	2434	2329	2329
q6	182	163	132	132
q7	916	747	623	623
q8	9324	1308	1119	1119
q9	6927	5108	5229	5108
q10	6876	2305	1882	1882
q11	502	288	266	266
q12	347	345	218	218
q13	17799	3652	3055	3055
q14	229	236	212	212
q15	520	488	488	488
q16	423	439	366	366
q17	614	870	356	356
q18	7729	7221	7172	7172
q19	1363	949	537	537
q20	333	335	224	224
q21	3743	2594	2404	2404
q22	1032	1012	955	955
Total cold run time: 115451 ms
Total hot run time: 33849 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5118	5102	5113	5102
q2	241	332	228	228
q3	2134	2641	2288	2288
q4	1365	1798	1385	1385
q5	4494	4449	4442	4442
q6	222	163	128	128
q7	2049	1892	1698	1698
q8	2594	2438	2659	2438
q9	7263	7083	7140	7083
q10	3068	3207	2755	2755
q11	581	507	481	481
q12	652	817	653	653
q13	4273	3819	3296	3296
q14	278	299	273	273
q15	522	492	470	470
q16	436	497	442	442
q17	1192	1532	1361	1361
q18	7687	7616	7496	7496
q19	838	801	790	790
q20	1961	2046	1811	1811
q21	4746	4279	4296	4279
q22	1088	1008	1016	1008
Total cold run time: 52802 ms
Total hot run time: 49907 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185882 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e088da1289627faf91fbe4fea1f753b9c2e8315e, data reload: false

query1	1014	478	479	478
query2	6552	1932	1899	1899
query3	6743	219	218	218
query4	26071	23850	24120	23850
query5	4339	672	457	457
query6	322	197	180	180
query7	4633	495	300	300
query8	296	256	244	244
query9	8633	2617	2636	2617
query10	515	330	273	273
query11	15309	15112	14842	14842
query12	168	111	118	111
query13	1679	545	420	420
query14	8857	6198	6115	6115
query15	210	195	166	166
query16	7130	602	448	448
query17	1038	704	554	554
query18	1956	381	285	285
query19	188	191	185	185
query20	119	115	122	115
query21	215	127	114	114
query22	4076	4129	4031	4031
query23	33988	33272	32964	32964
query24	8446	2364	2412	2364
query25	523	443	387	387
query26	1252	278	154	154
query27	2744	510	335	335
query28	4347	2109	2094	2094
query29	779	543	429	429
query30	285	217	188	188
query31	910	859	757	757
query32	81	66	64	64
query33	547	372	327	327
query34	812	839	524	524
query35	762	820	718	718
query36	945	983	899	899
query37	109	103	75	75
query38	4129	4021	4081	4021
query39	1475	1444	1393	1393
query40	211	121	105	105
query41	66	53	54	53
query42	121	109	105	105
query43	520	513	505	505
query44	1358	811	820	811
query45	178	169	167	167
query46	832	999	629	629
query47	1766	1798	1739	1739
query48	379	437	321	321
query49	743	518	422	422
query50	671	667	400	400
query51	4213	4150	4104	4104
query52	111	109	105	105
query53	233	267	186	186
query54	592	585	507	507
query55	84	83	83	83
query56	322	296	270	270
query57	1133	1172	1091	1091
query58	277	248	246	246
query59	2606	2732	2546	2546
query60	348	329	307	307
query61	129	122	121	121
query62	788	743	668	668
query63	237	193	186	186
query64	4400	1012	668	668
query65	4321	4248	4282	4248
query66	1153	434	337	337
query67	15833	15489	15348	15348
query68	7811	894	537	537
query69	497	316	279	279
query70	1186	1087	1062	1062
query71	413	333	311	311
query72	5856	4733	4852	4733
query73	678	647	352	352
query74	8985	9067	8611	8611
query75	3196	3205	2687	2687
query76	3183	1195	777	777
query77	490	378	285	285
query78	10015	10232	9289	9289
query79	2669	820	578	578
query80	624	513	433	433
query81	532	254	221	221
query82	460	129	95	95
query83	246	259	233	233
query84	276	113	94	94
query85	798	362	305	305
query86	376	278	300	278
query87	4435	4385	4390	4385
query88	3891	2253	2258	2253
query89	388	323	284	284
query90	1887	205	201	201
query91	139	142	111	111
query92	74	65	60	60
query93	2155	947	573	573
query94	715	400	302	302
query95	387	288	291	288
query96	498	569	285	285
query97	2709	2716	2649	2649
query98	233	208	201	201
query99	1343	1392	1257	1257
Total cold run time: 272967 ms
Total hot run time: 185882 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.58 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e088da1289627faf91fbe4fea1f753b9c2e8315e, data reload: false

query1	0.04	0.04	0.02
query2	0.12	0.10	0.11
query3	0.25	0.19	0.20
query4	1.59	0.20	0.11
query5	0.43	0.42	0.42
query6	1.17	0.67	0.66
query7	0.03	0.02	0.01
query8	0.04	0.03	0.03
query9	0.58	0.52	0.53
query10	0.57	0.58	0.56
query11	0.17	0.11	0.11
query12	0.15	0.11	0.13
query13	0.61	0.60	0.60
query14	0.78	0.79	0.81
query15	0.88	0.85	0.87
query16	0.40	0.39	0.39
query17	1.02	1.01	1.06
query18	0.22	0.21	0.21
query19	1.91	1.78	1.83
query20	0.01	0.00	0.01
query21	15.40	0.90	0.53
query22	0.78	1.28	0.74
query23	14.77	1.38	0.63
query24	6.55	2.37	0.46
query25	0.41	0.25	0.09
query26	0.63	0.16	0.13
query27	0.06	0.05	0.04
query28	9.44	0.94	0.44
query29	12.59	3.98	3.32
query30	0.25	0.08	0.06
query31	2.82	0.61	0.42
query32	3.43	0.56	0.47
query33	3.06	3.06	3.05
query34	15.68	5.08	4.44
query35	4.56	4.51	4.50
query36	0.68	0.49	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.14	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 102.57 s
Total hot run time: 28.58 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 16, 2025
@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 2.68% (8/298) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 55.82% (14886/26670)
Line Coverage 44.61% (131913/295703)
Region Coverage 43.75% (66412/151816)
Branch Coverage 38.35% (34028/88732)

@Yukang-Lian
Copy link
Collaborator Author

run external

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 38.93% (116/298) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.28% (20816/26257)
Line Coverage 72.46% (214285/295714)
Region Coverage 70.69% (126100/178372)
Branch Coverage 64.43% (65311/101364)

@dataroaring dataroaring merged commit c1ef8c2 into apache:master May 18, 2025
24 of 26 checks passed
github-actions bot pushed a commit that referenced this pull request May 20, 2025
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #42227

Problem Summary:

Issue Number: close #41753
Add detailed parameters to compaction status:

add

last cumulative failure time
last base failure time
last full failure time
last cumulative success time
last base success time
last full success time
last cumulative schedule time
last base schedule time
last full schedule time
last cumulative status
last base status
last full status
in cloud_tablet.cpp

and add

last cumulative schedule time
last full schedule time
last cumulative status
last full status
in tablet.cpp

Co-authored-by: yoruet <1559650411@qq.com>
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this pull request May 21, 2025
…#50950)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: apache#42227

Problem Summary:

Issue Number: close apache#41753
Add detailed parameters to compaction status:

add

last cumulative failure time
last base failure time
last full failure time
last cumulative success time
last base success time
last full success time
last cumulative schedule time
last base schedule time
last full schedule time
last cumulative status
last base status
last full status
in cloud_tablet.cpp

and add

last cumulative schedule time
last full schedule time
last cumulative status
last full status
in tablet.cpp

Co-authored-by: yoruet <1559650411@qq.com>
dataroaring pushed a commit that referenced this pull request May 22, 2025
…on Profiling and Logging #50950" (#51125)

Pick #50950 

Co-authored-by: yoruet <1559650411@qq.com>
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…#50950)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: apache#42227

Problem Summary:

Issue Number: close apache#41753
Add detailed parameters to compaction status:

add

last cumulative failure time
last base failure time
last full failure time
last cumulative success time
last base success time
last full success time
last cumulative schedule time
last base schedule time
last full schedule time
last cumulative status
last base status
last full status
in cloud_tablet.cpp

and add

last cumulative schedule time
last full schedule time
last cumulative status
last full status
in tablet.cpp

Co-authored-by: yoruet <1559650411@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.6-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Improve Compaction Profiling and Logging
7 participants