Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](compaction) Fix data race when fetching base compaction score from cloud tablet #38006

Conversation

TangSiyang2001
Copy link
Collaborator

@TangSiyang2001 TangSiyang2001 commented Jul 17, 2024

Proposed changes

When calculating base compaction score on cloud tablet, data race may cause rowset meta become nullptr and cause a crash.

Therefore added a read lock as protection in this case.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@TangSiyang2001
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39785 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 32a46a4c8cfc43742303368905e106c7b0d1b1cc, data reload: false

------ Round 1 ----------------------------------
q1	18077	4457	4335	4335
q2	2490	199	188	188
q3	11066	1162	1135	1135
q4	10769	796	751	751
q5	8691	2726	2786	2726
q6	222	139	137	137
q7	968	595	585	585
q8	9225	2044	2054	2044
q9	8730	6555	6538	6538
q10	8637	3769	3772	3769
q11	457	248	251	248
q12	391	232	222	222
q13	17888	2966	2951	2951
q14	274	232	240	232
q15	539	471	495	471
q16	500	380	382	380
q17	954	639	669	639
q18	8058	7611	7389	7389
q19	8657	1389	1332	1332
q20	679	322	346	322
q21	4891	3107	3213	3107
q22	353	289	284	284
Total cold run time: 122516 ms
Total hot run time: 39785 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4403	4276	4225	4225
q2	372	276	279	276
q3	2968	2785	2779	2779
q4	1858	1590	1636	1590
q5	5279	5314	5286	5286
q6	215	128	128	128
q7	2078	1757	1672	1672
q8	3194	3308	3320	3308
q9	8408	8342	8432	8342
q10	3867	3714	3718	3714
q11	564	464	484	464
q12	781	654	589	589
q13	17764	2985	2986	2985
q14	307	286	279	279
q15	515	494	472	472
q16	470	404	412	404
q17	1767	1451	1450	1450
q18	7584	7425	7249	7249
q19	1654	1547	1655	1547
q20	1984	1783	1784	1783
q21	4956	4632	4659	4632
q22	579	487	521	487
Total cold run time: 71567 ms
Total hot run time: 53661 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172729 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 32a46a4c8cfc43742303368905e106c7b0d1b1cc, data reload: false

query1	900	373	368	368
query2	6468	1929	1857	1857
query3	6670	210	218	210
query4	28447	17372	17195	17195
query5	4186	489	474	474
query6	284	170	160	160
query7	4588	285	288	285
query8	238	185	188	185
query9	8750	2430	2424	2424
query10	449	285	270	270
query11	10569	10162	9986	9986
query12	133	83	88	83
query13	1646	367	371	367
query14	10539	7691	7722	7691
query15	230	167	173	167
query16	7807	349	305	305
query17	1776	536	517	517
query18	1764	273	279	273
query19	199	168	150	150
query20	90	82	81	81
query21	209	126	121	121
query22	4367	4171	4138	4138
query23	33762	33195	33157	33157
query24	12097	2912	2848	2848
query25	652	367	369	367
query26	1777	148	150	148
query27	2937	268	267	267
query28	7349	2034	2013	2013
query29	1041	642	611	611
query30	289	166	148	148
query31	992	738	736	736
query32	97	52	52	52
query33	768	305	289	289
query34	984	489	499	489
query35	682	571	573	571
query36	1107	924	952	924
query37	282	83	84	83
query38	2885	2765	2801	2765
query39	853	827	834	827
query40	281	121	120	120
query41	49	44	46	44
query42	114	98	105	98
query43	517	476	451	451
query44	1250	754	733	733
query45	188	163	161	161
query46	1091	755	703	703
query47	1872	1779	1738	1738
query48	363	293	291	291
query49	1205	413	427	413
query50	769	405	410	405
query51	6887	6864	6835	6835
query52	107	100	96	96
query53	356	304	295	295
query54	994	443	446	443
query55	79	73	77	73
query56	289	265	278	265
query57	1186	1074	1086	1074
query58	264	237	265	237
query59	2857	2541	2559	2541
query60	309	306	280	280
query61	94	91	93	91
query62	840	650	650	650
query63	318	290	282	282
query64	10478	2237	1650	1650
query65	3214	3116	3134	3116
query66	1348	337	381	337
query67	15533	14989	14838	14838
query68	5827	546	559	546
query69	657	415	352	352
query70	1151	1166	1113	1113
query71	469	281	279	279
query72	7824	5432	5544	5432
query73	774	325	323	323
query74	6311	5621	5723	5621
query75	3868	2689	2698	2689
query76	3587	1048	932	932
query77	679	314	311	311
query78	9910	9419	8972	8972
query79	3413	538	517	517
query80	2116	471	472	471
query81	574	226	227	226
query82	1454	146	138	138
query83	329	169	169	169
query84	276	91	85	85
query85	1598	369	392	369
query86	484	309	304	304
query87	3320	3120	3130	3120
query88	4317	2402	2391	2391
query89	481	386	386	386
query90	1839	194	188	188
query91	133	99	103	99
query92	66	54	50	50
query93	4755	508	500	500
query94	1230	211	214	211
query95	417	323	316	316
query96	595	279	271	271
query97	3194	3042	3015	3015
query98	225	200	198	198
query99	1706	1289	1273	1273
Total cold run time: 295294 ms
Total hot run time: 172729 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.64 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 32a46a4c8cfc43742303368905e106c7b0d1b1cc, data reload: false

query1	0.05	0.03	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.69	0.07	0.07
query5	0.50	0.53	0.48
query6	1.15	0.73	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.05
query9	0.56	0.49	0.48
query10	0.54	0.54	0.53
query11	0.14	0.12	0.11
query12	0.15	0.12	0.12
query13	0.59	0.59	0.59
query14	0.77	0.78	0.77
query15	0.87	0.81	0.82
query16	0.37	0.35	0.37
query17	0.97	1.04	1.04
query18	0.21	0.22	0.22
query19	1.79	1.84	1.69
query20	0.02	0.01	0.01
query21	15.40	0.74	0.66
query22	4.12	7.86	1.79
query23	18.27	1.32	1.33
query24	2.18	0.23	0.23
query25	0.16	0.09	0.08
query26	0.30	0.22	0.21
query27	0.46	0.23	0.22
query28	13.24	1.03	1.01
query29	12.64	3.38	3.33
query30	0.26	0.06	0.06
query31	2.86	0.38	0.39
query32	3.26	0.48	0.47
query33	2.89	2.94	2.90
query34	17.19	4.33	4.35
query35	4.48	4.45	4.47
query36	0.65	0.48	0.48
query37	0.20	0.15	0.15
query38	0.16	0.15	0.15
query39	0.04	0.04	0.03
query40	0.15	0.13	0.11
query41	0.10	0.05	0.05
query42	0.06	0.04	0.05
query43	0.05	0.03	0.04
Total cold run time: 109.86 s
Total hot run time: 30.64 s

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 18, 2024
@TangSiyang2001 TangSiyang2001 changed the title [fix](compaction) fix data race when fetching base compaction score from cloud tablet [fix](compaction) Fix data race when fetching base compaction score from cloud tablet Jul 18, 2024
@dataroaring dataroaring merged commit 8940da1 into apache:master Jul 18, 2024
31 of 33 checks passed
dataroaring pushed a commit that referenced this pull request Jul 19, 2024
…rom cloud tablet (#38006)

## Proposed changes

When calculating base compaction score on cloud tablet, data race may
cause rowset meta become nullptr and cause a crash.

Therefore added a read lock as protection in this case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants