Skip to content

[cherry-pick](branch-3.0) Pick "[Fix](core) Fix null ptr introduced by #42949 (#46074)"#48346

Merged
dataroaring merged 1 commit intoapache:branch-3.0from
Yukang-Lian:Pick-46074-3.0
Feb 26, 2025
Merged

[cherry-pick](branch-3.0) Pick "[Fix](core) Fix null ptr introduced by #42949 (#46074)"#48346
dataroaring merged 1 commit intoapache:branch-3.0from
Yukang-Lian:Pick-46074-3.0

Conversation

@Yukang-Lian
Copy link
Collaborator

@Yukang-Lian Yukang-Lian commented Feb 26, 2025

Pick #46074

In PR #42949, during the rowset ID initialization process, we used a random ID to replace the rowset ID that failed during serialization. However, the generation of random IDs depends on the storage engine, which hasn't been initialized during the rowset ID initialization process, leading to a core dump. This PR fixes this issue by uniformly using MAX_ROWSET_ID-1 to replace the failed rowset ID. This approach is safe because the rowset ID generator won't generate such a large ID, and we can consider all rowsets with rowset ID equal to MAX_ROWSET_ID-1 as failed initialization rowsets that should rely on multiple replicas for automatic recovery.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

In PR apache#42949, during the rowset ID initialization process, we used a
random ID to replace the rowset ID that failed during serialization.
However, the generation of random IDs depends on the storage engine,
which hasn't been initialized during the rowset ID initialization
process, leading to a core dump. This PR fixes this issue by uniformly
using MAX_ROWSET_ID-1 to replace the failed rowset ID. This approach is
safe because the rowset ID generator won't generate such a large ID, and
we can consider all rowsets with rowset ID equal to MAX_ROWSET_ID-1 as
failed initialization rowsets that should rely on multiple replicas for
automatic recovery.
@Yukang-Lian
Copy link
Collaborator Author

run buildall

@Thearas
Copy link
Contributor

Thearas commented Feb 26, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Yukang-Lian Yukang-Lian changed the title [cherry-pick](branch-3.0) Pick "[Fix](core) Fix null ptr introduced by #42949" (#46074) [cherry-pick](branch-3.0) Pick "[Fix](core) Fix null ptr introduced by #42949 (#46074)" Feb 26, 2025
@doris-robot
Copy link

TPC-H: Total hot run time: 40545 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2d1e0fee22855e11248153bcc7e14e87df15294f, data reload: false

------ Round 1 ----------------------------------
q1	17721	6811	6679	6679
q2	2080	174	162	162
q3	10674	1105	1172	1105
q4	10558	736	709	709
q5	7757	2926	2833	2833
q6	226	135	133	133
q7	984	617	606	606
q8	9347	2016	2075	2016
q9	6681	6476	6451	6451
q10	6991	2307	2305	2305
q11	460	258	261	258
q12	404	219	211	211
q13	17772	2998	3010	2998
q14	257	206	221	206
q15	502	462	472	462
q16	666	599	577	577
q17	995	615	610	610
q18	7344	6859	6748	6748
q19	1404	1154	1057	1057
q20	477	209	204	204
q21	4172	3375	3231	3231
q22	1094	1012	984	984
Total cold run time: 108566 ms
Total hot run time: 40545 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6704	6685	6678	6678
q2	331	237	232	232
q3	2955	2812	2984	2812
q4	2015	1836	1808	1808
q5	5741	5735	5817	5735
q6	210	130	126	126
q7	2221	1830	1828	1828
q8	3424	3519	3550	3519
q9	8921	8941	8935	8935
q10	3602	3547	3552	3547
q11	600	511	488	488
q12	800	609	596	596
q13	9187	3216	3151	3151
q14	305	283	274	274
q15	521	465	470	465
q16	681	657	635	635
q17	1885	1614	1645	1614
q18	8396	7836	7831	7831
q19	1680	1652	1476	1476
q20	2079	1863	1905	1863
q21	5724	5408	5371	5371
q22	1156	1067	1024	1024
Total cold run time: 69138 ms
Total hot run time: 60008 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197031 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2d1e0fee22855e11248153bcc7e14e87df15294f, data reload: false

query1	1327	963	934	934
query2	6946	2025	1980	1980
query3	10887	4614	4656	4614
query4	32743	23341	23242	23242
query5	3582	476	456	456
query6	275	188	183	183
query7	3988	323	315	315
query8	312	234	234	234
query9	9319	2582	2588	2582
query10	474	278	258	258
query11	17959	15241	15490	15241
query12	155	107	106	106
query13	1544	416	404	404
query14	9567	7310	6555	6555
query15	249	180	188	180
query16	7888	483	508	483
query17	1588	583	566	566
query18	1983	334	317	317
query19	227	168	173	168
query20	126	117	115	115
query21	213	109	111	109
query22	4808	4683	4702	4683
query23	34876	33877	35075	33877
query24	11216	2950	2911	2911
query25	669	416	433	416
query26	1213	179	178	178
query27	2335	359	354	354
query28	7560	2473	2409	2409
query29	872	482	472	472
query30	276	169	184	169
query31	1064	837	827	827
query32	98	59	59	59
query33	789	304	312	304
query34	1120	525	534	525
query35	876	750	722	722
query36	1133	957	991	957
query37	141	73	74	73
query38	4191	4064	4163	4064
query39	1507	1505	1643	1505
query40	212	99	101	99
query41	51	47	50	47
query42	121	103	102	102
query43	525	497	499	497
query44	1319	809	828	809
query45	189	174	171	171
query46	1201	767	745	745
query47	2006	1891	1873	1873
query48	487	392	393	392
query49	1039	411	404	404
query50	864	435	425	425
query51	7472	7250	7225	7225
query52	104	96	91	91
query53	269	195	201	195
query54	1222	485	475	475
query55	78	84	82	82
query56	273	252	252	252
query57	1259	1082	1093	1082
query58	227	206	204	204
query59	3222	2806	2891	2806
query60	299	257	244	244
query61	107	116	107	107
query62	831	667	664	664
query63	233	187	195	187
query64	4037	692	630	630
query65	3280	3162	3197	3162
query66	731	301	306	301
query67	15857	15791	15555	15555
query68	4594	583	573	573
query69	443	265	264	264
query70	1109	1120	1133	1120
query71	324	251	269	251
query72	6332	4089	4030	4030
query73	753	356	352	352
query74	10469	8993	8962	8962
query75	3387	2666	2693	2666
query76	2753	1077	1020	1020
query77	393	291	277	277
query78	10534	9598	9606	9598
query79	2384	618	607	607
query80	1136	462	436	436
query81	562	245	241	241
query82	752	94	89	89
query83	224	149	144	144
query84	240	80	79	79
query85	1459	303	292	292
query86	481	300	268	268
query87	4365	4307	4357	4307
query88	4268	2399	2378	2378
query89	424	304	294	294
query90	1882	186	188	186
query91	186	150	146	146
query92	65	51	49	49
query93	2313	542	551	542
query94	724	301	300	300
query95	351	265	262	262
query96	613	273	295	273
query97	3298	3158	3199	3158
query98	226	213	211	211
query99	1578	1283	1292	1283
Total cold run time: 302233 ms
Total hot run time: 197031 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.98 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2d1e0fee22855e11248153bcc7e14e87df15294f, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.06	0.07
query4	1.63	0.11	0.10
query5	0.51	0.52	0.50
query6	1.14	0.73	0.72
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.57	0.50	0.50
query10	0.54	0.55	0.57
query11	0.15	0.09	0.11
query12	0.14	0.11	0.12
query13	0.60	0.59	0.59
query14	2.75	2.83	2.75
query15	0.89	0.82	0.83
query16	0.37	0.38	0.37
query17	0.99	1.06	1.05
query18	0.23	0.22	0.22
query19	1.98	1.86	2.00
query20	0.02	0.01	0.01
query21	15.36	0.58	0.58
query22	2.93	2.31	1.80
query23	17.01	0.96	0.88
query24	2.96	1.51	1.68
query25	0.23	0.13	0.06
query26	0.51	0.14	0.13
query27	0.04	0.03	0.04
query28	9.39	0.54	0.51
query29	12.57	3.39	3.37
query30	0.25	0.06	0.06
query31	2.86	0.38	0.39
query32	3.24	0.47	0.47
query33	3.00	3.01	3.01
query34	16.92	4.54	4.51
query35	4.55	4.50	4.56
query36	0.71	0.47	0.50
query37	0.10	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.15	0.12	0.12
query41	0.08	0.02	0.03
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 105.92 s
Total hot run time: 32.98 s

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 935c6dc into apache:branch-3.0 Feb 26, 2025
26 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants