Skip to content

[fix](tool) Fix meta_tool coredump and dead loop issues#61509

Open
wenzhenghu wants to merge 4 commits intoapache:masterfrom
HYDCP:fix_meta_tool
Open

[fix](tool) Fix meta_tool coredump and dead loop issues#61509
wenzhenghu wants to merge 4 commits intoapache:masterfrom
HYDCP:fix_meta_tool

Conversation

@wenzhenghu
Copy link
Contributor

@wenzhenghu wenzhenghu commented Mar 19, 2026

What problem does this PR solve?

related issue:#61447

Proposed changes

This PR fixes several bugs in the offline tool meta_tool, including coredumps caused by uninitialized ExecEnv components and a dead loop when loading tablet meta.

  1. Fix coredump in offline commands:

    • For lightweight commands (show_meta, batch_delete_meta, show_segment_footer, show_segment_data): These do not require a full StorageEngine environment but still rely on basic components like TabletSchemaCache, TabletColumnObjectPool, and MemTracker. This PR introduces a lightweight initialization function init_common_components() for them to prevent coredumps caused by null pointer dereferences.
    • For engine-level commands (get_meta, load_meta, delete_meta): These commands initialize a StorageEngine and DataDir instance to read/write RocksDB. The PR ensures that they correctly load configurations from be.conf (via DORIS_HOME) and also call init_common_components() to guarantee that both the global settings (e.g., memory limits, thread pools) and the ExecEnv components are properly set up before the engine starts.
  2. Fix dead loop in load_meta:
    The TabletMetaManager::load_json_meta function previously used infile.getline() in a while(!infile.eof()) loop. If the file cannot be opened or an error occurs, the failbit is set but not the eofbit, causing an infinite loop. This has been fixed by reading the file content safely using std::istreambuf_iterator and adding an is_open() check.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wenzhenghu
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26839 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3b9b5b77eea754700f0d5066a8a8d1d76578a0f1, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17602	4531	4297	4297
q2	q3	10636	774	517	517
q4	4714	350	252	252
q5	7781	1208	1014	1014
q6	189	173	148	148
q7	794	860	671	671
q8	10351	1476	1340	1340
q9	5770	4736	4673	4673
q10	6330	1944	1655	1655
q11	500	262	239	239
q12	754	588	465	465
q13	18052	2971	2176	2176
q14	231	233	213	213
q15	q16	739	747	675	675
q17	752	852	432	432
q18	5903	5449	5169	5169
q19	1400	980	627	627
q20	535	486	378	378
q21	4573	2091	1603	1603
q22	387	352	295	295
Total cold run time: 97993 ms
Total hot run time: 26839 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4779	4539	4593	4539
q2	q3	3894	4328	3838	3838
q4	892	1231	814	814
q5	4071	4399	4375	4375
q6	198	168	139	139
q7	1754	1635	1540	1540
q8	2512	2709	2591	2591
q9	7778	7350	7187	7187
q10	3818	3944	3629	3629
q11	531	461	440	440
q12	517	620	450	450
q13	2842	3251	2386	2386
q14	284	315	286	286
q15	q16	726	743	756	743
q17	1172	1384	1381	1381
q18	7183	6725	6602	6602
q19	851	882	929	882
q20	2075	2147	2044	2044
q21	4036	3541	3271	3271
q22	489	433	380	380
Total cold run time: 50402 ms
Total hot run time: 47517 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168316 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3b9b5b77eea754700f0d5066a8a8d1d76578a0f1, data reload: false

query5	4328	637	493	493
query6	329	230	203	203
query7	4210	473	257	257
query8	334	243	231	231
query9	8710	2693	2680	2680
query10	529	405	345	345
query11	6990	5101	4884	4884
query12	186	133	130	130
query13	1274	459	359	359
query14	5771	3729	3452	3452
query14_1	2861	2859	2898	2859
query15	209	191	181	181
query16	992	447	452	447
query17	916	743	639	639
query18	2454	472	365	365
query19	223	219	196	196
query20	136	132	130	130
query21	221	134	113	113
query22	13256	13921	14464	13921
query23	16306	15902	15549	15549
query23_1	15785	15707	15600	15600
query24	7345	1613	1229	1229
query24_1	1245	1259	1242	1242
query25	584	521	408	408
query26	1246	259	154	154
query27	2777	474	294	294
query28	4498	1843	1843	1843
query29	858	563	519	519
query30	295	230	190	190
query31	1007	945	872	872
query32	80	71	67	67
query33	520	342	285	285
query34	884	866	523	523
query35	639	676	595	595
query36	1079	1083	981	981
query37	140	96	87	87
query38	2950	2939	2873	2873
query39	853	842	803	803
query39_1	807	781	802	781
query40	234	152	136	136
query41	63	60	59	59
query42	254	254	256	254
query43	252	265	234	234
query44	
query45	198	188	182	182
query46	885	983	606	606
query47	2454	2112	2045	2045
query48	303	321	235	235
query49	632	472	378	378
query50	681	276	220	220
query51	4063	4069	4018	4018
query52	260	274	262	262
query53	296	336	285	285
query54	299	269	262	262
query55	90	90	82	82
query56	324	342	305	305
query57	1940	1839	1658	1658
query58	282	276	277	276
query59	2775	2934	2767	2767
query60	347	333	326	326
query61	181	151	144	144
query62	627	585	538	538
query63	311	281	281	281
query64	5110	1283	1004	1004
query65	
query66	1474	449	365	365
query67	24197	24237	24208	24208
query68	
query69	397	299	288	288
query70	944	911	968	911
query71	343	308	299	299
query72	2826	2731	2514	2514
query73	555	554	339	339
query74	9652	9521	9394	9394
query75	2873	2735	2497	2497
query76	2302	1056	706	706
query77	372	387	315	315
query78	11015	11131	10446	10446
query79	1128	767	573	573
query80	1320	627	547	547
query81	557	259	228	228
query82	1006	153	120	120
query83	332	259	252	252
query84	300	117	95	95
query85	930	540	459	459
query86	414	319	300	300
query87	3117	3139	2993	2993
query88	3598	2661	2653	2653
query89	442	386	356	356
query90	2022	174	173	173
query91	171	165	143	143
query92	80	77	71	71
query93	979	862	498	498
query94	638	336	295	295
query95	582	337	321	321
query96	644	525	235	235
query97	2491	2490	2391	2391
query98	229	220	237	220
query99	1010	1007	914	914
Total cold run time: 249582 ms
Total hot run time: 168316 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.69% (19775/37532)
Line Coverage 36.20% (184587/509869)
Region Coverage 32.45% (142885/440262)
Branch Coverage 33.65% (62510/185791)

@wenzhenghu
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27112 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9ec572e1ba93d899ba5da63bf7bb71b22baf22a5, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17592	4470	4289	4289
q2	q3	10651	776	517	517
q4	4678	358	279	279
q5	7571	1241	1028	1028
q6	182	175	151	151
q7	781	856	679	679
q8	9307	1493	1422	1422
q9	4737	4686	4769	4686
q10	6239	1913	1626	1626
q11	474	264	259	259
q12	700	573	470	470
q13	18033	2929	2172	2172
q14	227	235	213	213
q15	q16	746	738	670	670
q17	746	835	491	491
q18	6110	5460	5307	5307
q19	1222	993	613	613
q20	542	495	375	375
q21	4672	1875	1551	1551
q22	488	337	314	314
Total cold run time: 95698 ms
Total hot run time: 27112 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4801	4619	4594	4594
q2	q3	3879	4344	3815	3815
q4	890	1210	818	818
q5	4078	4395	4336	4336
q6	191	196	152	152
q7	1737	1650	1540	1540
q8	2500	2699	2561	2561
q9	7794	7524	7430	7430
q10	3809	4013	3595	3595
q11	516	432	454	432
q12	494	609	464	464
q13	2787	3286	2357	2357
q14	297	301	268	268
q15	q16	730	758	706	706
q17	1153	1317	1386	1317
q18	7148	6746	6696	6696
q19	1009	922	938	922
q20	2079	2177	2000	2000
q21	3952	3578	3463	3463
q22	450	417	376	376
Total cold run time: 50294 ms
Total hot run time: 47842 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169489 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9ec572e1ba93d899ba5da63bf7bb71b22baf22a5, data reload: false

query5	4397	682	554	554
query6	350	239	220	220
query7	4208	490	271	271
query8	351	258	228	228
query9	8744	2743	2777	2743
query10	511	405	368	368
query11	6975	5112	4902	4902
query12	190	135	130	130
query13	1279	475	351	351
query14	5724	3851	3551	3551
query14_1	2964	2931	2921	2921
query15	204	199	182	182
query16	1034	480	452	452
query17	1159	752	662	662
query18	2460	463	364	364
query19	221	223	201	201
query20	141	130	130	130
query21	216	142	112	112
query22	13205	14222	14711	14222
query23	16162	16032	15624	15624
query23_1	15800	15496	15456	15456
query24	7342	1636	1243	1243
query24_1	1245	1250	1236	1236
query25	559	479	412	412
query26	1241	259	152	152
query27	2772	493	302	302
query28	4447	1845	1834	1834
query29	837	585	486	486
query30	303	220	195	195
query31	1044	952	902	902
query32	81	77	71	71
query33	530	349	284	284
query34	953	875	536	536
query35	655	680	593	593
query36	1093	1148	991	991
query37	142	102	84	84
query38	2965	2967	2913	2913
query39	871	837	818	818
query39_1	789	791	791	791
query40	237	159	140	140
query41	64	62	60	60
query42	262	254	258	254
query43	260	263	222	222
query44	
query45	200	193	184	184
query46	884	990	618	618
query47	2100	2550	2037	2037
query48	318	317	230	230
query49	637	478	395	395
query50	710	286	221	221
query51	4123	4034	4086	4034
query52	269	269	257	257
query53	295	342	293	293
query54	318	286	275	275
query55	94	85	81	81
query56	335	323	320	320
query57	1924	1758	1800	1758
query58	298	290	282	282
query59	2815	2933	2768	2768
query60	345	352	340	340
query61	167	162	159	159
query62	627	591	538	538
query63	325	291	277	277
query64	5063	1306	1035	1035
query65	
query66	1466	481	380	380
query67	24298	24314	24220	24220
query68	
query69	411	320	296	296
query70	987	955	981	955
query71	348	318	309	309
query72	2904	2732	2489	2489
query73	550	553	321	321
query74	9616	9538	9376	9376
query75	2871	2757	2494	2494
query76	2295	1025	703	703
query77	364	380	322	322
query78	10867	11082	10459	10459
query79	1104	770	582	582
query80	729	654	593	593
query81	487	267	226	226
query82	1343	160	120	120
query83	378	277	243	243
query84	257	127	99	99
query85	868	520	475	475
query86	378	302	304	302
query87	3160	3143	3063	3063
query88	3588	2674	2659	2659
query89	440	375	348	348
query90	1988	191	184	184
query91	168	167	142	142
query92	75	78	74	74
query93	934	864	502	502
query94	465	338	313	313
query95	583	416	325	325
query96	646	524	232	232
query97	2484	2494	2403	2403
query98	242	228	225	225
query99	1008	947	909	909
Total cold run time: 249352 ms
Total hot run time: 169489 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.74% (19793/37532)
Line Coverage 36.27% (184926/509864)
Region Coverage 32.52% (143181/440262)
Branch Coverage 33.69% (62600/185793)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.60% (26303/36735)
Line Coverage 54.41% (276506/508150)
Region Coverage 51.70% (229716/444308)
Branch Coverage 53.07% (98849/186275)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants