Skip to content

[fix](filecache) pass tablet_id through FileReaderOptions instead of parsing from path#61683

Open
deardeng wants to merge 3 commits intoapache:masterfrom
deardeng:fix-extract-tablet-id
Open

[fix](filecache) pass tablet_id through FileReaderOptions instead of parsing from path#61683
deardeng wants to merge 3 commits intoapache:masterfrom
deardeng:fix-extract-tablet-id

Conversation

@deardeng
Copy link
Collaborator

CachedRemoteFileReader::_execute_remote_read previously parsed tablet_id from file paths at runtime via extract_tablet_id(). This breaks when enable_packed_file (small file merging) is enabled because packed file paths don't follow the expected data/{tablet_id}/... format.

Fix: store tablet_id from FileReaderOptions at construction time and use it directly, eliminating runtime path parsing. Propagate tablet_id through all code paths: Segment, InvertedIndexFileReader, FSIndexInput::open, DownloadFileMeta (warmup/preheating), and beta_rowset consistency checks.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Mar 24, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@deardeng
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26700 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1bd1c3a33837ca310853a554aec46b481223d74d, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17656	4442	4310	4310
q2	q3	10630	781	527	527
q4	4685	364	249	249
q5	7574	1223	1034	1034
q6	177	173	151	151
q7	802	852	690	690
q8	9334	1513	1406	1406
q9	5284	4825	4737	4737
q10	6322	1937	1653	1653
q11	471	246	240	240
q12	771	587	489	489
q13	18060	2704	1928	1928
q14	228	234	213	213
q15	q16	741	722	668	668
q17	737	855	448	448
q18	5905	5430	5286	5286
q19	1139	972	607	607
q20	528	486	378	378
q21	4443	1823	1403	1403
q22	541	388	283	283
Total cold run time: 96028 ms
Total hot run time: 26700 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4749	4530	4579	4530
q2	q3	3894	4456	3878	3878
q4	954	1208	809	809
q5	4095	4447	4349	4349
q6	186	172	138	138
q7	1788	1625	1534	1534
q8	2464	2698	2583	2583
q9	7579	7229	7317	7229
q10	3831	4013	3661	3661
q11	527	473	419	419
q12	483	601	478	478
q13	2551	2962	2042	2042
q14	288	313	288	288
q15	q16	733	772	745	745
q17	1192	1343	1401	1343
q18	7199	6815	6802	6802
q19	904	919	899	899
q20	2088	2162	2008	2008
q21	4069	3517	3271	3271
q22	451	417	399	399
Total cold run time: 50025 ms
Total hot run time: 47405 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169861 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1bd1c3a33837ca310853a554aec46b481223d74d, data reload: false

query5	4329	646	522	522
query6	319	227	205	205
query7	4218	468	261	261
query8	343	230	233	230
query9	8704	2744	2749	2744
query10	511	381	334	334
query11	7004	5090	4902	4902
query12	184	133	125	125
query13	1277	480	342	342
query14	5795	3769	3461	3461
query14_1	2907	2868	2834	2834
query15	205	191	180	180
query16	977	487	460	460
query17	953	766	598	598
query18	2430	445	347	347
query19	212	208	180	180
query20	137	125	128	125
query21	208	135	111	111
query22	13375	14367	14718	14367
query23	16687	16354	16130	16130
query23_1	16462	15809	15713	15713
query24	7139	1611	1218	1218
query24_1	1240	1217	1220	1217
query25	563	467	452	452
query26	1230	266	147	147
query27	2774	487	295	295
query28	4502	1872	1840	1840
query29	857	573	480	480
query30	292	228	185	185
query31	989	936	868	868
query32	80	69	68	68
query33	508	338	287	287
query34	882	872	526	526
query35	640	699	592	592
query36	1111	1107	986	986
query37	136	97	87	87
query38	2955	2894	2940	2894
query39	848	835	796	796
query39_1	797	786	797	786
query40	233	151	136	136
query41	62	58	60	58
query42	263	270	265	265
query43	249	252	226	226
query44	
query45	206	187	185	185
query46	875	976	610	610
query47	2096	2186	2061	2061
query48	314	317	228	228
query49	618	473	421	421
query50	679	279	216	216
query51	4062	4048	4007	4007
query52	268	270	260	260
query53	288	351	290	290
query54	314	279	271	271
query55	95	86	83	83
query56	330	314	325	314
query57	1848	1820	1705	1705
query58	292	287	272	272
query59	2791	2960	2768	2768
query60	347	355	347	347
query61	158	156	157	156
query62	640	613	538	538
query63	311	280	276	276
query64	5117	1288	1017	1017
query65	
query66	1453	460	353	353
query67	24173	24361	24175	24175
query68	
query69	412	309	288	288
query70	996	983	899	899
query71	358	318	309	309
query72	3077	2944	2723	2723
query73	551	555	320	320
query74	9641	9574	9436	9436
query75	2899	2770	2510	2510
query76	2306	1070	704	704
query77	385	404	330	330
query78	10962	11084	10477	10477
query79	2942	800	583	583
query80	1807	642	571	571
query81	546	259	224	224
query82	990	158	119	119
query83	342	265	254	254
query84	292	116	98	98
query85	918	500	446	446
query86	416	325	297	297
query87	3195	3166	2985	2985
query88	3540	2668	2666	2666
query89	432	377	347	347
query90	2015	189	185	185
query91	169	168	142	142
query92	78	74	70	70
query93	1211	876	497	497
query94	651	317	291	291
query95	584	411	338	338
query96	655	530	235	235
query97	2434	2469	2406	2406
query98	248	223	228	223
query99	1007	999	914	914
Total cold run time: 252211 ms
Total hot run time: 169861 ms

…parsing from path

CachedRemoteFileReader::_execute_remote_read previously parsed tablet_id
from file paths at runtime via extract_tablet_id(). This breaks when
enable_packed_file (small file merging) is enabled because packed file
paths don't follow the expected data/{tablet_id}/... format.

Fix: store tablet_id from FileReaderOptions at construction time and use
it directly, eliminating runtime path parsing. Propagate tablet_id through
all code paths: Segment, InvertedIndexFileReader, FSIndexInput::open,
DownloadFileMeta (warmup/preheating), and beta_rowset consistency checks.
@deardeng deardeng force-pushed the fix-extract-tablet-id branch from b387dc0 to 7c91d8c Compare March 25, 2026 03:35
@deardeng
Copy link
Collaborator Author

run buildall

@deardeng deardeng force-pushed the fix-extract-tablet-id branch from 7c91d8c to ca2ccb5 Compare March 25, 2026 03:49
@deardeng
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26994 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ca2ccb5a1f75338bb57af7aa9264ec7d62cb9877, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17640	4488	4312	4312
q2	q3	10641	806	574	574
q4	4678	361	257	257
q5	7574	1233	1018	1018
q6	182	176	148	148
q7	807	863	693	693
q8	9306	1481	1408	1408
q9	4994	4762	4838	4762
q10	6284	1963	1668	1668
q11	458	250	237	237
q12	716	598	479	479
q13	18050	2724	1961	1961
q14	245	237	219	219
q15	q16	758	768	688	688
q17	753	858	446	446
q18	6186	5588	5296	5296
q19	1118	1005	674	674
q20	546	510	395	395
q21	4417	1862	1458	1458
q22	520	406	301	301
Total cold run time: 95873 ms
Total hot run time: 26994 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4841	4646	4605	4605
q2	q3	3941	4384	3849	3849
q4	912	1243	808	808
q5	4131	4430	4414	4414
q6	194	184	148	148
q7	1800	1685	1572	1572
q8	2549	2761	2826	2761
q9	7628	7610	7541	7541
q10	3773	4028	3631	3631
q11	516	440	501	440
q12	570	628	472	472
q13	2499	2899	2084	2084
q14	301	315	325	315
q15	q16	745	745	721	721
q17	1183	1366	1364	1364
q18	7023	6923	6740	6740
q19	939	903	980	903
q20	2136	2414	1991	1991
q21	4047	3655	3442	3442
q22	509	435	396	396
Total cold run time: 50237 ms
Total hot run time: 48197 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169654 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ca2ccb5a1f75338bb57af7aa9264ec7d62cb9877, data reload: false

query5	4326	651	540	540
query6	355	243	210	210
query7	4234	479	285	285
query8	357	250	237	237
query9	8742	2729	2716	2716
query10	511	404	355	355
query11	7025	5148	4887	4887
query12	185	132	122	122
query13	1286	447	345	345
query14	5815	3750	3449	3449
query14_1	2859	2823	2855	2823
query15	208	201	179	179
query16	981	464	485	464
query17	1085	707	601	601
query18	2443	457	340	340
query19	213	208	188	188
query20	139	125	126	125
query21	216	133	113	113
query22	13244	14198	14861	14198
query23	16899	16481	15911	15911
query23_1	15901	15736	15721	15721
query24	7161	1616	1214	1214
query24_1	1211	1235	1243	1235
query25	594	508	436	436
query26	1247	268	151	151
query27	2766	491	301	301
query28	4484	1838	1833	1833
query29	882	599	512	512
query30	305	221	193	193
query31	1025	959	888	888
query32	82	76	73	73
query33	538	355	313	313
query34	899	862	523	523
query35	656	697	629	629
query36	1062	1135	1013	1013
query37	128	95	88	88
query38	2974	2921	2883	2883
query39	868	821	822	821
query39_1	797	787	809	787
query40	234	158	141	141
query41	69	69	64	64
query42	268	263	257	257
query43	253	251	220	220
query44	
query45	209	192	186	186
query46	911	992	605	605
query47	2110	2129	2081	2081
query48	309	322	241	241
query49	645	486	420	420
query50	687	282	224	224
query51	4035	4127	4058	4058
query52	267	270	261	261
query53	297	342	304	304
query54	331	297	289	289
query55	105	95	88	88
query56	345	367	344	344
query57	2009	1708	1647	1647
query58	287	275	272	272
query59	2792	2960	2736	2736
query60	347	339	327	327
query61	159	155	153	153
query62	630	607	554	554
query63	308	286	276	276
query64	5131	1285	1006	1006
query65	
query66	1471	457	355	355
query67	24412	24316	24269	24269
query68	
query69	430	317	290	290
query70	1017	994	927	927
query71	362	314	296	296
query72	3001	2788	2481	2481
query73	550	551	320	320
query74	9718	9636	9457	9457
query75	2907	2762	2482	2482
query76	2302	1044	690	690
query77	379	380	322	322
query78	11078	11106	10506	10506
query79	2322	778	582	582
query80	1679	636	567	567
query81	558	267	223	223
query82	1025	155	122	122
query83	371	280	248	248
query84	256	128	97	97
query85	904	504	443	443
query86	441	297	296	296
query87	3168	3117	3062	3062
query88	3551	2655	2636	2636
query89	439	373	348	348
query90	1959	186	174	174
query91	168	160	138	138
query92	77	74	76	74
query93	1010	864	516	516
query94	641	336	280	280
query95	612	416	340	340
query96	668	518	235	235
query97	2473	2480	2417	2417
query98	243	232	225	225
query99	1026	1001	914	914
Total cold run time: 252685 ms
Total hot run time: 169654 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.79% (19860/37618)
Line Coverage 36.31% (185583/511108)
Region Coverage 32.56% (143757/441511)
Branch Coverage 33.77% (62945/186378)

@deardeng
Copy link
Collaborator Author

run buildall

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 63.27% (23307/36835)
Line Coverage 46.62% (237566/509555)
Region Coverage 43.65% (194522/445637)
Branch Coverage 44.96% (84052/186942)

@deardeng deardeng force-pushed the fix-extract-tablet-id branch from 43d21c6 to a6c6a36 Compare March 25, 2026 07:27
@deardeng
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26719 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a6c6a36bd273eaa23e903208f8f50f4530287436, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17645	4416	4277	4277
q2	q3	10632	791	530	530
q4	4684	371	246	246
q5	7683	1234	1032	1032
q6	182	170	145	145
q7	824	845	665	665
q8	10562	1528	1360	1360
q9	5535	4826	4688	4688
q10	6316	1944	1670	1670
q11	475	248	251	248
q12	689	591	475	475
q13	18049	2722	1951	1951
q14	231	234	215	215
q15	q16	729	745	670	670
q17	731	888	425	425
q18	6293	5501	5199	5199
q19	1103	990	633	633
q20	540	485	378	378
q21	4403	2034	1614	1614
q22	412	364	298	298
Total cold run time: 97718 ms
Total hot run time: 26719 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4677	4612	4566	4566
q2	q3	3950	4463	3826	3826
q4	900	1206	811	811
q5	4205	4526	4372	4372
q6	183	175	147	147
q7	1766	1700	1529	1529
q8	2826	2745	2589	2589
q9	7687	7528	7555	7528
q10	3757	3990	3664	3664
q11	512	435	424	424
q12	510	597	526	526
q13	2545	2964	1982	1982
q14	276	289	268	268
q15	q16	767	924	717	717
q17	1204	1489	1401	1401
q18	7212	6995	6805	6805
q19	929	909	927	909
q20	2105	2174	2007	2007
q21	4024	3689	3382	3382
q22	478	422	399	399
Total cold run time: 50513 ms
Total hot run time: 47852 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169228 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a6c6a36bd273eaa23e903208f8f50f4530287436, data reload: false

query5	4337	630	520	520
query6	332	226	209	209
query7	4202	468	259	259
query8	350	233	219	219
query9	8677	2692	2684	2684
query10	515	393	341	341
query11	6952	5105	4896	4896
query12	207	132	132	132
query13	1284	471	352	352
query14	5707	3701	3509	3509
query14_1	2839	2823	2830	2823
query15	208	195	177	177
query16	988	403	466	403
query17	1124	725	668	668
query18	2432	443	338	338
query19	214	203	178	178
query20	147	129	125	125
query21	215	138	109	109
query22	13276	14001	14595	14001
query23	16816	16290	16125	16125
query23_1	16073	15971	15728	15728
query24	7217	1577	1209	1209
query24_1	1215	1224	1228	1224
query25	537	470	403	403
query26	1240	261	144	144
query27	2786	476	329	329
query28	4454	1832	1822	1822
query29	851	563	520	520
query30	295	231	197	197
query31	1024	925	876	876
query32	80	69	68	68
query33	509	342	277	277
query34	893	858	517	517
query35	636	680	595	595
query36	1056	1109	983	983
query37	138	98	78	78
query38	2933	2944	2829	2829
query39	856	839	802	802
query39_1	822	802	788	788
query40	232	152	135	135
query41	74	70	58	58
query42	257	259	267	259
query43	240	247	220	220
query44	
query45	196	186	179	179
query46	865	973	604	604
query47	2161	2136	2565	2136
query48	312	322	225	225
query49	633	447	415	415
query50	681	323	217	217
query51	4045	4055	4011	4011
query52	267	265	255	255
query53	287	332	284	284
query54	322	285	278	278
query55	92	84	81	81
query56	311	314	310	310
query57	1923	1840	1786	1786
query58	287	269	270	269
query59	2791	2947	2767	2767
query60	344	338	329	329
query61	151	156	150	150
query62	618	598	551	551
query63	312	280	273	273
query64	5133	1282	1028	1028
query65	
query66	1470	456	364	364
query67	24306	24339	24148	24148
query68	
query69	407	310	278	278
query70	918	948	912	912
query71	337	303	296	296
query72	2965	2870	2668	2668
query73	547	541	318	318
query74	9641	9761	9416	9416
query75	2862	2785	2481	2481
query76	2315	1043	675	675
query77	382	410	323	323
query78	10922	11058	10448	10448
query79	3124	735	559	559
query80	1729	645	563	563
query81	573	256	222	222
query82	974	149	116	116
query83	332	262	246	246
query84	255	124	105	105
query85	928	496	465	465
query86	504	332	297	297
query87	3130	3143	2983	2983
query88	3546	2650	2629	2629
query89	434	368	343	343
query90	2081	180	172	172
query91	173	165	143	143
query92	87	75	71	71
query93	2171	805	500	500
query94	649	313	286	286
query95	595	337	327	327
query96	646	505	229	229
query97	2441	2461	2391	2391
query98	242	223	219	219
query99	953	1000	918	918
Total cold run time: 253585 ms
Total hot run time: 169228 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.82% (19872/37620)
Line Coverage 36.32% (185668/511156)
Region Coverage 32.56% (143791/441561)
Branch Coverage 33.78% (62958/186403)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.46% (26324/36837)
Line Coverage 54.32% (276801/509603)
Region Coverage 51.37% (228940/445687)
Branch Coverage 52.95% (98992/186967)

@deardeng
Copy link
Collaborator Author

run p0

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.46% (26323/36837)
Line Coverage 54.31% (276781/509603)
Region Coverage 51.36% (228908/445687)
Branch Coverage 52.94% (98981/186967)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants