Skip to content

[feature](cloud) Support segment id list#65190

Draft
mymeiyi wants to merge 2 commits into
apache:masterfrom
mymeiyi:seg-list
Draft

[feature](cloud) Support segment id list#65190
mymeiyi wants to merge 2 commits into
apache:masterfrom
mymeiyi:seg-list

Conversation

@mymeiyi

@mymeiyi mymeiyi commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

No description provided.

mymeiyi added 2 commits July 3, 2026 10:47
Issue Number: None

Related PR: None

Problem Summary: Cloud rowsets historically assumed physical segment file names were continuous from 0 to num_segments - 1. Partial update append, cleanup, warmup, cache recycle, snapshot, binlog, reader, delete bitmap, collection statistics, and index compaction paths all depended on that implicit naming and could mix rowset segment position with the real physical segment id. This change adds rowset segment_ids and next_segment_id metadata behind enable_segment_list, preserves legacy continuous ids when the list is absent, introduces shared segment id helpers and RowsetSegmentMetaView/RowsetSegmentView abstractions, and migrates affected BE and Cloud paths to use rowset position for position-indexed metadata and real segment id for physical files, cache keys, delete bitmap keys, and RowLocation-facing code. It also removes unsafe convenience APIs that encoded the position equals segment id assumption, clarifies cooldown upload naming and merged next segment id calculation, and documents RowsetSegmentView as a non-owning view that must not cross async boundaries without an owning RowsetSharedPtr or copied values.

None

- Test: Manual test
    - Ran git diff --cached --check
    - Attempted build-support/clang-format.sh for changed C++ files, but it failed because llvm@16 is not installed
    - Did not run build or unit tests per request
- Behavior changed: Yes. When enable_segment_list is enabled for cloud rowsets, new writes can persist non-contiguous segment file ids while reads remain compatible with legacy rowsets.
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Segment-list rowsets are only written for cloud rowsets, while the binlog paths covered here are local-mode only. The existing binlog code intentionally uses contiguous segment indexes as file ids. Add comments around these paths to prevent future changes from incorrectly applying segment-list mapping to local binlog files.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - Ran git diff --check
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mymeiyi

mymeiyi commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29991 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit fc8bc2304f448ebc5982b25641f0309c466f8a4d, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17746	4052	4047	4047
q2	2013	334	220	220
q3	10416	1449	858	858
q4	4682	468	335	335
q5	7541	946	579	579
q6	183	177	140	140
q7	797	846	624	624
q8	9351	1616	1662	1616
q9	5682	4411	4438	4411
q10	6796	1790	1544	1544
q11	511	347	311	311
q12	706	554	442	442
q13	18154	3428	2797	2797
q14	275	264	249	249
q15	q16	794	787	718	718
q17	1069	1047	1084	1047
q18	7065	5711	5555	5555
q19	1250	1293	1131	1131
q20	792	678	563	563
q21	5782	2737	2503	2503
q22	441	362	301	301
Total cold run time: 102046 ms
Total hot run time: 29991 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4367	4341	4313	4313
q2	294	316	212	212
q3	4624	4985	4410	4410
q4	2064	2154	1367	1367
q5	4447	4335	4339	4335
q6	235	172	132	132
q7	1967	2077	1654	1654
q8	2610	2252	2200	2200
q9	8095	8095	7864	7864
q10	4716	4679	4293	4293
q11	587	436	378	378
q12	792	755	540	540
q13	3338	3532	2904	2904
q14	303	312	268	268
q15	q16	723	757	640	640
q17	1360	1381	1356	1356
q18	8189	7487	7297	7297
q19	1192	1147	1138	1138
q20	2217	2209	1932	1932
q21	5288	4561	4442	4442
q22	523	472	397	397
Total cold run time: 57931 ms
Total hot run time: 52072 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173687 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit fc8bc2304f448ebc5982b25641f0309c466f8a4d, data reload: false

query5	4319	634	481	481
query6	466	219	200	200
query7	4844	641	345	345
query8	340	192	178	178
query9	8742	4091	4113	4091
query10	497	357	316	316
query11	5989	2357	2139	2139
query12	156	101	110	101
query13	1268	612	384	384
query14	6295	5334	4976	4976
query14_1	4324	4303	4253	4253
query15	216	202	180	180
query16	1012	494	476	476
query17	960	752	595	595
query18	2468	511	354	354
query19	222	194	156	156
query20	113	112	111	111
query21	234	161	132	132
query22	13734	13553	13552	13552
query23	17447	16448	16096	16096
query23_1	16333	16319	16277	16277
query24	7752	1763	1303	1303
query24_1	1351	1300	1300	1300
query25	573	486	391	391
query26	1325	367	210	210
query27	2600	601	391	391
query28	4476	2112	2015	2015
query29	1048	615	498	498
query30	336	276	234	234
query31	1125	1086	984	984
query32	108	61	59	59
query33	505	314	236	236
query34	1203	1158	673	673
query35	748	796	679	679
query36	1420	1381	1253	1253
query37	147	112	89	89
query38	1862	1703	1647	1647
query39	924	917	889	889
query39_1	900	912	867	867
query40	237	160	139	139
query41	66	62	66	62
query42	95	93	91	91
query43	325	321	281	281
query44	1458	763	776	763
query45	202	188	178	178
query46	1156	1225	744	744
query47	2368	2367	2213	2213
query48	415	421	298	298
query49	595	438	318	318
query50	1160	438	333	333
query51	4430	4462	4437	4437
query52	90	86	74	74
query53	259	285	211	211
query54	276	230	219	219
query55	77	71	67	67
query56	296	292	279	279
query57	1409	1416	1315	1315
query58	285	246	263	246
query59	1564	1593	1414	1414
query60	301	275	263	263
query61	154	150	151	150
query62	708	652	584	584
query63	247	204	208	204
query64	2578	755	589	589
query65	4860	4739	4767	4739
query66	1806	511	384	384
query67	29631	29587	29337	29337
query68	3576	1587	968	968
query69	412	314	270	270
query70	1045	1003	967	967
query71	359	318	289	289
query72	2946	2640	2366	2366
query73	876	757	449	449
query74	5098	5015	4755	4755
query75	2608	2612	2211	2211
query76	2305	1209	774	774
query77	356	362	290	290
query78	12480	12473	11861	11861
query79	1417	1177	744	744
query80	647	560	464	464
query81	457	322	274	274
query82	572	160	127	127
query83	396	323	295	295
query84	320	176	127	127
query85	927	572	499	499
query86	362	293	295	293
query87	1847	1817	1767	1767
query88	3829	2840	2813	2813
query89	451	412	349	349
query90	1940	204	203	203
query91	217	187	159	159
query92	61	62	57	57
query93	1543	1560	968	968
query94	558	352	336	336
query95	780	581	475	475
query96	1047	818	356	356
query97	2700	2715	2553	2553
query98	219	206	199	199
query99	1176	1135	1034	1034
Total cold run time: 259421 ms
Total hot run time: 173687 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.36 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit fc8bc2304f448ebc5982b25641f0309c466f8a4d, data reload: false

query1	0.00	0.00	0.01
query2	0.09	0.05	0.05
query3	0.26	0.14	0.14
query4	1.60	0.13	0.14
query5	0.24	0.22	0.22
query6	1.23	1.07	1.05
query7	0.04	0.01	0.01
query8	0.06	0.04	0.04
query9	0.37	0.32	0.33
query10	0.54	0.57	0.55
query11	0.20	0.13	0.15
query12	0.18	0.15	0.14
query13	0.47	0.49	0.49
query14	1.02	0.99	1.01
query15	0.62	0.60	0.60
query16	0.32	0.31	0.33
query17	1.11	1.09	1.09
query18	0.22	0.21	0.21
query19	1.98	2.00	1.96
query20	0.01	0.01	0.01
query21	15.44	0.21	0.14
query22	5.04	0.05	0.05
query23	16.08	0.31	0.13
query24	2.90	0.42	0.33
query25	0.11	0.06	0.05
query26	0.72	0.20	0.17
query27	0.05	0.04	0.05
query28	3.57	0.93	0.53
query29	12.52	4.46	3.51
query30	0.28	0.15	0.17
query31	2.77	0.61	0.32
query32	3.23	0.60	0.49
query33	3.24	3.20	3.25
query34	15.60	4.25	3.51
query35	3.59	3.52	3.55
query36	0.55	0.43	0.41
query37	0.09	0.06	0.06
query38	0.06	0.04	0.04
query39	0.04	0.03	0.03
query40	0.17	0.17	0.15
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 96.78 s
Total hot run time: 25.36 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/91) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants