Skip to content

[refactor](storage type) remove deepcopy from typeinfo#61856

Merged
yiguolei merged 2 commits intoapache:masterfrom
yiguolei:f2
Mar 31, 2026
Merged

[refactor](storage type) remove deepcopy from typeinfo#61856
yiguolei merged 2 commits intoapache:masterfrom
yiguolei:f2

Conversation

@yiguolei
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Mar 30, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yiguolei
Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 26925 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7e52365a1c6dcd8b5130fd2a7767fbea672011ce, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17643	4445	4301	4301
q2	q3	10637	789	536	536
q4	4681	362	249	249
q5	7740	1260	1002	1002
q6	179	173	146	146
q7	796	844	679	679
q8	9903	1510	1349	1349
q9	5370	4731	4757	4731
q10	6331	1932	1647	1647
q11	475	260	247	247
q12	750	599	463	463
q13	18045	2718	1947	1947
q14	225	234	214	214
q15	q16	753	733	663	663
q17	758	832	491	491
q18	5957	5469	5247	5247
q19	1363	993	632	632
q20	540	476	374	374
q21	4444	1836	1729	1729
q22	455	362	278	278
Total cold run time: 97045 ms
Total hot run time: 26925 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4847	4636	4593	4593
q2	q3	3913	4359	3818	3818
q4	892	1226	815	815
q5	4072	4419	4299	4299
q6	182	170	144	144
q7	1775	1661	1577	1577
q8	2502	2780	2610	2610
q9	7561	7486	7366	7366
q10	3920	4112	3604	3604
q11	512	429	412	412
q12	512	577	440	440
q13	2494	2982	2107	2107
q14	282	293	316	293
q15	q16	763	757	736	736
q17	1222	1388	1423	1388
q18	7255	6867	6765	6765
q19	942	1010	930	930
q20	2077	2202	2014	2014
q21	4011	3464	3559	3464
q22	501	436	397	397
Total cold run time: 50235 ms
Total hot run time: 47772 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 169217 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7e52365a1c6dcd8b5130fd2a7767fbea672011ce, data reload: false

query5	4340	638	507	507
query6	333	225	207	207
query7	4246	462	273	273
query8	349	250	236	236
query9	8758	2720	2715	2715
query10	537	399	342	342
query11	6952	5109	4854	4854
query12	195	128	128	128
query13	1279	456	350	350
query14	5767	3698	3520	3520
query14_1	2891	3089	2819	2819
query15	208	196	175	175
query16	997	478	466	466
query17	1113	735	636	636
query18	2457	475	364	364
query19	228	214	192	192
query20	136	131	126	126
query21	216	138	115	115
query22	13308	14116	14514	14116
query23	16861	16320	16039	16039
query23_1	16156	15939	15682	15682
query24	7176	1609	1228	1228
query24_1	1229	1257	1251	1251
query25	607	450	401	401
query26	1246	264	148	148
query27	2765	471	298	298
query28	4488	1836	1848	1836
query29	845	557	471	471
query30	297	222	188	188
query31	1015	947	871	871
query32	86	72	67	67
query33	514	333	285	285
query34	893	863	519	519
query35	627	686	589	589
query36	1102	1154	1004	1004
query37	132	95	79	79
query38	2936	2922	2838	2838
query39	859	839	804	804
query39_1	789	802	812	802
query40	234	147	136	136
query41	67	59	62	59
query42	261	252	248	248
query43	242	242	227	227
query44	
query45	196	188	181	181
query46	869	972	602	602
query47	2109	2142	2039	2039
query48	307	313	225	225
query49	647	465	421	421
query50	680	271	215	215
query51	4160	4118	4064	4064
query52	260	266	250	250
query53	291	334	283	283
query54	306	267	260	260
query55	89	84	84	84
query56	308	314	321	314
query57	1932	1756	1738	1738
query58	288	272	266	266
query59	2796	2981	2753	2753
query60	337	331	329	329
query61	158	154	162	154
query62	625	587	549	549
query63	311	274	274	274
query64	5096	1297	1000	1000
query65	
query66	1466	461	367	367
query67	24267	24352	24196	24196
query68	
query69	402	313	287	287
query70	979	985	896	896
query71	340	314	303	303
query72	2878	2711	2469	2469
query73	534	548	320	320
query74	9701	9607	9408	9408
query75	2845	2752	2472	2472
query76	2310	1048	666	666
query77	368	387	312	312
query78	10964	11126	10522	10522
query79	2516	768	599	599
query80	1793	619	545	545
query81	549	265	224	224
query82	1017	149	118	118
query83	335	271	242	242
query84	251	122	118	118
query85	915	505	458	458
query86	419	306	297	297
query87	3148	3142	3038	3038
query88	3554	2651	2664	2651
query89	429	365	348	348
query90	2009	175	176	175
query91	171	166	139	139
query92	77	75	72	72
query93	1072	837	512	512
query94	657	307	284	284
query95	586	395	318	318
query96	638	525	228	228
query97	2438	2481	2379	2379
query98	229	221	222	221
query99	993	1019	921	921
Total cold run time: 252220 ms
Total hot run time: 169217 ms

@yiguolei
Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 26737 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 346899620d13c02258d644ccdb5e4a9469f61389, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17641	4430	4291	4291
q2	q3	10642	834	540	540
q4	4713	354	246	246
q5	7874	1222	1016	1016
q6	178	172	144	144
q7	794	849	669	669
q8	10015	1489	1344	1344
q9	5393	4719	4726	4719
q10	6325	1953	1633	1633
q11	498	261	244	244
q12	758	577	456	456
q13	18062	2686	1990	1990
q14	224	237	212	212
q15	q16	739	727	682	682
q17	754	866	418	418
q18	6080	5378	5157	5157
q19	1190	984	635	635
q20	546	485	378	378
q21	4530	1833	1708	1708
q22	436	366	255	255
Total cold run time: 97392 ms
Total hot run time: 26737 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4722	4602	4750	4602
q2	q3	3904	4370	3908	3908
q4	877	1202	813	813
q5	4076	4349	4376	4349
q6	180	176	145	145
q7	1758	1666	1592	1592
q8	2537	2763	2600	2600
q9	7592	7388	7409	7388
q10	3777	3976	3560	3560
q11	523	487	439	439
q12	511	618	471	471
q13	2433	3058	2115	2115
q14	287	316	287	287
q15	q16	758	764	745	745
q17	1184	1320	1338	1320
q18	7276	6928	6631	6631
q19	909	913	911	911
q20	2101	2119	1993	1993
q21	3956	3488	3302	3302
q22	471	438	385	385
Total cold run time: 49832 ms
Total hot run time: 47556 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 168444 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 346899620d13c02258d644ccdb5e4a9469f61389, data reload: false

query5	4330	621	508	508
query6	327	232	198	198
query7	4220	482	261	261
query8	341	233	225	225
query9	8687	2732	2714	2714
query10	514	376	336	336
query11	7056	5069	4886	4886
query12	175	127	117	117
query13	1289	439	344	344
query14	5739	3657	3412	3412
query14_1	2817	2793	2826	2793
query15	211	189	178	178
query16	967	462	430	430
query17	866	690	586	586
query18	2425	434	337	337
query19	205	214	197	197
query20	129	122	124	122
query21	207	135	106	106
query22	13169	13877	14969	13877
query23	16737	16104	15952	15952
query23_1	16143	15848	15703	15703
query24	7176	1653	1200	1200
query24_1	1215	1223	1220	1220
query25	532	458	393	393
query26	1235	258	152	152
query27	2798	480	294	294
query28	4524	1842	1848	1842
query29	814	558	480	480
query30	288	225	190	190
query31	997	936	893	893
query32	83	68	67	67
query33	505	336	281	281
query34	913	847	505	505
query35	662	684	603	603
query36	1096	1121	1067	1067
query37	139	94	82	82
query38	2907	2907	2900	2900
query39	855	817	808	808
query39_1	793	798	793	793
query40	228	153	132	132
query41	61	60	58	58
query42	258	250	250	250
query43	255	248	228	228
query44	
query45	196	183	186	183
query46	881	983	612	612
query47	2100	2125	2056	2056
query48	314	319	241	241
query49	656	491	401	401
query50	739	299	224	224
query51	4072	4026	3979	3979
query52	261	267	257	257
query53	300	339	287	287
query54	309	283	274	274
query55	97	89	92	89
query56	329	329	328	328
query57	1934	1763	1566	1566
query58	294	279	277	277
query59	2820	2960	2742	2742
query60	351	352	330	330
query61	179	171	195	171
query62	616	610	512	512
query63	311	286	276	276
query64	5098	1357	1079	1079
query65	
query66	1485	474	370	370
query67	24200	24257	24193	24193
query68	
query69	426	330	295	295
query70	977	1007	914	914
query71	334	305	289	289
query72	2931	2684	2513	2513
query73	540	540	320	320
query74	9650	9535	9385	9385
query75	2870	2790	2451	2451
query76	2268	1015	662	662
query77	377	370	306	306
query78	10880	11102	10456	10456
query79	2782	734	588	588
query80	1765	622	549	549
query81	546	252	218	218
query82	974	157	120	120
query83	326	264	240	240
query84	303	116	99	99
query85	900	502	447	447
query86	410	308	333	308
query87	3140	3150	2964	2964
query88	3595	2670	2664	2664
query89	419	361	360	360
query90	2018	179	177	177
query91	171	162	134	134
query92	80	73	67	67
query93	1189	837	505	505
query94	641	314	302	302
query95	580	404	313	313
query96	649	510	228	228
query97	2462	2476	2392	2392
query98	241	213	217	213
query99	1007	1001	932	932
Total cold run time: 252467 ms
Total hot run time: 168444 ms

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.90% (19951/37714)
Line Coverage 36.45% (187218/513608)
Region Coverage 32.69% (145139/443989)
Branch Coverage 33.86% (63623/187879)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.46% (27126/36927)
Line Coverage 57.05% (292139/512054)
Region Coverage 54.38% (243674/448102)
Branch Coverage 56.02% (105566/188445)

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.46% (27126/36927)
Line Coverage 57.05% (292139/512054)
Region Coverage 54.38% (243674/448102)
Branch Coverage 56.02% (105566/188445)

@zclllyybb zclllyybb requested a review from Copilot March 30, 2026 11:51
@zclllyybb
Copy link
Copy Markdown
Contributor

/review

@github-actions
Copy link
Copy Markdown
Contributor

Code Review Summary

PR Goal

Remove the deep_copy virtual method from TypeInfo and all its subclasses (ScalarTypeInfo, ArrayTypeInfo, MapTypeInfo, StructTypeInfo), along with the corresponding FieldTypeTraits::deep_copy static methods. The two callers are replaced with direct/inlined alternatives.

Critical Checkpoint Conclusions

1. Does the code accomplish the goal? Is there a test that proves it?
Yes. The deep_copy virtual method is fully removed from TypeInfo and all subclasses. No callers remain in be/src/ or be/test/. Tests are updated to remove deep_copy test coverage (which is correct since the function no longer exists). The test is renamed from copy_and_equal to cmp_and_minmax to accurately reflect remaining coverage.

2. Is this modification as small, clear, and focused as possible?
Yes. The PR touches only the files necessary: the type hierarchy (types.h), the two callers (bloom_filter_index_writer, indexed_column_writer), and the test file.

3. Concurrency concerns?
Not applicable — no concurrency changes.

4. Lifecycle management?
Not applicable — no lifecycle changes.

5. Configuration items added?
None.

6. Incompatible changes (function symbols, storage formats)?
No. deep_copy was an internal virtual method, not part of any serialization or storage format. No compatibility concerns.

7. Functionally parallel code paths?
The PrimaryKeyBloomFilterIndexWriterImpl had its _type_info removed (correct). The template BloomFilterIndexWriterImpl (anonymous namespace, same .cpp file) has pre-existing dead _type_info and _arena members that were never used even before this PR. While cleaning them up here would be ideal for consistency, this is pre-existing dead code and not a blocking issue.

8. Special conditional checks?
The new inline Slice copy in PrimaryKeyBloomFilterIndexWriterImpl::add_values checks v->size > 0 before allocating, setting data = nullptr for empty slices. This is correct and matches the behavior of the removed FieldTypeTraits<CHAR>::deep_copy.

9. Test coverage?
The removed tests tested deep_copy specifically, which no longer exists. Remaining tests for cmp, set_to_min, set_to_max are preserved. The ArrayTypeTest deep_copy test is also correctly removed. Adequate.

10. Observability?
Not applicable — no observability changes needed for a dead-code removal.

11. Transaction/persistence?
Not applicable.

12. Data writes and modifications?
Not applicable — no data path changes beyond the two callers which are functionally equivalent (verified).

13. FE-BE variable passing?
Not applicable.

14. Performance analysis?
The IndexedColumnWriter change is a minor performance improvement: the old code deep-copied the first value into a buffer, then encoded from that buffer. The new code encodes directly from the source value, eliminating one redundant copy and the _arena + _first_value (faststring) allocations. The PrimaryKeyBloomFilterIndexWriterImpl change is neutral — it replaces one copy mechanism with an equivalent inline one.

15. Other issues?

  • Minor (pre-existing, non-blocking): BloomFilterIndexWriterImpl (template class in anonymous namespace, bloom_filter_index_writer.cpp:64-174) still has unused _type_info and _arena members. The _type_info is stored but never read. The _arena is never allocated from, so _arena.used_size() in size() always returns 0. Consider cleaning these up in a follow-up or in this PR for completeness.

Verdict

No blocking issues found. The refactoring is correct, complete (no remaining callers), and the replacement implementations are semantically equivalent to the originals. The pre-existing dead members in BloomFilterIndexWriterImpl are a minor cleanup opportunity.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors BE storage type handling by removing TypeInfo::deep_copy and updating dependent components/tests to avoid relying on Arena-based deep copies.

Changes:

  • Removes deep_copy from the TypeInfo interface and deletes related implementations/traits.
  • Updates IndexedColumnWriter to store the first-value encoded key directly (replacing the prior deep-copied raw value buffer).
  • Updates tests and bloom-filter index writing paths to no longer call deep_copy.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
be/test/storage/storage_types_test.cpp Removes deep-copy assertions and adjusts type tests accordingly.
be/src/storage/types.h Removes TypeInfo::deep_copy and associated deep-copy helpers/implementations.
be/src/storage/index/indexed_column_writer.h Replaces faststring first-value buffer with std::string for encoded first-key storage.
be/src/storage/index/indexed_column_writer.cpp Encodes/stashes the first value key at page start and reuses it when building the value index.
be/src/storage/index/bloom_filter/bloom_filter_index_writer.h Simplifies PK bloom filter writer construction by removing stored TypeInfo*.
be/src/storage/index/bloom_filter/bloom_filter_index_writer.cpp Replaces TypeInfo::deep_copy usage with manual Slice copying into an Arena.
Comments suppressed due to low confidence (1)

be/test/storage/storage_types_test.cpp:146

  • TEST(ArrayTypeTest, copy_and_equal) no longer exercises any copy/equality behavior after removing the deep-copy assertions; it currently only validates the resolved array item TypeInfo. Rename the test to reflect its actual intent (e.g., type resolution) or add assertions that match the current test name.
TEST(ArrayTypeTest, copy_and_equal) {
    bool bool_array[3] = {true, false, true};
    bool null_signs[3] = {true, true, true};
    common_test_array<FieldType::OLAP_FIELD_TYPE_BOOL>(CollectionValue(bool_array, 3, null_signs));

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 181 to +188
Slice new_value;
RETURN_IF_CATCH_EXCEPTION(_type_info->deep_copy(&new_value, v, _arena));
new_value.size = v->size;
if (v->size > 0) {
new_value.data = _arena.alloc(v->size);
memcpy(new_value.data, v->data, v->size);
} else {
new_value.data = nullptr;
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PrimaryKeyBloomFilterIndexWriterImpl::add_values used to wrap the deep-copy path in RETURN_IF_CATCH_EXCEPTION(...), which converts allocation/oom exceptions (e.g., from Arena::alloc) into a returned Status. The new manual copy path allocates via _arena.alloc without this wrapper, so std::bad_alloc/tracker exceptions can escape and violate the method contract (“return error when memory is exhausted”), potentially crashing the writer on OOM. Wrap the allocation+memcpy logic (or the whole loop body) in RETURN_IF_CATCH_EXCEPTION and return the resulting error Status on exception.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 598dfb9 into apache:master Mar 31, 2026
35 of 37 checks passed
yiguolei added a commit that referenced this pull request Mar 31, 2026
…1856 (#61913)

Cherry-picked from #61856

Co-authored-by: yiguolei <guolei@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants