Skip to content

[refine](be) Refactor DataTypeNumberSerDe hierarchy: split base/final, sink if-constexpr branches#61996

Open
Mryange wants to merge 3 commits intoapache:masterfrom
Mryange:refine-DataTypeNumberSerDe
Open

[refine](be) Refactor DataTypeNumberSerDe hierarchy: split base/final, sink if-constexpr branches#61996
Mryange wants to merge 3 commits intoapache:masterfrom
Mryange:refine-DataTypeNumberSerDe

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented Apr 1, 2026

What problem does this PR solve?

Problem Summary:

DataTypeNumberSerDe<T> was a monolithic template class serving as both the base class for all number-like SerDe types and the concrete implementation for pure numeric types. It handled 17+ types via extensive if constexpr branches, making the code hard to read and maintain. The inheritance structure was also asymmetric with the DataType side.

DataType hierarchy (for reference)

The DataType side already has a clean two-level design:

IDataType
  └── DataTypeNumberBase<T>                  // shared base (non-final)
        ├── DataTypeNumber<T>  final         // pure numerics (Bool/Int8~Int128/Float/Double)
        ├── DataTypeDate  final
        ├── DataTypeDateTime  final
        ├── DataTypeDateV2  final
        ├── DataTypeDateTimeV2  final
        ├── DataTypeIPv4  final
        ├── DataTypeIPv6  final
        ├── DataTypeTimeV2  final
        └── DataTypeTimeStampTz  final

SerDe hierarchy — BEFORE

The old SerDe side was flat and asymmetric — DataTypeNumberSerDe<T> served as both the base and the final class, with massive if constexpr branches to dispatch type-specific logic:

DataTypeSerDe
  └── DataTypeNumberSerDe<T>        // monolithic: base + concrete all in one
        ├── DataTypeDateV2SerDe
        ├── DataTypeDateTimeV2SerDe
        ├── DataTypeDateSerDe        // DateTime v1 inherits from DateSerDe (!)
        │     └── DataTypeDateTimeSerDe
        ├── DataTypeIPv4SerDe
        ├── DataTypeIPv6SerDe
        ├── DataTypeTimeV2SerDe
        └── DataTypeTimeStampTzSerDe

Problems: (1) no separation between base and pure-numeric final class, (2) if constexpr branches for 10+ types scattered in base methods, (3) DateTime inheriting from Date (wrong relationship).

SerDe hierarchy — AFTER

DataTypeSerDe
  └── DataTypeNumberSerDeBase<T>              // shared base (non-final)
        ├── DataTypeNumberSerDe<T>  final     // pure numerics only (Bool/Int/Float/Time)
        ├── DataTypeDateLikeV1SerDe<T>        // shared Date/DateTime v1 template
        │     ├── DataTypeDateSerDe  final
        │     └── DataTypeDateTimeSerDe  final
        ├── DataTypeDateV2SerDe
        ├── DataTypeDateTimeV2SerDe
        ├── DataTypeIPv4SerDe
        ├── DataTypeIPv6SerDe
        ├── DataTypeTimeV2SerDe
        └── DataTypeTimeStampTzSerDe

Now the SerDe hierarchy mirrors the DataType hierarchy: DataTypeNumberSerDeBase<T>DataTypeNumberBase<T>, DataTypeNumberSerDe<T> finalDataTypeNumber<T> final, and Date/DateTime are siblings via a shared intermediate template.


This PR performs a multi-phase refactoring:

Phase 1 — Split base class and add thin final wrapper

  • Rename DataTypeNumberSerDe<T>DataTypeNumberSerDeBase<T> (base class)
  • Add DataTypeNumberSerDe<T> final for pure numeric types (Bool/Int/Float/Double/Time), mirroring DataTypeNumber<T> on the DataType side

Phase 3 — Lift get_scale() to DataTypeSerDe base

  • Move get_scale() from DataTypeNumberSerDeBase<T> to DataTypeSerDe, making it available to all SerDe types (e.g. Decimal)

Phase 4 — Split Date/DateTime v1 SerDe into separate files

  • Create DataTypeDateLikeV1SerDe<T> shared template for Date/DateTime v1 common logic
  • Split into data_type_date_serde.h/.cpp and data_type_datetime_serde.h/.cpp
  • DataTypeDateSerDe and DataTypeDateTimeSerDe are now siblings (not parent-child)

Phase 2 — Sink if constexpr branches to subclasses

  • Move type-specific PB/Jsonb serialization logic from DataTypeNumberSerDeBase<T> into subclass overrides:
    • DataTypeDateLikeV1SerDe<T>: write/read PB (INT64), write/read Jsonb
    • DataTypeDateV2SerDe: write/read PB (UINT32), write/read Jsonb
    • DataTypeDateTimeV2SerDe: write/read PB (UINT64), write/read Jsonb
    • DataTypeTimeStampTzSerDe: write/read PB (UINT64), write/read Jsonb, read_column_from_arrow
    • DataTypeIPv4SerDe: write/read Jsonb
  • Remove corresponding dead branches from the base class

After refactoring, the base class DataTypeNumberSerDeBase<T> PB methods only handle pure numeric types (Bool, TinyInt–BigInt, LargeInt, Float, Double, Time/TimeV2).

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 1, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 29684 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 16ba36d3bc84e144c319149ee3f865a37d39fb4b, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17928	3838	3782	3782
q2	q3	10666	891	616	616
q4	4678	470	363	363
q5	7443	1352	1150	1150
q6	196	178	141	141
q7	912	970	767	767
q8	9331	1401	1370	1370
q9	5610	5308	5293	5293
q10	6251	2044	1789	1789
q11	492	283	284	283
q12	867	688	507	507
q13	18041	2803	2173	2173
q14	282	286	256	256
q15	q16	863	859	781	781
q17	970	1168	853	853
q18	6465	5731	5604	5604
q19	1170	1203	1101	1101
q20	608	552	409	409
q21	4766	2494	2094	2094
q22	458	390	352	352
Total cold run time: 97997 ms
Total hot run time: 29684 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4681	4520	4661	4520
q2	q3	4628	4738	4163	4163
q4	2012	2079	1331	1331
q5	4916	4996	5357	4996
q6	210	192	157	157
q7	2258	1816	1602	1602
q8	3374	3125	3186	3125
q9	8255	8142	8379	8142
q10	4483	4505	4231	4231
q11	596	413	376	376
q12	650	705	484	484
q13	2719	3172	2342	2342
q14	298	305	273	273
q15	q16	797	771	674	674
q17	1317	1278	1259	1259
q18	8067	7147	7028	7028
q19	1144	1131	1120	1120
q20	2267	2243	1957	1957
q21	6125	5310	4861	4861
q22	551	500	405	405
Total cold run time: 59348 ms
Total hot run time: 53046 ms

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 1, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 179825 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 16ba36d3bc84e144c319149ee3f865a37d39fb4b, data reload: false

query5	4343	663	530	530
query6	347	228	215	215
query7	4232	550	321	321
query8	337	256	224	224
query9	8754	3920	3934	3920
query10	536	389	371	371
query11	6660	5472	5145	5145
query12	182	132	123	123
query13	1276	606	439	439
query14	5676	5146	4721	4721
query14_1	4109	4058	4076	4058
query15	207	201	179	179
query16	995	440	418	418
query17	949	753	628	628
query18	2433	488	372	372
query19	243	225	184	184
query20	137	137	139	137
query21	232	148	125	125
query22	14104	14948	14428	14428
query23	18124	17036	16649	16649
query23_1	16881	16643	16878	16643
query24	7463	1743	1369	1369
query24_1	1351	1346	1336	1336
query25	613	526	469	469
query26	1279	315	193	193
query27	2691	631	371	371
query28	4562	1916	1898	1898
query29	972	685	559	559
query30	302	244	199	199
query31	1102	1056	962	962
query32	88	74	73	73
query33	557	368	311	311
query34	1207	1172	671	671
query35	745	774	679	679
query36	1218	1217	1141	1141
query37	158	99	90	90
query38	3117	3024	2997	2997
query39	922	890	866	866
query39_1	842	841	826	826
query40	236	163	147	147
query41	67	65	64	64
query42	275	280	272	272
query43	316	372	275	275
query44	
query45	204	199	185	185
query46	1151	1212	787	787
query47	2325	2323	2199	2199
query48	393	428	313	313
query49	647	534	436	436
query50	724	292	221	221
query51	4278	4328	4239	4239
query52	280	281	272	272
query53	321	343	273	273
query54	323	303	282	282
query55	103	93	88	88
query56	334	331	317	317
query57	1782	1690	1664	1664
query58	301	276	277	276
query59	2890	2983	2715	2715
query60	327	330	315	315
query61	162	160	157	157
query62	702	621	559	559
query63	315	275	268	268
query64	5222	1490	1065	1065
query65	
query66	1475	470	370	370
query67	24290	24383	24185	24185
query68	
query69	438	343	300	300
query70	1022	985	1018	985
query71	376	335	315	315
query72	3021	2730	2434	2434
query73	783	807	430	430
query74	9910	9737	9542	9542
query75	3564	3368	2995	2995
query76	2391	1160	785	785
query77	406	441	340	340
query78	11465	11382	10754	10754
query79	1543	1117	856	856
query80	842	770	681	681
query81	453	278	236	236
query82	1347	158	124	124
query83	372	284	255	255
query84	310	140	120	120
query85	863	515	456	456
query86	381	342	313	313
query87	3304	3190	3065	3065
query88	3597	2738	2685	2685
query89	480	410	375	375
query90	1980	178	180	178
query91	179	171	133	133
query92	78	75	69	69
query93	912	887	507	507
query94	530	331	294	294
query95	663	473	334	334
query96	1073	724	336	336
query97	2711	2663	2564	2564
query98	255	235	231	231
query99	1056	1057	942	942
Total cold run time: 259292 ms
Total hot run time: 179825 ms

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 29393 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit aba9f743a996ccde838c2e66f891339fcd943b52, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17968	3824	3716	3716
q2	q3	10687	873	596	596
q4	4674	461	361	361
q5	7454	1341	1132	1132
q6	183	162	136	136
q7	908	941	776	776
q8	9504	1462	1394	1394
q9	6073	5408	5307	5307
q10	6291	2030	1741	1741
q11	508	273	274	273
q12	842	690	506	506
q13	18023	2781	2160	2160
q14	285	287	254	254
q15	q16	889	847	780	780
q17	1087	1211	828	828
q18	6330	5699	5635	5635
q19	1660	1314	1057	1057
q20	612	549	404	404
q21	4710	2429	1980	1980
q22	469	418	357	357
Total cold run time: 99157 ms
Total hot run time: 29393 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4503	4430	4353	4353
q2	q3	4635	4728	4157	4157
q4	2010	2295	1343	1343
q5	4897	4986	5183	4986
q6	208	170	133	133
q7	2041	1817	1620	1620
q8	3464	3106	3241	3106
q9	8519	8171	8183	8171
q10	4451	4468	4281	4281
q11	653	475	372	372
q12	653	709	488	488
q13	3067	3263	2471	2471
q14	298	300	274	274
q15	q16	753	789	720	720
q17	1323	1246	1251	1246
q18	7701	7037	6969	6969
q19	1157	1173	1151	1151
q20	2221	2189	1944	1944
q21	6448	5441	4792	4792
q22	539	494	421	421
Total cold run time: 59541 ms
Total hot run time: 52998 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 180417 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit aba9f743a996ccde838c2e66f891339fcd943b52, data reload: false

query5	4342	665	505	505
query6	341	230	209	209
query7	4287	586	352	352
query8	334	243	229	229
query9	8743	3835	3861	3835
query10	511	390	353	353
query11	6643	5493	5154	5154
query12	192	137	154	137
query13	1271	597	421	421
query14	5618	5151	4719	4719
query14_1	4089	4061	4118	4061
query15	207	195	178	178
query16	1071	443	426	426
query17	1124	787	638	638
query18	2478	502	360	360
query19	242	230	185	185
query20	139	129	127	127
query21	225	149	121	121
query22	13789	15332	15116	15116
query23	17944	17126	16616	16616
query23_1	16758	16718	16836	16718
query24	7578	1868	1386	1386
query24_1	1565	1538	1412	1412
query25	676	564	468	468
query26	1259	341	188	188
query27	2653	645	385	385
query28	4465	1878	1868	1868
query29	1025	694	579	579
query30	301	234	201	201
query31	1087	1044	944	944
query32	90	74	73	73
query33	559	383	313	313
query34	1244	1175	665	665
query35	765	778	680	680
query36	1278	1209	1055	1055
query37	152	100	86	86
query38	3133	3049	3040	3040
query39	944	911	860	860
query39_1	844	834	846	834
query40	242	166	149	149
query41	72	68	68	68
query42	280	277	273	273
query43	326	325	287	287
query44	
query45	217	201	200	200
query46	1229	1300	838	838
query47	2327	2350	2215	2215
query48	415	437	292	292
query49	642	550	424	424
query50	733	291	212	212
query51	4458	4267	4337	4267
query52	282	285	275	275
query53	328	345	272	272
query54	323	292	278	278
query55	102	98	88	88
query56	310	315	317	315
query57	1774	1582	1637	1582
query58	291	274	266	266
query59	2916	2984	2711	2711
query60	330	327	325	325
query61	156	151	161	151
query62	688	621	566	566
query63	316	272	263	263
query64	5348	1529	1096	1096
query65	
query66	1489	492	383	383
query67	24253	24407	24210	24210
query68	
query69	462	342	312	312
query70	1042	1049	974	974
query71	361	326	313	313
query72	3022	2714	2506	2506
query73	864	794	421	421
query74	9805	9846	9527	9527
query75	3578	3408	3003	3003
query76	2308	1155	760	760
query77	407	421	328	328
query78	11269	11452	10720	10720
query79	1536	1098	813	813
query80	1373	758	677	677
query81	521	283	234	234
query82	1260	159	119	119
query83	334	288	265	265
query84	274	150	120	120
query85	955	530	473	473
query86	465	344	287	287
query87	3294	3187	3118	3118
query88	3673	2691	2671	2671
query89	478	403	375	375
query90	2055	185	175	175
query91	183	187	148	148
query92	75	75	69	69
query93	937	910	505	505
query94	694	349	296	296
query95	674	458	332	332
query96	1057	832	324	324
query97	2700	2659	2547	2547
query98	238	226	226	226
query99	1062	1082	961	961
Total cold run time: 259978 ms
Total hot run time: 180417 ms

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.91% (20049/37890)
Line Coverage 36.50% (188114/515407)
Region Coverage 32.69% (145677/445676)
Branch Coverage 33.89% (63878/188486)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.63% (27322/37109)
Line Coverage 57.26% (294259/513858)
Region Coverage 54.68% (245930/449789)
Branch Coverage 56.24% (106329/189052)

@zclllyybb zclllyybb requested a review from Copilot April 2, 2026 08:07
@zclllyybb
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the BE numeric SerDe hierarchy to mirror the existing DataType hierarchy by introducing a shared DataTypeNumberSerDeBase<T> plus thin final wrappers for pure numerics, and by splitting Date/DateTime v1 SerDe into sibling implementations with shared templated logic. This also sinks several type-specific PB/Jsonb serialization branches into the relevant subclasses and lifts get_scale() to the DataTypeSerDe base.

Changes:

  • Introduce DataTypeNumberSerDeBase<T> and a thin final DataTypeNumberSerDe<T> for pure numerics; update get_serde() wiring accordingly.
  • Split Date/DateTime v1 SerDe into data_type_date_serde.* and new data_type_datetime_serde.* with shared DataTypeDateLikeV1SerDe<T>.
  • Move several PB/Jsonb serialization implementations from the number base SerDe into specialized SerDes (DateV2/DateTimeV2/TimeStampTz/IPv4, etc.) and add DataTypeSerDe::get_scale().

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
be/test/core/data_type_serde/data_type_serde_datetime_v1_test.cpp Updates includes and Date v1 SerDe construction after Date/DateTime v1 split.
be/test/core/data_type_serde/data_type_serde_datelike_batch_test.cpp Updates includes and switches from templated DateSerDe to concrete Date/DateTime v1 SerDes.
be/src/core/data_type/data_type_number.h Routes DataTypeNumber<T> SerDe creation through the new thin final numeric SerDe.
be/src/core/data_type/data_type_number_base.h Switches base numeric get_serde() to return DataTypeNumberSerDeBase<T>.
be/src/core/data_type/data_type_date.h Updates Date v1 SerDe include/type to DataTypeDateSerDe.
be/src/core/data_type/data_type_date_time.h Updates DateTime v1 SerDe include to the new data_type_datetime_serde.h.
be/src/core/data_type_serde/data_type_timestamptz_serde.h Changes inheritance to DataTypeNumberSerDeBase and adds PB/Jsonb + Arrow read overrides.
be/src/core/data_type_serde/data_type_timestamptz_serde.cpp Implements TimeStampTz PB/Jsonb and Arrow read logic previously handled in the monolithic base.
be/src/core/data_type_serde/data_type_time_serde.h Switches TimeV2 SerDe to inherit from DataTypeNumberSerDeBase.
be/src/core/data_type_serde/data_type_serde.h Adds DataTypeSerDe::get_scale() default implementation.
be/src/core/data_type_serde/data_type_serde.cpp Updates numeric binary deserialization dispatch to DataTypeNumberSerDeBase.
be/src/core/data_type_serde/data_type_number_serde.h Splits monolithic numeric SerDe into base + thin final numeric SerDe; removes PB branches for non-numerics.
be/src/core/data_type_serde/data_type_number_serde.cpp Renames implementations to DataTypeNumberSerDeBase and adds explicit instantiations.
be/src/core/data_type_serde/data_type_ipv6_serde.h Switches IPv6 SerDe inheritance to DataTypeNumberSerDeBase.
be/src/core/data_type_serde/data_type_ipv4_serde.h Switches IPv4 SerDe inheritance to DataTypeNumberSerDeBase and adds Jsonb cell IO overrides.
be/src/core/data_type_serde/data_type_ipv4_serde.cpp Implements IPv4 Jsonb cell IO overrides after removing base branches.
be/src/core/data_type_serde/data_type_datev2_serde.h Switches DateV2 SerDe inheritance to DataTypeNumberSerDeBase and declares PB/Jsonb overrides.
be/src/core/data_type_serde/data_type_datev2_serde.cpp Implements DateV2 PB/Jsonb overrides previously handled via base if constexpr branches.
be/src/core/data_type_serde/data_type_datetimev2_serde.h Switches DateTimeV2 SerDe inheritance to DataTypeNumberSerDeBase and declares PB/Jsonb overrides.
be/src/core/data_type_serde/data_type_datetimev2_serde.cpp Implements DateTimeV2 PB/Jsonb overrides previously handled via base if constexpr branches.
be/src/core/data_type_serde/data_type_datetime_serde.h New: declares DateTime v1 SerDe sibling class built on shared DataTypeDateLikeV1SerDe<T>.
be/src/core/data_type_serde/data_type_datetime_serde.cpp New: implements DateTime v1 JSON + Arrow read behavior (previously in combined Date/DateTime file).
be/src/core/data_type_serde/data_type_date_serde.h Introduces shared DataTypeDateLikeV1SerDe<T> template and concrete DataTypeDateSerDe.
be/src/core/data_type_serde/data_type_date_serde.cpp Moves shared Date/DateTime v1 logic here and adds PB/Jsonb overrides for v1 datelike types.
Comments suppressed due to low confidence (1)

be/src/core/data_type_serde/data_type_date_serde.cpp:589

  • write_column_to_pb() casts the VecDateTimeValue data buffer to int64_t* and passes it to protobuf. Even if VecDateTimeValue is trivially copyable, reinterpreting the object array as an int64_t array relies on layout/aliasing assumptions and is undefined behavior with strict-aliasing enabled. Prefer serializing by iterating and using an explicit underlying-int conversion (e.g., binary_cast/bit_cast or a dedicated accessor) rather than pointer reinterprets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ptype->set_id(PGenericType::UINT32);
auto* values = result.mutable_uint32_value();
values->Reserve(row_count);
values->Add((uint32_t*)data.begin() + start, (uint32_t*)data.begin() + end);
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_column_to_pb() is adding values by casting the column’s DateV2Value buffer to uint32_t* and passing it to protobuf. Even though DateV2Value stores an underlying integer, reinterpreting the object array as a uint32_t array relies on object layout and can violate strict-aliasing rules (undefined behavior under optimization). Prefer filling the repeated field by iterating and pushing to_date_int_val() (or an equivalent accessor) per element, or using a safe bit-cast/memcpy-based conversion instead of pointer reinterprets.

Suggested change
values->Add((uint32_t*)data.begin() + start, (uint32_t*)data.begin() + end);
for (int64_t i = start; i < end; ++i) {
values->Add(data[i].to_date_int_val());
}

Copilot uses AI. Check for mistakes.
ptype->set_id(PGenericType::UINT64);
auto* values = result.mutable_uint64_value();
values->Reserve(row_count);
values->Add((uint64_t*)data.begin() + start, (uint64_t*)data.begin() + end);
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_column_to_pb() casts the DateV2Value<DateTimeV2ValueType> data buffer to uint64_t* and passes it to protobuf. This relies on the in-memory layout of DateV2Value matching a raw uint64_t array and can violate strict-aliasing rules (undefined behavior). Safer approach: append values by iterating and using to_date_int_val() (or a safe bit-cast/memcpy) to obtain the underlying uint64_t representation.

Suggested change
values->Add((uint64_t*)data.begin() + start, (uint64_t*)data.begin() + end);
for (int64_t i = start; i < end; ++i) {
values->Add(data[i].to_date_int_val());
}

Copilot uses AI. Check for mistakes.
ptype->set_id(PGenericType::UINT64);
auto* values = result.mutable_uint64_value();
values->Reserve(row_count);
values->Add((uint64_t*)data.begin() + start, (uint64_t*)data.begin() + end);
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_column_to_pb() populates the protobuf repeated field by reinterpreting the TimestampTzValue object buffer as a uint64_t*. This assumes object layout compatibility and can trigger strict-aliasing UB under optimization. Consider appending the underlying integer via TimestampTzValue::to_date_int_val() (or a safe bit-cast/memcpy) per row instead of pointer casting.

Suggested change
values->Add((uint64_t*)data.begin() + start, (uint64_t*)data.begin() + end);
for (int64_t i = start; i < end; ++i) {
values->Add(binary_cast<TimestampTzValue, UInt64>(data[i]));
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants