Skip to content

[fix](datetime) Replace legacy from_date_str with cast function#61682

Open
zclllyybb wants to merge 3 commits intoapache:masterfrom
zclllyybb:cascade/datetime-parsing-casttodatev2-d2f2db
Open

[fix](datetime) Replace legacy from_date_str with cast function#61682
zclllyybb wants to merge 3 commits intoapache:masterfrom
zclllyybb:cascade/datetime-parsing-casttodatev2-d2f2db

Conversation

@zclllyybb
Copy link
Contributor

@zclllyybb zclllyybb commented Mar 24, 2026

and split individual CastToTimestampTz class

and Replace IS_STRICT and IS_DATETIME boolean template parameters with strongly-typed enums DatelikeParseMode and DatelikeTargetType across all datetime cast operations. This improves code readability and type safety by making template parameters self-documenting.

and split individual `CastToTimestampTz` class
@zclllyybb
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zclllyybb
Copy link
Contributor Author

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR replaces legacy from_date_str member functions on VecDateTimeValue and DateV2Value<T> with static cast function calls (CastToDateOrDatetime, CastToDateV2, CastToDatetimeV2, CastToTimestampTz). It also refactors CastToDatetimeV2 internals to separate public API from private implementation details via _internal suffix methods, and extracts a new CastToTimestampTz wrapper in cast_to_timestamptz_impl.hpp.

Critical Checkpoint Conclusions

Goal & correctness: The PR achieves its goal of removing from_date_str by routing all callers through the unified cast functions. All 38 files are consistent in the replacement pattern. The from_date_str methods are fully removed with no remaining callers.

Focused & minimal: The change is relatively large (38 files, +572/-260) but mechanical in nature. Each call site follows a consistent pattern.

Concurrency: No concurrency issues introduced. The partition_transformers.h non-atomic static bool initialized pattern is pre-existing and not introduced by this PR.

Lifecycle / static initialization: No new SIOF risks. The CastToTimestampTz forward declaration in cast_to_datetimev2_impl.hpp (line 85) and subsequent definition in cast_to_timestamptz_impl.hpp are well-structured.

Configuration items: None added.

Incompatible changes: The typo fix CastToTimstampTzCastToTimestampTz is a symbol rename. Since the old symbol was only used internally and the new symbol is defined identically, this is safe.

Parallel code paths: All date type variants (DATE, DATETIME, DATEV2, DATETIMEV2, TIMESTAMPTZ) are handled. The ES scroll parser and runtime filter wrapper correctly dispatch to the appropriate cast function based on type.

Test coverage: Tests are updated consistently across 12 test files. No new tests added, but existing tests verify the new paths.

Observability: No observability changes needed for this refactoring.

Performance: No performance regression expected. The new code paths are functionally equivalent. Moving implementations from headers to .cpp files may slightly improve compilation time.

Issues Found

  1. [Minor] Whitespace issue in runtime_state.cpp — extra leading space on _nano_seconds = 0; line.

  2. [Design concern] ODR risk — Two different struct CastToTimestampTz definitions exist in the codebase (cast_to_timestamptz.h and cast_to_timestamptz_impl.hpp). They have the same name in the same namespace but different member functions and parameter types. While no translation unit currently includes both, this is a latent ODR violation that could cause undefined behavior if any future include chain pulls in both. Consider renaming the impl version (e.g., CastToTimestampTzImpl) to avoid confusion and prevent accidental ODR violations.

@doris-robot
Copy link

TPC-H: Total hot run time: 26945 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 655e7aa97d6961878ec0da6ddffd929f26cc8bcb, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17630	4458	4274	4274
q2	q3	10635	810	541	541
q4	4688	344	251	251
q5	7642	1234	1012	1012
q6	173	174	145	145
q7	793	853	679	679
q8	9892	1473	1355	1355
q9	5608	4823	4784	4784
q10	6317	1965	1675	1675
q11	456	247	244	244
q12	760	583	465	465
q13	18032	2712	1958	1958
q14	225	233	222	222
q15	q16	758	753	663	663
q17	746	864	423	423
q18	5961	5484	5297	5297
q19	1138	984	605	605
q20	525	493	369	369
q21	4539	1849	1712	1712
q22	395	367	271	271
Total cold run time: 96913 ms
Total hot run time: 26945 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4758	4569	4579	4569
q2	q3	3880	4362	3857	3857
q4	893	1205	801	801
q5	4173	4552	4339	4339
q6	197	182	146	146
q7	1786	1679	1552	1552
q8	2606	2815	2593	2593
q9	7428	7507	7481	7481
q10	3746	4038	3564	3564
q11	505	445	418	418
q12	467	611	458	458
q13	2413	3220	2086	2086
q14	293	308	279	279
q15	q16	733	751	706	706
q17	1171	1395	1376	1376
q18	7457	6893	6690	6690
q19	910	895	918	895
q20	2102	2151	2036	2036
q21	3939	3573	3306	3306
q22	484	435	366	366
Total cold run time: 49941 ms
Total hot run time: 47518 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169719 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 655e7aa97d6961878ec0da6ddffd929f26cc8bcb, data reload: false

query5	4313	633	506	506
query6	348	225	203	203
query7	4214	463	266	266
query8	337	253	219	219
query9	8686	2735	2744	2735
query10	470	390	369	369
query11	6980	5097	4871	4871
query12	187	134	126	126
query13	1284	466	384	384
query14	5758	3704	3454	3454
query14_1	2802	2765	2787	2765
query15	208	192	174	174
query16	1006	472	455	455
query17	1093	696	605	605
query18	2426	440	345	345
query19	239	212	184	184
query20	138	125	129	125
query21	213	131	106	106
query22	13250	14027	14495	14027
query23	17003	16445	16419	16419
query23_1	16231	16390	15771	15771
query24	7129	1617	1217	1217
query24_1	1248	1238	1232	1232
query25	573	484	449	449
query26	1257	267	159	159
query27	2770	497	303	303
query28	4512	1837	1867	1837
query29	889	581	491	491
query30	300	227	196	196
query31	1021	955	874	874
query32	87	76	72	72
query33	517	342	301	301
query34	913	871	525	525
query35	651	682	603	603
query36	1108	1130	996	996
query37	141	102	88	88
query38	2939	2933	2849	2849
query39	864	833	812	812
query39_1	802	789	799	789
query40	236	162	145	145
query41	72	68	69	68
query42	269	267	255	255
query43	247	256	231	231
query44	
query45	210	189	236	189
query46	889	989	618	618
query47	2137	2124	2091	2091
query48	308	321	223	223
query49	625	478	384	384
query50	696	286	215	215
query51	4090	4104	3989	3989
query52	268	269	259	259
query53	298	335	282	282
query54	292	282	268	268
query55	90	86	83	83
query56	317	326	320	320
query57	1924	1814	1736	1736
query58	281	273	274	273
query59	2816	2938	2748	2748
query60	343	343	336	336
query61	153	145	147	145
query62	616	596	550	550
query63	314	277	276	276
query64	5013	1343	1043	1043
query65	
query66	1471	449	357	357
query67	24131	24307	24265	24265
query68	
query69	403	328	288	288
query70	999	1003	980	980
query71	347	314	307	307
query72	2866	2720	2472	2472
query73	545	548	321	321
query74	9624	9613	9486	9486
query75	2889	2761	2457	2457
query76	2285	1021	666	666
query77	369	392	314	314
query78	10984	11072	10513	10513
query79	3124	745	568	568
query80	1792	636	558	558
query81	549	258	222	222
query82	989	151	118	118
query83	350	264	245	245
query84	256	123	104	104
query85	929	501	455	455
query86	412	309	310	309
query87	3163	3104	3014	3014
query88	3575	2662	2643	2643
query89	430	386	344	344
query90	2015	184	178	178
query91	172	167	139	139
query92	81	76	71	71
query93	1220	848	498	498
query94	653	328	284	284
query95	596	411	325	325
query96	649	519	229	229
query97	2473	2511	2400	2400
query98	247	231	227	227
query99	1000	962	919	919
Total cold run time: 252889 ms
Total hot run time: 169719 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.79% (19858/37618)
Line Coverage 36.29% (185529/511240)
Region Coverage 32.54% (143606/441384)
Branch Coverage 33.75% (62885/186334)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.32% (27007/36835)
Line Coverage 56.82% (289605/509688)
Region Coverage 54.18% (241383/445524)
Branch Coverage 55.87% (104419/186902)

@zclllyybb
Copy link
Contributor Author

/review

…pes in cast functions

Replace `IS_STRICT` and `IS_DATETIME` boolean template parameters with strongly-typed enums `DatelikeParseMode` and `DatelikeTargetType` across all datetime cast operations. This improves code readability and type safety by making template parameters self-documenting.
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This is a well-executed refactoring PR that replaces legacy from_date_str methods with strongly-typed cast functions and replaces boolean template parameters (IS_STRICT, IS_DATETIME) with DatelikeParseMode and DatelikeTargetType enums. The new CastToTimestampTz class is cleanly factored out. Overall the changes are mechanical and correct, with one inconsistency noted below.

Critical Checkpoints (Part 1.3)

  • Goal / correctness: The goal is to replace legacy from_date_str with cast functions and use strongly-typed enums. The code accomplishes this goal. Existing regression tests cover basic paths but no new dedicated tests for the enum-based API were added.
  • Minimal and focused: Yes. The PR is large (46 files) but purely mechanical — each change follows a consistent pattern.
  • Concurrency: Not applicable. No concurrent data structures or threading changes.
  • Lifecycle / static init: Not applicable. No new static variables or lifecycle changes.
  • Configuration: Not applicable. No new config items.
  • Incompatible changes: No. This is an internal API refactoring with no user-visible or wire-format changes.
  • Parallel code paths: One inconsistency found (see inline comment) — from_decimal_batch in data_type_date_or_datetime_serde.cpp uses STRICT where datev2/datetimev2 serdes use NON_STRICT. This preserves the old behavior (the old code also used <true> in the non-strict path) but is inconsistent with the v2 serde files where the PR corrected this.
  • Special conditional checks: The ternary pattern IsDatetime ? DatelikeTargetType::DATE_TIME : DatelikeTargetType::DATE used in templates is correct and consistent.
  • Test coverage: Existing tests are mechanically adapted. No new test cases specifically for the enum-based API, but the refactoring is behavior-preserving so existing coverage is adequate.
  • Observability: Not applicable. No new features requiring observability.
  • Transaction / persistence: Not applicable.
  • Data writes: Not applicable.
  • FE-BE variables: Not applicable.
  • Performance: The from_float for CastToDateOrDatetime now takes an additional to_scale parameter that is explicitly discarded via static_cast<void>(to_scale) — no performance impact. The from_decimal_batch NON_STRICT change for v2 types is actually a minor performance improvement (avoids wasteful error-status writes).
  • Other issues: The typo fix CastToTimstampTzCastToTimestampTz is a nice cleanup. The mutable_utc_dt() accessor added to TimestampTzValue is appropriate for the CastToTimestampTz integration.

@zclllyybb zclllyybb force-pushed the cascade/datetime-parsing-casttodatev2-d2f2db branch from 8ec75dc to 35ecbcb Compare March 25, 2026 05:29
@zclllyybb zclllyybb force-pushed the cascade/datetime-parsing-casttodatev2-d2f2db branch from 35ecbcb to a4a5960 Compare March 25, 2026 05:31
@zclllyybb
Copy link
Contributor Author

/review

@github-actions
Copy link
Contributor

Code Review — No Blocking Issues Found

Reviewed the full diff (47 files, +1349/-425). This is a well-executed mechanical refactoring with several embedded bug fixes.

Critical Checkpoint Conclusions

1. Goal & Correctness: The PR replaces legacy from_date_str methods and boolean template parameters (IS_STRICT, IS_DATETIME) with strongly-typed enums (DatelikeParseMode, DatelikeTargetType) and centralized cast functions. All replacement sites are correct — the new code preserves the original dispatch logic for every caller.

2. Modification Scope: Focused on the enum migration and API consolidation. Also fixes several pre-existing bugs:

  • from_decimal_batch for DateV1/DateTimeV1 had IS_STRICT=true in the template but params.is_strict=false at runtime — now correctly uses NON_STRICT
  • from_decimal slow-dispatch overload had TimeValue::TimeType& parameter type (wrong for Date/DateTime context) — now correctly uses VecDateTimeValue&
  • function_test_util.cpp called from_string_strict_mode<false> with is_strict=false — template/runtime mismatch fixed to both use STRICT
  • FieldTypeTraits<TIMESTAMPTZ>::from_string defaulted to non-strict — now correctly uses strict mode for OLAP field parsing
  • Typo fix: CastToTimstampTzCastToTimestampTz

3. Concurrency: N/A — all changes are in stateless static functions.

4. Lifecycle Management: N/A.

5. Configuration Items: None added.

6. Incompatible Changes: from_date_str removed from VecDateTimeValue and DateV2Value<T> public API. This is internal BE API only.

7. Parallel Code Paths: Enum migration applied uniformly across all datelike types (Date, DateTime, DateV2, DateTimeV2, TimeV2, TimestampTz). No paths missed.

8. Special Conditional Checks: DCHECK(IsStrict == params.is_strict) guards added to catch template/runtime parameter mismatches in debug builds — appropriate use of assertions.

9. Test Coverage: New data_type_serde_datelike_batch_test.cpp (454 lines) covers from_string, from_float, from_decimal batch operations for all types including mixed valid/invalid rows. The mixed-batch tests (datev1_from_decimal_batch_mixed, datetimev1_from_decimal_batch_mixed) serve as regression tests for the IS_STRICT bug fix. Existing test files updated to use new API.

10. Observability: N/A for this refactoring.

11. Performance: Zero runtime cost — enum-to-bool conversions are constexpr. The static_cast<void>(to_scale) in CastToDateOrDatetime::from_float is a no-op to suppress unused parameter warning after interface unification.

12. Other Observations:

  • CastToTimestampTz properly extracted into its own file with friend access to CastToDatetimeV2's private _internal methods
  • The _read_column_from_arrow change correctly hardcodes DATE_TIME since VecDateTimeValue() defaults to TIME_DATETIME, matching old runtime dispatch behavior
  • read_date_text_impl / read_datetime_text_impl correctly use DATE / DATE_TIME respectively, matching the old from_date_str runtime type check
  • New Int64 overloads in io_helper.cpp properly use binary_cast round-trip

LGTM.

@zclllyybb
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26914 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a4a59600771c3f9a9e56c66531570af4078551ce, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17612	4582	4290	4290
q2	q3	10641	795	530	530
q4	4679	352	247	247
q5	7554	1218	1036	1036
q6	176	178	148	148
q7	804	855	698	698
q8	9331	1496	1377	1377
q9	4935	4771	4772	4771
q10	6241	1915	1698	1698
q11	487	257	244	244
q12	694	596	478	478
q13	18060	2708	1953	1953
q14	234	235	212	212
q15	q16	739	739	703	703
q17	747	884	427	427
q18	6176	5579	5293	5293
q19	1106	986	649	649
q20	539	492	395	395
q21	4489	1858	1452	1452
q22	542	378	313	313
Total cold run time: 95786 ms
Total hot run time: 26914 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4797	4696	4582	4582
q2	q3	3914	4356	3866	3866
q4	888	1175	794	794
q5	4144	4426	4392	4392
q6	183	177	140	140
q7	1816	1702	1507	1507
q8	2512	2750	2588	2588
q9	7751	7413	7528	7413
q10	3885	4021	3609	3609
q11	528	426	417	417
q12	506	616	470	470
q13	2449	2974	2111	2111
q14	276	331	282	282
q15	q16	713	754	709	709
q17	1156	1331	1314	1314
q18	7054	6797	6777	6777
q19	936	939	929	929
q20	2124	2227	2039	2039
q21	4097	3457	3346	3346
q22	501	422	391	391
Total cold run time: 50230 ms
Total hot run time: 47676 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169177 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a4a59600771c3f9a9e56c66531570af4078551ce, data reload: false

query5	4327	664	526	526
query6	340	235	213	213
query7	4209	483	267	267
query8	354	250	226	226
query9	8722	2762	2759	2759
query10	518	406	351	351
query11	6998	5092	4882	4882
query12	191	128	128	128
query13	1279	466	354	354
query14	5783	3768	3419	3419
query14_1	2876	2785	2811	2785
query15	203	194	184	184
query16	991	472	480	472
query17	903	750	632	632
query18	2457	463	362	362
query19	220	221	185	185
query20	134	125	132	125
query21	215	137	114	114
query22	13214	14108	14848	14108
query23	16831	16353	15950	15950
query23_1	15943	15741	15662	15662
query24	7113	1614	1219	1219
query24_1	1237	1245	1222	1222
query25	564	467	406	406
query26	1247	258	147	147
query27	2792	484	293	293
query28	4471	1825	1835	1825
query29	849	559	473	473
query30	299	230	188	188
query31	1007	950	869	869
query32	84	76	70	70
query33	517	341	292	292
query34	910	882	531	531
query35	648	675	621	621
query36	1079	1164	930	930
query37	127	94	86	86
query38	2942	2933	2871	2871
query39	848	844	813	813
query39_1	798	784	789	784
query40	231	151	138	138
query41	63	60	60	60
query42	263	261	257	257
query43	249	245	224	224
query44	
query45	197	191	185	185
query46	889	1002	613	613
query47	2149	2143	2023	2023
query48	327	327	246	246
query49	641	462	387	387
query50	688	283	221	221
query51	4077	4071	3983	3983
query52	262	270	258	258
query53	287	340	289	289
query54	311	304	286	286
query55	92	86	85	85
query56	315	325	306	306
query57	1941	1790	1789	1789
query58	291	277	278	277
query59	2826	2965	2738	2738
query60	341	348	329	329
query61	157	149	156	149
query62	631	597	540	540
query63	309	285	273	273
query64	5135	1288	1044	1044
query65	
query66	1461	456	375	375
query67	24324	24370	24230	24230
query68	
query69	404	312	284	284
query70	989	1009	967	967
query71	356	307	330	307
query72	2855	2733	2519	2519
query73	532	547	311	311
query74	9646	9600	9421	9421
query75	2849	2777	2470	2470
query76	2280	1028	685	685
query77	366	389	319	319
query78	10899	11088	10477	10477
query79	1065	844	568	568
query80	794	629	567	567
query81	491	268	222	222
query82	1308	159	121	121
query83	340	269	252	252
query84	292	124	109	109
query85	902	499	452	452
query86	386	318	274	274
query87	3153	3117	3020	3020
query88	3584	2687	2674	2674
query89	426	367	346	346
query90	1927	184	177	177
query91	170	163	136	136
query92	75	78	71	71
query93	911	837	532	532
query94	463	312	296	296
query95	604	350	331	331
query96	658	520	225	225
query97	2470	2495	2388	2388
query98	245	227	215	215
query99	1002	1012	925	925
Total cold run time: 249594 ms
Total hot run time: 169177 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.86% (19887/37624)
Line Coverage 36.37% (185933/511241)
Region Coverage 32.60% (144007/441690)
Branch Coverage 33.81% (63014/186401)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.81% (26455/36841)
Line Coverage 54.66% (278578/509678)
Region Coverage 52.01% (231866/445816)
Branch Coverage 53.35% (99746/186965)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants