[fix](variant) preserve TIMESTAMPTZ values in sparse path by csun5285 · Pull Request #63522 · apache/doris

csun5285 · 2026-05-22T06:09:48Z

Add the missing write_one_cell_to_binary override mirroring DataTypeDateTimeV2SerDe so the writer also emits the scale byte. Reader is already correct.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

DataTypeTimeStampTzSerDe inherited DataTypeNumberSerDe's default write_one_cell_to_binary, which emits [type:1][value:8]. The matching reader branch in DataTypeNumberSerDe<TYPE_TIMESTAMPTZ>::deserialize_binary_to_* skips a scale byte before reading the value, expecting [type:1][scale:1][value:8]. The 1-byte layout mismatch shifted every read by one byte, leaving only the timezone-offset bits intact, so CAST(var['ts'] AS string) on a variant typed path that fell to sparse returned just "+08:00" (DORIS-25915). Add the missing write_one_cell_to_binary override mirroring DataTypeDateTimeV2SerDe so the writer also emits the scale byte. Reader is already correct. Tests: - regression-test/suites/variant_p0/test_variant_timestamptz_sparse.groovy reproduces the Jira repro (typed paths > variant_max_subcolumns_count with variant_enable_typed_paths_to_sparse=true) and asserts the read value contains the date portion. - BE UT data_type_serde_timestamptz_test.cpp adds binary_roundtrip covering scale=0/3/6, checking the 10-byte layout and roundtrip via both DataTypeSerDe::deserialize_binary_to_column and ::deserialize_binary_to_field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

csun5285 · 2026-05-22T06:09:57Z

run buildall

hello-stephen · 2026-05-22T06:10:19Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

hello-stephen · 2026-05-22T07:41:40Z

TPC-H: Total hot run time: 31308 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 807ea93d508aa0cc3842a48ffe84893a4a26fab5, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17658	3864	3868	3864
q2	q3	10787	1393	810	810
q4	4690	486	351	351
q5	7610	2292	2110	2110
q6	382	179	141	141
q7	932	787	646	646
q8	9574	1696	1656	1656
q9	7040	4912	4951	4912
q10	6446	2098	1814	1814
q11	432	268	243	243
q12	638	439	306	306
q13	18123	3487	2741	2741
q14	257	252	229	229
q15	q16	814	781	707	707
q17	1008	896	924	896
q18	6897	5686	5463	5463
q19	1174	1240	1241	1240
q20	553	424	291	291
q21	5898	2748	2565	2565
q22	454	374	323	323
Total cold run time: 101367 ms
Total hot run time: 31308 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4601	4526	4664	4526
q2	q3	4827	5117	4682	4682
q4	2186	2204	1451	1451
q5	4877	4662	4647	4647
q6	227	184	132	132
q7	1884	1730	1510	1510
q8	2296	1915	1915	1915
q9	7259	7309	7174	7174
q10	4499	4405	4008	4008
q11	532	383	357	357
q12	712	732	514	514
q13	2995	3351	2832	2832
q14	284	288	248	248
q15	q16	673	697	604	604
q17	1311	1252	1252	1252
q18	7404	6942	6870	6870
q19	1121	1113	1107	1107
q20	2224	2207	1920	1920
q21	5374	4647	4520	4520
q22	526	475	434	434
Total cold run time: 55812 ms
Total hot run time: 50703 ms

hello-stephen · 2026-05-22T07:52:32Z

TPC-DS: Total hot run time: 169416 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 807ea93d508aa0cc3842a48ffe84893a4a26fab5, data reload: false

query5	4333	656	515	515
query6	328	217	201	201
query7	4226	589	295	295
query8	328	234	221	221
query9	8853	3964	3975	3964
query10	451	345	300	300
query11	5793	2381	2181	2181
query12	181	126	122	122
query13	1277	598	416	416
query14	5968	5360	5028	5028
query14_1	4342	4340	4314	4314
query15	216	206	185	185
query16	1041	459	428	428
query17	935	739	590	590
query18	2437	494	374	374
query19	233	215	169	169
query20	136	132	132	132
query21	219	139	121	121
query22	13627	13558	13335	13335
query23	17281	16301	16029	16029
query23_1	16115	16277	16234	16234
query24	7552	1789	1308	1308
query24_1	1308	1307	1322	1307
query25	582	505	443	443
query26	1318	314	172	172
query27	2743	556	341	341
query28	4540	1958	1954	1954
query29	991	640	522	522
query30	307	244	204	204
query31	1124	1062	933	933
query32	90	78	76	76
query33	562	362	310	310
query34	1209	1171	659	659
query35	760	784	675	675
query36	1357	1338	1271	1271
query37	156	113	91	91
query38	3206	3148	3037	3037
query39	941	921	898	898
query39_1	885	882	868	868
query40	233	148	129	129
query41	74	70	67	67
query42	114	121	114	114
query43	326	323	287	287
query44	
query45	214	204	200	200
query46	1062	1220	734	734
query47	2315	2377	2226	2226
query48	412	402	314	314
query49	651	530	376	376
query50	977	338	250	250
query51	4379	4328	4199	4199
query52	102	103	93	93
query53	252	272	204	204
query54	318	276	263	263
query55	91	86	86	86
query56	286	334	295	295
query57	1439	1407	1330	1330
query58	288	270	267	267
query59	1558	1672	1412	1412
query60	320	311	318	311
query61	165	161	164	161
query62	666	632	551	551
query63	253	198	204	198
query64	2415	800	640	640
query65	
query66	1718	487	355	355
query67	30076	29951	29843	29843
query68	
query69	472	342	308	308
query70	1032	999	977	977
query71	301	271	269	269
query72	3041	2780	2432	2432
query73	819	763	419	419
query74	5061	4949	4732	4732
query75	2678	2598	2264	2264
query76	2307	1115	757	757
query77	397	423	341	341
query78	12298	12068	11637	11637
query79	1434	1083	729	729
query80	668	550	457	457
query81	458	275	242	242
query82	1367	156	122	122
query83	355	274	247	247
query84	312	140	113	113
query85	900	551	446	446
query86	401	334	305	305
query87	3413	3428	3221	3221
query88	3520	2648	2657	2648
query89	433	387	334	334
query90	1987	187	182	182
query91	178	172	141	141
query92	77	73	72	72
query93	1495	1429	818	818
query94	557	357	305	305
query95	662	472	357	357
query96	1047	795	333	333
query97	2703	2693	2572	2572
query98	231	225	231	225
query99	1137	1091	986	986
Total cold run time: 253396 ms
Total hot run time: 169416 ms

hello-stephen · 2026-05-22T08:36:32Z

BE UT Coverage Report

Increment line coverage 100.00% (13/13) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.66% (20756/38678)
Line Coverage	37.26% (196639/527778)
Region Coverage	33.59% (154157/458908)
Branch Coverage	34.58% (67123/194094)

hello-stephen · 2026-05-22T10:35:31Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (13/13) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.75% (27940/37884)
Line Coverage	57.64% (303425/526432)
Region Coverage	54.72% (253524/463337)
Branch Coverage	56.31% (109709/194821)

eldenmoon · 2026-05-25T02:48:22Z

/review

github-actions

I found a blocking compatibility issue in the sparse VARIANT TIMESTAMPTZ serialization fix. The new writer emits the missing scale byte for future rows, but existing persisted sparse values written by the old code still have the 9-byte layout and the read path does not have a compatibility branch, so those rowsets can be misread or fail the existing size check after upgrade.

Critical checkpoints:

Goal/test: the PR addresses new sparse TIMESTAMPTZ writes and adds BE/regression coverage for the new 10-byte layout, but does not prove old persisted sparse data remains readable.
Scope/focus: the production change is small and mostly focused; the runtime-filter rename is mechanical.
Concurrency/lifecycle: no new concurrency or special lifecycle risks identified.
Configuration: no new configuration items.
Compatibility/storage format: blocking issue found. This changes a persisted sparse-column value layout without reader compatibility for values already written by the old layout.
Parallel paths: datetimev2 already uses the scale byte; TIMESTAMPTZ writer is brought in line, but legacy TIMESTAMPTZ bytes still need handling.
Tests: new tests cover the fixed layout and new writes; missing legacy-read coverage. The regression test also violates Doris regression test standards by using def tableName, manual assertTrue, and dropping the table at the end.
Observability/transactions/data writes/FE-BE variables/performance: no additional issues found.
User focus: no additional user-provided review focus was specified.

github-actions · 2026-05-25T02:54:00Z

+    const auto sc = static_cast<uint8_t>(_scale);
+
+    const size_t old_size = chars.size();
+    const size_t new_size = old_size + sizeof(uint8_t) + sizeof(uint8_t) + data_ref.size;


This fixes the encoding for newly written sparse TIMESTAMPTZ values, but it also changes a persisted sparse-column value layout from the old [type][8-byte value] bytes to [type][scale][8-byte value] without any reader compatibility for rowsets already written by the old code. The variant sparse path stores these bytes in ColumnString (ColumnVariant::serialize_to_binary_column / SparseColumnMergeIterator::_serialize_nullable_column_to_sparse), and ColumnVariant::deserialize_from_binary_column later calls DataTypeSerDe::deserialize_binary_to_field followed by CHECK_EQ(end - start_data, data_ref.size). For an existing 9-byte TIMESTAMPTZ value, the current reader consumes a scale byte plus 8 value bytes (10 bytes total), so after this change old persisted sparse values can read past the StringRef and/or trip the size check during upgrade. Please add a compatibility read path for the legacy 9-byte TIMESTAMPTZ encoding, and add a test that deserializes those old bytes.

github-actions · 2026-05-25T02:54:00Z

+suite("test_variant_timestamptz_sparse", "p0"){
+    sql " set time_zone = '+08:00' "
+
+    def tableName = "test_variant_timestamptz_sparse_repro"


This new regression test does not follow the Doris regression-test rules in AGENTS.md: simple table names should be hardcoded instead of def tableName, deterministic checks should use qt_*/.out instead of manual assertTrue, and tests should not drop tables at the end (drop before use only, to preserve the environment for debugging). Please rewrite this to hardcode test_variant_timestamptz_sparse_repro, express the checked values through ordered qt_* output, and remove the final drop.

eldenmoon · 2026-05-25T03:07:36Z

need to forbit new type creation in schema template in FE, if new type is not fully adapted

github-actions Bot requested changes May 25, 2026

View reviewed changes

eldenmoon added the dev/4.1.x label May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix](variant) preserve TIMESTAMPTZ values in sparse path#63522

[fix](variant) preserve TIMESTAMPTZ values in sparse path#63522
csun5285 wants to merge 1 commit into
apache:masterfrom
csun5285:fix/DORIS-25915-variant-timestamptz-sparse

csun5285 commented May 22, 2026 •

edited

Loading

Uh oh!

csun5285 commented May 22, 2026

Uh oh!

hello-stephen commented May 22, 2026

Uh oh!

hello-stephen commented May 22, 2026

Uh oh!

hello-stephen commented May 22, 2026

Uh oh!

hello-stephen commented May 22, 2026

Uh oh!

hello-stephen commented May 22, 2026

Uh oh!

eldenmoon commented May 25, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 25, 2026

Uh oh!

github-actions Bot May 25, 2026

Uh oh!

eldenmoon commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

csun5285 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

csun5285 commented May 22, 2026

Uh oh!

hello-stephen commented May 22, 2026

Uh oh!

hello-stephen commented May 22, 2026

Uh oh!

hello-stephen commented May 22, 2026

Uh oh!

hello-stephen commented May 22, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented May 22, 2026

BE Regression && UT Coverage Report

Uh oh!

eldenmoon commented May 25, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

eldenmoon commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

csun5285 commented May 22, 2026 •

edited

Loading

eldenmoon commented May 25, 2026 •

edited

Loading