Skip to content

[fix](be) Fix timestamptz group_array state serde#63827

Merged
zclllyybb merged 2 commits into
apache:masterfrom
Mryange:fix-timestamptz-group_array
May 29, 2026
Merged

[fix](be) Fix timestamptz group_array state serde#63827
zclllyybb merged 2 commits into
apache:masterfrom
Mryange:fix-timestamptz-group_array

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented May 28, 2026

What problem does this PR solve?

Problem Summary:

Fix collect_list/group_array on nested TIMESTAMPTZ values when complex aggregate state is serialized through JSON. This keeps the existing state format for compatibility, provides a UTC timezone during serde, and adds regression coverage for the nested group_array case.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 28, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one regression-test standards issue that should be fixed before merge.

Critical checkpoint conclusions:

  • Goal/test coverage: The code change addresses aggregate-state serialization for nested TIMESTAMPTZ and adds a regression case that exercises group_array over ARRAY<TIMESTAMPTZ(6)>.
  • Scope/focus: The actual PR diff is small and focused.
  • Concurrency: No concurrency-sensitive code path was changed.
  • Lifecycle: No static/global lifecycle issue was found.
  • Config items: No config item was added or changed.
  • Compatibility: Aggregate state serialization remains paired write/read logic; no persistent format change was identified in this patch.
  • Parallel paths: The change applies to the complex-type collect-list serialization path used by the added nested TIMESTAMPTZ test.
  • Conditional checks: No problematic new conditional check was found.
  • Test coverage/results: The new test covers the crash path, but it violates the regression-test convention by dropping the test table after execution.
  • Observability: Not applicable.
  • Transaction/persistence/data write: Not applicable.
  • FE/BE variable passing: Not applicable.
  • Performance: The added UTC timezone object is local and lightweight; no performance blocker found.

User focus: No additional user-provided review focus was present.

GROUP BY grp
"""

sql "DROP TABLE IF EXISTS tz_group_array_crash"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this cleanup. Doris regression-test standards require dropping tables before use, not after, so the table remains available for debugging when the case fails. This test already does DROP TABLE IF EXISTS tz_group_array_crash before creating it at the start of the added case.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 28, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review completed for PR #63827.

No new blocking issues found. The previous inline comment about dropping the regression table after the test is already addressed in the current head diff, so I did not duplicate it.

Critical checkpoint conclusions:

  • Goal and proof: The PR targets nested TIMESTAMPTZ values in collect_list/group_array aggregate state JSON serde by providing a deterministic UTC timezone. The added regression case covers ARRAY<TIMESTAMPTZ(6)> through group_array and validates the canonical UTC result.
  • Scope: The change is small and focused on the complex collect-list aggregate state serialization path plus one regression case.
  • Concurrency and lifecycle: No new shared state, locking, threads, or non-trivial lifecycle management are introduced. The timezone object is local to each serialization/deserialization call and is only used synchronously through FormatOptions.
  • Configuration and compatibility: No configuration items are added. The aggregate state binary framing is unchanged; only the text representation for nested TIMESTAMPTZ is made deterministic with UTC.
  • Parallel paths: Direct TIMESTAMPTZ collect uses the non-complex path, while nested TIMESTAMPTZ uses the modified complex serde path. The array_agg path shares this collect-list data implementation, so it benefits as well.
  • Error handling and data correctness: Existing Status checks in the serialize/deserialize path are preserved. Using UTC keeps TIMESTAMPTZ aggregate-state round trips independent of BE session/local timezone while preserving the instant.
  • Tests: Regression output was added for the new nested TIMESTAMPTZ group_array case. The query uses array_sort around group_array output, which makes the aggregate result deterministic enough for the expected output.
  • Observability, transactions, persistence, and memory: No new observability, transaction, persistence, or significant memory-accounting concerns apply to this small serde fix.
  • User focus: No additional user-provided review focus was specified.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 28, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31663 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 513a1ee8a852f1d918906ebc3a1a994d96b0730d, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17707	4017	4076	4017
q2	q3	10765	1455	814	814
q4	4690	478	351	351
q5	7582	2297	2106	2106
q6	235	178	137	137
q7	994	797	662	662
q8	9389	1671	1637	1637
q9	5113	4964	4954	4954
q10	6381	2172	1857	1857
q11	436	265	244	244
q12	637	431	299	299
q13	18100	3332	2771	2771
q14	266	263	237	237
q15	q16	828	785	709	709
q17	968	1049	898	898
q18	7024	5725	5552	5552
q19	1515	1240	1055	1055
q20	672	441	309	309
q21	6119	2763	2740	2740
q22	464	405	314	314
Total cold run time: 99885 ms
Total hot run time: 31663 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5064	4922	4669	4669
q2	q3	4980	5308	4595	4595
q4	2209	2200	1387	1387
q5	5002	4693	4636	4636
q6	251	196	137	137
q7	1931	1781	1553	1553
q8	2385	2121	2176	2121
q9	7996	7557	7448	7448
q10	4698	4668	4203	4203
q11	536	381	351	351
q12	721	729	523	523
q13	3048	3412	2804	2804
q14	278	279	250	250
q15	q16	701	693	606	606
q17	1286	1247	1248	1247
q18	7305	6723	6691	6691
q19	1112	1105	1114	1105
q20	2227	2218	1960	1960
q21	5252	4584	4473	4473
q22	532	445	412	412
Total cold run time: 57514 ms
Total hot run time: 51171 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171963 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 513a1ee8a852f1d918906ebc3a1a994d96b0730d, data reload: false

query5	4315	669	505	505
query6	344	216	205	205
query7	4223	538	307	307
query8	336	239	225	225
query9	8821	4049	4019	4019
query10	437	348	307	307
query11	5807	2430	2238	2238
query12	177	126	122	122
query13	1282	628	449	449
query14	6255	5504	5180	5180
query14_1	4486	4485	4417	4417
query15	211	209	185	185
query16	1015	452	446	446
query17	1139	695	585	585
query18	2717	474	356	356
query19	224	198	165	165
query20	139	129	129	129
query21	213	137	117	117
query22	13731	13770	13387	13387
query23	17449	16505	16178	16178
query23_1	16375	16394	16525	16394
query24	7488	1773	1295	1295
query24_1	1323	1308	1329	1308
query25	539	474	420	420
query26	1321	315	175	175
query27	2688	569	349	349
query28	4407	1980	1966	1966
query29	971	623	528	528
query30	309	240	204	204
query31	1166	1090	968	968
query32	89	81	77	77
query33	544	365	309	309
query34	1199	1115	658	658
query35	791	796	715	715
query36	1408	1415	1261	1261
query37	159	109	96	96
query38	3239	3193	3116	3116
query39	953	907	899	899
query39_1	888	888	879	879
query40	241	156	132	132
query41	74	70	68	68
query42	113	110	113	110
query43	343	332	301	301
query44	
query45	216	209	201	201
query46	1083	1189	762	762
query47	2333	2356	2203	2203
query48	407	426	316	316
query49	664	517	400	400
query50	959	367	259	259
query51	4392	4287	4313	4287
query52	108	108	97	97
query53	272	288	207	207
query54	342	285	301	285
query55	99	95	88	88
query56	312	329	323	323
query57	1464	1420	1326	1326
query58	316	287	279	279
query59	1675	1702	1508	1508
query60	341	334	329	329
query61	186	181	183	181
query62	716	661	593	593
query63	246	205	213	205
query64	2415	796	635	635
query65	
query66	1663	476	351	351
query67	29904	29794	29644	29644
query68	
query69	475	351	317	317
query70	1018	1014	995	995
query71	312	292	264	264
query72	3189	2694	2418	2418
query73	881	731	444	444
query74	5165	4938	4831	4831
query75	2718	2624	2275	2275
query76	2316	1192	773	773
query77	406	409	344	344
query78	12489	12424	11782	11782
query79	1437	1041	741	741
query80	644	529	461	461
query81	463	283	241	241
query82	1395	156	121	121
query83	354	277	258	258
query84	255	144	111	111
query85	872	532	457	457
query86	404	328	300	300
query87	3436	3382	3254	3254
query88	3607	2716	2712	2712
query89	434	394	348	348
query90	1963	180	181	180
query91	178	170	139	139
query92	80	78	75	75
query93	1450	1423	861	861
query94	545	340	304	304
query95	690	482	354	354
query96	1024	784	360	360
query97	2759	2770	2623	2623
query98	233	228	233	228
query99	1183	1154	1027	1027
Total cold run time: 255453 ms
Total hot run time: 171963 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.85% (28101/38054)
Line Coverage 57.82% (305520/528422)
Region Coverage 54.92% (255497/465237)
Branch Coverage 56.43% (110386/195621)

@zclllyybb zclllyybb merged commit 8731600 into apache:master May 29, 2026
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants