[fix](be) Fix timestamptz group_array state serde#63827
Conversation
|
/review |
There was a problem hiding this comment.
I found one regression-test standards issue that should be fixed before merge.
Critical checkpoint conclusions:
- Goal/test coverage: The code change addresses aggregate-state serialization for nested TIMESTAMPTZ and adds a regression case that exercises
group_arrayoverARRAY<TIMESTAMPTZ(6)>. - Scope/focus: The actual PR diff is small and focused.
- Concurrency: No concurrency-sensitive code path was changed.
- Lifecycle: No static/global lifecycle issue was found.
- Config items: No config item was added or changed.
- Compatibility: Aggregate state serialization remains paired write/read logic; no persistent format change was identified in this patch.
- Parallel paths: The change applies to the complex-type collect-list serialization path used by the added nested TIMESTAMPTZ test.
- Conditional checks: No problematic new conditional check was found.
- Test coverage/results: The new test covers the crash path, but it violates the regression-test convention by dropping the test table after execution.
- Observability: Not applicable.
- Transaction/persistence/data write: Not applicable.
- FE/BE variable passing: Not applicable.
- Performance: The added UTC timezone object is local and lightweight; no performance blocker found.
User focus: No additional user-provided review focus was present.
| GROUP BY grp | ||
| """ | ||
|
|
||
| sql "DROP TABLE IF EXISTS tz_group_array_crash" |
There was a problem hiding this comment.
Please remove this cleanup. Doris regression-test standards require dropping tables before use, not after, so the table remains available for debugging when the case fails. This test already does DROP TABLE IF EXISTS tz_group_array_crash before creating it at the start of the added case.
|
/review |
There was a problem hiding this comment.
Code review completed for PR #63827.
No new blocking issues found. The previous inline comment about dropping the regression table after the test is already addressed in the current head diff, so I did not duplicate it.
Critical checkpoint conclusions:
- Goal and proof: The PR targets nested TIMESTAMPTZ values in collect_list/group_array aggregate state JSON serde by providing a deterministic UTC timezone. The added regression case covers ARRAY<TIMESTAMPTZ(6)> through group_array and validates the canonical UTC result.
- Scope: The change is small and focused on the complex collect-list aggregate state serialization path plus one regression case.
- Concurrency and lifecycle: No new shared state, locking, threads, or non-trivial lifecycle management are introduced. The timezone object is local to each serialization/deserialization call and is only used synchronously through FormatOptions.
- Configuration and compatibility: No configuration items are added. The aggregate state binary framing is unchanged; only the text representation for nested TIMESTAMPTZ is made deterministic with UTC.
- Parallel paths: Direct TIMESTAMPTZ collect uses the non-complex path, while nested TIMESTAMPTZ uses the modified complex serde path. The array_agg path shares this collect-list data implementation, so it benefits as well.
- Error handling and data correctness: Existing Status checks in the serialize/deserialize path are preserved. Using UTC keeps TIMESTAMPTZ aggregate-state round trips independent of BE session/local timezone while preserving the instant.
- Tests: Regression output was added for the new nested TIMESTAMPTZ group_array case. The query uses array_sort around group_array output, which makes the aggregate result deterministic enough for the expected output.
- Observability, transactions, persistence, and memory: No new observability, transaction, persistence, or significant memory-accounting concerns apply to this small serde fix.
- User focus: No additional user-provided review focus was specified.
|
run buildall |
TPC-H: Total hot run time: 31663 ms |
TPC-DS: Total hot run time: 171963 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Problem Summary:
Fix collect_list/group_array on nested TIMESTAMPTZ values when complex aggregate state is serialized through JSON. This keeps the existing state format for compatibility, provides a UTC timezone during serde, and adds regression coverage for the nested group_array case.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)