[fix](variant) preserve TIMESTAMPTZ values in sparse path#63522
[fix](variant) preserve TIMESTAMPTZ values in sparse path#63522csun5285 wants to merge 1 commit into
Conversation
DataTypeTimeStampTzSerDe inherited DataTypeNumberSerDe's default write_one_cell_to_binary, which emits [type:1][value:8]. The matching reader branch in DataTypeNumberSerDe<TYPE_TIMESTAMPTZ>::deserialize_binary_to_* skips a scale byte before reading the value, expecting [type:1][scale:1][value:8]. The 1-byte layout mismatch shifted every read by one byte, leaving only the timezone-offset bits intact, so CAST(var['ts'] AS string) on a variant typed path that fell to sparse returned just "+08:00" (DORIS-25915). Add the missing write_one_cell_to_binary override mirroring DataTypeDateTimeV2SerDe so the writer also emits the scale byte. Reader is already correct. Tests: - regression-test/suites/variant_p0/test_variant_timestamptz_sparse.groovy reproduces the Jira repro (typed paths > variant_max_subcolumns_count with variant_enable_typed_paths_to_sparse=true) and asserts the read value contains the date portion. - BE UT data_type_serde_timestamptz_test.cpp adds binary_roundtrip covering scale=0/3/6, checking the 10-byte layout and roundtrip via both DataTypeSerDe::deserialize_binary_to_column and ::deserialize_binary_to_field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run buildall |
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
TPC-H: Total hot run time: 31308 ms |
TPC-DS: Total hot run time: 169416 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
I found a blocking compatibility issue in the sparse VARIANT TIMESTAMPTZ serialization fix. The new writer emits the missing scale byte for future rows, but existing persisted sparse values written by the old code still have the 9-byte layout and the read path does not have a compatibility branch, so those rowsets can be misread or fail the existing size check after upgrade.
Critical checkpoints:
- Goal/test: the PR addresses new sparse TIMESTAMPTZ writes and adds BE/regression coverage for the new 10-byte layout, but does not prove old persisted sparse data remains readable.
- Scope/focus: the production change is small and mostly focused; the runtime-filter rename is mechanical.
- Concurrency/lifecycle: no new concurrency or special lifecycle risks identified.
- Configuration: no new configuration items.
- Compatibility/storage format: blocking issue found. This changes a persisted sparse-column value layout without reader compatibility for values already written by the old layout.
- Parallel paths: datetimev2 already uses the scale byte; TIMESTAMPTZ writer is brought in line, but legacy TIMESTAMPTZ bytes still need handling.
- Tests: new tests cover the fixed layout and new writes; missing legacy-read coverage. The regression test also violates Doris regression test standards by using
def tableName, manualassertTrue, and dropping the table at the end. - Observability/transactions/data writes/FE-BE variables/performance: no additional issues found.
- User focus: no additional user-provided review focus was specified.
| const auto sc = static_cast<uint8_t>(_scale); | ||
|
|
||
| const size_t old_size = chars.size(); | ||
| const size_t new_size = old_size + sizeof(uint8_t) + sizeof(uint8_t) + data_ref.size; |
There was a problem hiding this comment.
This fixes the encoding for newly written sparse TIMESTAMPTZ values, but it also changes a persisted sparse-column value layout from the old [type][8-byte value] bytes to [type][scale][8-byte value] without any reader compatibility for rowsets already written by the old code. The variant sparse path stores these bytes in ColumnString (ColumnVariant::serialize_to_binary_column / SparseColumnMergeIterator::_serialize_nullable_column_to_sparse), and ColumnVariant::deserialize_from_binary_column later calls DataTypeSerDe::deserialize_binary_to_field followed by CHECK_EQ(end - start_data, data_ref.size). For an existing 9-byte TIMESTAMPTZ value, the current reader consumes a scale byte plus 8 value bytes (10 bytes total), so after this change old persisted sparse values can read past the StringRef and/or trip the size check during upgrade. Please add a compatibility read path for the legacy 9-byte TIMESTAMPTZ encoding, and add a test that deserializes those old bytes.
| suite("test_variant_timestamptz_sparse", "p0"){ | ||
| sql " set time_zone = '+08:00' " | ||
|
|
||
| def tableName = "test_variant_timestamptz_sparse_repro" |
There was a problem hiding this comment.
This new regression test does not follow the Doris regression-test rules in AGENTS.md: simple table names should be hardcoded instead of def tableName, deterministic checks should use qt_*/.out instead of manual assertTrue, and tests should not drop tables at the end (drop before use only, to preserve the environment for debugging). Please rewrite this to hardcode test_variant_timestamptz_sparse_repro, express the checked values through ordered qt_* output, and remove the final drop.
|
need to forbit new type creation in schema template in FE, if new type is not fully adapted |
Add the missing write_one_cell_to_binary override mirroring DataTypeDateTimeV2SerDe so the writer also emits the scale byte. Reader is already correct.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)