Skip to content

DateTime64(3) turns into DateTime64(3, UTC) with native parquet reader v3 #87469

@Selfeer

Description

@Selfeer

Company or project name

No response

Describe what's wrong

I have a parquet file with:

physical_type: INT64
logical_type: Timestamp(isAdjustedToUTC=false, timeUnit=milliseconds, is_from_converted_type=false, force_set_converted_type=false)

There is a mismatch when I read it with and without input_format_parquet_use_native_reader_v3

input_format_parquet_use_native_reader_v3=0

DESCRIBE TABLE file('standalone_timestamp.parquet', 'Parquet')
SETTINGS input_format_parquet_use_native_reader_v3 = 0
   ┌─name─┬─type────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
1. │ ts   │ Nullable(DateTime64(3)) │              │                    │         │                  │                │
   └──────┴─────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

input_format_parquet_use_native_reader_v3=1

DESCRIBE TABLE file('standalone_timestamp.parquet', 'Parquet')
SETTINGS input_format_parquet_use_native_reader_v3 = 1
   ┌─name─┬─type───────────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
1. │ ts   │ Nullable(DateTime64(3, 'UTC')) │              │                    │         │                  │                │
   └──────┴────────────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

Does it reproduce on the most recent release?

Yes

How to reproduce

ClickHouse version: 25.8.4.13
Settings: input_format_parquet_use_native_reader_v3=1

I wasn't able to get steps strictly with ClickHouse but here I could generate a repro file with pyarrow:

import pyarrow as pa
import pyarrow.parquet as pq
import datetime
import os

OUT_PATH = "test_decimal/standalone_timestamp.parquet"
os.makedirs(os.path.dirname(OUT_PATH), exist_ok=True)

timestamp_type = pa.timestamp('ms', tz=None)

values = [datetime.datetime(2023, 1, 1, 12, 0, 0), None, datetime.datetime(2023, 1, 2, 13, 0, 0)]
arr = pa.array(values, type=timestamp_type)
table = pa.Table.from_arrays([arr], schema=pa.schema([pa.field("ts", timestamp_type, nullable=True)]))

pq.write_table(table, OUT_PATH, version="1.0", compression=None, use_dictionary=False)

Expected behavior

No response

Error message and/or stacktrace

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

comp-formatsInput/output formats (CSV/JSON/Parquet/ORC/Arrow/Protobuf/etc.).potential bugTo be reviewed by developers and confirmed/rejected.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions