You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When writing Parquet from Clickhouse, using either FORMAT Parquet or insert into table function file('...', Parquet), where the output contains an array/list column, the resulting schema has a LIST field (correct), containing a single repeated field named list (correct), containing a single filed named item (NOT correct, must be named element).
<list-repetition> group <name> (LIST) {
repeated group list {
<element-repetition> <element-type> element;
}
}
The outer-most level must be a group annotated with LIST that contains a single field named list. The repetition of this level must be either optional or required and determines whether the list is nullable.
The middle level, named list, must be a repeated group with a single field named element.
The element field encodes the list's element type and repetition. Element repetition must be required or optional.
See also this Arrow issue apache/arrow#29781 which mentions the use of item rather than element
This behavior is consistent across all values of output_format_parquet_version (1.0, 2.4, 2.6, 2.latest).
Does it reproduce on recent release?
Yes.
Also with 23.5.1.570
How to reproduce
Create a Parquet file with an array/list
% clickhouse local -q "select 1 as num, 'foo' as word, [1, 2, 3] as array_of_num FORMAT Parquet" > bug.parquet
Describe what's wrong
When writing Parquet from Clickhouse, using either
FORMAT Parquet
orinsert into table function file('...', Parquet)
, where the output contains an array/list column, the resulting schema has aLIST
field (correct), containing a single repeated field namedlist
(correct), containing a single filed nameditem
(NOT correct, must be namedelement
).See the Parquet documentation here https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists
Relevant excerpt:
See also this Arrow issue apache/arrow#29781 which mentions the use of
item
rather thanelement
This behavior is consistent across all values of
output_format_parquet_version
(1.0
,2.4
,2.6
,2.latest
).Does it reproduce on recent release?
Yes.
Also with 23.5.1.570
How to reproduce
array_of_num
->list
->item
Expected behavior
The field nested within
list
should be namedelement
.Additional context
Parquet files that deviate from the spec in this way are not correctly understood by Google BigQuery.
The text was updated successfully, but these errors were encountered: