Conversation
|
Workflow [PR], commit [d5dfc4d] Summary: ❌
|
…ing-subcolumns-with-column-mapping-mode
| ) | ||
|
|
||
| assert ( | ||
| "col_x2D1\tNullable(Date32)\t\t\t\t\t\n" |
There was a problem hiding this comment.
As I remember, the clickhouse columns in this parquet file were named like col-1, weren’t they?
There was a problem hiding this comment.
Did you try to create with col-1,... names? Will that work?
There was a problem hiding this comment.
hm, this file is first read with ordinary s3 table function and it returned the column names as you see in the test
ClickHouse/tests/integration/test_storage_delta/test.py
Lines 3394 to 3410 in a929851
So the parquet file has columns named as
col_x2D1, not col-1.Though I now checked your test and added there
DESCRIBE TABLE icebergS3(s3_conn, filename='field_ids_struct_test', SETTINGS iceberg_metadata_table_uuid = '149ecc15-7afc-4311-86b3-3a4c8d4ec08e');, and it indeed has the column names you mentioned. I guess it is because they are named this way in iceberg metadata. But in this test used only the parquet file.
Also the point is that even though I used this parquet file to insert the data
ClickHouse/tests/integration/test_storage_delta/test.py
Lines 3412 to 3413 in a929851
the result parquet file will not have columns named in the same way. Now only delta lake metadata will contain
col_x2D1, while result parquet file columns will be of format col-{random-uuid} (I can add a check for this in the test to show it explicitly), this is because of columnMapping.mode = name, which means that parquet file names do not contain actual column names but instead randomly generated column names.
| format_settings.parquet.allow_missing_columns = true; | ||
| } | ||
|
|
||
| static void checkTypesAndNestedTypesEqual(DataTypePtr type1, DataTypePtr type2, const std::string & column_name) |
There was a problem hiding this comment.
Should we also check complex types recursively?
|
Is this the fix for #86204 ? |
I was fixing a different issue, but most likely it also fixes your issue, need to check. |
Cherry pick #86064 to 25.8: DeltaLake: fix reading subcolumns with non-default column mapping mode
…efault column mapping mode
Backport #86064 to 25.8: DeltaLake: fix reading subcolumns with non-default column mapping mode
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix reading subcolumns with non-default column mapping mode in storage DeltaLake.
Documentation entry for user-facing changes