-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
From: https://stackoverflow.com/questions/53214288/merging-parquet-files-pandas-meta-in-schema-mismatch
I am trying to merge multiple parquet files into one. Their schemas are identical field-wise but my ParquetWriter is complaining that they are not. After some investigation I found that the pandas meta in the schemas are different, causing this error.
Sample-
import pyarrow.parquet as pq
pq_tables=[]
for file_ in files:
pq_table = pq.read_table(f'{MESS_DIR}/{file_}')
pq_tables.append(pq_table)
if writer is None:
writer = pq.ParquetWriter(COMPRESSED_FILE, schema=pq_table.schema, use_deprecated_int96_timestamps=True)
writer.write_table(table=pq_table)The error-
Traceback (most recent call last):
File "{PATH_TO}/main.py", line 68, in lambda_handler
writer.write_table(table=pq_table)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/parquet.py", line 335, in write_table
raise ValueError(msg)
ValueError: Table schema does not match schema used to create file:Environment: Python 3.6.3
OSX 10.14
Reporter: Micah Williamson
Assignee: Krisztian Szucs / @kszucs
Related issues:
- [Python] ParquetWriter.write_table doesn't support coerce_timestamps or allow_truncated_timestamps (is duplicated by)
PRs and other links:
Note: This issue was originally created as ARROW-3728. Please see the migration documentation for further details.
Reactions are currently unavailable