You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a sanity check, I transformed the csv file to parquet using an AWS Glue Spark Job and I was able to query the output parquet file successfully.
As such, it appears as though the pyarrow writer is producing an invalid parquet file when a column contains at least one instance of 0.0, at least one instance of -0.0, and no other values.
I have the following csv file: (Note that
col_a
contains a negative zero value.)...and process it via:
The output parquet file is then loaded into S3 and queried via AWS Athena (i.e. PrestoDB / Hive).
Any query that touches
col_a
fails with the following error:As a sanity check, I transformed the csv file to parquet using an AWS Glue Spark Job and I was able to query the output parquet file successfully.
As such, it appears as though the pyarrow writer is producing an invalid parquet file when a column contains at least one instance of 0.0, at least one instance of -0.0, and no other values.
Reporter: Bob Briody
Assignee: Wes McKinney / @wesm
PRs and other links:
Note: This issue was originally created as ARROW-5562. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: