Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] parquet writer does not handle negative zero correctly #22006

Closed
asfimport opened this issue Jun 11, 2019 · 2 comments
Closed

Comments

@asfimport
Copy link

 

I have the following csv file: (Note that col_a contains a negative zero value.)

col_a,col_b
0.0,0.0
-0.0,0.0

...and process it via:

from pyarrow import csv, parquet
in_csv = 'in.csv'
table = csv.read_csv(in_csv)
parquet.write_to_dataset(table, root_path='./')

 

The output parquet file is then loaded into S3 and queried via AWS Athena (i.e. PrestoDB / Hive). 

Any query that touches col_a fails with the following error:

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split {{REDACTED}} (offset=0, length=593): low must be less than or equal to high

 

As a sanity check, I transformed the csv file to parquet using an AWS Glue Spark Job and I was able to query the output parquet file successfully.

As such, it appears as though the pyarrow writer is producing an invalid parquet file when a column contains at least one instance of 0.0, at least one instance of -0.0, and no other values.

 

Reporter: Bob Briody
Assignee: Wes McKinney / @wesm

PRs and other links:

Note: This issue was originally created as ARROW-5562. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Odd issue. I added to 0.15.0 in case someone can take a look

@asfimport
Copy link
Author

Ben Kietzman / @bkietz:
Issue resolved by pull request 5375
#5375

@asfimport asfimport added this to the 0.15.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants