[C++][Parquet] parquet writer does not handle negative zero correctly #22006

asfimport · 2019-06-11T23:21:28Z

I have the following csv file: (Note that col_a contains a negative zero value.)

col_a,col_b
0.0,0.0
-0.0,0.0

...and process it via:

from pyarrow import csv, parquet
in_csv = 'in.csv'
table = csv.read_csv(in_csv)
parquet.write_to_dataset(table, root_path='./')

The output parquet file is then loaded into S3 and queried via AWS Athena (i.e. PrestoDB / Hive).

Any query that touches col_a fails with the following error:

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split {{REDACTED}} (offset=0, length=593): low must be less than or equal to high

As a sanity check, I transformed the csv file to parquet using an AWS Glue Spark Job and I was able to query the output parquet file successfully.

As such, it appears as though the pyarrow writer is producing an invalid parquet file when a column contains at least one instance of 0.0, at least one instance of -0.0, and no other values.

Reporter: Bob Briody
Assignee: Wes McKinney / @wesm

PRs and other links:

GitHub Pull Request #5375

_{Note: This issue was originally created as ARROW-5562. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2019-08-22T22:49:48Z

Wes McKinney / @wesm:
Odd issue. I added to 0.15.0 in case someone can take a look

asfimport · 2019-09-16T14:09:41Z

Ben Kietzman / @bkietz:
Issue resolved by pull request 5375
#5375

asfimport closed this as completed Sep 16, 2019

asfimport assigned wesm Jan 10, 2023

asfimport added this to the 0.15.0 milestone Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Parquet] parquet writer does not handle negative zero correctly #22006

[C++][Parquet] parquet writer does not handle negative zero correctly #22006

asfimport commented Jun 11, 2019

asfimport commented Aug 22, 2019

asfimport commented Sep 16, 2019

[C++][Parquet] parquet writer does not handle negative zero correctly #22006

[C++][Parquet] parquet writer does not handle negative zero correctly #22006

Comments

asfimport commented Jun 11, 2019

PRs and other links:

asfimport commented Aug 22, 2019

asfimport commented Sep 16, 2019