You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When initialising an array with NaT only values the row group statistic is corrupt returning either random values or raises integer out of bound exceptions.
importioimportpandasaspdimportpyarrowaspaimportpyarrow.parquetaspqdf=pd.DataFrame({"t": pd.Series([pd.NaT], dtype="datetime64[ns]")})
buf=pa.BufferOutputStream()
pq.write_table(pa.Table.from_pandas(df), buf, version="2.0")
buf=io.BytesIO(buf.getvalue().to_pybytes())
parquet_file=pq.ParquetFile(buf)
# Asserting behaviour is difficult since it is random and the state is ill defined. # After a few iterations an exception is raised.whileTrue:
parquet_file.metadata.row_group(0).column(0).statistics.max
Uwe Korn / @xhochy:
The problem here is that parquet_file.metadata.row_group(0).column(0).statistics.has_min_max is False and thus .max should never be accessed. Instead of returning undefined data, we should raise an exception.
When initialising an array with NaT only values the row group statistic is corrupt returning either random values or raises integer out of bound exceptions.
Reporter: Florian Jetter / @fjetter
Assignee: Uwe Korn / @xhochy
PRs and other links:
Note: This issue was originally created as ARROW-6339. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: