You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Joris Van den Bossche / @jorisvandenbossche: @dargueta Regarding the logical type, I think this is expected: INT96 is only a physical type in the parquet format, and there is no timestamp-like logical type that uses INT96 as physical type.
The usage of INT96 for timestamps only stems from a convention in some of the parquet implementations (I think Hive and Impala, but not very familiar with it), and therefore arrow has the option to write them, for compatibility with those systems. But note that this type is actually deprecated in the parquet format.
Deepak Majeti / @majetideepak:
The comments above are correct! INT96 type is deprecated and it statistics are disabled by default. The timestamp byte layout in INT96 is big endian and does not comply with the standard sort orders in the spec.
Run the following code:
Examining the
int64.parq
file, we see that the column metadata includes an object type ofTIMESTAMP_MICROS
and also gives some stats. All is well.However, if we look at
int96.parq
, it appears that that metadata is lost. No object type, and no column stats.This is a bit confusing since the metadata for the exact same data can look differently depending on an unrelated flag being set or cleared.
Environment: PyArrow: 0.12.1
Python: 2.7.15, 3.7.2
Pandas: 0.24.2
Reporter: Diego Argueta / @dargueta
Note: This issue was originally created as ARROW-4967. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: