If you try to write a PyArrow table containing nanosecond-resolution timestamps to Parquet using coerce_timestamps and use_deprecated_int96_timestamps=True, the Arrow library will segfault.
The crash doesn't happen if you don't coerce the timestamp resolution or if you don't use 96-bit timestamps.
To Reproduce:
import datetime
import pyarrow
from pyarrow import parquet
schema = pyarrow.schema([
pyarrow.field('last_updated', pyarrow.timestamp('ns')),
])
data = [
pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('ns')),
]
table = pyarrow.Table.from_arrays(data, ['last_updated'])
with open('test_file.parquet', 'wb') as fdesc:
parquet.write_table(table, fdesc,
coerce_timestamps='us', # 'ms' works too
use_deprecated_int96_timestamps=True)
See attached file for the crash report.
Environment: OS: Mac OS X 10.13.2
Python: 3.6.4
PyArrow: 0.8.0
Reporter: Diego Argueta / @dargueta
Assignee: Joshua Storck / @joshuastorck
Related issues:
Original Issue Attachments:
Externally tracked issue: #1498
Note: This issue was originally created as ARROW-2020. Please see the migration documentation for further details.
If you try to write a PyArrow table containing nanosecond-resolution timestamps to Parquet using
coerce_timestampsanduse_deprecated_int96_timestamps=True, the Arrow library will segfault.The crash doesn't happen if you don't coerce the timestamp resolution or if you don't use 96-bit timestamps.
To Reproduce:
See attached file for the crash report.
Environment: OS: Mac OS X 10.13.2
Python: 3.6.4
PyArrow: 0.8.0
Reporter: Diego Argueta / @dargueta
Assignee: Joshua Storck / @joshuastorck
Related issues:
Original Issue Attachments:
Externally tracked issue: #1498
Note: This issue was originally created as ARROW-2020. Please see the migration documentation for further details.