Skip to content

[Python] Parquet segfaults if coercing ns timestamps and writing 96-bit timestamps #18001

@asfimport

Description

@asfimport

If you try to write a PyArrow table containing nanosecond-resolution timestamps to Parquet using coerce_timestamps and use_deprecated_int96_timestamps=True, the Arrow library will segfault.

The crash doesn't happen if you don't coerce the timestamp resolution or if you don't use 96-bit timestamps.

 

 

To Reproduce:

 

 
import datetime

import pyarrow
from pyarrow import parquet

schema = pyarrow.schema([
    pyarrow.field('last_updated', pyarrow.timestamp('ns')),
])

data = [
    pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('ns')),
]

table = pyarrow.Table.from_arrays(data, ['last_updated'])

with open('test_file.parquet', 'wb') as fdesc:
    parquet.write_table(table, fdesc,
                        coerce_timestamps='us',  # 'ms' works too
                        use_deprecated_int96_timestamps=True)

 

See attached file for the crash report.

 

Environment: OS: Mac OS X 10.13.2
Python: 3.6.4
PyArrow: 0.8.0
Reporter: Diego Argueta / @dargueta
Assignee: Joshua Storck / @joshuastorck

Related issues:

Original Issue Attachments:

Externally tracked issue: #1498

Note: This issue was originally created as ARROW-2020. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions