Skip to content

[Python] date64 arrays do not round-trip through pandas conversion #38050

@spenczar

Description

@spenczar

Describe the bug, including details regarding any error messages, version, and platform.

In PyArrow, Date64Array values do not maintain precision when being loaded from pandas by pa.array.

For example, let's make a date64 array value, and convert it to a pandas Series, taking care to avoid using datetime objects:

import pyarrow as pa
import pandas as pd
date64_array = pa.array([1, 2, 3], pa.date64())
date64_pd = date64_array.to_pandas(date_as_object=False)

# Now load it back in:
date64_roundtripped = pa.array(date64_pd, pa.date64())

# It ought to be unchanged - but its not, this assertion fails:
assert date64_roundtripped == date64_array

If one prints pc.subtract(date64_roundtripped, date64_array), you can see that they are different:

<pyarrow.lib.DurationArray object at 0x10537f160>
[
  -1,
  -2,
  -3
]

Note that this does not occur for date32:

import pyarrow as pa
import pandas as pd
date32_array = pa.array([1, 2, 3], pa.date32())
date32_pd = date32_array.to_pandas(date_as_object=False)


date32_roundtripped = pa.array(pandas, pa.date32())

# just fine:
assert date32_roundtripped == date32_array

It appears to me that date64_pd is just fine. It prints as this:

0   1970-01-01 00:00:00.001
1   1970-01-01 00:00:00.002
2   1970-01-01 00:00:00.003
dtype: datetime64[ns]

One hint at whats going on is to use pa.Array.from_pandas. That actually returns a `TimestampArray:

In [31]: pa.Array.from_pandas(date64_array)
Out[31]:
<pyarrow.lib.TimestampArray object at 0x12d434d60>
[
  1970-01-01 00:00:00.001000000,
  1970-01-01 00:00:00.002000000,
  1970-01-01 00:00:00.003000000
]

The issue might be that conversion from TimestampArray to Date64 array drops precision, maybe.

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions