-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
Describe the bug, including details regarding any error messages, version, and platform.
In PyArrow, Date64Array values do not maintain precision when being loaded from pandas by pa.array.
For example, let's make a date64 array value, and convert it to a pandas Series, taking care to avoid using datetime objects:
import pyarrow as pa
import pandas as pd
date64_array = pa.array([1, 2, 3], pa.date64())
date64_pd = date64_array.to_pandas(date_as_object=False)
# Now load it back in:
date64_roundtripped = pa.array(date64_pd, pa.date64())
# It ought to be unchanged - but its not, this assertion fails:
assert date64_roundtripped == date64_arrayIf one prints pc.subtract(date64_roundtripped, date64_array), you can see that they are different:
<pyarrow.lib.DurationArray object at 0x10537f160>
[
-1,
-2,
-3
]
Note that this does not occur for date32:
import pyarrow as pa
import pandas as pd
date32_array = pa.array([1, 2, 3], pa.date32())
date32_pd = date32_array.to_pandas(date_as_object=False)
date32_roundtripped = pa.array(pandas, pa.date32())
# just fine:
assert date32_roundtripped == date32_arrayIt appears to me that date64_pd is just fine. It prints as this:
0 1970-01-01 00:00:00.001
1 1970-01-01 00:00:00.002
2 1970-01-01 00:00:00.003
dtype: datetime64[ns]
One hint at whats going on is to use pa.Array.from_pandas. That actually returns a `TimestampArray:
In [31]: pa.Array.from_pandas(date64_array)
Out[31]:
<pyarrow.lib.TimestampArray object at 0x12d434d60>
[
1970-01-01 00:00:00.001000000,
1970-01-01 00:00:00.002000000,
1970-01-01 00:00:00.003000000
]
The issue might be that conversion from TimestampArray to Date64 array drops precision, maybe.
Component(s)
Python