New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Timestamp unit change not done in from_pandas() conversion #17688
Comments
Bryan Cutler / @BryanCutler: import pandas as pd
import pyarrow as pa
from datetime import datetime
s = pd.Series([datetime.now()])
s_nyc = s.dt.tz_localize('tzlocal()').dt.tz_convert('America/New_York')
arr = pa.Array.from_pandas(s_nyc, type=pa.timestamp('us', tz='America/New_York'))
arr.type
arr = pa.Array.from_pandas(s, type=pa.timestamp('us'))
arr.type
print(arr) |
Bryan Cutler / @BryanCutler: |
Wes McKinney / @wesm: |
Bryan Cutler / @BryanCutler: import pandas as pd
import pyarrow as pa
import datetime
arr = pa.array([datetime.date(2017, 10, 23)])
c = pa.Column.from_array("d", arr)
s = c.to_pandas()
print(s)
# 0 2017-10-23
# Name: d, dtype: datetime64[ns]
result = pa.Array.from_pandas(s, type=pa.date32())
print(result)
"""
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221)
File "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", line 28, in array_format
values.append(value_format(x, 0))
File "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", line 49, in value_format
return repr(x)
File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535)
File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368)
ValueError: year is out of range
""" This is a little more troublesome because I can't find a decent workaround. Should I open another jira for this? |
Wes McKinney / @wesm: |
Wes McKinney / @wesm: |
When calling
Array.from_pandas
with a pandas.Series of timestamps that have 'ns' unit and specifying a type to coerce to with 'us' causes problems. When the series has timestamps with a timezone, the unit is ignored. When the series does not have a timezone, it is applied but causes an OverflowError when printing.A workaround is to manually change values with astype
Reporter: Bryan Cutler / @BryanCutler
Assignee: Wes McKinney / @wesm
Related issues:
Note: This issue was originally created as ARROW-1680. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: