Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Table.to_pandas converts Arrow date32[day] to pandas datetime64[ns] #20514

Closed
asfimport opened this issue Nov 29, 2018 · 4 comments
Closed

Comments

@asfimport
Copy link
Collaborator

asfimport commented Nov 29, 2018

This issue was raised here:

wesm/feather#359

I explored this minimally against Arrow master:

https://gist.github.com/wesm/2ebe0ca2461d1ecfba6185777238ad1f

While it's pretty memory-wasteful, it might be better to preserve the intent of the data type when converting to pandas data structures. It also allows the data to round trip successfully

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

Related issues:

Note: This issue was originally created as ARROW-3899. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
We can minimize memory use by passing the date32[day] values through a hash table. This should be easier with the new hashing machinery

@asfimport
Copy link
Collaborator Author

Uwe Korn / @xhochy:
We currently have the parameter date_as_object on the conversion to Pandas. This is set to false. Although I would like to have true as the default, this would be a heavy breaking change. We should add a DeprecationWarning that we will change that in the next release and then do it a release later.

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
Oh right. Sounds reasonable. This is a bit tedious as we'll only want to warn when there is actually a date field

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
This was resolved in 0.12. On master now

In [4]: arr = pa.array([date(2001, 1, 1), None, date(2001, 1, 2)])                                                                                                                             

In [5]: arr                                                                                                                                                                                    
Out[5]: 
<pyarrow.lib.Date32Array object at 0x7f033ceb7ae8>
[
  11323,
  null,
  11324
]

In [6]: arr.to_pandas()                                                                                                                                                                        
Out[6]: 
array([datetime.date(2001, 1, 1), None, datetime.date(2001, 1, 2)],
      dtype=object)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants