You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've seen that Pandas sparse series was not supported in pyarrow since it was planned to be deprecated. In Pandas 1.0.1 they released a stable version of sparse array and as far as I know it is not planned to be deprecated anymore. Are you planning to support sparse series in next versions of pyarrow ?
Eg when converting a pandas DataFrame with sparse columns to a pyarrow Table, we could densify the sparse array (since Arrow has no support for sparse arrays in its columnar format), but I am not sure this is what users would expect.
Support for conversion to one of the sparse tensors in pyarrow could indeed be added.
Michael Novitsky: @jorisvandenbossche Hi Joris, we are dealing with data that is sparse in its nature (contains many nans) and we currently have memory problems when dealing with a big Dataframe . We can't use scipy sparse matrices since they support compression on zeros only and not nans and we want the data to be sparse in the whole flow - dataframe->pyarrow->plasma store.
Support for conversion to one of the sparse tensors in pyarrow could indeed be added - can you please point me to the part where this conversion is happening?
Joris Van den Bossche / @jorisvandenbossche:
With the current Arrow data types, we don't really have support for sparse data, so there is no direct way to support conversion from/to pandas sparse Series (except for converting to dense).
There has been some discussion in the past about extending the Arrow spec to sparse/compressed data (e.g. RLE), but no one has started yet on a full proposal.
I've seen that Pandas sparse series was not supported in pyarrow since it was planned to be deprecated. In Pandas 1.0.1 they released a stable version of sparse array and as far as I know it is not planned to be deprecated anymore. Are you planning to support sparse series in next versions of pyarrow ?
Environment: ubuntu 16/18
Reporter: Michael Novitsky
Note: This issue was originally created as ARROW-8679. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: