Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Feather format cannot store/retrieve lists correctly? #25005

Closed
asfimport opened this issue May 19, 2020 · 2 comments
Closed

[Python] Feather format cannot store/retrieve lists correctly? #25005

asfimport opened this issue May 19, 2020 · 2 comments

Comments

@asfimport
Copy link

asfimport commented May 19, 2020

I'm seeing a very weird behavior when I try to store and retrieve a Pandas data-frame using the Feather format. Simplified example:

>>> import pandas as pd
>>> df = pd.DataFrame(data={"scalar": [1, 2], "array": [[1], [7]]})
>>> df
 scalar array
0     1   [1]
1     2   [7]
>>> df.to_feather("test.ft")
>>> pd.read_feather("test.ft")
  scalar                  array
0      1                   [16]
1      2  [1045468844972122628]

As you can see, the retrieved data is incorrect. I was originally trying to use the feather-format (not using Pandas directly) and that didn't work well either.

By playing around with the data-frame that is to be stored I can also get different but still incorrect behavior, e.g. a larger list, an error that says the file size is incorrect, or simply a segmentation fault.

 

This is my first time using Feather/Arrow BTW.

Environment: Python 3.8.2
PyArrow 0.17.1
Pandas 1.0.3
Linux (Manjaro)
Reporter: Farzad Abdolhosseini

Related issues:

Note: This issue was originally created as ARROW-8868. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
[~farzadab] thanks for opening the issue! That's clearly buggy, and I suppose it is the same issue as the one I opened for structs (it's also a nested data type): ARROW-8860

Temporary workaround is to disable compression, as I do in the linked issue (compression="uncompressed" (the only problem is that this keyword is only passed through in pandas master, not yet released, for using it in df.to_feather)

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
Going to close this as a duplicate of ARROW-8860, we can continue the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant