Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Canonical fixed-shape tensor extension array/type is not picklable. #35599

Closed
clarkzinzow opened this issue May 15, 2023 · 2 comments · Fixed by #35933
Closed

[Python] Canonical fixed-shape tensor extension array/type is not picklable. #35599

clarkzinzow opened this issue May 15, 2023 · 2 comments · Fixed by #35933

Comments

@clarkzinzow
Copy link

clarkzinzow commented May 15, 2023

Describe the bug, including details regarding any error messages, version, and platform.

The fixed-shape tensor extension type does not appear to be picklable. Given that pickling Arrow data is supported in general and is used in Python-centric systems such as Ray, supporting pickling for canonical extension types/arrays seems reasonable.

Reproduction

Extension Type

pickle.loads(pickle.dumps(pa.fixed_shape_tensor(pa.int64(), (2, 2))))

raises the error:

KeyError                                  Traceback (most recent call last)
File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4798, in pyarrow.lib.type_for_alias()

KeyError: 'extension<arrow.fixed_shape_tensor>'

Extension Array

tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
storage = pa.array(arr, pa.list_(pa.int32(), 4))
tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
pickle.loads(pickle.dumps(tensor_array))

raises the ~same error:

KeyError                                  Traceback (most recent call last)
File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4798, in pyarrow.lib.type_for_alias()

KeyError: 'extension<arrow.fixed_shape_tensor>'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[13], line 1
----> 1 pickle.loads(pickle.dumps(tensor_array))

File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4800, in pyarrow.lib.type_for_alias()

ValueError: No type alias for extension<arrow.fixed_shape_tensor>

Environment

  • pyarrow 12.0.0
  • Python 3.9
  • MacOS

Possible Solution

It seems like we might be able to implement __reduce__ on FixedShapeTensorType such that it uses the __arrow_ext_serialize__ serialization protocol? E.g.

def __reduce__(self):
    return type(self).__arrow_ext_deserialize__, (self.storage, self.__arrow_ext_serialize__())

Component(s)

Python

@AlenkaF
Copy link
Member

AlenkaF commented May 16, 2023

Thank you for the issue @clarkzinzow!
You are correct. I think it is reasonable to implement __reduce__ method as you suggested.

Are you interested in making a PR with the proposed solution and a test for it?

@jorisvandenbossche
Copy link
Member

It seems like we might be able to implement __reduce__ on FixedShapeTensorType such that it uses the __arrow_ext_serialize__ serialization protocol?

We might even be able to put this on the base ExtensionType class, so that every extension type implementation automatically has this implemented (since this should be generic).

The reason for the current error message is that it falls back to the base class DataType reducer which essentially pickles the string repr of the type (which is overridden by many type subclasses):

def __reduce__(self):
return type_for_alias, (str(self),)

@jorisvandenbossche jorisvandenbossche added this to the 13.0.0 milestone May 16, 2023
@AlenkaF AlenkaF self-assigned this Jun 6, 2023
moeyensj added a commit to B612-Asteroid-Institute/adam_core that referenced this issue Jun 6, 2023
moeyensj added a commit to B612-Asteroid-Institute/adam_core that referenced this issue Jun 6, 2023
moeyensj added a commit to B612-Asteroid-Institute/adam_core that referenced this issue Jun 6, 2023
moeyensj added a commit to B612-Asteroid-Institute/adam_core that referenced this issue Jun 6, 2023
jorisvandenbossche pushed a commit that referenced this issue Jun 13, 2023
…is not picklable. (#35933)

This PR adds `__reduce__` method to the `FixedShapeTensorType`.
* Closes: #35599

Authored-by: AlenkaF <frim.alenka@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants