Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-33801: [Python] Further expose C++ Extension Types in Python #34469

Closed

Conversation

sjperkins
Copy link
Contributor

@sjperkins sjperkins commented Mar 6, 2023

Rationale for this change

Explained in greater detail in #33997, but my summary follows:

  • C++ Extension Types currently are not well-exposed in the Python Layer
  • In particular, generic ExtensionArray and ExtensionScalar types are associated with C++ Extensions exposed through BaseExtensionType, rather than hand-coded Python types (which would be laborious to generate).

What changes are included in this PR?

  • When BaseExtensionType.__arrow_ext_class__() and BaseExtensionType.__arrow_ext_scalar_class__() are called, associated Python classes are created on the fly and cached for future calls.
  • BaseExtensionType.__arrow_ext_serialize__() and BaseExtensionType.__arrow_ext_deserialize__() defer to CExtensionType.Serialize() and CExtensionType.Deserialize() respectively.

Are these changes tested?

  • __arrow_ext_class__() and __arrow_ext_scalar_class__() are tested
  • __arrow_ext_serialize__() and __arrow_ext_deserialize__() are not tested yet. I'd appreciate feedback on whether the general approach is useful going forward before adding tests.

Are there any user-facing changes?

  • I don't believe so.

@sjperkins sjperkins requested a review from AlenkaF as a code owner March 6, 2023 14:41
@github-actions
Copy link

github-actions bot commented Mar 6, 2023

array = mod._make_uuid_array()
assert array.type == uuid_type
assert isinstance(array, array_cls)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prior to this change, only isinstance(array, pyarrow.ExtensionArray) would have held.

array_cls = uuid_type.__arrow_ext_class__()
scalar_cls = uuid_type.__arrow_ext_scalar_class__()
assert array_cls.__name__ == "ExtensionArray(uuid)"
assert scalar_cls.__name__ == "ExtensionScalar(uuid)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-generated class naming schema is similar to that of BaseExtensionType. i.e. BaseExtensionType(extension<uuid>).


storage = pa.array([b'0onmlkjihgfedcba']*4, pa.binary(16))
ext_array = pa.ExtensionArray.from_storage(uuid_type, storage)
assert isinstance(ext_array, array_cls)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From memory, before to this change, pa.ExtensionArray.from_storage would have returned a pyarrow.ExtensionArray.

@amol-
Copy link
Member

amol- commented Mar 30, 2023

Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍

@amol- amol- closed this Mar 30, 2023
@sjperkins
Copy link
Contributor Author

Would it be possible to re-open this issue? I'm awaiting review from Arrow maintainers here.

@amol-
Copy link
Member

amol- commented Mar 30, 2023

Yes of course, the cleaning was done exactly to surface the prs that still needed attention

@amol- amol- reopened this Mar 30, 2023
@sjperkins
Copy link
Contributor Author

Closing for reasons discussed in #34483 (comment)

@sjperkins sjperkins closed this Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python] Custom Python type/array subclasses for ExtensionTypes implemented in C++
2 participants