-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33801: [Python] Further expose C++ Extension Types in Python #34469
GH-33801: [Python] Further expose C++ Extension Types in Python #34469
Conversation
array = mod._make_uuid_array() | ||
assert array.type == uuid_type | ||
assert isinstance(array, array_cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prior to this change, only isinstance(array, pyarrow.ExtensionArray)
would have held.
array_cls = uuid_type.__arrow_ext_class__() | ||
scalar_cls = uuid_type.__arrow_ext_scalar_class__() | ||
assert array_cls.__name__ == "ExtensionArray(uuid)" | ||
assert scalar_cls.__name__ == "ExtensionScalar(uuid)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The auto-generated class naming schema is similar to that of BaseExtensionType. i.e. BaseExtensionType(extension<uuid>)
.
|
||
storage = pa.array([b'0onmlkjihgfedcba']*4, pa.binary(16)) | ||
ext_array = pa.ExtensionArray.from_storage(uuid_type, storage) | ||
assert isinstance(ext_array, array_cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From memory, before to this change, pa.ExtensionArray.from_storage
would have returned a pyarrow.ExtensionArray
.
Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍 |
Would it be possible to re-open this issue? I'm awaiting review from Arrow maintainers here. |
Yes of course, the cleaning was done exactly to surface the prs that still needed attention |
Closing for reasons discussed in #34483 (comment) |
Rationale for this change
Explained in greater detail in #33997, but my summary follows:
ExtensionArray
andExtensionScalar
types are associated with C++ Extensions exposed throughBaseExtensionType
, rather than hand-coded Python types (which would be laborious to generate).What changes are included in this PR?
BaseExtensionType.__arrow_ext_class__()
andBaseExtensionType.__arrow_ext_scalar_class__()
are called, associated Python classes are created on the fly and cached for future calls.BaseExtensionType.__arrow_ext_serialize__()
andBaseExtensionType.__arrow_ext_deserialize__()
defer toCExtensionType.Serialize()
andCExtensionType.Deserialize()
respectively.Are these changes tested?
__arrow_ext_class__()
and__arrow_ext_scalar_class__()
are tested__arrow_ext_serialize__()
and__arrow_ext_deserialize__()
are not tested yet. I'd appreciate feedback on whether the general approach is useful going forward before adding tests.Are there any user-facing changes?