Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods for the Arrow PyCapsule Protocol to DataFrame/Column interchange protocol objects #342

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
64 changes: 64 additions & 0 deletions protocol/dataframe_protocol.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,41 @@ def get_buffers(self) -> ColumnBuffers:
"""
pass

def __arrow_c_schema__(self) -> object:
"""
Export the data type of the Column to a Arrow C schema PyCapsule.

Returns
-------
PyCapsule
"""
pass

def __arrow_c_array__(
self, requested_schema: Optional[object] = None
) -> Tuple[object, object]:
"""
Export the Column as an Arrow C array and schema PyCapsule.

If the Column consists of multiple chunks, this method should raise
an error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not supported at all? Or if this is supported only at the dataframe level (since the same restriction isn't mentioned there), should this say why and/or refer to that support at dataframe level?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be useful to add a section:

Raises
------


Parameters
----------
requested_schema : PyCapsule, default None
The schema to which the dataframe should be casted, passed as a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "casted" -> "cast"

(also further down)

PyCapsule containing a C ArrowSchema representation of the
requested schema.
If None, the array will be returned as-is, with a type matching the
one returned by ``__arrow_c_schema__()``.

Returns
-------
Tuple[PyCapsule, PyCapsule]
A pair of PyCapsules containing a C ArrowSchema and ArrowArray,
respectively.
"""
pass

# def get_children(self) -> Iterable[Column]:
# """
Expand Down Expand Up @@ -490,3 +525,32 @@ def get_chunks(self, n_chunks: Optional[int] = None) -> Iterable["DataFrame"]:
same way.
"""
pass

def __arrow_c_schema__(self) -> object:
"""
Export the schema of the DataFrae to a Arrow C schema PyCapsule.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in DataFrame


Returns
-------
PyCapsule
"""
pass

def __arrow_c_stream__(self, requested_schema: Optional[object] = None) -> object:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet more conflicting terminology? Is "stream" supposed to mean "dataframe" here, rather than CUDA stream? If so, won't that conflict with device support later, and/or confused with DLPack stream support?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does one differentiate between a DataFrame and a struct column here (assuming those will be supported in the future)?

"""
Export the DataFrame as an Arrow C stream PyCapsule.

Parameters
----------
requested_schema : PyCapsule, default None
The schema to which the dataframe should be casted, passed as a
PyCapsule containing a C ArrowSchema representation of the
requested schema.
If None, the array will be returned as-is, with a type matching the
one returned by ``__arrow_c_schema__()``.

Returns
-------
PyCapsule
"""
pass