Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Python] DLPack on FixedShapeTensorArray/FixedShapeTensorScalar #38868

Open
AlenkaF opened this issue Nov 23, 2023 · 10 comments
Open

[C++][Python] DLPack on FixedShapeTensorArray/FixedShapeTensorScalar #38868

AlenkaF opened this issue Nov 23, 2023 · 10 comments

Comments

@AlenkaF
Copy link
Member

AlenkaF commented Nov 23, 2023

Describe the enhancement requested

With the work in #33984 support for DLPack (exporting) will be added to Arrays. It would be great to do similar for FixedShapeTensorArray (and Tensor) classes.

I would like to open a discussion to decide how to approach the implementation of tensor protocol in case of FixedShapeTensorArray.

If we follow the same logic for FixedShapeTensorArray as in #38472 the dimensionality of the object would change as the tensor extension type array is an array of tensors and not a single tensor. This is not necessarily the right approach. Maybe it would make more sense to apply DLPack support to FixedShapeTensorScalar.

Will be happy to receive feedback on this!
cc @jorisvandenbossche @rok

Component(s)

C++, Python

@rok
Copy link
Member

rok commented Nov 23, 2023

Maybe it would make more sense to apply DLPack support to FixedShapeTensorScalar.

Would it make sense to present contiguous buffers of ChunkedArray as DLPack tensors?

@AlenkaF
Copy link
Member Author

AlenkaF commented Nov 27, 2023

Not sure I understand. You mean a ChunkedArray where the chunks are FIxedShapeTensorArray objects? Or you mean a general Array to be presented as a DLPack tensor?

@rok
Copy link
Member

rok commented Nov 27, 2023

The first option. I'm not sure the second is an option - I'm assuming non-contiguous (chunked) arrays can't be represented as DLPack tensors?

@AlenkaF
Copy link
Member Author

AlenkaF commented Nov 27, 2023

The first option. I'm not sure the second is an option - I'm assuming non-contiguous (chunked) arrays can't be represented as DLPack tensors?

Ah, maybe the issue desc isn't clear.

What I meant was, are we talking about chunks of tensors or chunks of primitive arrays. Then we can make it clearer and just talk about tensor extension arrays vs primitive arrays, as chunking can't be supported because DLPack needs a contiguous buffer.

So for the second part, for primitive arrays (not chunked), this is already being worked on in #33984. What I would really like to see is to have DLPack support for FixedShapeTensorArray also (or nested in general?).

This is the case I was having in mind when opening this issue and the question I think is: would it be correct to present the whole tensor extension storage array as a DLPack tensor or would it be more correct to represent specific tensor in the FixedShapeTensorArray (list element) as a DLPack tensor?

@rok
Copy link
Member

rok commented Nov 27, 2023

This is the case I was having in mind when opening this issue and the question I think is: would it be correct to present the whole tensor extension storage array as a DLPack tensor or would it be more correct to represent specific tensor in the FixedShapeTensorArray (list element) as a DLPack tensor?

If Array sized DLPack tensors can easily be sliced to Scalar tensors (and all other things being equal) I'd be slightly in favor of having Array sized tensors just because that'd mean keeping less objects around :)

@jorisvandenbossche
Copy link
Member

The reason I am hesitant to add a direct FixedShapeTensorArray.__dlpack__ is because this would change the dimensionality of the returned object.

All our arrays (including FixedShapeTensorArray) are semantically 1D arrays, and all the primitive arrays that will support __dlpack__ with #38472 also return a 1D tensor (for example, np.from_dlpack(pa_arr) will be 1D).
Similarly, Array.to_numpy() also always gives you a 1D numpy array (including for FixedShapeTensorArray).

I think it would be inconsistent of such methods like to_numpy(), np.asarray(..) or np.from_dlpack(..) would start to give an object of different dimensionality depending on the data type.


An alternative way to expose the full nD array backing the FixedShapeTensorArray could also be to implement a to_tensor() method, which would be a specific API for fixed shape tensor and exactly meant to get the underlying nD array in a zero-copy way.
If we then add dlpack support to our Tensor class (which we can do anyway, I think), this would still expose the data of FixedShapeTensorArray through DLPack.

One might argue that a downside of this is that you then need do an additional to_tensor() that you don't need for other types, before being able to convert it to some tensor library with from_dlpack. However, I think the user / library will always need to be aware they are handling a fixed shape tensor array: 1) dlpack only works for certain dtypes, so generally you will have to check that anyway, and 2) exactly because it returns a different dimension (and is a very specific data type), I think you are typically handling this specifically already compared to other columns.

@pitrou
Copy link
Member

pitrou commented Dec 6, 2023

If we then add dlpack support to our Tensor class (which we can do anyway, I think), this would still expose the data of FixedShapeTensorArray through DLPack.

Then we should probably also support to_tensor() on regular primitive arrays? For example a 100-entry float32 array would return a 1d float32 tensor with shape (100,).

Also, fortunately DLPack doesn't support nulls, because neither does our Tensor class :-)

@AlenkaF
Copy link
Member Author

AlenkaF commented Dec 6, 2023

I think going through our Tensor class would be a good idea. To summarize we would be adding

  • to_tensor on primitive arrays returning 1-dimensional tensor.
  • to_tensor on FixedShapeTensorArray (or general nested arrays?) returning a n-dimensional array.

And then we would add DLPack support to the Tensor class 👍

And this will all have to be done in C++.

@pitrou
Copy link
Member

pitrou commented Dec 6, 2023

to_tensor on FixedShapeTensorArray (or general nested arrays?) returning a n-dimensional array.

On any FixedSizeList array probably?

And this will all have to be done in C++.

:-P

@jorisvandenbossche
Copy link
Member

And this will all have to be done in C++.

Note that the more minimal scope of FixedShapeTensorArray, this already exists in C++, so here it's just a matter of adding bindings to that in a python to_tensor() method.

(I would argue that this is more important than implementing generic Array->Tensor conversion, because primitive arrays already have __dlpack__ support directly)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants