-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Add a Tensor logical value type with constant shape, implemented using ExtensionType #15483
Comments
Wes McKinney / @wesm: |
Christian Hudon / @chrish42: For me, there are two cases I'd need Arrow to support:
|
Wes McKinney / @wesm: |
Christian Hudon / @chrish42: Is the approach proposed sound for this case? What would be a next step in terms of code? Is there an example of another type that's implemented as en ExtensionType that I could have a look at? |
Joris Van den Bossche / @jorisvandenbossche: For implementing it in C++, probably best source are the test extension types: https://github.com/apache/arrow/blob/master/cpp/src/arrow/testing/extension_type.h and https://github.com/apache/arrow/blob/master/cpp/src/arrow/extension_type_test.cc |
Christian Hudon / @chrish42: |
Christopher Osborn: |
Rok Mihevc / @rok: |
Bryan Cutler / @BryanCutler: https://github.com/CODAIT/text-extensions-for-pandas/blob/master/text_extensions_for_pandas/array/tensor.py We would love to help out with this effort and contribute what we have to Arrow, if it fits the bill! |
Rok Mihevc / @rok: |
Bryan Cutler / @BryanCutler: |
Christian Hudon / @chrish42: |
Rok Mihevc / @rok: As this is for the case where all tensors in the array are of the same shape I propose we store the data in a single Tensor. Is there a good reason not to do that? I assume we should support non-contiguous tensors. I'll add that. Any comments at this point? @chrish42 - feel free to jump in any time. |
I believe if there is a single pyarrow.Tensor serialized in the backing binary array, that array will have a length of 1. Then if placing in a Table or RecordBatch, would restrict it to 1 row for the other columns as well. |
Bryan Cutler / @BryanCutler: |
Rok Mihevc / @rok: I've also started working on the c++ for this I'll report back soon. |
Bryan Cutler / @BryanCutler: |
Wenbing Bai: |
Rok Mihevc / @rok: |
Todd Farmer / @toddfarmer: |
Apache Arrow JIRA Bot: |
> [ARROW-1614](https://issues.apache.org/jira/browse/ARROW-1614): In an Arrow table, we would like to add support for a column that has values cells each containing a tensor value, with all tensors having the same dimensions. These would be stored as a binary value, plus some metadata to store type and shape/strides. * Closes: #15483 Lead-authored-by: Rok Mihevc <rok@mihevc.org> Co-authored-by: Rok <rok@mihevc.org> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
apache#8510) > [ARROW-1614](https://issues.apache.org/jira/browse/ARROW-1614): In an Arrow table, we would like to add support for a column that has values cells each containing a tensor value, with all tensors having the same dimensions. These would be stored as a binary value, plus some metadata to store type and shape/strides. * Closes: apache#15483 Lead-authored-by: Rok Mihevc <rok@mihevc.org> Co-authored-by: Rok <rok@mihevc.org> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
apache#8510) > [ARROW-1614](https://issues.apache.org/jira/browse/ARROW-1614): In an Arrow table, we would like to add support for a column that has values cells each containing a tensor value, with all tensors having the same dimensions. These would be stored as a binary value, plus some metadata to store type and shape/strides. * Closes: apache#15483 Lead-authored-by: Rok Mihevc <rok@mihevc.org> Co-authored-by: Rok <rok@mihevc.org> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
In an Arrow table, we would like to add support for a column that has values cells each containing a tensor value, with all tensors having the same shape/dimensions. These would be stored as a binary value, plus some metadata to store type and shape/strides.
Reporter: Wes McKinney / @wesm
Assignee: Rok Mihevc / @rok
Watchers: Rok Mihevc / @rok
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-1614. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: