-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-15483: [C++] Add a Fixed Shape Tensor canonical ExtensionType #8510
Conversation
rok
commented
Oct 22, 2020
•
edited by github-actions
bot
Loading
edited by github-actions
bot
- Closes: [C++] Add a Tensor logical value type with constant shape, implemented using ExtensionType #15483
Currently, only the shape is stored. Is this enough? That does a assume a fixed row major order? |
I think we either assume that or also store strides / dimension order. I am not sure how dimension order changes are done in other frameworks (TF, pytorch, etc.) but I would assume they don't reorder tensors in memory. So I would go for storing strides. |
d4608a9
to
356c300
Compare
b5a8643
to
a5b19d7
Compare
In the context of testing metadata equality withinin multiple parquet files in a dataset, equality on shape and strides may be a very strict requirement. Would relaxing the equality requirement to only compare the number of tensor dimensions negatively impact the design? |
Good point. By tensor dimensions you mean shape, right? |
I was thinking even looser: def __eq__(self, other):
len(self.shape) == len(other.shape) |
Done. |
@jorisvandenbossche @sjperkins @pitrou is there interest to get this in? |
Currently we don't ship any standard extension types. I recommend discussing this on the mailing-list. |
fyi, the ray project created its own Tensor type: |
Indeed I think having a built-in Tensor value type (implemented using extension arrays) in Arrow/pyarrow would be better than having third party projects rolling their own. |
@wesm would there be interest in folding the Pandas side of these third-party extensions into Pandas also? |
That will be something to discuss in the pandas project. |
@rok, you are awesome! 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failing CI is unrelated? (it seems the R failures are being worked on, and the C++ failures are related to LLVM update #34768)
Great news! |
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
They seem unrelated indeed and I don't think they obscure any new problems as the change was fairly minimal. |
Merged after 2.5 years ;) Thanks @rok! |
Thanks for all the input and reviews everyone, very happy to see this merged! @jorisvandenbossche now let's talk about strides @ #34797 :D |
Benchmark runs are scheduled for baseline = 81c828e and contender = a84a39b. a84a39b is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
### Rationale for this change In the C++ the fixed shape tensor canonical extension type is implementated #8510 so we can add bindings to the extension type in Python. ### What changes are included in this PR? Binding for fixed shape tensor canonical extension type. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #34882 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
### Rationale for this change In the C++ the fixed shape tensor canonical extension type is implementated apache#8510 so we can add bindings to the extension type in Python. ### What changes are included in this PR? Binding for fixed shape tensor canonical extension type. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#34882 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
apache#8510) > [ARROW-1614](https://issues.apache.org/jira/browse/ARROW-1614): In an Arrow table, we would like to add support for a column that has values cells each containing a tensor value, with all tensors having the same dimensions. These would be stored as a binary value, plus some metadata to store type and shape/strides. * Closes: apache#15483 Lead-authored-by: Rok Mihevc <rok@mihevc.org> Co-authored-by: Rok <rok@mihevc.org> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
### Rationale for this change In the C++ the fixed shape tensor canonical extension type is implementated apache#8510 so we can add bindings to the extension type in Python. ### What changes are included in this PR? Binding for fixed shape tensor canonical extension type. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#34882 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
apache#8510) > [ARROW-1614](https://issues.apache.org/jira/browse/ARROW-1614): In an Arrow table, we would like to add support for a column that has values cells each containing a tensor value, with all tensors having the same dimensions. These would be stored as a binary value, plus some metadata to store type and shape/strides. * Closes: apache#15483 Lead-authored-by: Rok Mihevc <rok@mihevc.org> Co-authored-by: Rok <rok@mihevc.org> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
### Rationale for this change In the C++ the fixed shape tensor canonical extension type is implementated apache#8510 so we can add bindings to the extension type in Python. ### What changes are included in this PR? Binding for fixed shape tensor canonical extension type. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#34882 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
We started a mailing list discussion about potential |