Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37484: [Python] Add a FixedSizeTensorScalar class #37533

Merged
merged 45 commits into from
Feb 8, 2024
Merged

Conversation

rok
Copy link
Member

@rok rok commented Sep 4, 2023

Rationale for this change

When working with FixedSizeTensorArray we want to access individual tensors. This would be enabled by adding:

def FixedSizeTensorScalar(pa.ExtensionScalar):
    def to_numpy_ndarray(): ...

See #37484.

What changes are included in this PR?

This adds FixedSizeTensorScalar and tests for it.

Are there any user-facing changes?

Yes, when calling FixedSizeTensorArray[i] we would get back FixedSizeTensorScalar instead of ExtensionScalar.

@github-actions
Copy link

github-actions bot commented Sep 4, 2023

⚠️ GitHub issue #37484 has been automatically assigned in GitHub to PR creator.

@alippai
Copy link
Contributor

alippai commented Sep 4, 2023

Would the numpy array api or https://data-apis.org/array-api/latest/purpose_and_scope.html add any value here?

@rok
Copy link
Member Author

rok commented Sep 5, 2023

@alippai This PR would effectively implement a __pos__(i) method. Which should per array-api return an array while pyarrow typically returns scalar. I'm not sure how to reconcile this. We should probably have a broader discussion on the mailing list about adopting array API.

Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Added two suggestions, otherwise the Pyhton part LGTM.

python/pyarrow/array.pxi Outdated Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
python/pyarrow/types.pxi Outdated Show resolved Hide resolved
@AlenkaF
Copy link
Member

AlenkaF commented Sep 19, 2023

Would the numpy array api or https://data-apis.org/array-api/latest/purpose_and_scope.html add any value here?

I am very much hoping we could implement DLPack in Arrow: #33984. Specially for the new tensor arrays, it would be very beneficial!

@alippai This PR would effectively implement a __pos__(i) method. Which should per array-api return an array while pyarrow typically returns scalar. I'm not sure how to reconcile this. We should probably have a broader discussion on the mailing list about adopting array API.

+1

@AlenkaF
Copy link
Member

AlenkaF commented Sep 19, 2023

One more thing, can the change in the C++ code (GetTensor feature) also be reflected in the title of the PR and the description? I am not sure how we want to rename it, so I haven't made any change.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Sep 24, 2023
@AlenkaF AlenkaF added this to the 15.0.0 milestone Nov 16, 2023
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't yet look in detail, but added some quick drive-by comments. And thanks for working on this!

Can you also add some tests for the new Scalar class?

Currently, for the Python bindings, you added a get_tensor(i) method on the array class, but wouldn't make sense to (also/instead) add a to_tensor() method on the scalar class, since this is to get a Tensor for a single element (scalar) of the array?

cpp/src/arrow/extension/fixed_shape_tensor.cc Outdated Show resolved Hide resolved
cpp/src/arrow/extension/fixed_shape_tensor.cc Outdated Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Dec 1, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 2, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Dec 2, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Feb 7, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 7, 2024
@rok
Copy link
Member Author

rok commented Feb 8, 2024

Thanks a lot for the update. This is really getting good, just a couple comments on specific points.

Thanks for the helpful review @pitrou, I'm happy to see this moving forward! I've addressed your points, please let me know if more changes are needed.

python/pyarrow/array.pxi Show resolved Hide resolved
shape = obj.shape[1:]
size = obj.size / obj.shape[0]
shape = np.take(obj.shape, permutation)
values = np.ravel(obj, order="K")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_strided can be a later PR if desired. The docstring addition is good for now!

@pitrou
Copy link
Member

pitrou commented Feb 8, 2024

@github-actions crossbow submit -g python -g wheel

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 8, 2024
@pitrou
Copy link
Member

pitrou commented Feb 8, 2024

It may be nice to later add a doc section for tensors here:
https://arrow.apache.org/docs/python/

Copy link

github-actions bot commented Feb 8, 2024

Revision: bf2ca0e

Submitted crossbow builds: ursacomputing/crossbow @ actions-2a16c8cab9

Task Status
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest GitHub Actions
test-conda-python-3.10-pandas-nightly GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-upstream_devel GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.8 GitHub Actions
test-conda-python-3.8-pandas-1.0 GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-latest GitHub Actions
test-cuda-python GitHub Actions
test-debian-11-python-3 Azure
test-fedora-38-python-3 Azure
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-python-3 GitHub Actions
wheel-macos-big-sur-cp310-arm64 GitHub Actions
wheel-macos-big-sur-cp311-arm64 GitHub Actions
wheel-macos-big-sur-cp312-arm64 GitHub Actions
wheel-macos-big-sur-cp38-arm64 GitHub Actions
wheel-macos-big-sur-cp39-arm64 GitHub Actions
wheel-macos-catalina-cp310-amd64 GitHub Actions
wheel-macos-catalina-cp311-amd64 GitHub Actions
wheel-macos-catalina-cp312-amd64 GitHub Actions
wheel-macos-catalina-cp38-amd64 GitHub Actions
wheel-macos-catalina-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp38-amd64 GitHub Actions
wheel-manylinux-2-28-cp38-arm64 GitHub Actions
wheel-manylinux-2-28-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp39-arm64 GitHub Actions
wheel-manylinux-2014-cp310-amd64 GitHub Actions
wheel-manylinux-2014-cp310-arm64 GitHub Actions
wheel-manylinux-2014-cp311-amd64 GitHub Actions
wheel-manylinux-2014-cp311-arm64 GitHub Actions
wheel-manylinux-2014-cp312-amd64 GitHub Actions
wheel-manylinux-2014-cp312-arm64 GitHub Actions
wheel-manylinux-2014-cp38-amd64 GitHub Actions
wheel-manylinux-2014-cp38-arm64 GitHub Actions
wheel-manylinux-2014-cp39-amd64 GitHub Actions
wheel-manylinux-2014-cp39-arm64 GitHub Actions
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp38-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

@pitrou pitrou merged commit 026188e into apache:main Feb 8, 2024
33 of 34 checks passed
@pitrou pitrou removed the awaiting change review Awaiting change review label Feb 8, 2024
@rok
Copy link
Member Author

rok commented Feb 8, 2024

Added an issue for the docs: #39998
Thanks for your reviews @pitrou @jorisvandenbossche @AlenkaF @alippai !

@github-actions github-actions bot added the awaiting changes Awaiting changes label Feb 8, 2024
and the rest of the dimensions will match the permuted shape of the fixed
shape tensor.

The conversion is zero-copy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: this is only if the conversion to numpy is zero-copy (i.e. primitive numeric data without nulls)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added to VariableShapeTensor PR 8ca3bf7

Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 026188e.

There were 9 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
)

### Rationale for this change

When working with `FixedSizeTensorArray` we want to access individual tensors. This would be enabled by adding:
```python
def FixedSizeTensorScalar(pa.ExtensionScalar):
    def to_numpy_ndarray(): ...
```

See apache#37484.

### What changes are included in this PR?

This adds `FixedSizeTensorScalar` and tests for it.

### Are there any user-facing changes?

Yes, when calling `FixedSizeTensorArray[i]` we would get back `FixedSizeTensorScalar` instead of `ExtensionScalar`.
* Closes: apache#37484

Lead-authored-by: Rok Mihevc <rok@mihevc.org>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
)

### Rationale for this change

When working with `FixedSizeTensorArray` we want to access individual tensors. This would be enabled by adding:
```python
def FixedSizeTensorScalar(pa.ExtensionScalar):
    def to_numpy_ndarray(): ...
```

See apache#37484.

### What changes are included in this PR?

This adds `FixedSizeTensorScalar` and tests for it.

### Are there any user-facing changes?

Yes, when calling `FixedSizeTensorArray[i]` we would get back `FixedSizeTensorScalar` instead of `ExtensionScalar`.
* Closes: apache#37484

Lead-authored-by: Rok Mihevc <rok@mihevc.org>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
thisisnic pushed a commit to thisisnic/arrow that referenced this pull request Mar 8, 2024
)

### Rationale for this change

When working with `FixedSizeTensorArray` we want to access individual tensors. This would be enabled by adding:
```python
def FixedSizeTensorScalar(pa.ExtensionScalar):
    def to_numpy_ndarray(): ...
```

See apache#37484.

### What changes are included in this PR?

This adds `FixedSizeTensorScalar` and tests for it.

### Are there any user-facing changes?

Yes, when calling `FixedSizeTensorArray[i]` we would get back `FixedSizeTensorScalar` instead of `ExtensionScalar`.
* Closes: apache#37484

Lead-authored-by: Rok Mihevc <rok@mihevc.org>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python] Add a FixedSizeTensorScalar class
6 participants