Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-34796: [C++] Add FromTensor, ToTensor and strides methods to FixedShapeTensorArray #34797

Merged
merged 5 commits into from
Apr 11, 2023

Conversation

rok
Copy link
Member

@rok rok commented Mar 30, 2023

Rationale for this change

We want to enable converting Tensors to FixedShapeTensorArrays and the other way around.

What changes are included in this PR?

This adds FromTensor, ToTensor to FixedShapeTensorArrays and strides method to FixedShapeTensorType.

Are these changes tested?

Yes.

Are there any user-facing changes?

This adds FromTensor, ToTensor and strides are user facing methods.

@github-actions
Copy link

@rok rok force-pushed the from_tensor_to_tensor branch 2 times, most recently from 5a62709 to f72e7a2 Compare March 30, 2023 12:07
@rok rok changed the title GH-34796: [C++] Add a Fixed Shape Tensor canonical ExtensionType GH-34796: [C++] Add FromTensor, ToTensor and strides methods to FixedShapeTensorArray Mar 30, 2023
@rok rok marked this pull request as ready for review April 4, 2023 13:27
@jorisvandenbossche
Copy link
Member

In my last review of #8510 before splitting it (#8510 (review)), I still had some comments/questions on the FromTensor/ToTensor implementation. I am assuming that nothing fundamentally changed here (just split off what was at that point in that PR?).
Can you take a look at those comments and respond to them? Especially the comments about the resulting shape of FromTensor (#8510 (comment), #8510 (comment); you answered then that this was resolved because of permutation shape_, but that's no longer the case). It might be that you addressed those comments, but that's hard to see if you don't explicitly explain what you changed.

Comment on lines 305 to 306
auto cell_shapes = std::vector<std::vector<int64_t>>{{3, 4}, {3, 4}, {4, 3}, {4, 3}};
auto permutations = std::vector<std::vector<int64_t>>{{0, 1}, {1, 0}, {0, 1}, {1, 0}};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I assume this is the equivalent place compared to where I commented before on the previous tests: for the second test case (shape {3, 3, 4}, cell_shape {3, 4} and permutation of {1, 0}), shouldn't the expected fixed_shape_tensor type's shape be {4, 3}?
The tensor itself has shape {3, 3, 4} (so {3, 4} for the individual tensor element), but has a permutation, so {4, 3} ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed ToTensor/FromTensor to permute original shape to cell_shape and back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we should permute also dim_names? My original thought wast to keep the original shape and only use permutation generating strides.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says "dim_names ... map to the physical layout (row-major)". So they should correspond to the shape/dimensions of the extension type (and so not necessarily the original tensor).
(so I assume that is a yes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I changed the FromTensor/ToTensor to permute the dim_names accordingly.

@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Apr 6, 2023
@rok
Copy link
Member Author

rok commented Apr 6, 2023

A change introduced late in #8510 (and then removed) was that ComputeRowMajorStrides is replaced with a general ComputeStrides to calculate strides of permuted tensors which should help matching pytorch behavior:

import torch

def print_strides(shape, permutation):
    permutation2 = [p - 1 for p in permutation[1:]]
    shape2 = shape[1:]
    x = torch.randn(shape).permute(permutation)
    y = torch.randn(shape2).permute(permutation2)
    strides = [z * 8 for z in x.stride()]
    strides2 = [z * 8 for z in y.stride()]
    
    print("[full tensor] shape:", shape, "permutation:", permutation, "strides:", strides)
    print("[cell tensor] shape:", shape2, "permutation:", permutation2, "strides:", strides2)

shapes = [
    (3, 3, 4),
    (3, 3, 4),
    (3, 4, 3),
    (3, 4, 3),
]
permutations = [
    (0, 1, 2),
    (0, 2, 1),
    (0, 1, 2),
    (0, 2, 1),
]

for shape, permutation in zip(shapes, permutations):
    print_strides(shape, permutation)
[full tensor] shape: (3, 3, 4) permutation: (0, 1, 2) strides: [96, 32, 8]
[cell tensor] shape: (3, 4) permutation: [0, 1] strides: [32, 8]
[full tensor] shape: (3, 3, 4) permutation: (0, 2, 1) strides: [96, 8, 32]
[cell tensor] shape: (3, 4) permutation: [1, 0] strides: [8, 32]
[full tensor] shape: (3, 4, 3) permutation: (0, 1, 2) strides: [96, 24, 8]
[cell tensor] shape: (4, 3) permutation: [0, 1] strides: [24, 8]
[full tensor] shape: (3, 4, 3) permutation: (0, 2, 1) strides: [96, 8, 24]
[cell tensor] shape: (4, 3) permutation: [1, 0] strides: [8, 24]

@rok rok added this to the 12.0.0 milestone Apr 6, 2023
///
/// This method will create a Tensor from a FixedShapeTensorArray, setting its
/// first dimension as length equal to the FixedShapeTensorArray's length and the
/// remaining dimensions as the FixedShapeTensorType's shape.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here to the docstring that this will automatically reshape the resulting Tensor according to the permutation metadata of the FixedShapeTensorArray?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 101 to 102
/// \brief Compute strides of FixedShapeTensorType
static Status ComputeStrides(const FixedWidthType& type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for having this public in addition to strides() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it's like this now is because both FixedShapeTensorType and FixedShapeTensorArray are using ComputeStrides, but ComputeStrides is a FixedShapeTensorType method. I'll see if I can put it into another namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to anonymous namespace.

@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Apr 7, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Apr 7, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Apr 7, 2023
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, the converted shape and dim_names look correct to me now.

@ursabot
Copy link

ursabot commented Apr 12, 2023

Benchmark runs are scheduled for baseline = c40e658 and contender = aff876a. aff876a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.28% ⬆️0.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] aff876a5 ec2-t3-xlarge-us-east-2
[Failed] aff876a5 test-mac-arm
[Finished] aff876a5 ursa-i9-9960x
[Finished] aff876a5 ursa-thinkcentre-m75q
[Finished] c40e658f ec2-t3-xlarge-us-east-2
[Failed] c40e658f test-mac-arm
[Finished] c40e658f ursa-i9-9960x
[Finished] c40e658f ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

paleolimbot pushed a commit that referenced this pull request Apr 17, 2023
)

### Rationale for this change

`std::reinterpret_pointer_cast` was introduced with FixedShapeTensor PR (#34797) but is not available on OSX (see #35143).

### What changes are included in this PR?

This change switches to using `internal::checked_pointer_cast`.

### Are these changes tested?

Change is tested in CI, but should also be verified on crossbow.

### Are there any user-facing changes?

No.
* Closes: #35143

Authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
raulcd pushed a commit that referenced this pull request Apr 17, 2023
)

### Rationale for this change

`std::reinterpret_pointer_cast` was introduced with FixedShapeTensor PR (#34797) but is not available on OSX (see #35143).

### What changes are included in this PR?

This change switches to using `internal::checked_pointer_cast`.

### Are these changes tested?

Change is tested in CI, but should also be verified on crossbow.

### Are there any user-facing changes?

No.
* Closes: #35143

Authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
… FixedShapeTensorArray (apache#34797)

### Rationale for this change

We want to enable converting Tensors to FixedShapeTensorArrays and the other way around.

### What changes are included in this PR?

This adds FromTensor, ToTensor to FixedShapeTensorArrays and strides method to FixedShapeTensorType.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

This adds FromTensor, ToTensor and strides are user facing methods.
* Closes: apache#34796

Authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
apache#35154)

### Rationale for this change

`std::reinterpret_pointer_cast` was introduced with FixedShapeTensor PR (apache#34797) but is not available on OSX (see apache#35143).

### What changes are included in this PR?

This change switches to using `internal::checked_pointer_cast`.

### Are these changes tested?

Change is tested in CI, but should also be verified on crossbow.

### Are there any user-facing changes?

No.
* Closes: apache#35143

Authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
… FixedShapeTensorArray (apache#34797)

### Rationale for this change

We want to enable converting Tensors to FixedShapeTensorArrays and the other way around.

### What changes are included in this PR?

This adds FromTensor, ToTensor to FixedShapeTensorArrays and strides method to FixedShapeTensorType.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

This adds FromTensor, ToTensor and strides are user facing methods.
* Closes: apache#34796

Authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
apache#35154)

### Rationale for this change

`std::reinterpret_pointer_cast` was introduced with FixedShapeTensor PR (apache#34797) but is not available on OSX (see apache#35143).

### What changes are included in this PR?

This change switches to using `internal::checked_pointer_cast`.

### Are these changes tested?

Change is tested in CI, but should also be verified on crossbow.

### Are there any user-facing changes?

No.
* Closes: apache#35143

Authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this pull request May 16, 2023
… FixedShapeTensorArray (apache#34797)

### Rationale for this change

We want to enable converting Tensors to FixedShapeTensorArrays and the other way around.

### What changes are included in this PR?

This adds FromTensor, ToTensor to FixedShapeTensorArrays and strides method to FixedShapeTensorType.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

This adds FromTensor, ToTensor and strides are user facing methods.
* Closes: apache#34796

Authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this pull request May 16, 2023
apache#35154)

### Rationale for this change

`std::reinterpret_pointer_cast` was introduced with FixedShapeTensor PR (apache#34797) but is not available on OSX (see apache#35143).

### What changes are included in this PR?

This change switches to using `internal::checked_pointer_cast`.

### Are these changes tested?

Change is tested in CI, but should also be verified on crossbow.

### Are there any user-facing changes?

No.
* Closes: apache#35143

Authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Add FromTensor, ToTensor and strides methods to FixedShapeTensorArray
3 participants