Skip to content

ARROW-6508: [C++] Add Tensor and SparseTensor factory function with validations#5862

Closed
mrkn wants to merge 19 commits intoapache:masterfrom
mrkn:ARROW-6508
Closed

ARROW-6508: [C++] Add Tensor and SparseTensor factory function with validations#5862
mrkn wants to merge 19 commits intoapache:masterfrom
mrkn:ARROW-6508

Conversation

@mrkn
Copy link
Copy Markdown
Member

@mrkn mrkn commented Nov 19, 2019

I'd like to add Make factory functions in Tensor and SparseTensor. These functions validate the given parameters.

The following validations run in Tensor::Make:

  • Data type (non-null and is_tensor_supported)
  • data pointer
  • the size of shape
  • elements of shape are all positive
  • the consistency of shape and strides
  • the consistency of strides and the size of data
  • the consistency of the size of dim_names and shape

The following validations run in SparseTensor::Make:

  • Data type
  • the consistency of the sparse index and shape
  • the consistency of the size of dim_names and shape

TODO:

  • Use Result<T> as return types.

@github-actions
Copy link
Copy Markdown

@mrkn mrkn force-pushed the ARROW-6508 branch 6 times, most recently from dd11db5 to 1997d1c Compare November 22, 2019 03:21
@wesm
Copy link
Copy Markdown
Member

wesm commented Nov 22, 2019

A couple of high level comments before I review the details:

  • I might suggest calling these MakeSafe or similar since the Make constructors on other classes cannot fail (see Table::Make) (cc @fsaintjacques @pitrou for thoughts)
  • Since you're adding new APIs, are these good candidates to use Result<T> instead of Status?

@mrkn
Copy link
Copy Markdown
Member Author

mrkn commented Nov 23, 2019

I might suggest calling these MakeSafe or similar since the Make constructors on other classes cannot fail (see Table::Make)

OK, I understand. Renaming to MakeSafe is good for me, but I'm concerning that this name is a new name in C++ API.

Since you're adding new APIs, are these good candidates to use Result<T> instead of Status?

Oh, I forgot to rewrite this to use Result<T>. I will do it.

Thanks!

@mrkn mrkn force-pushed the ARROW-6508 branch 5 times, most recently from 52f1d5f to 692c0dc Compare November 26, 2019 04:10
@mrkn
Copy link
Copy Markdown
Member Author

mrkn commented Nov 26, 2019

I finished rewriting by using Result<T>, renaming to MakeSafe, and rebasing it onto the latest master branch.

I search uses of Result<T> and I found that StructArray::Make performs validations and returns Result<T>.
It may be better to rename it to StructArray::MakeSafe likewise.

@mrkn mrkn marked this pull request as ready for review November 26, 2019 04:11
@pitrou
Copy link
Copy Markdown
Member

pitrou commented Nov 26, 2019

I would rather have all those methods named Make personally. The fact that they return a Result hint that there may be some validation going on.

@fsaintjacques
Copy link
Copy Markdown
Contributor

The other (unsafe) Make functions also return Status and use an output parameter. The difference is whether they will logically validate (via Validate) the inputs.

We had this discussion previously, not sure if on the ML or in a PR review: Make|MakeUnsafe versus Make|MakeSafe.

@mrkn
Copy link
Copy Markdown
Member Author

mrkn commented Nov 27, 2019

I guess it is enough to see the return type to distinguish whether a Make function may perform validations and can be failed, too.

But we have to use the different names if we want to introduce two factory functions, which have the same argument types and the different return types, and the one performs validations and the other does not.

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Nov 27, 2019

Well, why do we need the unsafe APIs here? You're only checking the shape and strides, so it's gonna be fast (unless you have 1000-dimensional tensors?).

@mrkn
Copy link
Copy Markdown
Member Author

mrkn commented Nov 27, 2019

We don't need unsafe APIs in tensor and sparse tensor.

I just mentioned my anxiety about the potential issue of the naming convention that uses only Make. For example, we can consider adding a safe version of Table::Make:

Result<std::shared_ptr<Table>> Make(
    const std::shared_ptr<Schema>& schema,
    const std::vector<std::shared_ptr<ChunkedArray>>& columns,
    int64_t num_rows = -1
);

But we cannot add this because this function differs only the return type from the existing Table::Make.

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Nov 27, 2019

If we want to add a safe version of Table::Make, we can rename the unsafe version to Table::MakeUnsafe.

@mrkn
Copy link
Copy Markdown
Member Author

mrkn commented Nov 27, 2019

OK, I'll revert the commit to rename Make to MakeSafe.

I found the previous discussion thread in ML. The discussion hasn't been settled yet.

Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! There's some potential for simplification in some places. Also, the tests would deserve developing a bit.

mrkn added 2 commits December 3, 2019 12:19
These static factory functions validate the given arguments and then
make a new Tensor object if the arguments are valid.
@mrkn mrkn requested a review from pitrou December 4, 2019 01:03
@mrkn
Copy link
Copy Markdown
Member Author

mrkn commented Dec 4, 2019

@pitrou I finished fixes for all of your comments. Could you please review this again?

Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Will merge when green, thank you!

@pitrou pitrou closed this in 70aea01 Dec 4, 2019
nealrichardson pushed a commit to nealrichardson/arrow that referenced this pull request Dec 6, 2019
…alidations

I'd like to add `Make` factory functions in `Tensor` and `SparseTensor`.  These functions validate the given parameters.

The following validations run in `Tensor::Make`:

- Data type (non-null and `is_tensor_supported`)
- `data` pointer
- the size of `shape`
- elements of `shape` are all positive
- the consistency of `shape` and `strides`
- the consistency of `strides` and the size of `data`
- the consistency of the size of `dim_names` and `shape`

The following validations run in `SparseTensor::Make`:

- Data type
- the consistency of the sparse index and `shape`
- the consistency of the size of `dim_names` and `shape`

---

## TODO:

- [x] Use `Result<T>` as return types.

Closes apache#5862 from mrkn/ARROW-6508 and squashes the following commits:

40a723b <Kenta Murata> Let Tensor::Make support 0-dim tensors
e03f1f7 <Kenta Murata> Fix tests of Tensor and SparseTensor more appropriately
6130d25 <Kenta Murata> Simplify Tensor::Make and NumericTensor::Make declarations
195cd92 <Kenta Murata> Fix SparseTensorImpl::Make
29913bd <Kenta Murata> Remove a needless condition
4801c3f <Kenta Murata> Make ValidateShape inline
32ef792 <Kenta Murata> Use appropriate macros
16fd011 <Kenta Murata> Rename MakeSafe to Make
03ce831 <Kenta Murata> Fix error messages
0c69d60 <Kenta Murata> Refactoring
aa8017b <Kenta Murata> Rname to MakeSafe
9cbe5d8 <Kenta Murata> Use Result<T>
bf1a474 <Kenta Murata> SparseCOOIndex::Make and SparseCSRIndex::Make
1a2b5ca <Kenta Murata> Reject negative items in shape
a7ac21d <Kenta Murata> Add SparseTensorImpl<T>::Make
9a2bfaf <Kenta Murata> Add NumericTensor::Make functions
aec0ac5 <Kenta Murata> Remove Tensor::Make without strides
b9fc3c9 <Kenta Murata> Extract internal::ValidateTensorParameters
5aa0e63 <Kenta Murata> Add Tensor::Make functions

Authored-by: Kenta Murata <mrkn@mrkn.jp>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@mrkn mrkn deleted the ARROW-6508 branch January 23, 2020 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants