Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorJoin kernel for CPU #2301

Merged
merged 9 commits into from
Oct 6, 2020
Merged

TensorJoin kernel for CPU #2301

merged 9 commits into from
Oct 6, 2020

Conversation

szalpal
Copy link
Member

@szalpal szalpal commented Sep 28, 2020

Why we need this PR?

Pick one, remove the rest

  • It adds new feature: Stacking or Concatenating tensors

Concatenation operation creates a new tensor, with joined values along some dimension, e.g.

arr0 = [[1, 2, 4, 2], [1, 1, 7, 6], [6, 8, 8, 4]])
shape = (3, 4)

arr1 = [[3, 8, 8, 6], [8, 1, 5, 7], [6, 2, 7, 5]]
shape = (3, 4)

concatenate([arr0, arr1], axis=1) ->
[[1, 2, 4, 2, 3, 8, 8, 6],
 [1, 1, 7, 6, 8, 1, 5, 7],
 [6, 8, 8, 4, 6, 2, 7, 5]])
shape = (3, 8)

Stacking, OTOH, creates new tensor with added dimension, where axis points.

stack([arr0, arr1], axis=1) ->
[[[1, 2, 4, 2],
  [3, 8, 8, 6]],

 [[1, 1, 7, 6],
  [8, 1, 5, 7]],

 [[6, 8, 8, 4],
  [6, 2, 7, 5]]]
shape = (3, 2, 4)

Note, that memory layout is the same in both modes, only output shape changes.

Signed-off-by: szalpal <mszolucha@nvidia.com>
Signed-off-by: szalpal <mszolucha@nvidia.com>
Signed-off-by: szalpal <mszolucha@nvidia.com>
Signed-off-by: szalpal <mszolucha@nvidia.com>
@szalpal szalpal changed the title [WIP] TensorJoin kernel for CPU TensorJoin kernel for CPU Sep 28, 2020
Signed-off-by: szalpal <mszolucha@nvidia.com>
Signed-off-by: szalpal <mszolucha@nvidia.com>
@szalpal
Copy link
Member Author

szalpal commented Sep 28, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1659611]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1659611]: BUILD FAILED

*/
void Run(KernelContext &ctx, const OutTensorCPU<T, output_dims> &out,
span<const InTensorCPU<T, dims>> in) {
if (in.size() != n_input_tensors_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How likely is that? Do we need to check it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the event user provides the input to Run, that has more tensors than was provided in Setup, we'll get a segfault. This is why I wanted to add this check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that it is assumed that this won't happen in a well-written program. Should we make it an assert instead? (just a suggestion though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I agree that in a well-written program it wouldn't happen. But in case it's not well-written, segfault might occur and in some really strange situation.

I guess that's an open question for us, how much error-checking we would like to have between Setup and Run calls (I don't remember discussing it before)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why we have asserts, to check logic errors. If we were to check every error this way we will end up with extremely verbose code.
I think that assuming same input for Setup and Run is OK, and an assert would suffice for error checking.
I think we have this assumption all around our code base

Comment on lines +64 to +65
vector<vector<int>> arr = {{6, 8, 5, 1, 3, 5, 1, 6, 8, 3, 7, 5},
{4, 5, 1, 8, 4, 4, 1, 4, 1, 7, 6, 6}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vector<vector<int>> arr = {{6, 8, 5, 1, 3, 5, 1, 6, 8, 3, 7, 5},
{4, 5, 1, 8, 4, 4, 1, 4, 1, 7, 6, 6}};
vector<vector<int>> arr = {{
100, 101, 102, 103,
104, 105, 106, 107,
108, 109, 110, 112,
}, {
200, 201, 202, 203,
204, 205, 206, 207,
208, 209, 210, 212,
}};

Signed-off-by: szalpal <mszolucha@nvidia.com>
Comment on lines +131 to +132
vector<vector<int>> arr = {{100, 101, 102, 110, 111, 112, 120, 121, 122, 130, 131, 132},
{200, 201, 202, 210, 211, 212, 220, 221, 222, 230, 231, 232}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being nitpicky, but... can't you format these into 3x4 arrays? It would be so much easier to read.

KernelRequirements Setup(KernelContext &ctx, span<const TensorShape<dims>> in_shapes, int axis) {
n_input_tensors_ = in_shapes.size();
auto ndims = in_shapes[0].sample_dim();
DALI_ENFORCE(axis >= -ndims + !new_axis && axis <= ndims - !new_axis,
Copy link
Contributor

@mzient mzient Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DALI_ENFORCE(axis >= -ndims + !new_axis && axis <= ndims - !new_axis,
DALI_ENFORCE(axis >= 0 && axis <= ndims - !new_axis

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 119 to 120
make_string("Incorrect axis. Actual: ", axis, ". Expected in [",
-ndims + !new_axis, ", ", ndims - !new_axis, "] interval (",
Copy link
Contributor

@mzient mzient Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
make_string("Incorrect axis. Actual: ", axis, ". Expected in [",
-ndims + !new_axis, ", ", ndims - !new_axis, "] interval (",
make_string("Incorrect axis index: ", axis, ". Must be between 0 and ", ndims - !new_axis, "."));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected

make_string("Incorrect axis. Actual: ", axis, ". Expected in [",
-ndims + !new_axis, ", ", ndims - !new_axis, "] interval (",
new_axis ? "STACK" : "CONCAT", " mode)"));
axis_ = axis >= 0 ? axis : ndims + axis + new_axis;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of Python logic should not make its way to the kernels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}


int axis_, n_input_tensors_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int axis_, n_input_tensors_;
int axis_ = -1, n_input_tensors_ = -1;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
///@}

static constexpr int output_dims = (dims == DynamicDimensions ? DynamicDimensions :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this definition to the top (line 109)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've intentionally put it here, sufficing "put you definition closest to the using point" policy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we don't write definitions in between member functions. IMHO it makes this definition hard to find. Anyway, not pushing.

KernelRequirements Setup(KernelContext &ctx, span<const TensorShape<dims>> in_shapes, int axis) {
n_input_tensors_ = in_shapes.size();
auto ndims = in_shapes[0].sample_dim();
DALI_ENFORCE(axis >= -ndims + !new_axis && axis <= ndims - !new_axis,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in slack. I believe that this kind of -negative indexing logic doesn't belong to the kernel layer. I'd make the kernel accept:
[0, ndims] for STACK
[0, ndims) for CONCAT
And leave negative indexing to the operator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

axis_ = axis >= 0 ? axis : ndims + axis + new_axis;

{
const auto &ref = in_shapes[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this scope, and move this line to 117 (so that you can take auto ndims = ref.sample_dim())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to retain the scope - it brings some better structure to the function. It's apparent this way, which part of the function is for error checking, and which for actual processing

for (int j = 0; j < ref.size(); j++) {
if (!new_axis) {
DALI_ENFORCE(in_shapes[i][j] == ref.shape[j] || j == axis_, make_string(
"Number of samples in every dimension (except the concatenated one) "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Number of samples in every dimension (except the concatenated one) "
"CONCAT: Number of samples in every dimension (except the concatenated one) "

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the error would be clearer, if the mode info is at the end. Also, sole CONCAT or STACK I believe isn't that clear, it should be accompanied with mode, to remove any ambiguities. Given that, I'd prefer to remain with what I've wrote in the first place, if you don't mind?

" has dimension ", in_shapes[i][j]));
} else {
DALI_ENFORCE(in_shapes[i][j] == ref.shape[j], make_string(
"Number of samples in every dimension must be the same (STACK mode). "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Number of samples in every dimension must be the same (STACK mode). "
"STACK: Number of samples in every dimension must be the same. "

*/
void Run(KernelContext &ctx, const OutTensorCPU<T, output_dims> &out,
span<const InTensorCPU<T, dims>> in) {
if (in.size() != n_input_tensors_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that it is assumed that this won't happen in a well-written program. Should we make it an assert instead? (just a suggestion though)

TensorShape<> sh1 = {4, 10, 7, 8};
TensorShape<> sh2 = {4, 5, 14, 8};
TensorShape<> sh3 = {4, 5, 7, 16};
EXPECT_EQ(impl::DetermineShape<false>(make_span(shin), 0), sh0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EXPECT_EQ(impl::DetermineShape<false>(make_span(shin), 0), sh0);
EXPECT_EQ(impl::DetermineShape<false>(make_span(shin), 0), {8, 5, 7, 8});

and so on would read a bit easier (just a suggestion)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to emphasise, which reference data goes for which dimension



TEST(TensorJoinCpuTest, ConcatenateTensorsTest) {
using namespace std; // NOLINT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using namespace std; // NOLINT
using std::vector;

If you really must

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



TEST(TensorJoinCpuTest, StackKernelTest) {
using namespace std; // NOLINT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using namespace std; // NOLINT
using std::vector;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: szalpal <mszolucha@nvidia.com>
Signed-off-by: szalpal <mszolucha@nvidia.com>
@szalpal
Copy link
Member Author

szalpal commented Oct 6, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1679344]: BUILD STARTED

}
///@}

static constexpr int output_dims = (dims == DynamicDimensions ? DynamicDimensions :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we don't write definitions in between member functions. IMHO it makes this definition hard to find. Anyway, not pushing.

*/
void Run(KernelContext &ctx, const OutTensorCPU<T, output_dims> &out,
span<const InTensorCPU<T, dims>> in) {
if (in.size() != n_input_tensors_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why we have asserts, to check logic errors. If we were to check every error this way we will end up with extremely verbose code.
I think that assuming same input for Setup and Run is OK, and an assert would suffice for error checking.
I think we have this assumption all around our code base

@mzient
Copy link
Contributor

mzient commented Oct 6, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1679489]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1679344]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1679489]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1679489]: BUILD PASSED

@szalpal szalpal merged commit e30b577 into NVIDIA:master Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants