-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorJoin kernel for CPU #2301
Conversation
Signed-off-by: szalpal <mszolucha@nvidia.com>
Signed-off-by: szalpal <mszolucha@nvidia.com>
Signed-off-by: szalpal <mszolucha@nvidia.com>
Signed-off-by: szalpal <mszolucha@nvidia.com>
Signed-off-by: szalpal <mszolucha@nvidia.com>
!build |
CI MESSAGE: [1659611]: BUILD STARTED |
CI MESSAGE: [1659611]: BUILD FAILED |
*/ | ||
void Run(KernelContext &ctx, const OutTensorCPU<T, output_dims> &out, | ||
span<const InTensorCPU<T, dims>> in) { | ||
if (in.size() != n_input_tensors_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How likely is that? Do we need to check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the event user provides the input to Run
, that has more tensors than was provided in Setup
, we'll get a segfault. This is why I wanted to add this check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that it is assumed that this won't happen in a well-written program. Should we make it an assert instead? (just a suggestion though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I agree that in a well-written program it wouldn't happen. But in case it's not well-written, segfault might occur and in some really strange situation.
I guess that's an open question for us, how much error-checking we would like to have between Setup and Run calls (I don't remember discussing it before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why we have asserts, to check logic errors. If we were to check every error this way we will end up with extremely verbose code.
I think that assuming same input for Setup and Run is OK, and an assert would suffice for error checking.
I think we have this assumption all around our code base
vector<vector<int>> arr = {{6, 8, 5, 1, 3, 5, 1, 6, 8, 3, 7, 5}, | ||
{4, 5, 1, 8, 4, 4, 1, 4, 1, 7, 6, 6}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector<vector<int>> arr = {{6, 8, 5, 1, 3, 5, 1, 6, 8, 3, 7, 5}, | |
{4, 5, 1, 8, 4, 4, 1, 4, 1, 7, 6, 6}}; | |
vector<vector<int>> arr = {{ | |
100, 101, 102, 103, | |
104, 105, 106, 107, | |
108, 109, 110, 112, | |
}, { | |
200, 201, 202, 203, | |
204, 205, 206, 207, | |
208, 209, 210, 212, | |
}}; |
Signed-off-by: szalpal <mszolucha@nvidia.com>
vector<vector<int>> arr = {{100, 101, 102, 110, 111, 112, 120, 121, 122, 130, 131, 132}, | ||
{200, 201, 202, 210, 211, 212, 220, 221, 222, 230, 231, 232}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being nitpicky, but... can't you format these into 3x4 arrays? It would be so much easier to read.
KernelRequirements Setup(KernelContext &ctx, span<const TensorShape<dims>> in_shapes, int axis) { | ||
n_input_tensors_ = in_shapes.size(); | ||
auto ndims = in_shapes[0].sample_dim(); | ||
DALI_ENFORCE(axis >= -ndims + !new_axis && axis <= ndims - !new_axis, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DALI_ENFORCE(axis >= -ndims + !new_axis && axis <= ndims - !new_axis, | |
DALI_ENFORCE(axis >= 0 && axis <= ndims - !new_axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
make_string("Incorrect axis. Actual: ", axis, ". Expected in [", | ||
-ndims + !new_axis, ", ", ndims - !new_axis, "] interval (", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_string("Incorrect axis. Actual: ", axis, ". Expected in [", | |
-ndims + !new_axis, ", ", ndims - !new_axis, "] interval (", | |
make_string("Incorrect axis index: ", axis, ". Must be between 0 and ", ndims - !new_axis, ".")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected
make_string("Incorrect axis. Actual: ", axis, ". Expected in [", | ||
-ndims + !new_axis, ", ", ndims - !new_axis, "] interval (", | ||
new_axis ? "STACK" : "CONCAT", " mode)")); | ||
axis_ = axis >= 0 ? axis : ndims + axis + new_axis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of Python logic should not make its way to the kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
|
||
int axis_, n_input_tensors_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int axis_, n_input_tensors_; | |
int axis_ = -1, n_input_tensors_ = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
///@} | ||
|
||
static constexpr int output_dims = (dims == DynamicDimensions ? DynamicDimensions : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this definition to the top (line 109)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've intentionally put it here, sufficing "put you definition closest to the using point" policy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally we don't write definitions in between member functions. IMHO it makes this definition hard to find. Anyway, not pushing.
KernelRequirements Setup(KernelContext &ctx, span<const TensorShape<dims>> in_shapes, int axis) { | ||
n_input_tensors_ = in_shapes.size(); | ||
auto ndims = in_shapes[0].sample_dim(); | ||
DALI_ENFORCE(axis >= -ndims + !new_axis && axis <= ndims - !new_axis, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in slack. I believe that this kind of -negative indexing logic doesn't belong to the kernel layer. I'd make the kernel accept:
[0, ndims] for STACK
[0, ndims) for CONCAT
And leave negative indexing to the operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
axis_ = axis >= 0 ? axis : ndims + axis + new_axis; | ||
|
||
{ | ||
const auto &ref = in_shapes[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this scope, and move this line to 117 (so that you can take auto ndims = ref.sample_dim()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to retain the scope - it brings some better structure to the function. It's apparent this way, which part of the function is for error checking, and which for actual processing
for (int j = 0; j < ref.size(); j++) { | ||
if (!new_axis) { | ||
DALI_ENFORCE(in_shapes[i][j] == ref.shape[j] || j == axis_, make_string( | ||
"Number of samples in every dimension (except the concatenated one) " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Number of samples in every dimension (except the concatenated one) " | |
"CONCAT: Number of samples in every dimension (except the concatenated one) " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, the error would be clearer, if the mode info is at the end. Also, sole CONCAT
or STACK
I believe isn't that clear, it should be accompanied with mode
, to remove any ambiguities. Given that, I'd prefer to remain with what I've wrote in the first place, if you don't mind?
" has dimension ", in_shapes[i][j])); | ||
} else { | ||
DALI_ENFORCE(in_shapes[i][j] == ref.shape[j], make_string( | ||
"Number of samples in every dimension must be the same (STACK mode). " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Number of samples in every dimension must be the same (STACK mode). " | |
"STACK: Number of samples in every dimension must be the same. " |
*/ | ||
void Run(KernelContext &ctx, const OutTensorCPU<T, output_dims> &out, | ||
span<const InTensorCPU<T, dims>> in) { | ||
if (in.size() != n_input_tensors_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that it is assumed that this won't happen in a well-written program. Should we make it an assert instead? (just a suggestion though)
TensorShape<> sh1 = {4, 10, 7, 8}; | ||
TensorShape<> sh2 = {4, 5, 14, 8}; | ||
TensorShape<> sh3 = {4, 5, 7, 16}; | ||
EXPECT_EQ(impl::DetermineShape<false>(make_span(shin), 0), sh0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EXPECT_EQ(impl::DetermineShape<false>(make_span(shin), 0), sh0); | |
EXPECT_EQ(impl::DetermineShape<false>(make_span(shin), 0), {8, 5, 7, 8}); |
and so on would read a bit easier (just a suggestion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to emphasise, which reference data goes for which dimension
|
||
|
||
TEST(TensorJoinCpuTest, ConcatenateTensorsTest) { | ||
using namespace std; // NOLINT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using namespace std; // NOLINT | |
using std::vector; |
If you really must
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
|
||
TEST(TensorJoinCpuTest, StackKernelTest) { | ||
using namespace std; // NOLINT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using namespace std; // NOLINT | |
using std::vector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
!build |
CI MESSAGE: [1679344]: BUILD STARTED |
} | ||
///@} | ||
|
||
static constexpr int output_dims = (dims == DynamicDimensions ? DynamicDimensions : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally we don't write definitions in between member functions. IMHO it makes this definition hard to find. Anyway, not pushing.
*/ | ||
void Run(KernelContext &ctx, const OutTensorCPU<T, output_dims> &out, | ||
span<const InTensorCPU<T, dims>> in) { | ||
if (in.size() != n_input_tensors_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why we have asserts, to check logic errors. If we were to check every error this way we will end up with extremely verbose code.
I think that assuming same input for Setup and Run is OK, and an assert would suffice for error checking.
I think we have this assumption all around our code base
!build |
CI MESSAGE: [1679489]: BUILD STARTED |
CI MESSAGE: [1679344]: BUILD FAILED |
CI MESSAGE: [1679489]: BUILD FAILED |
CI MESSAGE: [1679489]: BUILD PASSED |
Why we need this PR?
Pick one, remove the rest
Concatenation operation creates a new tensor, with joined values along some dimension, e.g.
Stacking, OTOH, creates new tensor with added dimension, where
axis
points.Note, that memory layout is the same in both modes, only output shape changes.