Fix Transpose bugs - degenerate dims and non-uniform GPU #1817

klecki · 2020-03-12T17:03:26Z

cuTT which is used as backend for DALI Transpose
cannot handle degenerate dimensions equal 1,
so they are removed from shape description
together with the corresponding entries from permutation.

As the permutation describes transformation:
Destination dimension i <- source dimension perm[i]

If we remove dimension i from the shape,
we need to remove entry of a value i from perm,
and reduce the netries that are bigger.

It was done the other way round, removing the
entry at position i in perm.

Additionaly, cuTT cannot handle a permutation
with only one dimension - we now fall back
to a cudaMemcpyAsync.

In case of shape = {1, 1, ..., 1}, the preparation
step reduced it to empty shape -> this is now
special cased to always reduce to shape equal {1}
and perm = {0}, which fixes a bug in CPU Transpose
that didn't check for an empty case.

Non-uniform batch case for GPU was also fixed,
as it was not accesing the elements of the batch,
but only the first element.

GTest test were extended with additional checks
for the shape & perm reduction (for degenerate dims = 1),
and for non-unfiorm batch shapes.

Additional docstrings were added to Permutation CPU Impl.

Docstring was extended.

Signed-off-by: Krzysztof Lecki klecki@nvidia.com

Why we need this PR?

Fixes several bugs in Transpose and adds test coverage for those cases.

What happened in this PR?

What solution was applied:
[ Tests for cases with degenerate dims = 1 were added, which allowed to find the bugs.
The PrepareArgs was adjusted to remove the entries containing removed dimensions dim instead of removing elements at position dim in perm. Special cases for 1-dim was added.
Non-uniform batch case got new tests and the GPU variant was fixed (although it still has sync). ]
Affected modules and functionalities:
[ Transpose Op ]
Key points relevant for the review:
[ PrepareArguments ]
Validation and testing:
[ GTest ]
Documentation (including examples):
[ Docstring for Transpose was extended with description how it actually works. I used TeX, please check if you're ok with this ]

JIRA TASK: [DALI-1310]

cuTT which is used as backend for DALI Transpose cannot handle degenerate dimensions equal 1, so they are removed from shape description together with the corresponding entries from permutation. As the permutation describes transformation: Destination dimension i <- source dimension perm[i] If we remove dimension `i` from the shape, we need to remove entry of a value `i` from perm, and reduce the netries that are bigger. It was done the other way round, removing the entry at position `i` in perm. Additionaly, cuTT cannot handle a permutation with only one dimension - we now fall back to a cudaMemcpyAsync. In case of shape = {1, 1, ..., 1}, the preparation step reduced it to empty shape -> this is now special cased to always reduce to shape equal {1} and perm = {0}, which fixes a bug in CPU Transpose that didn't check for an empty case. Non-uniform batch case for GPU was also fixed, as it was not accesing the elements of the batch, but only the first element. GTest test were extended with additional checks for the shape & perm reduction (for degenerate dims = 1), and for non-unfiorm batch shapes. Additional docstrings were added to Permutation CPU Impl. Docstring was extended. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2020-03-12T17:04:39Z

!build

dali-automaton · 2020-03-12T17:10:58Z

CI MESSAGE: [1185787]: BUILD STARTED

mzient · 2020-03-12T17:12:36Z

dali/operators/generic/transpose/transpose.cc

+for all valid coordinates:
+
+.. math ::
+
+  x_0 \in [0, 100)
+
+  x_1 \in [0, 200)
+
+  x_3 \in [0, 3)
+
+)code")


IMO this is excessive and makes help(Transpose) hard to read.

I'm not sure if we care about the help(anything) with all the :meth: stuff and other rst/sphinx related formatting.

I can maybe wrap it to one line (but with TeX it's going to need some additional spacing syntax, so won't be much better) or remove it.
Do you have suggestion what to do here?

My suggestion is to remove the whole part starting at "for all valid coordinates". I'd say it's obvious and does little but clutter the documentation.

If you think so. Removed.

dali-automaton · 2020-03-12T17:26:19Z

CI MESSAGE: [1185787]: BUILD FAILED

mzient · 2020-03-12T17:29:41Z

dali/operators/generic/transpose/transpose_test.cc

+                                                       one_masks_reduced, uniform)));
+
+TEST(TransposeTest, PrepareArgumentsNoOnes) {
+  using dtype = SmallVector<int, 6>;


something more descriptive?

Suggested change

using dtype = SmallVector<int, 6>;

using array = SmallVector<int, 6>;

Suggested change

using dtype = SmallVector<int, 6>;

using intvec = SmallVector<int, 6>;

mzient · 2020-03-12T17:33:18Z

dali/operators/generic/transpose/transpose.h

+  tmp_shape.reserve(N - ones_pos.size());
+  // shape_idx holds index of original shape (already processed dimensions),
+  // ones_ids - holds indexes in array of degenerate (1-sized) dimensions
+  for (int shape_idx = 0, ones_idx = 0; shape_idx < N; shape_idx++) {


Isn't this loop equivalent to this?

for (auto extent : shape) if (extent != 1) tmp_shape.push_back(extent);

Yes it is, I initially wanted to take care of the permutation here as well. Will fix.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2020-03-12T17:40:56Z

!build

dali-automaton · 2020-03-12T17:46:01Z

CI MESSAGE: [1185871]: BUILD STARTED

mzient · 2020-03-12T17:54:01Z

dali/operators/generic/transpose/transpose.h

+
+  // This will be oterwise reduced to empty shape, and we still want to have some
+  // notion of non-empty shape left
+  if (volume(shape) == 1) {


transpose aside, the code below seems to do the same in linear time and is (arguably) easier to understand:

if (volume(shape) == 1) { shape = {1}; perm = {0}; return; } SmallVector<int, kStatiShapeElements> coord_map; SmallVector<ShapeT, kStatiShapeElements> tmp_shape; int N = shape.size(); coord_map.resize(N); for (int i = 0, skip = 0; i < N; i++) { if (shape[i] == 1) skip++; else tmp_shape.push_back(shape[i]); coord_map[i] = i - skip; } VecInt tmp_perm; for (int i = 0; i < N; i++) { if (shape[perm[i]] == 1) continue; tmp_perm.push_back(coord_map[perm[i]]); } perm = std::move(tmp_perm); shape = std::move(tmp_shape);

👍
adjusted a bit, and added some comments.

mzient · 2020-03-12T18:02:20Z

dali/operators/generic/transpose/transpose.h

+  tmp_perm.reserve(N - ones_pos.size());
+  for (int i = 0; i < N; i++) {
+    // this element was removed
+    if (!std::binary_search(ones_pos.begin(), ones_pos.end(), perm[i])) {


Suggested change

if (!std::binary_search(ones_pos.begin(), ones_pos.end(), perm[i])) {

if (shape[perm[i]] != 1)) {

??

dali-automaton · 2020-03-12T19:35:14Z

CI MESSAGE: [1185871]: BUILD PASSED

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

mzient · 2020-03-13T10:13:17Z

dali/operators/generic/transpose/transpose.h

+
+  VecInt tmp_perm;
+  for (int i = 0; i < N; i++) {
+    // We need to skip those elements of permutation, that correspond to dimensions = 1.


Suggested change

// We need to skip those elements of permutation, that correspond to dimensions = 1.

// We need to skip the elements of permutation which correspond to dimensions with extent = 1.

dali-automaton · 2020-03-13T10:15:57Z

CI MESSAGE: [1187486]: BUILD STARTED

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

dali-automaton · 2020-03-13T10:21:05Z

CI MESSAGE: [1187486]: BUILD FAILED

klecki · 2020-03-13T10:28:14Z

!build

dali-automaton · 2020-03-13T10:30:44Z

CI MESSAGE: [1187507]: BUILD STARTED

dali-automaton · 2020-03-13T12:14:57Z

CI MESSAGE: [1187507]: BUILD PASSED

klecki requested review from jantonguirao, awolant, JanuszL and szalpal March 12, 2020 17:06

mzient reviewed Mar 12, 2020

View reviewed changes

klecki mentioned this pull request Mar 12, 2020

ops.Transpose maybe have a bug #1802

Closed

mzient reviewed Mar 12, 2020

View reviewed changes

Gcc nitpicking + rename

401ff80

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

mzient reviewed Mar 12, 2020

View reviewed changes

JanuszL mentioned this pull request Mar 12, 2020

More transpose tests #1804

Closed

JanuszL approved these changes Mar 12, 2020

View reviewed changes

Review fixes

7a17bb9

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

mzient reviewed Mar 13, 2020

View reviewed changes

mzient approved these changes Mar 13, 2020

View reviewed changes

Remove verbose coordinates from doc

98e440b

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki force-pushed the transpose-bug branch from d30a973 to 98e440b Compare March 13, 2020 10:16

Lint

18b3fed

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki merged commit 452b359 into NVIDIA:master Mar 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Transpose bugs - degenerate dims and non-uniform GPU #1817

Fix Transpose bugs - degenerate dims and non-uniform GPU #1817

klecki commented Mar 12, 2020

klecki commented Mar 12, 2020

dali-automaton commented Mar 12, 2020

mzient Mar 12, 2020

klecki Mar 12, 2020

mzient Mar 12, 2020

klecki Mar 13, 2020

dali-automaton commented Mar 12, 2020

mzient Mar 12, 2020

klecki Mar 13, 2020

mzient Mar 12, 2020

klecki Mar 13, 2020

klecki commented Mar 12, 2020

dali-automaton commented Mar 12, 2020

mzient Mar 12, 2020 •

edited

Loading

klecki Mar 13, 2020

mzient Mar 12, 2020

dali-automaton commented Mar 12, 2020

mzient Mar 13, 2020

dali-automaton commented Mar 13, 2020

dali-automaton commented Mar 13, 2020

klecki commented Mar 13, 2020

dali-automaton commented Mar 13, 2020

dali-automaton commented Mar 13, 2020

	using dtype = SmallVector<int, 6>;
	using array = SmallVector<int, 6>;

	using dtype = SmallVector<int, 6>;
	using intvec = SmallVector<int, 6>;

	if (!std::binary_search(ones_pos.begin(), ones_pos.end(), perm[i])) {
	if (shape[perm[i]] != 1)) {

	// We need to skip those elements of permutation, that correspond to dimensions = 1.
	// We need to skip the elements of permutation which correspond to dimensions with extent = 1.

Fix Transpose bugs - degenerate dims and non-uniform GPU #1817

Fix Transpose bugs - degenerate dims and non-uniform GPU #1817

Conversation

klecki commented Mar 12, 2020

Why we need this PR?

What happened in this PR?

klecki commented Mar 12, 2020

dali-automaton commented Mar 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Mar 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klecki commented Mar 12, 2020

dali-automaton commented Mar 12, 2020

mzient Mar 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Mar 12, 2020

Choose a reason for hiding this comment

dali-automaton commented Mar 13, 2020

dali-automaton commented Mar 13, 2020

klecki commented Mar 13, 2020

dali-automaton commented Mar 13, 2020

dali-automaton commented Mar 13, 2020

mzient Mar 12, 2020 •

edited

Loading