Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Transpose bugs - degenerate dims and non-uniform GPU #1817

Merged
merged 5 commits into from
Mar 13, 2020

Conversation

klecki
Copy link
Contributor

@klecki klecki commented Mar 12, 2020

cuTT which is used as backend for DALI Transpose
cannot handle degenerate dimensions equal 1,
so they are removed from shape description
together with the corresponding entries from permutation.

As the permutation describes transformation:
Destination dimension i <- source dimension perm[i]

If we remove dimension i from the shape,
we need to remove entry of a value i from perm,
and reduce the netries that are bigger.

It was done the other way round, removing the
entry at position i in perm.

Additionaly, cuTT cannot handle a permutation
with only one dimension - we now fall back
to a cudaMemcpyAsync.

In case of shape = {1, 1, ..., 1}, the preparation
step reduced it to empty shape -> this is now
special cased to always reduce to shape equal {1}
and perm = {0}, which fixes a bug in CPU Transpose
that didn't check for an empty case.

Non-uniform batch case for GPU was also fixed,
as it was not accesing the elements of the batch,
but only the first element.

GTest test were extended with additional checks
for the shape & perm reduction (for degenerate dims = 1),
and for non-unfiorm batch shapes.

Additional docstrings were added to Permutation CPU Impl.

Docstring was extended.

Signed-off-by: Krzysztof Lecki klecki@nvidia.com

Why we need this PR?

Fixes several bugs in Transpose and adds test coverage for those cases.

What happened in this PR?

  • What solution was applied:
    [ Tests for cases with degenerate dims = 1 were added, which allowed to find the bugs.
    The PrepareArgs was adjusted to remove the entries containing removed dimensions dim instead of removing elements at position dim in perm. Special cases for 1-dim was added.
    Non-uniform batch case got new tests and the GPU variant was fixed (although it still has sync). ]
  • Affected modules and functionalities:
    [ Transpose Op ]
  • Key points relevant for the review:
    [ PrepareArguments ]
  • Validation and testing:
    [ GTest ]
  • Documentation (including examples):
    [ Docstring for Transpose was extended with description how it actually works. I used TeX, please check if you're ok with this ]

JIRA TASK: [DALI-1310]

cuTT which is used as backend for DALI Transpose
cannot handle degenerate dimensions equal 1,
so they are removed from shape description
together with the corresponding entries from permutation.

As the permutation describes transformation:
Destination dimension i <- source dimension perm[i]

If we remove dimension `i` from the shape,
we need to remove entry of a value `i` from perm,
and reduce the netries that are bigger.

It was done the other way round, removing the
entry at position `i` in perm.

Additionaly, cuTT cannot handle a permutation
with only one dimension - we now fall back
to a cudaMemcpyAsync.

In case of shape = {1, 1, ..., 1}, the preparation
step reduced it to empty shape -> this is now
special cased to always reduce to shape equal {1}
and perm = {0}, which fixes a bug in CPU Transpose
that didn't check for an empty case.

Non-uniform batch case for GPU was also fixed,
as it was not accesing the elements of the batch,
but only the first element.

GTest test were extended with additional checks
for the shape & perm reduction (for degenerate dims = 1),
and for non-unfiorm batch shapes.

Additional docstrings were added to Permutation CPU Impl.

Docstring was extended.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki
Copy link
Contributor Author

klecki commented Mar 12, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1185787]: BUILD STARTED

Comment on lines 78 to 88
for all valid coordinates:

.. math ::

x_0 \in [0, 100)

x_1 \in [0, 200)

x_3 \in [0, 3)

)code")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is excessive and makes help(Transpose) hard to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we care about the help(anything) with all the :meth: stuff and other rst/sphinx related formatting.

I can maybe wrap it to one line (but with TeX it's going to need some additional spacing syntax, so won't be much better) or remove it.
Do you have suggestion what to do here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to remove the whole part starting at "for all valid coordinates". I'd say it's obvious and does little but clutter the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think so. Removed.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1185787]: BUILD FAILED

one_masks_reduced, uniform)));

TEST(TransposeTest, PrepareArgumentsNoOnes) {
using dtype = SmallVector<int, 6>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something more descriptive?

Suggested change
using dtype = SmallVector<int, 6>;
using array = SmallVector<int, 6>;
Suggested change
using dtype = SmallVector<int, 6>;
using intvec = SmallVector<int, 6>;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

tmp_shape.reserve(N - ones_pos.size());
// shape_idx holds index of original shape (already processed dimensions),
// ones_ids - holds indexes in array of degenerate (1-sized) dimensions
for (int shape_idx = 0, ones_idx = 0; shape_idx < N; shape_idx++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this loop equivalent to this?

for (auto extent : shape)
  if (extent != 1)
    tmp_shape.push_back(extent);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is, I initially wanted to take care of the permutation here as well. Will fix.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki
Copy link
Contributor Author

klecki commented Mar 12, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1185871]: BUILD STARTED


// This will be oterwise reduced to empty shape, and we still want to have some
// notion of non-empty shape left
if (volume(shape) == 1) {
Copy link
Contributor

@mzient mzient Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transpose aside, the code below seems to do the same in linear time and is (arguably) easier to understand:

  if (volume(shape) == 1) {
    shape = {1};
    perm = {0};
    return;
  }

  SmallVector<int, kStatiShapeElements> coord_map;
  SmallVector<ShapeT, kStatiShapeElements> tmp_shape;
  int N = shape.size();
  coord_map.resize(N);
  
  for (int i = 0, skip = 0; i < N; i++) {
    if (shape[i] == 1)
      skip++;
    else
      tmp_shape.push_back(shape[i]);
    coord_map[i] = i - skip;
  }

  VecInt tmp_perm;
  for (int i = 0; i < N; i++) {
    if (shape[perm[i]] == 1)
      continue;
    tmp_perm.push_back(coord_map[perm[i]]);
  }

  perm = std::move(tmp_perm);
  shape = std::move(tmp_shape);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
adjusted a bit, and added some comments.

tmp_perm.reserve(N - ones_pos.size());
for (int i = 0; i < N; i++) {
// this element was removed
if (!std::binary_search(ones_pos.begin(), ones_pos.end(), perm[i])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!std::binary_search(ones_pos.begin(), ones_pos.end(), perm[i])) {
if (shape[perm[i]] != 1)) {

??

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1185871]: BUILD PASSED

@JanuszL JanuszL mentioned this pull request Mar 12, 2020
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

VecInt tmp_perm;
for (int i = 0; i < N; i++) {
// We need to skip those elements of permutation, that correspond to dimensions = 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We need to skip those elements of permutation, that correspond to dimensions = 1.
// We need to skip the elements of permutation which correspond to dimensions with extent = 1.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1187486]: BUILD STARTED

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1187486]: BUILD FAILED

@klecki
Copy link
Contributor Author

klecki commented Mar 13, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1187507]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1187507]: BUILD PASSED

@klecki klecki merged commit 452b359 into NVIDIA:master Mar 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants