-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Transpose bugs - degenerate dims and non-uniform GPU #1817
Conversation
cuTT which is used as backend for DALI Transpose cannot handle degenerate dimensions equal 1, so they are removed from shape description together with the corresponding entries from permutation. As the permutation describes transformation: Destination dimension i <- source dimension perm[i] If we remove dimension `i` from the shape, we need to remove entry of a value `i` from perm, and reduce the netries that are bigger. It was done the other way round, removing the entry at position `i` in perm. Additionaly, cuTT cannot handle a permutation with only one dimension - we now fall back to a cudaMemcpyAsync. In case of shape = {1, 1, ..., 1}, the preparation step reduced it to empty shape -> this is now special cased to always reduce to shape equal {1} and perm = {0}, which fixes a bug in CPU Transpose that didn't check for an empty case. Non-uniform batch case for GPU was also fixed, as it was not accesing the elements of the batch, but only the first element. GTest test were extended with additional checks for the shape & perm reduction (for degenerate dims = 1), and for non-unfiorm batch shapes. Additional docstrings were added to Permutation CPU Impl. Docstring was extended. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
!build |
CI MESSAGE: [1185787]: BUILD STARTED |
for all valid coordinates: | ||
|
||
.. math :: | ||
|
||
x_0 \in [0, 100) | ||
|
||
x_1 \in [0, 200) | ||
|
||
x_3 \in [0, 3) | ||
|
||
)code") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is excessive and makes help(Transpose) hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we care about the help(anything) with all the :meth: stuff and other rst/sphinx related formatting.
I can maybe wrap it to one line (but with TeX it's going to need some additional spacing syntax, so won't be much better) or remove it.
Do you have suggestion what to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is to remove the whole part starting at "for all valid coordinates". I'd say it's obvious and does little but clutter the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think so. Removed.
CI MESSAGE: [1185787]: BUILD FAILED |
one_masks_reduced, uniform))); | ||
|
||
TEST(TransposeTest, PrepareArgumentsNoOnes) { | ||
using dtype = SmallVector<int, 6>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something more descriptive?
using dtype = SmallVector<int, 6>; | |
using array = SmallVector<int, 6>; |
using dtype = SmallVector<int, 6>; | |
using intvec = SmallVector<int, 6>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tmp_shape.reserve(N - ones_pos.size()); | ||
// shape_idx holds index of original shape (already processed dimensions), | ||
// ones_ids - holds indexes in array of degenerate (1-sized) dimensions | ||
for (int shape_idx = 0, ones_idx = 0; shape_idx < N; shape_idx++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this loop equivalent to this?
for (auto extent : shape)
if (extent != 1)
tmp_shape.push_back(extent);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is, I initially wanted to take care of the permutation here as well. Will fix.
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
!build |
CI MESSAGE: [1185871]: BUILD STARTED |
|
||
// This will be oterwise reduced to empty shape, and we still want to have some | ||
// notion of non-empty shape left | ||
if (volume(shape) == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transpose
aside, the code below seems to do the same in linear time and is (arguably) easier to understand:
if (volume(shape) == 1) {
shape = {1};
perm = {0};
return;
}
SmallVector<int, kStatiShapeElements> coord_map;
SmallVector<ShapeT, kStatiShapeElements> tmp_shape;
int N = shape.size();
coord_map.resize(N);
for (int i = 0, skip = 0; i < N; i++) {
if (shape[i] == 1)
skip++;
else
tmp_shape.push_back(shape[i]);
coord_map[i] = i - skip;
}
VecInt tmp_perm;
for (int i = 0; i < N; i++) {
if (shape[perm[i]] == 1)
continue;
tmp_perm.push_back(coord_map[perm[i]]);
}
perm = std::move(tmp_perm);
shape = std::move(tmp_shape);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
adjusted a bit, and added some comments.
tmp_perm.reserve(N - ones_pos.size()); | ||
for (int i = 0; i < N; i++) { | ||
// this element was removed | ||
if (!std::binary_search(ones_pos.begin(), ones_pos.end(), perm[i])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!std::binary_search(ones_pos.begin(), ones_pos.end(), perm[i])) { | |
if (shape[perm[i]] != 1)) { |
??
CI MESSAGE: [1185871]: BUILD PASSED |
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
|
||
VecInt tmp_perm; | ||
for (int i = 0; i < N; i++) { | ||
// We need to skip those elements of permutation, that correspond to dimensions = 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We need to skip those elements of permutation, that correspond to dimensions = 1. | |
// We need to skip the elements of permutation which correspond to dimensions with extent = 1. |
CI MESSAGE: [1187486]: BUILD STARTED |
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
CI MESSAGE: [1187486]: BUILD FAILED |
!build |
CI MESSAGE: [1187507]: BUILD STARTED |
CI MESSAGE: [1187507]: BUILD PASSED |
cuTT which is used as backend for DALI Transpose
cannot handle degenerate dimensions equal 1,
so they are removed from shape description
together with the corresponding entries from permutation.
As the permutation describes transformation:
Destination dimension i <- source dimension perm[i]
If we remove dimension
i
from the shape,we need to remove entry of a value
i
from perm,and reduce the netries that are bigger.
It was done the other way round, removing the
entry at position
i
in perm.Additionaly, cuTT cannot handle a permutation
with only one dimension - we now fall back
to a cudaMemcpyAsync.
In case of shape = {1, 1, ..., 1}, the preparation
step reduced it to empty shape -> this is now
special cased to always reduce to shape equal {1}
and perm = {0}, which fixes a bug in CPU Transpose
that didn't check for an empty case.
Non-uniform batch case for GPU was also fixed,
as it was not accesing the elements of the batch,
but only the first element.
GTest test were extended with additional checks
for the shape & perm reduction (for degenerate dims = 1),
and for non-unfiorm batch shapes.
Additional docstrings were added to Permutation CPU Impl.
Docstring was extended.
Signed-off-by: Krzysztof Lecki klecki@nvidia.com
Why we need this PR?
Fixes several bugs in Transpose and adds test coverage for those cases.
What happened in this PR?
[ Tests for cases with degenerate dims = 1 were added, which allowed to find the bugs.
The PrepareArgs was adjusted to remove the entries containing removed dimensions
dim
instead of removing elements at positiondim
in perm. Special cases for 1-dim was added.Non-uniform batch case got new tests and the GPU variant was fixed (although it still has sync). ]
[ Transpose Op ]
[ PrepareArguments ]
[ GTest ]
[ Docstring for Transpose was extended with description how it actually works. I used TeX, please check if you're ok with this ]
JIRA TASK: [DALI-1310]