-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update broadcasting shape simplification logic #4314
Update broadcasting shape simplification logic #4314
Conversation
Signed-off-by: Joaquin Anton <janton@nvidia.com>
3bd5f07
to
0e1ea43
Compare
for (int k = 0; k < shapes.size(); k++) { | ||
auto &sh = *shapes[k]; | ||
if ((sh[dim] != 1 && sh[dim + 1] == 1 && !all_ones(dim + 1)) || | ||
(sh[dim] == 1 && !all_ones(dim) && sh[dim + 1] != 1)) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two issues:
all_ones(dim + 1)
and all_ones(dim)
are not only in the inner loop, but they refer to different dimensions - so there's a big chance they'll be calculated multiple times (2*shapes.size()
). I'd suggest to replace the function with an array, where the values would be stored once and then accessed as needed.
// 2. All extents on that dimension and the next are the same or one, | ||
// and the ones are either present or not present in both dimensions | ||
auto can_merge_next = [=](int dim) { | ||
if (all_same(dim) && all_same(dim + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise - all_same
could be arrays.
@@ -274,33 +274,165 @@ void ExpandToNDims(TensorShape<> &sh, int ndim) { | |||
sh = shape_cat(TensorShape<>(std::vector<int64_t>(ndim - sh.sample_dim(), 1)), sh); | |||
} | |||
|
|||
void SimplifyShapesForBroadcasting(TensorShape<>& lhs, TensorShape<> &rhs) { | |||
|
|||
SmallVector<std::pair<int, int>, 5> SimplifiedShapeCollapseGroups(span<TensorShape<>*> shapes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this function could be split into one that extracts the groups and one that collapses the shapes (we actually have the utilities for the latter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this function was just meant to extract the groups (didn't modify the shapes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question and some suggestion for can_merge_ones.
|
||
lhs = collapse_dims(lhs, group_dims); | ||
rhs = collapse_dims(rhs, group_dims); | ||
bool IsBroadcastingEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more general question, how much do we want to control this via the env or maybe introduce something like:
nvidia.dali.experimental.enable_math_broadcasting()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely we will remove this as soon as we have full support (including ternary operators). This is just a dev switch for the time being, before we enable the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should even keep this at all when the feature is good enough to be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
// Can collapse dimensions with ones | ||
// when there is no transition from or to an 'odd' one | ||
// (meaning that the rest of the extents are not all one | ||
// in the two dimensions) | ||
// Examples: | ||
// 1. Can collapse even if we have transition from 2 to 1, | ||
// because there are only ones on the second dimension | ||
// {2 1} -> {2} | ||
// {1 1} -> {1} | ||
// 2. Can NOT collapse because there is a transition to 1 | ||
// {2 1} -> {2 1} | ||
// {2 2} -> {2 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be clearer to me if the comment said, that we can collapse two dimensions that are broadcasted, if the broadcasting happens only within one of the shapes, that is case one ({A, B} op {1, 1}) but not case two ({A, 1}, {1, B}).
Moreover, sh[dim + 1] == 1 && !all_ones(dim + 1) could be considered is_broadcast(sh, dim + 1)
or something similar.
We can't collapse if for some shape dim and dim+1 are having two different results for is_broadcast, am I right?
As a side note, we can always collapse a dimension that is all_ones, right? Those check let this happen, right?
if (compatible_vol(volumes, i)) { | ||
int j = i + 1; | ||
for (; j < full_ndim;) { | ||
if (compatible_vol(volumes, j) && can_merge_next(j - 1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need both compatible_vol
and can_merge_next
? Isn't can_merge_next
enough?
Signed-off-by: Joaquin Anton <janton@nvidia.com>
2a9a19d
to
370f91d
Compare
if (modify_shapes) { | ||
for (int i = 0; i < n; i++) { | ||
*shapes[i] = outs[i]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be outside the if (d < ndim) {
- otherwise the shapes remain unsimplified in a totally degenerate case.
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
!build |
CI MESSAGE: [6146986]: BUILD STARTED |
!build |
CI MESSAGE: [6148343]: BUILD STARTED |
CI MESSAGE: [6148343]: BUILD FAILED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add some comments, like a mention that we skip if all shapes have 1 in that dimension.
It also looks like it can collapse non-compatible dims, for example {10, 2} op {5, 4}
to 20 op 20
.
SmallVector<TensorShape<>, 3> outs; | ||
outs.resize(n); | ||
|
||
int ndim = shapes[0]->size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the sample_dim()
which is easier to differentiate from size() and has the semantic meaning, I find it highly confusing how we implemented size() in TensorShape and TensorListShape.
int ndim = shapes[0]->size(); | |
int ndim = shapes[0]->sample_dim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it highly confusing that we have sample_dim
in TensorShape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still it's dimension of this sample, rather than some arbitrary size. And when you get to TensorShape and TensorListShape residing side by side in the code the usage of size()
is highly confusing.
if (static_cast<int>(shapes[i]->size()) > ndim) | ||
ndim = shapes[i]->size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (static_cast<int>(shapes[i]->size()) > ndim) | |
ndim = shapes[i]->size(); | |
if (static_cast<int>(shapes[i]->sample_dim()) > ndim) | |
ndim = shapes[i]->sample_dim(); |
|
||
auto get = [&](int shape, int dim) -> int64_t { | ||
auto &s = *shapes[shape]; | ||
dim -= ndim - s.size(); // add leading unit dims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dim -= ndim - s.size(); // add leading unit dims | |
dim -= ndim - s.sample_dim(); // add leading unit dims |
}; | ||
|
||
auto should_skip = [&](int d) { | ||
for (int i = 0; i < n; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I'm the greatest fan of the n
as something that is captured by reference by a lot of lambdas. At this point I was like: wait, where did this local variable come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you suggest we do instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_operands
for example?
|
||
int group_start = 0; | ||
|
||
auto can_collapse = [&](int d) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is any kind of equality check on the matching dimensions? Or do we drop those, and assume we already checked it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we assume just that - but it can be easily added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to simplify shapes only after we've checked that they are compatible.
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Shape compatibility is checked prior to the simplification step (in PropagateShapes, arithmetic.h) |
CI MESSAGE: [6148343]: BUILD PASSED |
!build |
CI MESSAGE: [6169540]: BUILD STARTED |
SmallVector<TensorShape<>, 3> outs; | ||
outs.resize(n); | ||
|
||
int ndim = shapes[0]->sample_dim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:(
CI MESSAGE: [6169540]: BUILD PASSED |
Signed-off-by: Joaquin Anton janton@nvidia.com
Category:
New feature / Refactoring
Description:
{2, 3, 4} and {1, 1, 4} can be simplified to {6, 4} and {1, 4} before execution
Additional information:
Affected modules and functionalities:
broadcasting.* utils
Key points relevant for the review:
Can we make the algorithm in SimplifiedShapeCollapseGroups any simpler?
Tests:
ArithmeticOpsBroadcastingTest.SimplifyShapesForBroadcasting
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-3063