Update broadcasting shape simplification logic #4314

jantonguirao · 2022-10-06T11:18:10Z

Signed-off-by: Joaquin Anton janton@nvidia.com

Category:

New feature / Refactoring

Description:

It extends the shape simplification utils for broadcasting, to cover few cases that were incorrect
For example, collapsing adjacent ones
{2, 3, 4} and {1, 1, 4} can be simplified to {6, 4} and {1, 4} before execution
Added more test-cases to cover some interesting examples
Extended the utilities to work with ternary operators (actually, the utils now support any number of arguments)
Decoupled the selection of collapsible groups from the rest. This way we can calculate the groups to be collapsed on the arguments and apply the same transformation to the output shape (usage like this is part of the next PR).
Note: To see this utils in context, see the working prototype of arithmetic operator broadcasting: Support shape broadcasting in arithmetic operators (CPU & GPU) #4282

Additional information:

Affected modules and functionalities:

broadcasting.* utils

Key points relevant for the review:

Can we make the algorithm in SimplifiedShapeCollapseGroups any simpler?

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-3063

Signed-off-by: Joaquin Anton <janton@nvidia.com>

mzient · 2022-10-10T12:59:34Z

dali/operators/math/expressions/broadcasting.cc

+    for (int k = 0; k < shapes.size(); k++) {
+      auto &sh = *shapes[k];
+      if ((sh[dim] != 1 && sh[dim + 1] == 1 && !all_ones(dim + 1)) ||
+          (sh[dim] == 1 && !all_ones(dim) && sh[dim + 1] != 1)) {
+        return false;
+      }


Two issues:
all_ones(dim + 1) and all_ones(dim) are not only in the inner loop, but they refer to different dimensions - so there's a big chance they'll be calculated multiple times (2*shapes.size()). I'd suggest to replace the function with an array, where the values would be stored once and then accessed as needed.

mzient · 2022-10-10T12:59:56Z

dali/operators/math/expressions/broadcasting.cc

+  // 2. All extents on that dimension and the next are the same or one,
+  //    and the ones are either present or not present in both dimensions
+  auto can_merge_next = [=](int dim) {
+    if (all_same(dim) && all_same(dim + 1))


Likewise - all_same could be arrays.

mzient · 2022-10-10T13:01:20Z

dali/operators/math/expressions/broadcasting.cc

@@ -274,33 +274,165 @@ void ExpandToNDims(TensorShape<> &sh, int ndim) {
  sh = shape_cat(TensorShape<>(std::vector<int64_t>(ndim - sh.sample_dim(), 1)), sh);
 }

-void SimplifyShapesForBroadcasting(TensorShape<>& lhs, TensorShape<> &rhs) {
+
+SmallVector<std::pair<int, int>, 5> SimplifiedShapeCollapseGroups(span<TensorShape<>*> shapes) {


I think that this function could be split into one that extracts the groups and one that collapses the shapes (we actually have the utilities for the latter).

Actually this function was just meant to extract the groups (didn't modify the shapes).

klecki

Question and some suggestion for can_merge_ones.

klecki · 2022-10-10T13:39:19Z

dali/operators/math/expressions/broadcasting.cc


-  lhs = collapse_dims(lhs, group_dims);
-  rhs = collapse_dims(rhs, group_dims);
+bool IsBroadcastingEnabled() {


This is more general question, how much do we want to control this via the env or maybe introduce something like:
nvidia.dali.experimental.enable_math_broadcasting()?

Most likely we will remove this as soon as we have full support (including ternary operators). This is just a dev switch for the time being, before we enable the feature.

I don't think we should even keep this at all when the feature is good enough to be merged.

klecki · 2022-10-10T14:55:07Z

dali/operators/math/expressions/broadcasting.cc

+  // Can collapse dimensions with ones
+  // when there is no transition from or to an 'odd' one
+  // (meaning that the rest of the extents are not all one
+  // in the two dimensions)
+  // Examples:
+  // 1. Can collapse even if we have transition from 2 to 1,
+  //    because there are only ones on the second dimension
+  // {2 1} -> {2}
+  // {1 1} -> {1}
+  // 2. Can NOT collapse because there is a transition to 1
+  // {2 1} -> {2 1}
+  // {2 2} -> {2 2}


It would be clearer to me if the comment said, that we can collapse two dimensions that are broadcasted, if the broadcasting happens only within one of the shapes, that is case one ({A, B} op {1, 1}) but not case two ({A, 1}, {1, B}).

Moreover, sh[dim + 1] == 1 && !all_ones(dim + 1) could be considered is_broadcast(sh, dim + 1) or something similar.

We can't collapse if for some shape dim and dim+1 are having two different results for is_broadcast, am I right?

As a side note, we can always collapse a dimension that is all_ones, right? Those check let this happen, right?

klecki · 2022-10-10T15:18:46Z

dali/operators/math/expressions/broadcasting.cc

+    if (compatible_vol(volumes, i)) {
+      int j = i + 1;
+      for (; j < full_ndim;) {
+        if (compatible_vol(volumes, j) && can_merge_next(j - 1)) {


Why do we need both compatible_vol and can_merge_next? Isn't can_merge_next enough?

Signed-off-by: Joaquin Anton <janton@nvidia.com>

mzient · 2022-10-11T07:19:04Z

dali/operators/math/expressions/broadcasting.cc

+    if (modify_shapes) {
+      for (int i = 0; i < n; i++) {
+        *shapes[i] = outs[i];
+      }


This should be outside the if (d < ndim) { - otherwise the shapes remain unsimplified in a totally degenerate case.

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao · 2022-10-11T09:02:56Z

!build

dali-automaton · 2022-10-11T09:05:10Z

CI MESSAGE: [6146986]: BUILD STARTED

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao · 2022-10-11T12:52:19Z

!build

dali-automaton · 2022-10-11T12:55:37Z

CI MESSAGE: [6148343]: BUILD STARTED

dali-automaton · 2022-10-11T16:57:56Z

CI MESSAGE: [6148343]: BUILD FAILED

klecki

It would be nice to add some comments, like a mention that we skip if all shapes have 1 in that dimension.

It also looks like it can collapse non-compatible dims, for example {10, 2} op {5, 4} to 20 op 20.

klecki · 2022-10-11T15:01:33Z

dali/operators/math/expressions/broadcasting.cc

+  SmallVector<TensorShape<>, 3> outs;
+  outs.resize(n);
+
+  int ndim = shapes[0]->size();


Please use the sample_dim() which is easier to differentiate from size() and has the semantic meaning, I find it highly confusing how we implemented size() in TensorShape and TensorListShape.

Suggested change

int ndim = shapes[0]->size();

int ndim = shapes[0]->sample_dim();

I find it highly confusing that we have sample_dim in TensorShape.

Still it's dimension of this sample, rather than some arbitrary size. And when you get to TensorShape and TensorListShape residing side by side in the code the usage of size() is highly confusing.

klecki · 2022-10-11T15:02:06Z

dali/operators/math/expressions/broadcasting.cc

+    if (static_cast<int>(shapes[i]->size()) > ndim)
+      ndim = shapes[i]->size();


Suggested change

if (static_cast<int>(shapes[i]->size()) > ndim)

ndim = shapes[i]->size();

if (static_cast<int>(shapes[i]->sample_dim()) > ndim)

ndim = shapes[i]->sample_dim();

klecki · 2022-10-11T15:03:38Z

dali/operators/math/expressions/broadcasting.cc

+
+  auto get = [&](int shape, int dim) -> int64_t {
+    auto &s = *shapes[shape];
+    dim -= ndim - s.size();  // add leading unit dims


Suggested change

dim -= ndim - s.size(); // add leading unit dims

dim -= ndim - s.sample_dim(); // add leading unit dims

klecki · 2022-10-11T15:08:23Z

dali/operators/math/expressions/broadcasting.cc

+  };
+
+  auto should_skip = [&](int d) {
+    for (int i = 0; i < n; i++)


I am not sure if I'm the greatest fan of the n as something that is captured by reference by a lot of lambdas. At this point I was like: wait, where did this local variable come from?

What do you suggest we do instead?

num_operands for example?

klecki · 2022-10-11T17:56:39Z

dali/operators/math/expressions/broadcasting.cc

+
+  int group_start = 0;
+
+  auto can_collapse = [&](int d) {


Where is any kind of equality check on the matching dimensions? Or do we drop those, and assume we already checked it?

I think we assume just that - but it can be easily added.

We try to simplify shapes only after we've checked that they are compatible.

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao · 2022-10-13T08:19:45Z

It would be nice to add some comments, like a mention that we skip if all shapes have 1 in that dimension.

It also looks like it can collapse non-compatible dims, for example {10, 2} op {5, 4} to 20 op 20.

Shape compatibility is checked prior to the simplification step (in PropagateShapes, arithmetic.h)

dali-automaton · 2022-10-13T08:26:21Z

CI MESSAGE: [6148343]: BUILD PASSED

jantonguirao · 2022-10-13T10:11:55Z

!build

dali-automaton · 2022-10-13T10:26:58Z

CI MESSAGE: [6169540]: BUILD STARTED

mzient · 2022-10-13T10:28:01Z

dali/operators/math/expressions/broadcasting.cc

+  SmallVector<TensorShape<>, 3> outs;
+  outs.resize(n);
+
+  int ndim = shapes[0]->sample_dim();


dali-automaton · 2022-10-13T14:32:30Z

CI MESSAGE: [6169540]: BUILD PASSED

jantonguirao marked this pull request as ready for review October 6, 2022 11:18

jantonguirao assigned klecki Oct 6, 2022

Update broadcasting shape simplification logic

0e1ea43

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao force-pushed the broadcasting_utils_update branch from 3bd5f07 to 0e1ea43 Compare October 6, 2022 12:17

jantonguirao mentioned this pull request Oct 6, 2022

Support shape broadcasting in arithmetic operators (CPU & GPU) #4282

Closed

18 tasks

jantonguirao assigned mzient Oct 7, 2022

mzient reviewed Oct 10, 2022

View reviewed changes

klecki reviewed Oct 10, 2022

View reviewed changes

Simpler algorithm for SimplifyShapesForBroadcasting

370f91d

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao force-pushed the broadcasting_utils_update branch from 2a9a19d to 370f91d Compare October 11, 2022 06:56

mzient reviewed Oct 11, 2022

View reviewed changes

jantonguirao added 3 commits October 11, 2022 09:58

Fix all ones cases

78c8e58

Signed-off-by: Joaquin Anton <janton@nvidia.com>

Remove IsBroadacstingEnabled

0f50391

Signed-off-by: Joaquin Anton <janton@nvidia.com>

Remove returning groups of dims

3e34b80

Signed-off-by: Joaquin Anton <janton@nvidia.com>

mzient approved these changes Oct 11, 2022

View reviewed changes

Bugfix

9888219

Signed-off-by: Joaquin Anton <janton@nvidia.com>

mzient approved these changes Oct 11, 2022

View reviewed changes

klecki reviewed Oct 11, 2022

View reviewed changes

Code review fixes

c7e6781

Signed-off-by: Joaquin Anton <janton@nvidia.com>

mzient reviewed Oct 13, 2022

View reviewed changes

klecki approved these changes Oct 13, 2022

View reviewed changes

jantonguirao merged commit 97b818a into NVIDIA:main Oct 13, 2022

jantonguirao mentioned this pull request Oct 13, 2022

Support broadcasting in arithmetic operators (CPU & GPU) #4348

Merged

18 tasks

	int ndim = shapes[0]->size();
	int ndim = shapes[0]->sample_dim();

		if (static_cast<int>(shapes[i]->size()) > ndim)
		ndim = shapes[i]->size();

	dim -= ndim - s.size(); // add leading unit dims
	dim -= ndim - s.sample_dim(); // add leading unit dims

Update broadcasting shape simplification logic #4314

Update broadcasting shape simplification logic #4314

Conversation

jantonguirao commented Oct 6, 2022

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jantonguirao commented Oct 11, 2022

dali-automaton commented Oct 11, 2022

jantonguirao commented Oct 11, 2022

dali-automaton commented Oct 11, 2022

dali-automaton commented Oct 11, 2022

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Oct 12, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jantonguirao commented Oct 13, 2022

dali-automaton commented Oct 13, 2022

jantonguirao commented Oct 13, 2022

dali-automaton commented Oct 13, 2022

Choose a reason for hiding this comment

dali-automaton commented Oct 13, 2022

mzient Oct 12, 2022 •

edited