Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update broadcasting shape simplification logic #4314

Merged
merged 7 commits into from
Oct 13, 2022

Conversation

jantonguirao
Copy link
Contributor

Signed-off-by: Joaquin Anton janton@nvidia.com

Category:

New feature / Refactoring

Description:

  • It extends the shape simplification utils for broadcasting, to cover few cases that were incorrect
  • For example, collapsing adjacent ones
    {2, 3, 4} and {1, 1, 4} can be simplified to {6, 4} and {1, 4} before execution
  • Added more test-cases to cover some interesting examples
  • Extended the utilities to work with ternary operators (actually, the utils now support any number of arguments)
  • Decoupled the selection of collapsible groups from the rest. This way we can calculate the groups to be collapsed on the arguments and apply the same transformation to the output shape (usage like this is part of the next PR).
  • Note: To see this utils in context, see the working prototype of arithmetic operator broadcasting: Support shape broadcasting in arithmetic operators (CPU & GPU) #4282

Additional information:

Affected modules and functionalities:

broadcasting.* utils

Key points relevant for the review:

Can we make the algorithm in SimplifiedShapeCollapseGroups any simpler?

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A
    ArithmeticOpsBroadcastingTest.SimplifyShapesForBroadcasting

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other (comments in the code)
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-3063

@jantonguirao jantonguirao marked this pull request as ready for review October 6, 2022 11:18
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Comment on lines 346 to 351
for (int k = 0; k < shapes.size(); k++) {
auto &sh = *shapes[k];
if ((sh[dim] != 1 && sh[dim + 1] == 1 && !all_ones(dim + 1)) ||
(sh[dim] == 1 && !all_ones(dim) && sh[dim + 1] != 1)) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues:
all_ones(dim + 1) and all_ones(dim) are not only in the inner loop, but they refer to different dimensions - so there's a big chance they'll be calculated multiple times (2*shapes.size()). I'd suggest to replace the function with an array, where the values would be stored once and then accessed as needed.

// 2. All extents on that dimension and the next are the same or one,
// and the ones are either present or not present in both dimensions
auto can_merge_next = [=](int dim) {
if (all_same(dim) && all_same(dim + 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise - all_same could be arrays.

@@ -274,33 +274,165 @@ void ExpandToNDims(TensorShape<> &sh, int ndim) {
sh = shape_cat(TensorShape<>(std::vector<int64_t>(ndim - sh.sample_dim(), 1)), sh);
}

void SimplifyShapesForBroadcasting(TensorShape<>& lhs, TensorShape<> &rhs) {

SmallVector<std::pair<int, int>, 5> SimplifiedShapeCollapseGroups(span<TensorShape<>*> shapes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this function could be split into one that extracts the groups and one that collapses the shapes (we actually have the utilities for the latter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this function was just meant to extract the groups (didn't modify the shapes).

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question and some suggestion for can_merge_ones.


lhs = collapse_dims(lhs, group_dims);
rhs = collapse_dims(rhs, group_dims);
bool IsBroadcastingEnabled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more general question, how much do we want to control this via the env or maybe introduce something like:
nvidia.dali.experimental.enable_math_broadcasting()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely we will remove this as soon as we have full support (including ternary operators). This is just a dev switch for the time being, before we enable the feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should even keep this at all when the feature is good enough to be merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment on lines 333 to 344
// Can collapse dimensions with ones
// when there is no transition from or to an 'odd' one
// (meaning that the rest of the extents are not all one
// in the two dimensions)
// Examples:
// 1. Can collapse even if we have transition from 2 to 1,
// because there are only ones on the second dimension
// {2 1} -> {2}
// {1 1} -> {1}
// 2. Can NOT collapse because there is a transition to 1
// {2 1} -> {2 1}
// {2 2} -> {2 2}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be clearer to me if the comment said, that we can collapse two dimensions that are broadcasted, if the broadcasting happens only within one of the shapes, that is case one ({A, B} op {1, 1}) but not case two ({A, 1}, {1, B}).

Moreover, sh[dim + 1] == 1 && !all_ones(dim + 1) could be considered is_broadcast(sh, dim + 1) or something similar.

We can't collapse if for some shape dim and dim+1 are having two different results for is_broadcast, am I right?

As a side note, we can always collapse a dimension that is all_ones, right? Those check let this happen, right?

if (compatible_vol(volumes, i)) {
int j = i + 1;
for (; j < full_ndim;) {
if (compatible_vol(volumes, j) && can_merge_next(j - 1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both compatible_vol and can_merge_next? Isn't can_merge_next enough?

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Comment on lines 354 to 357
if (modify_shapes) {
for (int i = 0; i < n; i++) {
*shapes[i] = outs[i];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be outside the if (d < ndim) { - otherwise the shapes remain unsimplified in a totally degenerate case.

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6146986]: BUILD STARTED

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6148343]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6148343]: BUILD FAILED

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add some comments, like a mention that we skip if all shapes have 1 in that dimension.

It also looks like it can collapse non-compatible dims, for example {10, 2} op {5, 4} to 20 op 20.

SmallVector<TensorShape<>, 3> outs;
outs.resize(n);

int ndim = shapes[0]->size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the sample_dim() which is easier to differentiate from size() and has the semantic meaning, I find it highly confusing how we implemented size() in TensorShape and TensorListShape.

Suggested change
int ndim = shapes[0]->size();
int ndim = shapes[0]->sample_dim();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it highly confusing that we have sample_dim in TensorShape.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still it's dimension of this sample, rather than some arbitrary size. And when you get to TensorShape and TensorListShape residing side by side in the code the usage of size() is highly confusing.

Comment on lines 284 to 285
if (static_cast<int>(shapes[i]->size()) > ndim)
ndim = shapes[i]->size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (static_cast<int>(shapes[i]->size()) > ndim)
ndim = shapes[i]->size();
if (static_cast<int>(shapes[i]->sample_dim()) > ndim)
ndim = shapes[i]->sample_dim();


auto get = [&](int shape, int dim) -> int64_t {
auto &s = *shapes[shape];
dim -= ndim - s.size(); // add leading unit dims
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dim -= ndim - s.size(); // add leading unit dims
dim -= ndim - s.sample_dim(); // add leading unit dims

};

auto should_skip = [&](int d) {
for (int i = 0; i < n; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I'm the greatest fan of the n as something that is captured by reference by a lot of lambdas. At this point I was like: wait, where did this local variable come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you suggest we do instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_operands for example?


int group_start = 0;

auto can_collapse = [&](int d) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is any kind of equality check on the matching dimensions? Or do we drop those, and assume we already checked it?

Copy link
Contributor

@mzient mzient Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we assume just that - but it can be easily added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to simplify shapes only after we've checked that they are compatible.

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

It would be nice to add some comments, like a mention that we skip if all shapes have 1 in that dimension.

It also looks like it can collapse non-compatible dims, for example {10, 2} op {5, 4} to 20 op 20.

Shape compatibility is checked prior to the simplification step (in PropagateShapes, arithmetic.h)

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6148343]: BUILD PASSED

@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6169540]: BUILD STARTED

SmallVector<TensorShape<>, 3> outs;
outs.resize(n);

int ndim = shapes[0]->sample_dim();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6169540]: BUILD PASSED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants