Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid a couple FP subtractions. #8574

Merged
merged 1 commit into from Aug 20, 2019
Merged

Conversation

bangerth
Copy link
Member

By noting that the existing code performs dim subtractions of
terms that are each a product of two values, we can reorder
things in such a way that we first accumulate the products
(which is a dot product) and then subtract the result. This
should allow for some vectorization.

The performance gain is almost certainly completely negligible,
but it makes the code marginally easier to read. The reason
why the indices involved here allow for this is because
'jacobian_pushed_forward_grads[i]' happens to be a
Tensor<3,dim> and 'shape_gradients[k][i]' is a
Tensor<1,dim>. So the types are so that their product
is in fact equivalent to the summation of the last index
as was written before.

@masterleinad
Copy link
Member

/rebuild

@masterleinad
Copy link
Member

This seems to produce much different results...

@peterrum
Copy link
Member

If I am not wrong the situation is (we are working on Tensor<2,dim>):

      for (unsigned int k = 0; k < this->dofs_per_cell; ++k)
        for (unsigned int i = 0; i < quadrature.size(); ++i)
          for (unsigned int j = 0; j < spacedim; ++j)
            for (unsigned int l = 0; l < spacedim; ++l)
              output_data.shape_hessians[k][i] -=
                mapping_data.jacobian_pushed_forward_grads[i][j][l] *
                output_data.shape_gradients[k][i][j][l];

which can be simplified to (as it currently on master):

      for (unsigned int k = 0; k < this->dofs_per_cell; ++k)
        for (unsigned int i = 0; i < quadrature.size(); ++i)
          for (unsigned int j = 0; j < spacedim; ++j)
            output_data.shape_hessians[k][i] -=
              mapping_data.jacobian_pushed_forward_grads[i][j] *
              output_data.shape_gradients[k][i][j];

to even simplify the code, we would need to write:

      for (unsigned int k = 0; k < this->dofs_per_cell; ++k)
        for (unsigned int i = 0; i < quadrature.size(); ++i)
          output_data.shape_hessians[k][i] -=
            scalar_product(mapping_data.jacobian_pushed_forward_grads[i],
                                      output_data.shape_gradients[k][i]);

@bangerth
Copy link
Member Author

@peterrum: No, that's not quite correct. We have the following data types:

  • mapping_data.jacobian_pushed_forward_grads is std::vector<Tensor<3, spacedim>>
  • output_data.shape_gradients is FiniteElementRelatedData::GradientVector which is Table<2, Tensor<1, spacedim>>

So I was pretty sure that

      for (unsigned int j=0; j<spacedim; ++j)
          output_data.shape_hessians[k][i] -=
            mapping_data.jacobian_pushed_forward_grads[i][j] *
            output_data.shape_gradients[k][i][j];

is the same as

          output_data.shape_hessians[k][i] -=
            mapping_data.jacobian_pushed_forward_grads[i] *
            output_data.shape_gradients[k][i];

The latter is simply the product of a Tensor<3,spacedim> with a Tensor<1,spacedim> that should include the contraction over the third index of the former and the only index of the latter; whereas the former code just writes this out.

I've been staring at this for the last few minutes, but I really can't see where my mind has gone wrong. I think I either need help, therapy, or both. Help?

@masterleinad
Copy link
Member

output_data.shape_gradients[k][i] * mapping_data.jacobian_pushed_forward_grads[i];

so the other way around seems to work...

@bangerth
Copy link
Member Author

Hm. Are you suggesting that the contraction of a rank-3 tensor and a rank-1 tensor uses the wrong index on the first object?

@peterrum
Copy link
Member

Sorry for my comments above, in some other places the shape_gradient ist Tensor<2,dim>...

I think I know why @masterleinad suggestion works. What we would like to do is a contraction over index 0 for both tensors:

contract(dst, mapping_data.jacobian_pushed_forward_grads[i], 0, 
         output_data.shape_gradients[k][i], 0)

however if we do mapping_data.jacobian_pushed_forward_grads[i] * output_data.shape_gradients[k][i] (according to https://www.dealii.org/developer/doxygen/deal.II/classTensor.html#a218fce880c45d5642dbf96b38b3c7af0), we perform (last index of first, and first index of second tensor):

contract(dst, mapping_data.jacobian_pushed_forward_grads[i], 2, 
         output_data.shape_gradients[k][i], 0)

If we switch the argument of operator*, we get the wanted solution:

contract(dst,output_data.shape_gradients[k][i] , 0, 
         mapping_data.jacobian_pushed_forward_grads[i], 0)

@bangerth bangerth mentioned this pull request Aug 19, 2019
@peterrum
Copy link
Member

@bangerth I have extended your test from PR #8599 on a new branch in my repo.

There you can see that these implementations are equivalent:

// v0
Tensor<2, dim> c;
for (unsigned int i = 0; i < dim; ++i)
  c += a[i] * b[i];

// v2
Tensor<2, dim> c = b * a;

// v3
contract(c, a, 1, b);

and these are equivalent:

// v1
Tensor<2, dim> c = a * b;

// v4
contract(c, a, 3, b);

This result seems to demonstrate that my statement about the contraction of operator* is correct. (Note: for some reason the contraction indices start with 1...)

By noting that the existing code performs dim subtractions of
terms that are each a product of two values, we can reorder
things in such a way that we first accumulate the products
(which is a dot product) and then subtract the result. This
should allow for some vectorization.

The performance gain is almost certainly completely negligible,
but it makes the code marginally easier to read. The reason
why the indices involved here allow for this is because
'jacobian_pushed_forward_grads[i]' happens to be a
Tensor<3,dim> and 'shape_gradients[k][i]' is a
Tensor<1,dim>. So the types are so that their product
is in fact equivalent to the summation of the last index
as was written before.
@bangerth
Copy link
Member Author

Oh, doh. I really don't know what I was thinking...

@bangerth
Copy link
Member Author

The current state seems to work on my system now...

@masterleinad masterleinad merged commit 10302af into dealii:master Aug 20, 2019
@bangerth bangerth deleted the contract branch August 20, 2019 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants