Avoid a couple FP subtractions. #8574

bangerth · 2019-08-14T23:26:45Z

By noting that the existing code performs dim subtractions of
terms that are each a product of two values, we can reorder
things in such a way that we first accumulate the products
(which is a dot product) and then subtract the result. This
should allow for some vectorization.

The performance gain is almost certainly completely negligible,
but it makes the code marginally easier to read. The reason
why the indices involved here allow for this is because
'jacobian_pushed_forward_grads[i]' happens to be a
Tensor<3,dim> and 'shape_gradients[k][i]' is a
Tensor<1,dim>. So the types are so that their product
is in fact equivalent to the summation of the last index
as was written before.

masterleinad · 2019-08-15T03:04:11Z

/rebuild

masterleinad · 2019-08-16T03:05:30Z

This seems to produce much different results...

peterrum · 2019-08-16T03:42:15Z

If I am not wrong the situation is (we are working on Tensor<2,dim>):

      for (unsigned int k = 0; k < this->dofs_per_cell; ++k)
        for (unsigned int i = 0; i < quadrature.size(); ++i)
          for (unsigned int j = 0; j < spacedim; ++j)
            for (unsigned int l = 0; l < spacedim; ++l)
              output_data.shape_hessians[k][i] -=
                mapping_data.jacobian_pushed_forward_grads[i][j][l] *
                output_data.shape_gradients[k][i][j][l];

which can be simplified to (as it currently on master):

      for (unsigned int k = 0; k < this->dofs_per_cell; ++k)
        for (unsigned int i = 0; i < quadrature.size(); ++i)
          for (unsigned int j = 0; j < spacedim; ++j)
            output_data.shape_hessians[k][i] -=
              mapping_data.jacobian_pushed_forward_grads[i][j] *
              output_data.shape_gradients[k][i][j];

to even simplify the code, we would need to write:

      for (unsigned int k = 0; k < this->dofs_per_cell; ++k)
        for (unsigned int i = 0; i < quadrature.size(); ++i)
          output_data.shape_hessians[k][i] -=
            scalar_product(mapping_data.jacobian_pushed_forward_grads[i],
                                      output_data.shape_gradients[k][i]);

bangerth · 2019-08-16T20:39:07Z

@peterrum: No, that's not quite correct. We have the following data types:

mapping_data.jacobian_pushed_forward_grads is std::vector<Tensor<3, spacedim>>
output_data.shape_gradients is FiniteElementRelatedData::GradientVector which is Table<2, Tensor<1, spacedim>>

So I was pretty sure that

      for (unsigned int j=0; j<spacedim; ++j)
          output_data.shape_hessians[k][i] -=
            mapping_data.jacobian_pushed_forward_grads[i][j] *
            output_data.shape_gradients[k][i][j];

is the same as

          output_data.shape_hessians[k][i] -=
            mapping_data.jacobian_pushed_forward_grads[i] *
            output_data.shape_gradients[k][i];

The latter is simply the product of a Tensor<3,spacedim> with a Tensor<1,spacedim> that should include the contraction over the third index of the former and the only index of the latter; whereas the former code just writes this out.

I've been staring at this for the last few minutes, but I really can't see where my mind has gone wrong. I think I either need help, therapy, or both. Help?

masterleinad · 2019-08-16T22:12:42Z

output_data.shape_gradients[k][i] * mapping_data.jacobian_pushed_forward_grads[i];

so the other way around seems to work...

bangerth · 2019-08-16T22:41:38Z

Hm. Are you suggesting that the contraction of a rank-3 tensor and a rank-1 tensor uses the wrong index on the first object?

peterrum · 2019-08-16T22:50:55Z

Sorry for my comments above, in some other places the shape_gradient ist Tensor<2,dim>...

I think I know why @masterleinad suggestion works. What we would like to do is a contraction over index 0 for both tensors:

contract(dst, mapping_data.jacobian_pushed_forward_grads[i], 0, 
         output_data.shape_gradients[k][i], 0)

however if we do mapping_data.jacobian_pushed_forward_grads[i] * output_data.shape_gradients[k][i] (according to https://www.dealii.org/developer/doxygen/deal.II/classTensor.html#a218fce880c45d5642dbf96b38b3c7af0), we perform (last index of first, and first index of second tensor):

contract(dst, mapping_data.jacobian_pushed_forward_grads[i], 2, 
         output_data.shape_gradients[k][i], 0)

If we switch the argument of operator*, we get the wanted solution:

contract(dst,output_data.shape_gradients[k][i] , 0, 
         mapping_data.jacobian_pushed_forward_grads[i], 0)

peterrum · 2019-08-19T07:13:11Z

@bangerth I have extended your test from PR #8599 on a new branch in my repo.

There you can see that these implementations are equivalent:

// v0
Tensor<2, dim> c;
for (unsigned int i = 0; i < dim; ++i)
  c += a[i] * b[i];

// v2
Tensor<2, dim> c = b * a;

// v3
contract(c, a, 1, b);

and these are equivalent:

// v1
Tensor<2, dim> c = a * b;

// v4
contract(c, a, 3, b);

This result seems to demonstrate that my statement about the contraction of operator* is correct. (Note: for some reason the contraction indices start with 1...)

By noting that the existing code performs dim subtractions of terms that are each a product of two values, we can reorder things in such a way that we first accumulate the products (which is a dot product) and then subtract the result. This should allow for some vectorization. The performance gain is almost certainly completely negligible, but it makes the code marginally easier to read. The reason why the indices involved here allow for this is because 'jacobian_pushed_forward_grads[i]' happens to be a Tensor<3,dim> and 'shape_gradients[k][i]' is a Tensor<1,dim>. So the types are so that their product is in fact equivalent to the summation of the last index as was written before.

bangerth · 2019-08-19T18:00:27Z

Oh, doh. I really don't know what I was thinking...

bangerth · 2019-08-19T18:00:43Z

The current state seems to work on my system now...

masterleinad approved these changes Aug 15, 2019

View reviewed changes

masterleinad added ready to test Reviewed and ready to merge labels Aug 15, 2019

masterleinad removed the Reviewed and ready to merge label Aug 16, 2019

bangerth mentioned this pull request Aug 19, 2019

Add a test. #8599

Merged

bangerth force-pushed the contract branch from f250492 to b0edf30 Compare August 19, 2019 18:00

masterleinad added the Reviewed and ready to merge label Aug 19, 2019

masterleinad merged commit 10302af into dealii:master Aug 20, 2019

bangerth deleted the contract branch August 20, 2019 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid a couple FP subtractions. #8574

Avoid a couple FP subtractions. #8574

bangerth commented Aug 14, 2019

masterleinad commented Aug 15, 2019

masterleinad commented Aug 16, 2019

peterrum commented Aug 16, 2019

bangerth commented Aug 16, 2019

masterleinad commented Aug 16, 2019

bangerth commented Aug 16, 2019

peterrum commented Aug 16, 2019

peterrum commented Aug 19, 2019

bangerth commented Aug 19, 2019

bangerth commented Aug 19, 2019

Avoid a couple FP subtractions. #8574

Avoid a couple FP subtractions. #8574

Conversation

bangerth commented Aug 14, 2019

masterleinad commented Aug 15, 2019

masterleinad commented Aug 16, 2019

peterrum commented Aug 16, 2019

bangerth commented Aug 16, 2019

masterleinad commented Aug 16, 2019

bangerth commented Aug 16, 2019

peterrum commented Aug 16, 2019

peterrum commented Aug 19, 2019

bangerth commented Aug 19, 2019

bangerth commented Aug 19, 2019