New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid a couple FP subtractions. #8574
Conversation
/rebuild |
This seems to produce much different results... |
If I am not wrong the situation is (we are working on for (unsigned int k = 0; k < this->dofs_per_cell; ++k)
for (unsigned int i = 0; i < quadrature.size(); ++i)
for (unsigned int j = 0; j < spacedim; ++j)
for (unsigned int l = 0; l < spacedim; ++l)
output_data.shape_hessians[k][i] -=
mapping_data.jacobian_pushed_forward_grads[i][j][l] *
output_data.shape_gradients[k][i][j][l]; which can be simplified to (as it currently on master): for (unsigned int k = 0; k < this->dofs_per_cell; ++k)
for (unsigned int i = 0; i < quadrature.size(); ++i)
for (unsigned int j = 0; j < spacedim; ++j)
output_data.shape_hessians[k][i] -=
mapping_data.jacobian_pushed_forward_grads[i][j] *
output_data.shape_gradients[k][i][j]; to even simplify the code, we would need to write: for (unsigned int k = 0; k < this->dofs_per_cell; ++k)
for (unsigned int i = 0; i < quadrature.size(); ++i)
output_data.shape_hessians[k][i] -=
scalar_product(mapping_data.jacobian_pushed_forward_grads[i],
output_data.shape_gradients[k][i]); |
@peterrum: No, that's not quite correct. We have the following data types:
So I was pretty sure that
is the same as
The latter is simply the product of a I've been staring at this for the last few minutes, but I really can't see where my mind has gone wrong. I think I either need help, therapy, or both. Help? |
so the other way around seems to work... |
Hm. Are you suggesting that the contraction of a rank-3 tensor and a rank-1 tensor uses the wrong index on the first object? |
Sorry for my comments above, in some other places the I think I know why @masterleinad suggestion works. What we would like to do is a contraction over index 0 for both tensors: contract(dst, mapping_data.jacobian_pushed_forward_grads[i], 0,
output_data.shape_gradients[k][i], 0) however if we do contract(dst, mapping_data.jacobian_pushed_forward_grads[i], 2,
output_data.shape_gradients[k][i], 0) If we switch the argument of contract(dst,output_data.shape_gradients[k][i] , 0,
mapping_data.jacobian_pushed_forward_grads[i], 0) |
@bangerth I have extended your test from PR #8599 on a new branch in my repo. There you can see that these implementations are equivalent: // v0
Tensor<2, dim> c;
for (unsigned int i = 0; i < dim; ++i)
c += a[i] * b[i];
// v2
Tensor<2, dim> c = b * a;
// v3
contract(c, a, 1, b); and these are equivalent: // v1
Tensor<2, dim> c = a * b;
// v4
contract(c, a, 3, b); This result seems to demonstrate that my statement about the contraction of |
By noting that the existing code performs dim subtractions of terms that are each a product of two values, we can reorder things in such a way that we first accumulate the products (which is a dot product) and then subtract the result. This should allow for some vectorization. The performance gain is almost certainly completely negligible, but it makes the code marginally easier to read. The reason why the indices involved here allow for this is because 'jacobian_pushed_forward_grads[i]' happens to be a Tensor<3,dim> and 'shape_gradients[k][i]' is a Tensor<1,dim>. So the types are so that their product is in fact equivalent to the summation of the last index as was written before.
Oh, doh. I really don't know what I was thinking... |
The current state seems to work on my system now... |
By noting that the existing code performs dim subtractions of
terms that are each a product of two values, we can reorder
things in such a way that we first accumulate the products
(which is a dot product) and then subtract the result. This
should allow for some vectorization.
The performance gain is almost certainly completely negligible,
but it makes the code marginally easier to read. The reason
why the indices involved here allow for this is because
'jacobian_pushed_forward_grads[i]' happens to be a
Tensor<3,dim> and 'shape_gradients[k][i]' is a
Tensor<1,dim>. So the types are so that their product
is in fact equivalent to the summation of the last index
as was written before.