New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize mapping computation in MappingQ #15035
Optimize mapping computation in MappingQ #15035
Conversation
/rebuild |
I have to fix the test failures before merging. |
Apart from the failing tests, I think this looks great. Can you please adjust the following Lines 172 to 174 in 0ed9f79
This is the main problem observed in #9867 - if we can avoid filling all these arrays, we gain a lot. We should only use that code for the higher derivatives, and even that part can be removed once we write some more higher order derivative evaluators. I think the code should simply skip the I consider the ability to skip these arrays a big progress, thanks for the initiative @bergbauer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me shortly summarize the PR. The content of MappingQ::fill_mapping_data_for_generic_points()
is moved to maybe_update_q_points_Jacobians_generic()
. The new function is supposed to be used in the other fill_*
functions (non-tensorproduct path). Is this right?
auto quadrature_dim = QProjector<dim>::project_to_face( | ||
ReferenceCells::get_hypercube<dim>(), quadrature, face_no); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new!? Where is subface_no
used!?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think this is right. You are not using anywhere data_set
(and data
). Latter contains the actual support points. See
Lines 913 to 916 in 6288d4d
data.initialize_face(this->requires_update_flags(update_flags), | |
QProjector<dim>::project_to_all_faces( | |
ReferenceCells::get_hypercube<dim>(), quadrature[0]), | |
quadrature[0].size()); |
Line 219 in 6288d4d
initialize(update_flags, q, n_original_q_points); |
OK. It does not:
Line 84 in 6288d4d
MappingQ<dim, spacedim>::InternalData::initialize( |
dealii/include/deal.II/fe/mapping_q.h
Lines 394 to 554 in 6288d4d
/** | |
* Values of shape functions. Access by function @p shape. | |
* | |
* Computed once. | |
*/ | |
AlignedVector<double> shape_values; | |
/** | |
* Values of shape function derivatives. Access by function @p derivative. | |
* | |
* Computed once. | |
*/ | |
AlignedVector<Tensor<1, dim>> shape_derivatives; | |
/** | |
* Values of shape function second derivatives. Access by function @p | |
* second_derivative. | |
* | |
* Computed once. | |
*/ | |
AlignedVector<Tensor<2, dim>> shape_second_derivatives; | |
/** | |
* Values of shape function third derivatives. Access by function @p | |
* second_derivative. | |
* | |
* Computed once. | |
*/ | |
AlignedVector<Tensor<3, dim>> shape_third_derivatives; | |
/** | |
* Values of shape function fourth derivatives. Access by function @p | |
* second_derivative. | |
* | |
* Computed once. | |
*/ | |
AlignedVector<Tensor<4, dim>> shape_fourth_derivatives; | |
/** | |
* Unit tangential vectors. Used for the computation of boundary forms and | |
* normal vectors. | |
* | |
* This array has `(dim-1) * GeometryInfo::faces_per_cell` entries. The | |
* first GeometryInfo::faces_per_cell contain the vectors in the first | |
* tangential direction for each face; the second set of | |
* GeometryInfo::faces_per_cell entries contain the vectors in the second | |
* tangential direction (only in 3d, since there we have 2 tangential | |
* directions per face), etc. | |
* | |
* Filled once. | |
*/ | |
std::array<std::vector<Tensor<1, dim>>, | |
GeometryInfo<dim>::faces_per_cell *(dim - 1)> | |
unit_tangentials; | |
/** | |
* The polynomial degree of the mapping. Since the objects here are also | |
* used (with minor adjustments) by MappingQ, we need to store this. | |
*/ | |
const unsigned int polynomial_degree; | |
/** | |
* Number of shape functions. If this is a Q1 mapping, then it is simply | |
* the number of vertices per cell. However, since also derived classes | |
* use this class (e.g. the Mapping_Q() class), the number of shape | |
* functions may also be different. | |
* | |
* In general, it is $(p+1)^\text{dim}$, where $p$ is the polynomial | |
* degree of the mapping. | |
*/ | |
const unsigned int n_shape_functions; | |
/* | |
* The default line support points. Is used in when the shape function | |
* values are computed. | |
* | |
* The number of quadrature points depends on the degree of this | |
* class, and it matches the number of degrees of freedom of an | |
* FE_Q<1>(this->degree). | |
*/ | |
QGaussLobatto<1> line_support_points; | |
/** | |
* For the fast tensor-product path of the MappingQ class, we choose SIMD | |
* vectors that are as wide as possible to minimize the number of | |
* arithmetic operations. However, we do not want to choose it wider than | |
* necessary, e.g., we avoid something like 8-wide AVX-512 when we only | |
* compute 3 components of a 3d computation. This is because the | |
* additional lanes would not do useful work, but a few operations on very | |
* wide vectors can already lead to a lower clock frequency of processors | |
* over long time spans (thousands of clock cycles). Hence, we choose | |
* 2-wide SIMD for 1D and 2d and 4-wide SIMD for 3d. Note that we do not | |
* immediately fall back to no SIMD for 1d because all architectures that | |
* support SIMD also support 128-bit vectors (and none is reported to | |
* reduce clock frequency for 128-bit SIMD). | |
*/ | |
using VectorizedArrayType = | |
VectorizedArray<double, | |
std::min<std::size_t>(VectorizedArray<double>::size(), | |
(dim <= 2 ? 2 : 4))>; | |
/** | |
* In case the quadrature rule given represents a tensor product | |
* we need to store the evaluations of the 1d polynomials at | |
* the 1d quadrature points. That is what this variable is for. | |
*/ | |
internal::MatrixFreeFunctions::ShapeInfo<VectorizedArrayType> shape_info; | |
/** | |
* In case the quadrature rule given represents a tensor product | |
* we need to store temporary data in this object. | |
*/ | |
mutable AlignedVector<VectorizedArrayType> scratch; | |
/** | |
* Indicates whether the given Quadrature object is a tensor product. | |
*/ | |
bool tensor_product_quadrature; | |
/** | |
* Tensors of covariant transformation at each of the quadrature points. | |
* The matrix stored is the Jacobian * G^{-1}, where G = Jacobian^{t} * | |
* Jacobian, is the first fundamental form of the map; if dim=spacedim | |
* then it reduces to the transpose of the inverse of the Jacobian matrix, | |
* which itself is stored in the @p contravariant field of this structure. | |
* | |
* Computed on each cell. | |
*/ | |
mutable AlignedVector<DerivativeForm<1, dim, spacedim>> covariant; | |
/** | |
* Tensors of contravariant transformation at each of the quadrature | |
* points. The contravariant matrix is the Jacobian of the transformation, | |
* i.e. $J_{ij}=dx_i/d\hat x_j$. | |
* | |
* Computed on each cell. | |
*/ | |
mutable AlignedVector<DerivativeForm<1, dim, spacedim>> contravariant; | |
/** | |
* Auxiliary vectors for internal use. | |
*/ | |
mutable std::vector<AlignedVector<Tensor<1, spacedim>>> aux; | |
/** | |
* Stores the support points of the mapping shape functions on the @p | |
* cell_of_current_support_points. | |
*/ | |
mutable std::vector<Point<spacedim>> mapping_support_points; | |
/** | |
* Stores the cell of which the @p mapping_support_points are stored. | |
*/ | |
mutable typename Triangulation<dim, spacedim>::cell_iterator | |
cell_of_current_support_points; | |
/** | |
* The determinant of the Jacobian in each quadrature point. Filled if | |
* #update_volume_elements. | |
*/ | |
mutable AlignedVector<double> volume_elements; |
It only stores the shape functions evaluated (used here
dealii/include/deal.II/fe/mapping_q_internal.h
Lines 1304 to 1314 in 6288d4d
if (update_flags & update_quadrature_points) | |
for (unsigned int point = 0; point < quadrature_points.size(); ++point) | |
{ | |
const double * shape = &data.shape(point + data_set, 0); | |
Point<spacedim> result = | |
(shape[0] * data.mapping_support_points[0]); | |
for (unsigned int k = 1; k < data.n_shape_functions; ++k) | |
for (unsigned int i = 0; i < spacedim; ++i) | |
result[i] += shape[k] * data.mapping_support_points[k][i]; | |
quadrature_points[point] = result; | |
} |
Note: this reminds me of horrible times when simplex/wedges/pyramids had to be made working for mapping. That the face quadratures are stored in a single vector and you need to use the offsets took me some time....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this comment, I think we should also remove the variables
dealii/include/deal.II/fe/mapping_q.h
Lines 394 to 414 in 6288d4d
/** | |
* Values of shape functions. Access by function @p shape. | |
* | |
* Computed once. | |
*/ | |
AlignedVector<double> shape_values; | |
/** | |
* Values of shape function derivatives. Access by function @p derivative. | |
* | |
* Computed once. | |
*/ | |
AlignedVector<Tensor<1, dim>> shape_derivatives; | |
/** | |
* Values of shape function second derivatives. Access by function @p | |
* second_derivative. | |
* | |
* Computed once. | |
*/ | |
AlignedVector<Tensor<2, dim>> shape_second_derivatives; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kronbichler Sounds good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using the offset now to select the correct quadrature points. Unfortunately, we cannot remove shape_values
and shape_derivatives
yet because the codim1 case uses this information. This is something for a follow-up.
This is unfortunately still not correct for the subfaces, but I will look into this probably tomorrow. |
QProjector<dim>::project_to_oriented_face( | ||
ReferenceCells::get_hypercube<dim>(), | ||
quadrature, | ||
face_no, | ||
cell->face_orientation(face_no), | ||
cell->face_flip(face_no), | ||
cell->face_rotation(face_no)) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In agreement with the comment by @peterrum, I think we should not recompute the points here, but rather use the ones that get tabulated in the constructor of FEFaceValues
via project_to_all_faces
and project_to_all_subfaces
. This is the information we should "tabulate" instead of the evaluated shape functions. We might already want to do this in vectorized form to reduce the online cost, but that might be an improvement done later, because it is orthogonal to the present PR. Then we can skip the new code in QProjector
and instead use the data_set
variable as before - I believe it should then automatically work for all subfaces on adaptive meshes. Does that make sense to you @bergbauer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense if you have the same (non-tensor-product) quadrature on all faces of a cell which is not the case in my use case (where I have a different quadrature on every face and I (currently) call get_face_data()
and fill_fe_face_values()
for every face, see
dealii/include/deal.II/non_matching/mapping_info.h
Lines 896 to 904 in 24f2482
auto internal_mapping_data = | |
mapping->get_face_data(update_flags_mapping, | |
hp::QCollection<dim - 1>(quadrature)); | |
mapping->fill_fe_face_values(cell, | |
face_no, | |
hp::QCollection<dim - 1>(quadrature), | |
*internal_mapping_data, | |
mapping_data); |
FEFaceValues
tabulating the quadrature points makes sense so we should consider both options in this PR. Thanks for the hint!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to do the projection within MappingInfo
. Exploiting the fact that we have to do an interpolation within the face can be tackled later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now use the tabulated quadrature information for the faces like you suggested. I would suggest to keep the flexible functions in QProjector
which do the face orientation on the fly to use it like proposed in 72c6f6c .
Absolutely! @peterrum Quite easy for cells and nasty for the (sub-)faces. |
I have also applied changes to use the optimized functions to |
@bergbauer There are two failing tests:
|
source/fe/mapping_q.cc
Outdated
MappingQ<dim, spacedim>::fill_mapping_data_for_face_quadrature( | ||
const typename Triangulation<dim, spacedim>::cell_iterator &cell, | ||
const unsigned int face_no, | ||
const Quadrature<dim - 1> & face_quadrature, | ||
const UpdateFlags update_flags, | ||
internal::FEValuesImplementation::MappingRelatedData<dim, spacedim> | ||
&output_data) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you don't create a InternalData
within MappingInfo and fill this with the points? Are there particular reasons against this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that but the call to compute_mapping_support_points()
and internal::MappingQImplementation::do_fill_fe_face_values()
(currently) need to stay in MappingQ
because they are protected or need protected arguments. (I can make things public if we choose to do so)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that but the call to compute_mapping_support_points() and internal::MappingQImplementation::do_fill_fe_face_values() (currently) need to stay in MappingQ because they are protected or need protected arguments. (I can make things public if we choose to do so)
I don't understand what you mean...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InternalData
is now filled inside MappingInfo
and only the function call to the internal MappingQ function is realized in fill_mapping_data_for_face_quadrature()
. This function is protected similarly to the fill_fe_face_values()
function. fill_mapping_data_for_face_quadrature()
does the function call with the protected arguments from MappingQ and calls the internal function do_fill_fe_face_values()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I mostly have some minor comments. As a bigger question, I'm wondering about the code with the subfaces and the memory allocations for embedding the quadrature formula: Do we envision that this will be needed in the long run? I'm thinking of the possibility to use the tensor product to embed all faces/subfaces/orientations first (using existing evaluation capabilities of evaluation_kernels.h
), and only then use the unstructured evaluation of polynomials? I guess this is mostly orthogonal to this PR, but it seems that it might be worthwhile to think about what is less work here.
// compute shapes and derivatives for codim1 (for | ||
// do_transform_real_to_unit_cell_internal_codim1) | ||
if (dim == (spacedim - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess at some point we might want to replace those functions, but that is for another PR.
a101c96
to
6eb5ed1
Compare
2bd95b3
to
0d7fa7c
Compare
|
const Quadrature<2> mutation = internal::QProjector::mutate_quadrature( | ||
quadrature, face_orientation, face_flip, face_rotation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an assert that this only working for hypercubes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
const Quadrature<2> mutation = internal::QProjector::mutate_quadrature( | ||
quadrature, face_orientation, face_flip, face_rotation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
0d7fa7c
to
fec2c56
Compare
It got quite late yesterday evening 😄 It is public now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, apart from a few small pages.
fec2c56
to
e45d077
Compare
e45d077
to
12038f7
Compare
This PR proposes to use the vectorized and optimized functionality from
tensor_product_kernels.h
viainternal::evaluate_tensor_product_value_and_gradient()
to compute quadrature points and Jacobians forMappingQ
.Depends on #15102