Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize mapping computation in MappingQ #15035

Merged
merged 1 commit into from Apr 18, 2023

Conversation

bergbauer
Copy link
Contributor

@bergbauer bergbauer commented Apr 6, 2023

This PR proposes to use the vectorized and optimized functionality from tensor_product_kernels.h via internal::evaluate_tensor_product_value_and_gradient() to compute quadrature points and Jacobians for MappingQ.

Depends on #15102

@kronbichler
Copy link
Member

/rebuild

@bergbauer
Copy link
Contributor Author

I have to fix the test failures before merging.

@kronbichler
Copy link
Member

Apart from the failing tests, I think this looks great. Can you please adjust the following if statement:

// Only fill the big arrays on demand in case we cannot use the tensor
// product quadrature code path
if (dim == 1 || !tensor_product_quadrature || needs_higher_order_terms)

This is the main problem observed in #9867 - if we can avoid filling all these arrays, we gain a lot. We should only use that code for the higher derivatives, and even that part can be removed once we write some more higher order derivative evaluators. I think the code should simply skip the !tensor_product_quadrature case here. The test suite should then inform us if there is anything missing.

I consider the ability to skip these arrays a big progress, thanks for the initiative @bergbauer.

Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me shortly summarize the PR. The content of MappingQ::fill_mapping_data_for_generic_points() is moved to maybe_update_q_points_Jacobians_generic(). The new function is supposed to be used in the other fill_* functions (non-tensorproduct path). Is this right?

include/deal.II/fe/mapping_q_internal.h Show resolved Hide resolved
include/deal.II/fe/mapping_q_internal.h Show resolved Hide resolved
Comment on lines 2069 to 2070
auto quadrature_dim = QProjector<dim>::project_to_face(
ReferenceCells::get_hypercube<dim>(), quadrature, face_no);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new!? Where is subface_no used!?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think this is right. You are not using anywhere data_set (and data). Latter contains the actual support points. See

data.initialize_face(this->requires_update_flags(update_flags),
QProjector<dim>::project_to_all_faces(
ReferenceCells::get_hypercube<dim>(), quadrature[0]),
quadrature[0].size());

initialize(update_flags, q, n_original_q_points);

OK. It does not:

MappingQ<dim, spacedim>::InternalData::initialize(

/**
* Values of shape functions. Access by function @p shape.
*
* Computed once.
*/
AlignedVector<double> shape_values;
/**
* Values of shape function derivatives. Access by function @p derivative.
*
* Computed once.
*/
AlignedVector<Tensor<1, dim>> shape_derivatives;
/**
* Values of shape function second derivatives. Access by function @p
* second_derivative.
*
* Computed once.
*/
AlignedVector<Tensor<2, dim>> shape_second_derivatives;
/**
* Values of shape function third derivatives. Access by function @p
* second_derivative.
*
* Computed once.
*/
AlignedVector<Tensor<3, dim>> shape_third_derivatives;
/**
* Values of shape function fourth derivatives. Access by function @p
* second_derivative.
*
* Computed once.
*/
AlignedVector<Tensor<4, dim>> shape_fourth_derivatives;
/**
* Unit tangential vectors. Used for the computation of boundary forms and
* normal vectors.
*
* This array has `(dim-1) * GeometryInfo::faces_per_cell` entries. The
* first GeometryInfo::faces_per_cell contain the vectors in the first
* tangential direction for each face; the second set of
* GeometryInfo::faces_per_cell entries contain the vectors in the second
* tangential direction (only in 3d, since there we have 2 tangential
* directions per face), etc.
*
* Filled once.
*/
std::array<std::vector<Tensor<1, dim>>,
GeometryInfo<dim>::faces_per_cell *(dim - 1)>
unit_tangentials;
/**
* The polynomial degree of the mapping. Since the objects here are also
* used (with minor adjustments) by MappingQ, we need to store this.
*/
const unsigned int polynomial_degree;
/**
* Number of shape functions. If this is a Q1 mapping, then it is simply
* the number of vertices per cell. However, since also derived classes
* use this class (e.g. the Mapping_Q() class), the number of shape
* functions may also be different.
*
* In general, it is $(p+1)^\text{dim}$, where $p$ is the polynomial
* degree of the mapping.
*/
const unsigned int n_shape_functions;
/*
* The default line support points. Is used in when the shape function
* values are computed.
*
* The number of quadrature points depends on the degree of this
* class, and it matches the number of degrees of freedom of an
* FE_Q<1>(this->degree).
*/
QGaussLobatto<1> line_support_points;
/**
* For the fast tensor-product path of the MappingQ class, we choose SIMD
* vectors that are as wide as possible to minimize the number of
* arithmetic operations. However, we do not want to choose it wider than
* necessary, e.g., we avoid something like 8-wide AVX-512 when we only
* compute 3 components of a 3d computation. This is because the
* additional lanes would not do useful work, but a few operations on very
* wide vectors can already lead to a lower clock frequency of processors
* over long time spans (thousands of clock cycles). Hence, we choose
* 2-wide SIMD for 1D and 2d and 4-wide SIMD for 3d. Note that we do not
* immediately fall back to no SIMD for 1d because all architectures that
* support SIMD also support 128-bit vectors (and none is reported to
* reduce clock frequency for 128-bit SIMD).
*/
using VectorizedArrayType =
VectorizedArray<double,
std::min<std::size_t>(VectorizedArray<double>::size(),
(dim <= 2 ? 2 : 4))>;
/**
* In case the quadrature rule given represents a tensor product
* we need to store the evaluations of the 1d polynomials at
* the 1d quadrature points. That is what this variable is for.
*/
internal::MatrixFreeFunctions::ShapeInfo<VectorizedArrayType> shape_info;
/**
* In case the quadrature rule given represents a tensor product
* we need to store temporary data in this object.
*/
mutable AlignedVector<VectorizedArrayType> scratch;
/**
* Indicates whether the given Quadrature object is a tensor product.
*/
bool tensor_product_quadrature;
/**
* Tensors of covariant transformation at each of the quadrature points.
* The matrix stored is the Jacobian * G^{-1}, where G = Jacobian^{t} *
* Jacobian, is the first fundamental form of the map; if dim=spacedim
* then it reduces to the transpose of the inverse of the Jacobian matrix,
* which itself is stored in the @p contravariant field of this structure.
*
* Computed on each cell.
*/
mutable AlignedVector<DerivativeForm<1, dim, spacedim>> covariant;
/**
* Tensors of contravariant transformation at each of the quadrature
* points. The contravariant matrix is the Jacobian of the transformation,
* i.e. $J_{ij}=dx_i/d\hat x_j$.
*
* Computed on each cell.
*/
mutable AlignedVector<DerivativeForm<1, dim, spacedim>> contravariant;
/**
* Auxiliary vectors for internal use.
*/
mutable std::vector<AlignedVector<Tensor<1, spacedim>>> aux;
/**
* Stores the support points of the mapping shape functions on the @p
* cell_of_current_support_points.
*/
mutable std::vector<Point<spacedim>> mapping_support_points;
/**
* Stores the cell of which the @p mapping_support_points are stored.
*/
mutable typename Triangulation<dim, spacedim>::cell_iterator
cell_of_current_support_points;
/**
* The determinant of the Jacobian in each quadrature point. Filled if
* #update_volume_elements.
*/
mutable AlignedVector<double> volume_elements;

It only stores the shape functions evaluated (used here

if (update_flags & update_quadrature_points)
for (unsigned int point = 0; point < quadrature_points.size(); ++point)
{
const double * shape = &data.shape(point + data_set, 0);
Point<spacedim> result =
(shape[0] * data.mapping_support_points[0]);
for (unsigned int k = 1; k < data.n_shape_functions; ++k)
for (unsigned int i = 0; i < spacedim; ++i)
result[i] += shape[k] * data.mapping_support_points[k][i];
quadrature_points[point] = result;
}
; but you need the offset!!!). But adding the quadrature points should not be a problem.

Note: this reminds me of horrible times when simplex/wedges/pyramids had to be made working for mapping. That the face quadratures are stored in a single vector and you need to use the offsets took me some time....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this comment, I think we should also remove the variables

/**
* Values of shape functions. Access by function @p shape.
*
* Computed once.
*/
AlignedVector<double> shape_values;
/**
* Values of shape function derivatives. Access by function @p derivative.
*
* Computed once.
*/
AlignedVector<Tensor<1, dim>> shape_derivatives;
/**
* Values of shape function second derivatives. Access by function @p
* second_derivative.
*
* Computed once.
*/
AlignedVector<Tensor<2, dim>> shape_second_derivatives;
to ensure that we really do not use the values, first or second derivatives in a tabulated form any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kronbichler Sounds good 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using the offset now to select the correct quadrature points. Unfortunately, we cannot remove shape_values and shape_derivatives yet because the codim1 case uses this information. This is something for a follow-up.

@bergbauer
Copy link
Contributor Author

This is unfortunately still not correct for the subfaces, but I will look into this probably tomorrow.

Comment on lines 2071 to 2077
QProjector<dim>::project_to_oriented_face(
ReferenceCells::get_hypercube<dim>(),
quadrature,
face_no,
cell->face_orientation(face_no),
cell->face_flip(face_no),
cell->face_rotation(face_no)) :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In agreement with the comment by @peterrum, I think we should not recompute the points here, but rather use the ones that get tabulated in the constructor of FEFaceValues via project_to_all_faces and project_to_all_subfaces. This is the information we should "tabulate" instead of the evaluated shape functions. We might already want to do this in vectorized form to reduce the online cost, but that might be an improvement done later, because it is orthogonal to the present PR. Then we can skip the new code in QProjector and instead use the data_set variable as before - I believe it should then automatically work for all subfaces on adaptive meshes. Does that make sense to you @bergbauer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense if you have the same (non-tensor-product) quadrature on all faces of a cell which is not the case in my use case (where I have a different quadrature on every face and I (currently) call get_face_data() and fill_fe_face_values() for every face, see

auto internal_mapping_data =
mapping->get_face_data(update_flags_mapping,
hp::QCollection<dim - 1>(quadrature));
mapping->fill_fe_face_values(cell,
face_no,
hp::QCollection<dim - 1>(quadrature),
*internal_mapping_data,
mapping_data);
). For FEFaceValues tabulating the quadrature points makes sense so we should consider both options in this PR. Thanks for the hint!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to do the projection within MappingInfo. Exploiting the fact that we have to do an interpolation within the face can be tackled later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now use the tabulated quadrature information for the faces like you suggested. I would suggest to keep the flexible functions in QProjector which do the face orientation on the fly to use it like proposed in 72c6f6c .

@bergbauer
Copy link
Contributor Author

Let me shortly summarize the PR. The content of MappingQ::fill_mapping_data_for_generic_points() is moved to maybe_update_q_points_Jacobians_generic(). The new function is supposed to be used in the other fill_* functions (non-tensorproduct path). Is this right?

Absolutely! @peterrum Quite easy for cells and nasty for the (sub-)faces.

@bergbauer
Copy link
Contributor Author

I have also applied changes to use the optimized functions to NonMatching::MappingInfo (that you see where my motivation comes from). This is ready for review now. @peterrum @kronbichler

@peterrum
Copy link
Member

@bergbauer There are two failing tests:

projectroot.tests.numerics.generalized_interpolation_03debug.numerics/generalized_interpolation_03.debug
projectroot.tests.numerics.generalized_interpolation_02debug.numerics/generalized_interpolation_02.debug

Comment on lines 1416 to 1422
MappingQ<dim, spacedim>::fill_mapping_data_for_face_quadrature(
const typename Triangulation<dim, spacedim>::cell_iterator &cell,
const unsigned int face_no,
const Quadrature<dim - 1> & face_quadrature,
const UpdateFlags update_flags,
internal::FEValuesImplementation::MappingRelatedData<dim, spacedim>
&output_data) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you don't create a InternalData within MappingInfo and fill this with the points? Are there particular reasons against this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that but the call to compute_mapping_support_points() and internal::MappingQImplementation::do_fill_fe_face_values() (currently) need to stay in MappingQ because they are protected or need protected arguments. (I can make things public if we choose to do so)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that but the call to compute_mapping_support_points() and internal::MappingQImplementation::do_fill_fe_face_values() (currently) need to stay in MappingQ because they are protected or need protected arguments. (I can make things public if we choose to do so)

I don't understand what you mean...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InternalData is now filled inside MappingInfo and only the function call to the internal MappingQ function is realized in fill_mapping_data_for_face_quadrature(). This function is protected similarly to the fill_fe_face_values() function. fill_mapping_data_for_face_quadrature() does the function call with the protected arguments from MappingQ and calls the internal function do_fill_fe_face_values().

Copy link
Member

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I mostly have some minor comments. As a bigger question, I'm wondering about the code with the subfaces and the memory allocations for embedding the quadrature formula: Do we envision that this will be needed in the long run? I'm thinking of the possibility to use the tensor product to embed all faces/subfaces/orientations first (using existing evaluation capabilities of evaluation_kernels.h), and only then use the unstructured evaluation of polynomials? I guess this is mostly orthogonal to this PR, but it seems that it might be worthwhile to think about what is less work here.

include/deal.II/fe/mapping_q.h Outdated Show resolved Hide resolved
include/deal.II/fe/mapping_q_internal.h Show resolved Hide resolved
include/deal.II/non_matching/mapping_info.h Outdated Show resolved Hide resolved
Comment on lines +180 to +182
// compute shapes and derivatives for codim1 (for
// do_transform_real_to_unit_cell_internal_codim1)
if (dim == (spacedim - 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess at some point we might want to replace those functions, but that is for another PR.

@bergbauer bergbauer force-pushed the optimize_mapping_computation branch from 2bd95b3 to 0d7fa7c Compare April 17, 2023 22:49
@peterrum
Copy link
Member

fill_mapping_data_for_face_quadrature seems to be protected!?

Comment on lines +476 to +466
const Quadrature<2> mutation = internal::QProjector::mutate_quadrature(
quadrature, face_orientation, face_flip, face_rotation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an assert that this only working for hypercubes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines +707 to +699
const Quadrature<2> mutation = internal::QProjector::mutate_quadrature(
quadrature, face_orientation, face_flip, face_rotation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@bergbauer bergbauer force-pushed the optimize_mapping_computation branch from 0d7fa7c to fec2c56 Compare April 18, 2023 08:16
@bergbauer
Copy link
Contributor Author

fill_mapping_data_for_face_quadrature seems to be protected!?

It got quite late yesterday evening 😄 It is public now.

Copy link
Member

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, apart from a few small pages.

include/deal.II/non_matching/mapping_info.h Outdated Show resolved Hide resolved
source/base/qprojector.cc Outdated Show resolved Hide resolved
source/base/qprojector.cc Show resolved Hide resolved
@bergbauer bergbauer force-pushed the optimize_mapping_computation branch from fec2c56 to e45d077 Compare April 18, 2023 11:59
@bergbauer bergbauer force-pushed the optimize_mapping_computation branch from e45d077 to 12038f7 Compare April 18, 2023 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants