Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce FEFacePointEvaluation with optimized FCL path #16359

Merged
merged 5 commits into from Jan 26, 2024

Conversation

bergbauer
Copy link
Contributor

This PR is unfortunately quite big because every change is connected.

Major changes:

  • introduces a new class FEFacePointEvaluation separating cell and face evaluation, mirroring the structure from FEEvaluation/FEFaceEvaluation.
  • a new function NonMatching::MappingInfo::reinit_faces() which stores face quadrature points and mapping data in a face-centric manner to be used together with the corresponding reinit() function of FEFacePointEvaluation
  • restructuring of evaluation_template_factory.h to be able to use interpolation from and to face DoFs
  • expanding the interface of FEFaceEvaluation to be able to work on the face DoFs
  • new functions for in-face interpolation in FEFaceEvaluation and FEFacePointEvaluation

The two new tests show how FEFacePointEvaluation should be used efficiently together with FEFaceEvaluation.

FYI @kronbichler @peterrum @jh66637 @mschreter @ritthaler

Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First set of comments.

Comment on lines +278 to +281
FEFacePointEvaluation<1, dim, dim, double> fe_peval_m(mapping_info_faces,
fe,
true);
FEFacePointEvaluation<1, dim, dim, double> fe_peval_p(mapping_info_faces,
fe,
false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add somewhere an assert that this is not threadsafe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would you place the assert? Do we have a similar assert in FEEvaluation?

Comment on lines +2607 to +2620
/**
* Projects the values, the gradients, and the Hessians into the face DoFs of
* the current face using the internally stored cell DoFs.
*/
void
project_to_face(const EvaluationFlags::EvaluationFlags evaluation_flag);

/**
* Projects the values, the gradients, and the Hessians into the face DoFs of
* the current face using the cell DoFs provided via `values_array`.
*/
void
project_to_face(const VectorizedArrayType *values_array,
const EvaluationFlags::EvaluationFlags evaluation_flag);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there two versions of this function but only one of evaluate_in_face()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to evaluate it makes sense to offer the option to pass in the cell DoFs the user provides, for evaluate_in_face the face DoFs are at internal fields (where project_to_face has written them to).

Comment on lines +2607 to +2627
/**
* Projects the values, the gradients, and the Hessians into the face DoFs of
* the current face using the internally stored cell DoFs.
*/
void
project_to_face(const EvaluationFlags::EvaluationFlags evaluation_flag);

/**
* Projects the values, the gradients, and the Hessians into the face DoFs of
* the current face using the cell DoFs provided via `values_array`.
*/
void
project_to_face(const VectorizedArrayType *values_array,
const EvaluationFlags::EvaluationFlags evaluation_flag);

/**
* Evaluates the values, the gradients, and the Hessians in-face,
* interpolating into the face quadrature points.
*/
void
evaluate_in_face(const EvaluationFlags::EvaluationFlags evaluation_flag);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mark these functions as internal? They will never work for simplices, will they? Most of the interface of FEEvaluation works for arbitrary cell types and the check for cell type is performed internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on the basis functions. I am not the simplex expert but I could imagine that this works for the values but not for the gradients in general (I only looked at the linear basis). Where can I find the implemented basis functions for FE_SimplexP?

Comment on lines +2608 to +2609
* Projects the values, the gradients, and the Hessians into the face DoFs of
* the current face using the internally stored cell DoFs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it is not clear in which order the data (components/values/gradients/...) are stored. Probably, we should not document it to be able to make changes later on. See also:

* @note How this data is internally represented is not of importance (and not
* exposed on purpose).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FEFacePointEvaluation knows how to access it.

@@ -2705,7 +2755,7 @@ class FEFaceEvaluation : public FEEvaluationAccess<dim,
* static_dofs_per_component, but the number depends on the actual element
* selected and is thus not static.
*/
const unsigned int dofs_per_component;
const unsigned int dofs_per_component_on_cell;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in person, we need to deprecate the old variable first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the name in FEEvaluation? Should we rename that one as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, dofs_per_component(this->data->dofs_per_component_on_cell)
, dofs_per_cell(this->data->dofs_per_component_on_cell * n_components_)

In FEEvaluation it is fine, because we only have cell dofs there. Consistent would be dofs_per_component_on_cell in both classes (and deprecation of the old name).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dofs_per_component_on_cell is a rather long name, and for the evaluator there is not really a dofs_per_[component_on_]face that is uniquely defined that would motivate using another name. The relevant information is the number of all unknowns on the cell, which is why the particular name was chosen. I see why the change was done but I would be opposed to it because the number of cell dofs and face dofs is not information on the same level of abstraction:

We have the two names you suggest here in ShapeInfo; in my opinion, the name dofs_per_component_on_face in ShapeInfo is poorly chosen and somewhat misleading, so I would rather prefer to also adapt the name in ShapeInfo. Overall, the dofs_per_component variable is a non-internal variable and potentially often used, so I would only make a deprecation and change if we have a uniformly better naming scheme. I do not see that yet, but I might be missing something here.
Regarding the name for the faces, I would suggest something along dofs_per_component_projected_onto_face or something along those lines that is clear how we use it. I think we need to avoid the FiniteElement::dofs_per_face analogy here (which only counts degrees of freedom shared between cells, but not DG ones), because depending on the basis and finite element, the notion of which dofs are active for a particular element degree on a face may vary. In other words, I think we should use a name suggesting the tensor product language of referring to some projection/restriction/interpolation of the cell data onto the face.

Comment on lines 1682 to 1687
: FEPointEvaluationBase<n_components_, dim, spacedim, Number>(
mapping,
fe,
update_flags,
first_selected_component)
{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation.

*/
template <int n_components_,
int dim,
int spacedim = dim,
typename Number = double>
class FEPointEvaluation
class FEPointEvaluationBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could introduce the base class in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be possible, I can do that if you want me to. It only makes sense after we have FEFacePointEvaluation. Do you mean without new FCL stuff but bringing the ECL path to FEFacePointEvaluation only?

Comment on lines +1754 to +1772
if (shape_info.element_type == MatrixFreeFunctions::tensor_none)
return evaluate_tensor_none(n_components,
evaluation_flag,
values_dofs,
fe_eval);
else
return evaluate_tensor<fe_degree, n_q_points_1d>(n_components,
evaluation_flag,
values_dofs,
fe_eval);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the split up needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is much clearer to me this way concerning templates and control flow.

const EvaluationFlags::EvaluationFlags evaluation_flag,
const Number *values_dofs,
FEEvaluationData<dim, Number, true> &fe_eval)
evaluate_tensor_none(const unsigned int n_components,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this private but this is internal anyway. Would you be tempted to use it in one of your codes otherwise? 😉


template <int fe_degree, int n_q_points_1d>
static bool
evaluate_tensor(const unsigned int n_components,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private?

@jh66637
Copy link
Contributor

jh66637 commented Jan 3, 2024

@bergbauer I tested this in #16299. I think something is not working for first_selected_component!=0. I am not sure but I think in do_integrate_in_face() first_selected_component is not considered?

Anyway, I like the new interface :)

Copy link
Member

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few random comments. I will post a larger comment to the overall structure in a separate note soon.

@@ -2705,7 +2755,7 @@ class FEFaceEvaluation : public FEEvaluationAccess<dim,
* static_dofs_per_component, but the number depends on the actual element
* selected and is thus not static.
*/
const unsigned int dofs_per_component;
const unsigned int dofs_per_component_on_cell;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dofs_per_component_on_cell is a rather long name, and for the evaluator there is not really a dofs_per_[component_on_]face that is uniquely defined that would motivate using another name. The relevant information is the number of all unknowns on the cell, which is why the particular name was chosen. I see why the change was done but I would be opposed to it because the number of cell dofs and face dofs is not information on the same level of abstraction:

We have the two names you suggest here in ShapeInfo; in my opinion, the name dofs_per_component_on_face in ShapeInfo is poorly chosen and somewhat misleading, so I would rather prefer to also adapt the name in ShapeInfo. Overall, the dofs_per_component variable is a non-internal variable and potentially often used, so I would only make a deprecation and change if we have a uniformly better naming scheme. I do not see that yet, but I might be missing something here.
Regarding the name for the faces, I would suggest something along dofs_per_component_projected_onto_face or something along those lines that is clear how we use it. I think we need to avoid the FiniteElement::dofs_per_face analogy here (which only counts degrees of freedom shared between cells, but not DG ones), because depending on the basis and finite element, the notion of which dofs are active for a particular element degree on a face may vary. In other words, I think we should use a name suggesting the tensor product language of referring to some projection/restriction/interpolation of the cell data onto the face.

Comment on lines 2774 to 2782
const unsigned int dofs_per_component_on_face;

/**
* The number of degrees of freedom on the cell accumulated over all
* components in the current evaluation object. Usually close to
* static_dofs_per_cell = static_dofs_per_component*n_components, but the
* number depends on the actual element selected and is thus not static.
*/
const unsigned int dofs_per_face;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not choose these names, and choose something more specific to indicate how to interpret this number and avoid confusion with FiniteElement::dofs_per_face which has a different meaning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a second note, I think we should work towards having a smaller footprint in these FEEvaluation/FEFaceEvaluation classes in terms of the numbers of member variables, because the initialization overhead is already quite substantial. I would prefer if we would expose certain, less-used functionality through accessor functions in terms of this->data->....

Comment on lines 2085 to 2090
typename Number2,
typename Number,
int n_values = 1,
bool do_renumber = true>
bool do_renumber = true,
int stride = 1>
inline
#ifndef DEBUG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me that we should try to move the single-point evaluation facility, i.e., everything from line 2021 and onward in tensor_product_kernels.h to a separate file tensor_product_point_evaluate.h (or something similar), to more clearly document the different intents in the two cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. Should we split it before or after this PR?

@kronbichler
Copy link
Member

Can you quantify the implications of the suggested changes in the evaluation_template_face_factory class? By quantify I mean whether there are measurable changes in the compile time of the instantiations of that function and in the size of these files. I fear that creating separate instantiations for the project_to_face and evaluate_in_face functions could lead to considerable increases of the compiled library, because we essentially double the number of exposed functions. One could hope that a clever compiler might realize that the function body, doing the actual work inside the factory, are the same, so they should not get inlined and thus the now two places would call the same function. In either case, substantial cases like the present one might motivate the compiler to make different tradeoffs in terms of inlining and re-use, so one should evaluate some benchmark with face integrals and see the effect. On the positive side, I like that the tensor_none case has been factored out, because that one does not need the template, so that part of the code might actually lead to less code. An alternative might be to distinguish the cases project_to_face, evaluate_in_face and the overall evaluate with boolean parameters at run time. It might sound a bit counter-intuitive that run-time parameters might be preferable over compile-time switches as proposed in this PR, but without measuring I would not be sure which one to prefer.

I realize that the request to look at a benchmark might sound overly pessimistic and increase the load on you who already did a great job with all the changes in the pull request, but my concern is that these evaluator classes are sensitive in terms of the achieved performance, and being careful we destroy the favorable properties we enjoyed previously. I appreciate that we can extend the usability and I am very positive to eventually get things done, but we should also avoid repeating the mistakes from the past where too many things have been integrated in FEEvaluation and friends without benchmarking, so I later had to spend a lot of my own time to re-organize many things to restore at least some of the performance. Therefore, I would like to encourage us to take care now and discuss the preferred alterantive. Of course, another option we have is to accept some performance loss for generality in cases like this one (with the aim to not generate too much code), if we also work towards a leaner interface for the often-executed parts that is separate/more specialized.

On the positive side, I think splitting off the face evaluation part into FEFacePointEvaluation makes sense, because it allows to more clearly separate the use cases and have an implicit documentation by the new class. However, I did not see an up-to-date documentation and explanation of where the new class would be used (look at the class-level documentation, I think it is . I think the document is quite crucial, because one might be tempted to believe that FEPointEvaluation can cover all cases (which it does from a functional point of view), and the use case is primarily from a performance/optimization point of view, because the projection onto the face reduces the complexity of the operators, when combined with FEFaceEvaluation.

@bergbauer
Copy link
Contributor Author

Can you quantify the implications of the suggested changes in the evaluation_template_face_factory class?

Surprisingly the library size gets marginally smaller with this patch 358616 -> 358548 for release and 1288348 -> 1286472 for debug.
The compilation times seemed pretty equal to me.

but my concern is that these evaluator classes are sensitive in terms of the achieved performance, and being careful we destroy the favorable properties we enjoyed previously.

I absolutely understand your concerns! Let me run some benchmarks, then we can have a look at the numbers.

@kronbichler
Copy link
Member

Great, given the slightly reduced library size and compilation times I think we can move forward with this patch, thank you for measuring. I think we can postpone the step that splits tensor_product_kernels.h into two files, but it would be great if you could adjust the names of the variables in FEFaceEvaluation for now.

Copy link
Member

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Do you want to squash together some of the commits to make them self-contained?

@bergbauer
Copy link
Contributor Author

Do you want to squash together some of the commits to make them self-contained?

Sure.

@kronbichler I see a performance degradation of 13% for linear polynomials in 2D and 8% for linear and 3D. If we help the compiler to inline the high-level functions (like it was before, see last commit) which call templated functions we maintain the performance of master (except for linear and 2D, 5%) at the cost of a minimal increase in library size (-> 359056).

@kronbichler
Copy link
Member

Thanks for checking. The remaining difference is likely because the compiler is then not inlining the functions called within evaluate_in_face, project_to_face, etc., in order to keep a reasonable code size. I think we should limit the always inlining to release mode, as we do in

template <typename Number, typename Number2>
#ifndef DEBUG
DEAL_II_ALWAYS_INLINE
#endif
static void
do_forward(const unsigned int n_components,
or in
template <unsigned int direction, unsigned int d, bool skip_borders>
static inline DEAL_II_ALWAYS_INLINE_RELEASE void
interpolate_3D_face(
via
#ifdef DEBUG
# define DEAL_II_ALWAYS_INLINE_RELEASE
#else
# define DEAL_II_ALWAYS_INLINE_RELEASE DEAL_II_ALWAYS_INLINE
#endif
This should further limit the code size growth.

@kronbichler
Copy link
Member

/rebuild

@bergbauer
Copy link
Contributor Author

The remaining difference is likely because the compiler is then not inlining the functions called within evaluate_in_face, project_to_face, etc., in order to keep a reasonable code size.

That's exactly what the compiler is doing. It does not matter for anything more expensive than 2D p=1, 2D p=2 even gets sightly faster with the inline macros from the last commit. I think we have to see what the performance tests say now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants