Introduce FEFacePointEvaluation with optimized FCL path #16359

bergbauer · 2023-12-16T21:29:00Z

This PR is unfortunately quite big because every change is connected.

Major changes:

introduces a new class FEFacePointEvaluation separating cell and face evaluation, mirroring the structure from FEEvaluation/FEFaceEvaluation.
a new function NonMatching::MappingInfo::reinit_faces() which stores face quadrature points and mapping data in a face-centric manner to be used together with the corresponding reinit() function of FEFacePointEvaluation
restructuring of evaluation_template_factory.h to be able to use interpolation from and to face DoFs
expanding the interface of FEFaceEvaluation to be able to work on the face DoFs
new functions for in-face interpolation in FEFaceEvaluation and FEFacePointEvaluation

The two new tests show how FEFacePointEvaluation should be used efficiently together with FEFaceEvaluation.

FYI @kronbichler @peterrum @jh66637 @mschreter @ritthaler

peterrum

First set of comments.

peterrum · 2023-12-18T13:43:41Z

tests/non_matching/mapping_info_04.cc

+  FEFacePointEvaluation<1, dim, dim, double> fe_peval_m(mapping_info_faces,
+                                                        fe,
+                                                        true);
+  FEFacePointEvaluation<1, dim, dim, double> fe_peval_p(mapping_info_faces,
+                                                        fe,
+                                                        false);


Can we add somewhere an assert that this is not threadsafe?

Where would you place the assert? Do we have a similar assert in FEEvaluation?

peterrum · 2023-12-18T13:50:02Z

include/deal.II/matrix_free/fe_evaluation.h

+  /**
+   * Projects the values, the gradients, and the Hessians into the face DoFs of
+   * the current face using the internally stored cell DoFs.
+   */
+  void
+  project_to_face(const EvaluationFlags::EvaluationFlags evaluation_flag);
+
+  /**
+   * Projects the values, the gradients, and the Hessians into the face DoFs of
+   * the current face using the cell DoFs provided via `values_array`.
+   */
+  void
+  project_to_face(const VectorizedArrayType             *values_array,
+                  const EvaluationFlags::EvaluationFlags evaluation_flag);


Why are there two versions of this function but only one of evaluate_in_face()?

Similar to evaluate it makes sense to offer the option to pass in the cell DoFs the user provides, for evaluate_in_face the face DoFs are at internal fields (where project_to_face has written them to).

peterrum · 2023-12-18T13:53:34Z

include/deal.II/matrix_free/fe_evaluation.h

+  /**
+   * Projects the values, the gradients, and the Hessians into the face DoFs of
+   * the current face using the internally stored cell DoFs.
+   */
+  void
+  project_to_face(const EvaluationFlags::EvaluationFlags evaluation_flag);
+
+  /**
+   * Projects the values, the gradients, and the Hessians into the face DoFs of
+   * the current face using the cell DoFs provided via `values_array`.
+   */
+  void
+  project_to_face(const VectorizedArrayType             *values_array,
+                  const EvaluationFlags::EvaluationFlags evaluation_flag);
+
+  /**
+   * Evaluates the values, the gradients, and the Hessians in-face,
+   * interpolating into the face quadrature points.
+   */
+  void
+  evaluate_in_face(const EvaluationFlags::EvaluationFlags evaluation_flag);


Can we mark these functions as internal? They will never work for simplices, will they? Most of the interface of FEEvaluation works for arbitrary cell types and the check for cell type is performed internally.

This depends on the basis functions. I am not the simplex expert but I could imagine that this works for the values but not for the gradients in general (I only looked at the linear basis). Where can I find the implemented basis functions for FE_SimplexP?

peterrum · 2023-12-18T13:56:43Z

include/deal.II/matrix_free/fe_evaluation.h

+   * Projects the values, the gradients, and the Hessians into the face DoFs of
+   * the current face using the internally stored cell DoFs.


Also it is not clear in which order the data (components/values/gradients/...) are stored. Probably, we should not document it to be able to make changes later on. See also:

dealii/include/deal.II/grid/cell_id.h

Lines 68 to 69 in 5545c01

* @note How this data is internally represented is not of importance (and not

* exposed on purpose).

FEFacePointEvaluation knows how to access it.

peterrum · 2023-12-18T13:58:48Z

include/deal.II/matrix_free/fe_evaluation.h

@@ -2705,7 +2755,7 @@ class FEFaceEvaluation : public FEEvaluationAccess<dim,
   * static_dofs_per_component, but the number depends on the actual element
   * selected and is thus not static.
   */
-  const unsigned int dofs_per_component;
+  const unsigned int dofs_per_component_on_cell;


As discussed in person, we need to deprecate the old variable first.

What is the name in FEEvaluation? Should we rename that one as well?

dealii/include/deal.II/matrix_free/fe_evaluation.h

Lines 7236 to 7237 in 4944495

, dofs_per_component(this->data->dofs_per_component_on_cell)

, dofs_per_cell(this->data->dofs_per_component_on_cell * n_components_)

In FEEvaluation it is fine, because we only have cell dofs there. Consistent would be dofs_per_component_on_cell in both classes (and deprecation of the old name).

dofs_per_component_on_cell is a rather long name, and for the evaluator there is not really a dofs_per_[component_on_]face that is uniquely defined that would motivate using another name. The relevant information is the number of all unknowns on the cell, which is why the particular name was chosen. I see why the change was done but I would be opposed to it because the number of cell dofs and face dofs is not information on the same level of abstraction:

We have the two names you suggest here in ShapeInfo; in my opinion, the name dofs_per_component_on_face in ShapeInfo is poorly chosen and somewhat misleading, so I would rather prefer to also adapt the name in ShapeInfo. Overall, the dofs_per_component variable is a non-internal variable and potentially often used, so I would only make a deprecation and change if we have a uniformly better naming scheme. I do not see that yet, but I might be missing something here.
Regarding the name for the faces, I would suggest something along dofs_per_component_projected_onto_face or something along those lines that is clear how we use it. I think we need to avoid the FiniteElement::dofs_per_face analogy here (which only counts degrees of freedom shared between cells, but not DG ones), because depending on the basis and finite element, the notion of which dofs are active for a particular element degree on a face may vary. In other words, I think we should use a name suggesting the tensor product language of referring to some projection/restriction/interpolation of the cell data onto the face.

peterrum · 2023-12-18T14:33:11Z

include/deal.II/matrix_free/fe_point_evaluation.h

+    : FEPointEvaluationBase<n_components_, dim, spacedim, Number>(
+        mapping,
+        fe,
+        update_flags,
+        first_selected_component)
+  {}


Implementation.

peterrum · 2023-12-18T14:35:47Z

include/deal.II/matrix_free/fe_point_evaluation.h

 */
 template <int n_components_,
          int dim,
          int spacedim    = dim,
          typename Number = double>
-class FEPointEvaluation
+class FEPointEvaluationBase


I think you could introduce the base class in a separate PR.

Should be possible, I can do that if you want me to. It only makes sense after we have FEFacePointEvaluation. Do you mean without new FCL stuff but bringing the ECL path to FEFacePointEvaluation only?

peterrum · 2023-12-18T14:37:28Z

include/deal.II/matrix_free/evaluation_kernels_face.h

+      if (shape_info.element_type == MatrixFreeFunctions::tensor_none)
+        return evaluate_tensor_none(n_components,
+                                    evaluation_flag,
+                                    values_dofs,
+                                    fe_eval);
+      else
+        return evaluate_tensor<fe_degree, n_q_points_1d>(n_components,
+                                                         evaluation_flag,
+                                                         values_dofs,
+                                                         fe_eval);


Why is the split up needed?

The code is much clearer to me this way concerning templates and control flow.

peterrum · 2023-12-18T14:38:35Z

include/deal.II/matrix_free/evaluation_kernels_face.h

-        const EvaluationFlags::EvaluationFlags evaluation_flag,
-        const Number                          *values_dofs,
-        FEEvaluationData<dim, Number, true>   &fe_eval)
+    evaluate_tensor_none(const unsigned int                     n_components,


We could make this private but this is internal anyway. Would you be tempted to use it in one of your codes otherwise? 😉

peterrum · 2023-12-18T14:39:17Z

include/deal.II/matrix_free/evaluation_kernels_face.h

+
+    template <int fe_degree, int n_q_points_1d>
+    static bool
+    evaluate_tensor(const unsigned int                     n_components,


jh66637 · 2024-01-03T11:01:24Z

@bergbauer I tested this in #16299. I think something is not working for first_selected_component!=0. I am not sure but I think in do_integrate_in_face() first_selected_component is not considered?

Anyway, I like the new interface :)

include/deal.II/matrix_free/fe_point_evaluation.h

kronbichler

A few random comments. I will post a larger comment to the overall structure in a separate note soon.

include/deal.II/matrix_free/evaluation_template_face_factory.templates.h

kronbichler · 2024-01-18T14:35:01Z

include/deal.II/matrix_free/fe_evaluation.h

@@ -2705,7 +2755,7 @@ class FEFaceEvaluation : public FEEvaluationAccess<dim,
   * static_dofs_per_component, but the number depends on the actual element
   * selected and is thus not static.
   */
-  const unsigned int dofs_per_component;
+  const unsigned int dofs_per_component_on_cell;


dofs_per_component_on_cell is a rather long name, and for the evaluator there is not really a dofs_per_[component_on_]face that is uniquely defined that would motivate using another name. The relevant information is the number of all unknowns on the cell, which is why the particular name was chosen. I see why the change was done but I would be opposed to it because the number of cell dofs and face dofs is not information on the same level of abstraction:

We have the two names you suggest here in ShapeInfo; in my opinion, the name dofs_per_component_on_face in ShapeInfo is poorly chosen and somewhat misleading, so I would rather prefer to also adapt the name in ShapeInfo. Overall, the dofs_per_component variable is a non-internal variable and potentially often used, so I would only make a deprecation and change if we have a uniformly better naming scheme. I do not see that yet, but I might be missing something here.
Regarding the name for the faces, I would suggest something along dofs_per_component_projected_onto_face or something along those lines that is clear how we use it. I think we need to avoid the FiniteElement::dofs_per_face analogy here (which only counts degrees of freedom shared between cells, but not DG ones), because depending on the basis and finite element, the notion of which dofs are active for a particular element degree on a face may vary. In other words, I think we should use a name suggesting the tensor product language of referring to some projection/restriction/interpolation of the cell data onto the face.

kronbichler · 2024-01-18T14:36:30Z

include/deal.II/matrix_free/fe_evaluation.h

+  const unsigned int dofs_per_component_on_face;
+
+  /**
+   * The number of degrees of freedom on the cell accumulated over all
+   * components in the current evaluation object. Usually close to
+   * static_dofs_per_cell = static_dofs_per_component*n_components, but the
+   * number depends on the actual element selected and is thus not static.
+   */
+  const unsigned int dofs_per_face;


I would not choose these names, and choose something more specific to indicate how to interpret this number and avoid confusion with FiniteElement::dofs_per_face which has a different meaning.

As a second note, I think we should work towards having a smaller footprint in these FEEvaluation/FEFaceEvaluation classes in terms of the numbers of member variables, because the initialization overhead is already quite substantial. I would prefer if we would expose certain, less-used functionality through accessor functions in terms of this->data->....

kronbichler · 2024-01-18T14:42:55Z

include/deal.II/matrix_free/tensor_product_kernels.h

            typename Number2,
            typename Number,
            int  n_values    = 1,
-            bool do_renumber = true>
+            bool do_renumber = true,
+            int  stride      = 1>
  inline
 #ifndef DEBUG


This reminds me that we should try to move the single-point evaluation facility, i.e., everything from line 2021 and onward in tensor_product_kernels.h to a separate file tensor_product_point_evaluate.h (or something similar), to more clearly document the different intents in the two cases.

I can do that. Should we split it before or after this PR?

kronbichler · 2024-01-18T15:15:49Z

Can you quantify the implications of the suggested changes in the evaluation_template_face_factory class? By quantify I mean whether there are measurable changes in the compile time of the instantiations of that function and in the size of these files. I fear that creating separate instantiations for the project_to_face and evaluate_in_face functions could lead to considerable increases of the compiled library, because we essentially double the number of exposed functions. One could hope that a clever compiler might realize that the function body, doing the actual work inside the factory, are the same, so they should not get inlined and thus the now two places would call the same function. In either case, substantial cases like the present one might motivate the compiler to make different tradeoffs in terms of inlining and re-use, so one should evaluate some benchmark with face integrals and see the effect. On the positive side, I like that the tensor_none case has been factored out, because that one does not need the template, so that part of the code might actually lead to less code. An alternative might be to distinguish the cases project_to_face, evaluate_in_face and the overall evaluate with boolean parameters at run time. It might sound a bit counter-intuitive that run-time parameters might be preferable over compile-time switches as proposed in this PR, but without measuring I would not be sure which one to prefer.

I realize that the request to look at a benchmark might sound overly pessimistic and increase the load on you who already did a great job with all the changes in the pull request, but my concern is that these evaluator classes are sensitive in terms of the achieved performance, and being careful we destroy the favorable properties we enjoyed previously. I appreciate that we can extend the usability and I am very positive to eventually get things done, but we should also avoid repeating the mistakes from the past where too many things have been integrated in FEEvaluation and friends without benchmarking, so I later had to spend a lot of my own time to re-organize many things to restore at least some of the performance. Therefore, I would like to encourage us to take care now and discuss the preferred alterantive. Of course, another option we have is to accept some performance loss for generality in cases like this one (with the aim to not generate too much code), if we also work towards a leaner interface for the often-executed parts that is separate/more specialized.

On the positive side, I think splitting off the face evaluation part into FEFacePointEvaluation makes sense, because it allows to more clearly separate the use cases and have an implicit documentation by the new class. However, I did not see an up-to-date documentation and explanation of where the new class would be used (look at the class-level documentation, I think it is . I think the document is quite crucial, because one might be tempted to believe that FEPointEvaluation can cover all cases (which it does from a functional point of view), and the use case is primarily from a performance/optimization point of view, because the projection onto the face reduces the complexity of the operators, when combined with FEFaceEvaluation.

bergbauer · 2024-01-18T21:07:24Z

Can you quantify the implications of the suggested changes in the evaluation_template_face_factory class?

Surprisingly the library size gets marginally smaller with this patch 358616 -> 358548 for release and 1288348 -> 1286472 for debug.
The compilation times seemed pretty equal to me.

but my concern is that these evaluator classes are sensitive in terms of the achieved performance, and being careful we destroy the favorable properties we enjoyed previously.

I absolutely understand your concerns! Let me run some benchmarks, then we can have a look at the numbers.

kronbichler · 2024-01-21T08:03:32Z

Great, given the slightly reduced library size and compilation times I think we can move forward with this patch, thank you for measuring. I think we can postpone the step that splits tensor_product_kernels.h into two files, but it would be great if you could adjust the names of the variables in FEFaceEvaluation for now.

kronbichler

This looks good. Do you want to squash together some of the commits to make them self-contained?

bergbauer · 2024-01-22T18:51:31Z

Do you want to squash together some of the commits to make them self-contained?

Sure.

@kronbichler I see a performance degradation of 13% for linear polynomials in 2D and 8% for linear and 3D. If we help the compiler to inline the high-level functions (like it was before, see last commit) which call templated functions we maintain the performance of master (except for linear and 2D, 5%) at the cost of a minimal increase in library size (-> 359056).

kronbichler · 2024-01-23T07:45:48Z

Thanks for checking. The remaining difference is likely because the compiler is then not inlining the functions called within evaluate_in_face, project_to_face, etc., in order to keep a reasonable code size. I think we should limit the always inlining to release mode, as we do in

dealii/include/deal.II/matrix_free/evaluation_kernels.h

Lines 805 to 810 in c9c74dc

    
               template <typename Number, typename Number2> 
        
           #ifndef DEBUG 
        
               DEAL_II_ALWAYS_INLINE 
        
           #endif 
        
                 static void 
        
                 do_forward(const unsigned int            n_components,

or in

dealii/include/deal.II/matrix_free/evaluation_kernels_hanging_nodes.h

Lines 781 to 783 in c9c74dc

    
           template <unsigned int direction, unsigned int d, bool skip_borders> 
        
           static inline DEAL_II_ALWAYS_INLINE_RELEASE void 
        
           interpolate_3D_face(

via

dealii/include/deal.II/matrix_free/evaluation_kernels_hanging_nodes.h

Lines 31 to 35 in c9c74dc

    
           #ifdef DEBUG 
        
           #  define DEAL_II_ALWAYS_INLINE_RELEASE 
        
           #else 
        
           #  define DEAL_II_ALWAYS_INLINE_RELEASE DEAL_II_ALWAYS_INLINE 
        
           #endif

This should further limit the code size growth.

kronbichler · 2024-01-23T07:46:20Z

/rebuild

bergbauer · 2024-01-23T09:39:55Z

The remaining difference is likely because the compiler is then not inlining the functions called within evaluate_in_face, project_to_face, etc., in order to keep a reasonable code size.

That's exactly what the compiler is doing. It does not matter for anything more expensive than 2D p=1, 2D p=2 even gets sightly faster with the inline macros from the last commit. I think we have to see what the performance tests say now.

peterrum reviewed Dec 18, 2023

View reviewed changes

peterrum mentioned this pull request Dec 22, 2023

Add reinit_faces() for Quadrature<dim> #16380

Closed

jh66637 mentioned this pull request Dec 31, 2023

Add tutorial on Nitsche-type mortaring #16299

Merged

bergbauer commented Jan 3, 2024

View reviewed changes

include/deal.II/matrix_free/fe_point_evaluation.h Show resolved Hide resolved

kronbichler reviewed Jan 18, 2024

View reviewed changes

bergbauer force-pushed the fcl branch from bfaace5 to 518c2e8 Compare January 18, 2024 21:30

kronbichler approved these changes Jan 22, 2024

View reviewed changes

bergbauer and others added 3 commits January 22, 2024 20:19

FCL path

a7418f0

add test

bc1016c

Add dim component test and fix strides

1cf310c

bergbauer force-pushed the fcl branch from 5dc3c94 to 14c54e3 Compare January 22, 2024 19:21

kronbichler added Matrix-free ready to test labels Jan 23, 2024

bergbauer added 2 commits January 23, 2024 10:33

Address review comments

af5aa8a

Force inlining in release mode

b97f5e5

bergbauer force-pushed the fcl branch from 14c54e3 to b97f5e5 Compare January 23, 2024 09:34

kronbichler approved these changes Jan 23, 2024

View reviewed changes

jh66637 mentioned this pull request Jan 24, 2024

Add factory functions for FERemoteEvaluationCommunicator #16394

Merged

kronbichler merged commit fc1d748 into dealii:master Jan 26, 2024
15 checks passed

bergbauer mentioned this pull request Feb 5, 2024

NonMatching::MappingInfo: Remove quadrature projection #16596

Merged

bergbauer mentioned this pull request Mar 8, 2024

MatrixFree: Split tensor product points and single point kernels #16736

Merged

		* Projects the values, the gradients, and the Hessians into the face DoFs of
		* the current face using the internally stored cell DoFs.

	* @note How this data is internally represented is not of importance (and not
	* exposed on purpose).

	, dofs_per_component(this->data->dofs_per_component_on_cell)
	, dofs_per_cell(this->data->dofs_per_component_on_cell * n_components_)

Introduce FEFacePointEvaluation with optimized FCL path #16359

Introduce FEFacePointEvaluation with optimized FCL path #16359

Conversation

bergbauer commented Dec 16, 2023

peterrum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jh66637 commented Jan 3, 2024

kronbichler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kronbichler commented Jan 18, 2024

bergbauer commented Jan 18, 2024

kronbichler commented Jan 21, 2024

kronbichler left a comment

Choose a reason for hiding this comment

bergbauer commented Jan 22, 2024

kronbichler commented Jan 23, 2024

kronbichler commented Jan 23, 2024

bergbauer commented Jan 23, 2024