New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce FEFacePointEvaluation with optimized FCL path #16359
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First set of comments.
FEFacePointEvaluation<1, dim, dim, double> fe_peval_m(mapping_info_faces, | ||
fe, | ||
true); | ||
FEFacePointEvaluation<1, dim, dim, double> fe_peval_p(mapping_info_faces, | ||
fe, | ||
false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add somewhere an assert that this is not threadsafe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would you place the assert? Do we have a similar assert in FEEvaluation
?
/** | ||
* Projects the values, the gradients, and the Hessians into the face DoFs of | ||
* the current face using the internally stored cell DoFs. | ||
*/ | ||
void | ||
project_to_face(const EvaluationFlags::EvaluationFlags evaluation_flag); | ||
|
||
/** | ||
* Projects the values, the gradients, and the Hessians into the face DoFs of | ||
* the current face using the cell DoFs provided via `values_array`. | ||
*/ | ||
void | ||
project_to_face(const VectorizedArrayType *values_array, | ||
const EvaluationFlags::EvaluationFlags evaluation_flag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there two versions of this function but only one of evaluate_in_face()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to evaluate
it makes sense to offer the option to pass in the cell DoFs the user provides, for evaluate_in_face
the face DoFs are at internal fields (where project_to_face
has written them to).
/** | ||
* Projects the values, the gradients, and the Hessians into the face DoFs of | ||
* the current face using the internally stored cell DoFs. | ||
*/ | ||
void | ||
project_to_face(const EvaluationFlags::EvaluationFlags evaluation_flag); | ||
|
||
/** | ||
* Projects the values, the gradients, and the Hessians into the face DoFs of | ||
* the current face using the cell DoFs provided via `values_array`. | ||
*/ | ||
void | ||
project_to_face(const VectorizedArrayType *values_array, | ||
const EvaluationFlags::EvaluationFlags evaluation_flag); | ||
|
||
/** | ||
* Evaluates the values, the gradients, and the Hessians in-face, | ||
* interpolating into the face quadrature points. | ||
*/ | ||
void | ||
evaluate_in_face(const EvaluationFlags::EvaluationFlags evaluation_flag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mark these functions as internal? They will never work for simplices, will they? Most of the interface of FEEvaluation
works for arbitrary cell types and the check for cell type is performed internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This depends on the basis functions. I am not the simplex expert but I could imagine that this works for the values but not for the gradients in general (I only looked at the linear basis). Where can I find the implemented basis functions for FE_SimplexP
?
* Projects the values, the gradients, and the Hessians into the face DoFs of | ||
* the current face using the internally stored cell DoFs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it is not clear in which order the data (components/values/gradients/...) are stored. Probably, we should not document it to be able to make changes later on. See also:
dealii/include/deal.II/grid/cell_id.h
Lines 68 to 69 in 5545c01
* @note How this data is internally represented is not of importance (and not | |
* exposed on purpose). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FEFacePointEvaluation
knows how to access it.
@@ -2705,7 +2755,7 @@ class FEFaceEvaluation : public FEEvaluationAccess<dim, | |||
* static_dofs_per_component, but the number depends on the actual element | |||
* selected and is thus not static. | |||
*/ | |||
const unsigned int dofs_per_component; | |||
const unsigned int dofs_per_component_on_cell; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in person, we need to deprecate the old variable first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the name in FEEvaluation
? Should we rename that one as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dealii/include/deal.II/matrix_free/fe_evaluation.h
Lines 7236 to 7237 in 4944495
, dofs_per_component(this->data->dofs_per_component_on_cell) | |
, dofs_per_cell(this->data->dofs_per_component_on_cell * n_components_) |
In FEEvaluation
it is fine, because we only have cell dofs there. Consistent would be dofs_per_component_on_cell
in both classes (and deprecation of the old name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dofs_per_component_on_cell
is a rather long name, and for the evaluator there is not really a dofs_per_[component_on_]face
that is uniquely defined that would motivate using another name. The relevant information is the number of all unknowns on the cell, which is why the particular name was chosen. I see why the change was done but I would be opposed to it because the number of cell dofs and face dofs is not information on the same level of abstraction:
We have the two names you suggest here in ShapeInfo
; in my opinion, the name dofs_per_component_on_face
in ShapeInfo
is poorly chosen and somewhat misleading, so I would rather prefer to also adapt the name in ShapeInfo
. Overall, the dofs_per_component
variable is a non-internal variable and potentially often used, so I would only make a deprecation and change if we have a uniformly better naming scheme. I do not see that yet, but I might be missing something here.
Regarding the name for the faces, I would suggest something along dofs_per_component_projected_onto_face
or something along those lines that is clear how we use it. I think we need to avoid the FiniteElement::dofs_per_face
analogy here (which only counts degrees of freedom shared between cells, but not DG ones), because depending on the basis and finite element, the notion of which dofs are active for a particular element degree on a face may vary. In other words, I think we should use a name suggesting the tensor product language of referring to some projection/restriction/interpolation of the cell data onto the face.
: FEPointEvaluationBase<n_components_, dim, spacedim, Number>( | ||
mapping, | ||
fe, | ||
update_flags, | ||
first_selected_component) | ||
{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation.
*/ | ||
template <int n_components_, | ||
int dim, | ||
int spacedim = dim, | ||
typename Number = double> | ||
class FEPointEvaluation | ||
class FEPointEvaluationBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could introduce the base class in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be possible, I can do that if you want me to. It only makes sense after we have FEFacePointEvaluation
. Do you mean without new FCL stuff but bringing the ECL path to FEFacePointEvaluation
only?
if (shape_info.element_type == MatrixFreeFunctions::tensor_none) | ||
return evaluate_tensor_none(n_components, | ||
evaluation_flag, | ||
values_dofs, | ||
fe_eval); | ||
else | ||
return evaluate_tensor<fe_degree, n_q_points_1d>(n_components, | ||
evaluation_flag, | ||
values_dofs, | ||
fe_eval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the split up needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is much clearer to me this way concerning templates and control flow.
const EvaluationFlags::EvaluationFlags evaluation_flag, | ||
const Number *values_dofs, | ||
FEEvaluationData<dim, Number, true> &fe_eval) | ||
evaluate_tensor_none(const unsigned int n_components, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this private but this is internal anyway. Would you be tempted to use it in one of your codes otherwise? 😉
|
||
template <int fe_degree, int n_q_points_1d> | ||
static bool | ||
evaluate_tensor(const unsigned int n_components, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private?
@bergbauer I tested this in #16299. I think something is not working for Anyway, I like the new interface :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few random comments. I will post a larger comment to the overall structure in a separate note soon.
include/deal.II/matrix_free/evaluation_template_face_factory.templates.h
Show resolved
Hide resolved
@@ -2705,7 +2755,7 @@ class FEFaceEvaluation : public FEEvaluationAccess<dim, | |||
* static_dofs_per_component, but the number depends on the actual element | |||
* selected and is thus not static. | |||
*/ | |||
const unsigned int dofs_per_component; | |||
const unsigned int dofs_per_component_on_cell; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dofs_per_component_on_cell
is a rather long name, and for the evaluator there is not really a dofs_per_[component_on_]face
that is uniquely defined that would motivate using another name. The relevant information is the number of all unknowns on the cell, which is why the particular name was chosen. I see why the change was done but I would be opposed to it because the number of cell dofs and face dofs is not information on the same level of abstraction:
We have the two names you suggest here in ShapeInfo
; in my opinion, the name dofs_per_component_on_face
in ShapeInfo
is poorly chosen and somewhat misleading, so I would rather prefer to also adapt the name in ShapeInfo
. Overall, the dofs_per_component
variable is a non-internal variable and potentially often used, so I would only make a deprecation and change if we have a uniformly better naming scheme. I do not see that yet, but I might be missing something here.
Regarding the name for the faces, I would suggest something along dofs_per_component_projected_onto_face
or something along those lines that is clear how we use it. I think we need to avoid the FiniteElement::dofs_per_face
analogy here (which only counts degrees of freedom shared between cells, but not DG ones), because depending on the basis and finite element, the notion of which dofs are active for a particular element degree on a face may vary. In other words, I think we should use a name suggesting the tensor product language of referring to some projection/restriction/interpolation of the cell data onto the face.
const unsigned int dofs_per_component_on_face; | ||
|
||
/** | ||
* The number of degrees of freedom on the cell accumulated over all | ||
* components in the current evaluation object. Usually close to | ||
* static_dofs_per_cell = static_dofs_per_component*n_components, but the | ||
* number depends on the actual element selected and is thus not static. | ||
*/ | ||
const unsigned int dofs_per_face; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not choose these names, and choose something more specific to indicate how to interpret this number and avoid confusion with FiniteElement::dofs_per_face
which has a different meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a second note, I think we should work towards having a smaller footprint in these FEEvaluation
/FEFaceEvaluation
classes in terms of the numbers of member variables, because the initialization overhead is already quite substantial. I would prefer if we would expose certain, less-used functionality through accessor functions in terms of this->data->...
.
typename Number2, | ||
typename Number, | ||
int n_values = 1, | ||
bool do_renumber = true> | ||
bool do_renumber = true, | ||
int stride = 1> | ||
inline | ||
#ifndef DEBUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me that we should try to move the single-point evaluation facility, i.e., everything from line 2021 and onward in tensor_product_kernels.h
to a separate file tensor_product_point_evaluate.h
(or something similar), to more clearly document the different intents in the two cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that. Should we split it before or after this PR?
Can you quantify the implications of the suggested changes in the I realize that the request to look at a benchmark might sound overly pessimistic and increase the load on you who already did a great job with all the changes in the pull request, but my concern is that these evaluator classes are sensitive in terms of the achieved performance, and being careful we destroy the favorable properties we enjoyed previously. I appreciate that we can extend the usability and I am very positive to eventually get things done, but we should also avoid repeating the mistakes from the past where too many things have been integrated in FEEvaluation and friends without benchmarking, so I later had to spend a lot of my own time to re-organize many things to restore at least some of the performance. Therefore, I would like to encourage us to take care now and discuss the preferred alterantive. Of course, another option we have is to accept some performance loss for generality in cases like this one (with the aim to not generate too much code), if we also work towards a leaner interface for the often-executed parts that is separate/more specialized. On the positive side, I think splitting off the face evaluation part into |
Surprisingly the library size gets marginally smaller with this patch 358616 -> 358548 for release and 1288348 -> 1286472 for debug.
I absolutely understand your concerns! Let me run some benchmarks, then we can have a look at the numbers. |
Great, given the slightly reduced library size and compilation times I think we can move forward with this patch, thank you for measuring. I think we can postpone the step that splits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Do you want to squash together some of the commits to make them self-contained?
Sure. @kronbichler I see a performance degradation of 13% for linear polynomials in 2D and 8% for linear and 3D. If we help the compiler to inline the high-level functions (like it was before, see last commit) which call templated functions we maintain the performance of master (except for linear and 2D, 5%) at the cost of a minimal increase in library size (-> 359056). |
Thanks for checking. The remaining difference is likely because the compiler is then not inlining the functions called within dealii/include/deal.II/matrix_free/evaluation_kernels.h Lines 805 to 810 in c9c74dc
dealii/include/deal.II/matrix_free/evaluation_kernels_hanging_nodes.h Lines 781 to 783 in c9c74dc
dealii/include/deal.II/matrix_free/evaluation_kernels_hanging_nodes.h Lines 31 to 35 in c9c74dc
|
/rebuild |
That's exactly what the compiler is doing. It does not matter for anything more expensive than 2D p=1, 2D p=2 even gets sightly faster with the inline macros from the last commit. I think we have to see what the performance tests say now. |
This PR is unfortunately quite big because every change is connected.
Major changes:
FEFacePointEvaluation
separating cell and face evaluation, mirroring the structure fromFEEvaluation
/FEFaceEvaluation
.NonMatching::MappingInfo::reinit_faces()
which stores face quadrature points and mapping data in a face-centric manner to be used together with the correspondingreinit()
function ofFEFacePointEvaluation
evaluation_template_factory.h
to be able to use interpolation from and to face DoFsFEFaceEvaluation
to be able to work on the face DoFsFEFaceEvaluation
andFEFacePointEvaluation
The two new tests show how
FEFacePointEvaluation
should be used efficiently together withFEFaceEvaluation
.FYI @kronbichler @peterrum @jh66637 @mschreter @ritthaler