New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide more general factory class for FEEvaluation #10927
Conversation
include/deal.II/matrix_free/evaluation_template_factory.templates.h
Outdated
Show resolved
Hide resolved
c400519
to
42cac17
Compare
|
||
|
||
for (deal_II_dimension : DIMENSIONS; | ||
deal_II_scalar_vectorized : REAL_SCALARS_VECTORIZED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason why it gets expensive - we need to compile it for all VectorizedArray
types we have. We could again think of a shortcut to only use fast paths for some degrees, but I fear that we would need to be careful with describing this for the user to not give surprises when performance differs by a factor of 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with your suggestion. I guess I am one of the few who actually uses the VectorizedArrayType
template argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me once the tests are happy.
In a follow-up PR, we should clean up the internal interfaces because it is ugly to pass around that many arguments, especially in the gather_evaluate and integrate_scatter calls.
Indeed, we need to do something here!
if (fe_degree > -1) | ||
SelectEvaluator<dim, fe_degree, n_q_points_1d, VectorizedArrayType>:: | ||
evaluate(n_components, | ||
evaluation_flags, | ||
*this->data, | ||
const_cast<VectorizedArrayType *>(values_array), | ||
this->values_quad, | ||
this->gradients_quad, | ||
this->hessians_quad, | ||
this->scratch_data); | ||
else | ||
internal::FEEvaluationFactory<dim, Number, VectorizedArrayType>::evaluate( | ||
n_components, | ||
evaluation_flags, | ||
*this->data, | ||
const_cast<VectorizedArrayType *>(values_array), | ||
this->values_quad, | ||
this->gradients_quad, | ||
this->hessians_quad, | ||
this->scratch_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really nice. Duplication of the same arguments. The same in the following three functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but let us solve this with #10928 because that would reduce the duplication to 3 (?) arguments, at which point duplication is very likely the more readable version compared to adding a layer to collect the two directions (e.g. lambda).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree!
|
||
|
||
for (deal_II_dimension : DIMENSIONS; | ||
deal_II_scalar_vectorized : REAL_SCALARS_VECTORIZED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with your suggestion. I guess I am one of the few who actually uses the VectorizedArrayType
template argument.
template bool dealii::internal::FEFaceEvaluationFactory< | ||
deal_II_dimension, | ||
deal_II_scalar_vectorized::value_type, | ||
deal_II_scalar_vectorized>:: | ||
gather_evaluate<1>( | ||
const unsigned int, | ||
const deal_II_scalar_vectorized::value_type *, | ||
const MatrixFreeFunctions::ShapeInfo<deal_II_scalar_vectorized> &, | ||
const MatrixFreeFunctions::DoFInfo &, | ||
deal_II_scalar_vectorized *, | ||
deal_II_scalar_vectorized *, | ||
deal_II_scalar_vectorized *, | ||
const bool, | ||
const bool, | ||
const unsigned int, | ||
const unsigned int, | ||
const std::array<unsigned int, 1>, | ||
const std::array<unsigned int, 1>, | ||
const unsigned int, | ||
const MatrixFreeFunctions::DoFInfo::DoFAccessIndex, | ||
const std::array<unsigned int, 1>, | ||
const Table<2, unsigned int> &); | ||
|
||
template bool dealii::internal::FEFaceEvaluationFactory< | ||
deal_II_dimension, | ||
deal_II_scalar_vectorized::value_type, | ||
deal_II_scalar_vectorized>:: | ||
integrate_scatter<1>( | ||
const unsigned int, | ||
deal_II_scalar_vectorized::value_type *, | ||
const MatrixFreeFunctions::ShapeInfo<deal_II_scalar_vectorized> &, | ||
const MatrixFreeFunctions::DoFInfo &, | ||
deal_II_scalar_vectorized *, | ||
deal_II_scalar_vectorized *, | ||
deal_II_scalar_vectorized *, | ||
deal_II_scalar_vectorized *, | ||
const bool, | ||
const bool, | ||
const unsigned int, | ||
const unsigned int, | ||
const std::array<unsigned int, 1>, | ||
const std::array<unsigned int, 1>, | ||
const unsigned int, | ||
const MatrixFreeFunctions::DoFInfo::DoFAccessIndex, | ||
const std::array<unsigned int, 1>, | ||
const Table<2, unsigned int> &); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is ugly^^
By the way, I think this is a change worth a change-log entry, isn't it? |
42cac17
to
d4697e2
Compare
I added two new tests for the features that are new, and switched some existing test cases to use to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I would probably merge tomorrow if there are no objections.
73cd11c
to
e6d7131
Compare
I don't understand the internal changes of this PR, but I feel that it would be great to have a paragraph about this feature in the matrix-free documentation somewhere (especially how FE_EVAL_FACTORY_DEGREE_MAX works). Is there a way for the user to know if the current setup is in the "slow path"? |
Very good point. I added documentation for the parameter in FEEvaluation.
I guess we should provide a library function which prints the information to a stream given by the user. Let me try that in a follow-up PR - I open an issue for now to recall this and possibly extend the documentation. |
* templates), deal.II also supports usage of this class based on the | ||
* information in the element passed to the initialization. For this usage | ||
* model, set the template parameter for the polynomial degree to -1 and | ||
* choose an arbitrary number for the number of quadrature points. That code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arbitrary number for the number of quadrature points
The value is not actually used, is it? I guess we cannot force it to be -1
due to backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not used, but also it does not matter I believe because we replace the value here:
return EvaluatorType::template run<-1, 0>(args...); |
If I remember exactly we also did not say explicitly that one needed the combination
-1,0
before, so I would think it does not matter. Or was there anything particular you had in mind that works in one case but not the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. No, I think it might give somehow to a user the wrong impression if n_quadrature_points
is not forced to -1
and he/she sets the values to >0
and gets the wrong (not expected) result.
But this is not that important for this PR. I am happy with the PR. I let @tjhei merge!
Personally, I would want this to be on by default on anything I work on: if I use matrix-free, I care about performance. A warning message when starting is not something that would bother me. I guess we don't have this in any other place in the code, though. |
@kronbichler I guess I am speaking on behalf of every |
This PR adds a factory class for the
evaluate
andintegrate
calls ofFEEvaluation
andFEFaceEvaluation
. As opposed toSelectEvaluator
which seems to rebuild the code every time the class is used, this places the code in a separate file to force separation. It could have been achieved with that class also with some work but I did not want to repeat everything forFEFaceEvaluation
- I have chosen a shorter approach with less code.Right now I use a
FE_EVAL_FACTORY_DEGREE_MAX
to let the user pre-compile different degrees. We could think of making that a cmake variable to more easily change things. However, the question that is still open is the range for the quadrature formulas, which is hardcoded right now.Whenever the code hits the pre-compiled part but with a degree not expanded, we fall back to the generic evaluate function that does not have the template (and is >2x slower).
Fixes #9794. But I must admit that we currently get a bigger library with this PR because code gets pre-compiled for all types of
VectorizedArray
.In a follow-up PR, we should clean up the internal interfaces because it is ugly to pass around that many arguments, especially in the
gather_evaluate
andintegrate_scatter
calls.