Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more general factory class for FEEvaluation #10927

Merged
merged 9 commits into from Sep 22, 2020

Conversation

kronbichler
Copy link
Member

@kronbichler kronbichler commented Sep 17, 2020

This PR adds a factory class for the evaluate and integrate calls of FEEvaluation and FEFaceEvaluation. As opposed to SelectEvaluator which seems to rebuild the code every time the class is used, this places the code in a separate file to force separation. It could have been achieved with that class also with some work but I did not want to repeat everything for FEFaceEvaluation - I have chosen a shorter approach with less code.

Right now I use a FE_EVAL_FACTORY_DEGREE_MAX to let the user pre-compile different degrees. We could think of making that a cmake variable to more easily change things. However, the question that is still open is the range for the quadrature formulas, which is hardcoded right now.

Whenever the code hits the pre-compiled part but with a degree not expanded, we fall back to the generic evaluate function that does not have the template (and is >2x slower).

Fixes #9794. But I must admit that we currently get a bigger library with this PR because code gets pre-compiled for all types of VectorizedArray.

In a follow-up PR, we should clean up the internal interfaces because it is ugly to pass around that many arguments, especially in the gather_evaluate and integrate_scatter calls.



for (deal_II_dimension : DIMENSIONS;
deal_II_scalar_vectorized : REAL_SCALARS_VECTORIZED)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the reason why it gets expensive - we need to compile it for all VectorizedArray types we have. We could again think of a shortcut to only use fast paths for some degrees, but I fear that we would need to be careful with describing this for the user to not give surprises when performance differs by a factor of 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with your suggestion. I guess I am one of the few who actually uses the VectorizedArrayType template argument.

Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me once the tests are happy.

In a follow-up PR, we should clean up the internal interfaces because it is ugly to pass around that many arguments, especially in the gather_evaluate and integrate_scatter calls.

Indeed, we need to do something here!

Comment on lines +7324 to +7425
if (fe_degree > -1)
SelectEvaluator<dim, fe_degree, n_q_points_1d, VectorizedArrayType>::
evaluate(n_components,
evaluation_flags,
*this->data,
const_cast<VectorizedArrayType *>(values_array),
this->values_quad,
this->gradients_quad,
this->hessians_quad,
this->scratch_data);
else
internal::FEEvaluationFactory<dim, Number, VectorizedArrayType>::evaluate(
n_components,
evaluation_flags,
*this->data,
const_cast<VectorizedArrayType *>(values_array),
this->values_quad,
this->gradients_quad,
this->hessians_quad,
this->scratch_data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really nice. Duplication of the same arguments. The same in the following three functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but let us solve this with #10928 because that would reduce the duplication to 3 (?) arguments, at which point duplication is very likely the more readable version compared to adding a layer to collect the two directions (e.g. lambda).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree!



for (deal_II_dimension : DIMENSIONS;
deal_II_scalar_vectorized : REAL_SCALARS_VECTORIZED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with your suggestion. I guess I am one of the few who actually uses the VectorizedArrayType template argument.

Comment on lines +36 to +81
template bool dealii::internal::FEFaceEvaluationFactory<
deal_II_dimension,
deal_II_scalar_vectorized::value_type,
deal_II_scalar_vectorized>::
gather_evaluate<1>(
const unsigned int,
const deal_II_scalar_vectorized::value_type *,
const MatrixFreeFunctions::ShapeInfo<deal_II_scalar_vectorized> &,
const MatrixFreeFunctions::DoFInfo &,
deal_II_scalar_vectorized *,
deal_II_scalar_vectorized *,
deal_II_scalar_vectorized *,
const bool,
const bool,
const unsigned int,
const unsigned int,
const std::array<unsigned int, 1>,
const std::array<unsigned int, 1>,
const unsigned int,
const MatrixFreeFunctions::DoFInfo::DoFAccessIndex,
const std::array<unsigned int, 1>,
const Table<2, unsigned int> &);

template bool dealii::internal::FEFaceEvaluationFactory<
deal_II_dimension,
deal_II_scalar_vectorized::value_type,
deal_II_scalar_vectorized>::
integrate_scatter<1>(
const unsigned int,
deal_II_scalar_vectorized::value_type *,
const MatrixFreeFunctions::ShapeInfo<deal_II_scalar_vectorized> &,
const MatrixFreeFunctions::DoFInfo &,
deal_II_scalar_vectorized *,
deal_II_scalar_vectorized *,
deal_II_scalar_vectorized *,
deal_II_scalar_vectorized *,
const bool,
const bool,
const unsigned int,
const unsigned int,
const std::array<unsigned int, 1>,
const std::array<unsigned int, 1>,
const unsigned int,
const MatrixFreeFunctions::DoFInfo::DoFAccessIndex,
const std::array<unsigned int, 1>,
const Table<2, unsigned int> &);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is ugly^^

@peterrum
Copy link
Member

By the way, I think this is a change worth a change-log entry, isn't it?

@kronbichler
Copy link
Member Author

I added two new tests for the features that are new, and switched some existing test cases to use to -1 template path. Should be ready from my point of view.

Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I would probably merge tomorrow if there are no objections.

@tjhei
Copy link
Member

tjhei commented Sep 21, 2020

I don't understand the internal changes of this PR, but I feel that it would be great to have a paragraph about this feature in the matrix-free documentation somewhere (especially how FE_EVAL_FACTORY_DEGREE_MAX works).

Is there a way for the user to know if the current setup is in the "slow path"?

@kronbichler
Copy link
Member Author

I don't understand the internal changes of this PR, but I feel that it would be great to have a paragraph about this feature in the matrix-free documentation somewhere (especially how FE_EVAL_FACTORY_DEGREE_MAX works).

Very good point. I added documentation for the parameter in FEEvaluation.

Is there a way for the user to know if the current setup is in the "slow path"?

I guess we should provide a library function which prints the information to a stream given by the user. Let me try that in a follow-up PR - I open an issue for now to recall this and possibly extend the documentation.

* templates), deal.II also supports usage of this class based on the
* information in the element passed to the initialization. For this usage
* model, set the template parameter for the polynomial degree to -1 and
* choose an arbitrary number for the number of quadrature points. That code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arbitrary number for the number of quadrature points

The value is not actually used, is it? I guess we cannot force it to be -1 due to backwards compatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not used, but also it does not matter I believe because we replace the value here:

return EvaluatorType::template run<-1, 0>(args...);

If I remember exactly we also did not say explicitly that one needed the combination -1,0 before, so I would think it does not matter. Or was there anything particular you had in mind that works in one case but not the other?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. No, I think it might give somehow to a user the wrong impression if n_quadrature_points is not forced to -1 and he/she sets the values to >0 and gets the wrong (not expected) result.

But this is not that important for this PR. I am happy with the PR. I let @tjhei merge!

@tjhei
Copy link
Member

tjhei commented Sep 22, 2020

I guess we should provide a library function which prints the information to a stream given by the user. Let me try that in a follow-up PR - I open an issue for now to recall this and possibly extend the documentation.

Personally, I would want this to be on by default on anything I work on: if I use matrix-free, I care about performance. A warning message when starting is not something that would bother me. I guess we don't have this in any other place in the code, though.

@tjhei tjhei merged commit bed997f into dealii:master Sep 22, 2020
@peterrum
Copy link
Member

@kronbichler I guess I am speaking on behalf of every MatrixFree user who has to deal with many FEEvaluation instances with different degrees and different number of quadrature points: this is a great addition (and not a minor one - although no one actually sees it)! Thank you for the lot of work of the last weeks to prepare this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pre-compiled polynomial degrees for FEEvaluation
3 participants