New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NonMatching::MappingInfo: precompute mapping data for an IteratorRange of cells/faces #14809
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great, thank you for the effort. My remarks mostly concern documentation and one interface change I am not happy with.
unit_points = | ||
mapping_info->get_unit_points(current_cell_index, current_face_number); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this allocate memory in the operator=
in case the old size of unit_points
is already correct? Probably not, but it would be good to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I check this? We could get rid of unit_points
in FEPointEvaluation
entirely and always ask MappingInfo
(maybe in a follow-up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we could move all code into MappingInfo
(also the case where the user never sets up such an object explicitly, e.g. in particle codes that want to evaluate information once), that would be great. Regarding how to check it: I use a compiler/trace tool and monitor the allocations, first using toy codes, to see how the compiler reacts. Here I use this one:
#include <vector>
#include <iostream>
int main()
{
std::vector<unsigned int> data(5, 3);
std::vector<unsigned int> second(3, 6);
for (unsigned int i=0; i<10; ++i)
data = second;
std::cout << data.size() << " " << data.capacity() << " " << second.size() << " " << second.capacity() << std::endl;
}
Using valgrind as valgrind ./a.out
to check for memory errors, I now see
==269622== total heap usage: 4 allocs, 4 frees, 73,760 bytes allocated
This is less than 10 repetitions, so it can't allocate memory inside the loop (one might wonder if compilers might optimize it away, as it does not change the result, but at least gcc does not here). The reason for the loop is that I see more than 2 allocs, which I would expect for the two vectors - gcc counts the full program here. The cause is that std::cout <<
does another one to put the data to the output stream, and the last one is in the global program init
outside main (it is done inside call_init.part.0
and part of the program startup, outside the control of my program).
The output is also revealing, which is why I included it:
3 5 3 3
You see from the first two numbers that the vector data
kept its original capacity of 5, even though the length was changed to 3 after assignment from second
. This is a strong indication that it kept its memory region. You could output the address of &data[0]
to ensure it kept the same. You can also look at the source code of stl_vector.h
to see this (for me, the file is /usr/include/c++/12/bits/stl_vector.h
, but the implementation is in )/usr/include/c++/12/bits/vector.tcc
:
template<typename _Tp, typename _Alloc>
_GLIBCXX20_CONSTEXPR
vector<_Tp, _Alloc>&
vector<_Tp, _Alloc>::
operator=(const vector<_Tp, _Alloc>& __x)
{
...
const size_type __xlen = __x.size();
...
else if (size() >= __xlen)
{
std::_Destroy(std::copy(__x.begin(), __x.end(), begin()),
end(), _M_get_Tp_allocator());
}
...
From this, you see that we only call std::copy
(and destroy the elements that now got invalid due to reducing the size), but we did not deallocate/reallocate.
* Reinitialize the mapping information for the incoming vector of cells and | ||
* corresponding vector of quadratures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain n_unfilted_cells
(also above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an explanation for n_unfilted_cells
. Do you think an example code would help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a short code example would be helpful.
unit_points_index.resize(n_cells + 1); | ||
unsigned int cell_index = 0; | ||
unsigned int n_unit_points = 0; | ||
for (const auto &cell : cell_iterator_range) | ||
{ | ||
unit_points_index[cell_index] = n_unit_points; | ||
n_unit_points += unit_points_vector[cell_index].size(); | ||
|
||
++cell_index; | ||
} | ||
unit_points_index[n_cells] = n_unit_points; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, I would prefer to loop over unit_points_vector
instead. Maybe like this, if you would like to keep the range-based for loop:
unit_points_index.resize(n_cells + 1); | |
unsigned int cell_index = 0; | |
unsigned int n_unit_points = 0; | |
for (const auto &cell : cell_iterator_range) | |
{ | |
unit_points_index[cell_index] = n_unit_points; | |
n_unit_points += unit_points_vector[cell_index].size(); | |
++cell_index; | |
} | |
unit_points_index[n_cells] = n_unit_points; | |
unit_points_index.reserve(n_cells + 1); | |
unit_points_index.resize(1); | |
unit_points_index[0] = 0; | |
for (const auto &unit_points : unit_points_vector) | |
unit_points_index.push_back(unit_points_index.back() + unit_points.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replaced it with code similar to your suggestion for the cellwise reinit_*
functions. For the faces I could not think of a better way because I need access to cell->face_indices()
through the CellIterator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree for the faces.
template <int dim, int spacedim> | ||
template <typename Iterator> | ||
void | ||
MappingInfo<dim, spacedim>::reinit_surface( | ||
const IteratorRange<Iterator> & cell_iterator_range, | ||
const std::vector<ImmersedSurfaceQuadrature<dim>> &quadrature_vector, | ||
const unsigned int n_unfiltered_cells) | ||
{ | ||
do_cell_index_compression = | ||
n_unfiltered_cells != numbers::invalid_unsigned_int; | ||
|
||
if (update_flags_mapping & (update_JxW_values | update_normal_vectors)) | ||
update_flags_mapping |= update_covariant_transformation; | ||
|
||
const unsigned int n_cells = | ||
std::distance(cell_iterator_range.begin(), cell_iterator_range.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this and the following two functions are pretty similar, I wonder whether you see a way to implement the algorithm only once and use some unifying calls (e.g. transformation to arguments needed by other function) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two reinit_faces
are pretty similar indeed. I would suggest removing the reinit_faces
function that only takes vectors of unit points on faces. I do not use this function anymore since I implemented the function that takes quadratures and it is also not tested by the test in this PR. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest removing the reinit_faces function that only takes vectors of unit points on faces. I do not use this function anymore since I implemented the function that takes quadratures and it is also not tested by the test in this PR. What do you think?
I agree, that would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
|
||
|
||
template <int dim, int spacedim> | ||
const std::vector<Point<dim>> & | ||
MappingInfo<dim, spacedim>::get_unit_points() const | ||
const std::vector<Point<dim>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed this in person already, but I am not happy about this code change because we might create (many) copies as a return value. I think it would be better if we could return an ArrayView<Point<dim>>
here. It breaks the interface, but so does the change you do, but I think the break is the least intrusive way while still ensuring good performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I see some slight decay in the performance of FEPointEvaluation
comparing this branch with master (even after this change). Do you think this is the reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly yes. Memory allocations of many small arrays in inner loops are one of worst performance sinks. As said in the other post, we do not necessarily allocate memory in the assignment to a vector, but as far as I understand the code I think it needs to happen here, because we first must create a temporary std::vector
object to hold the temporary data. Return-value optimization, which is an often-said argument why pass-by-value should be OK, is likely not possible because we do not simply return a variable stored internally. Thus, I suggest we change this here and see how many places it affects in the tests of deal.II, which might give us an indication of breakage of user codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. In a follow-up could also try to reuse mapping across cells. Most of the infrastructure for that we should have in place!
@@ -883,7 +953,7 @@ FEPointEvaluation<n_components, dim, spacedim, Number>::evaluate( | |||
numbers::signaling_nan<gradient_type>()); | |||
} | |||
|
|||
if (unit_points.empty()) | |||
if (unit_points.size() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this show it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @peterrum - just an interesting remark, compiler people have been working on detecting size() == 0
and turn it into something like empty()
for performance reasons. To give insights on why there might be a difference (apart from code reading arguments, which are also there): The latter checks where the addresses of begin()
and end()
are the same (one comparison, plus possibly loads on begin()
and end()
), whereas size()
first computes temp = end() - begin()
, and then we compare temp == 0
, so two instructions instead of one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this because ArrayView
does not provide an empty()
function. If we decide to keep the local vector copy I will revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding empty()
to ArrayView
. We can do that in a follow-up PR!
@@ -57,27 +62,125 @@ namespace NonMatching | |||
MappingInfo(const Mapping<dim> &mapping, const UpdateFlags update_flags); | |||
|
|||
/** | |||
* Reinitialize the mapping information for the incoming cell and unit | |||
* Compute the mapping information for the incoming cell and unit | |||
* points. (needed to resolve ambiguity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you convert this to a sentence?
*/ | ||
enum class State | ||
{ | ||
none, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none, | |
invalid, |
{ | ||
Assert(state == State::single_cell, | ||
ExcMessage( | ||
"This mapping info is not reinitialized for a single cell")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This mapping info is not reinitialized for a single cell")); | |
"This mapping info is not reinitialized for a single cell!")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I have added it to every Assert in the two getter functions.
return mapping_data[cell_index_offset[cell_index] + face_number]; | ||
} | ||
else | ||
AssertThrow( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can convert most AssertThrow
instances to Assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I change this I have to return something for this invalid case.
* It is possible to give an IteratorRange<FilteredIterator> to this | ||
* function and together with @p n_unfiltered_cells specified this object | ||
* can compress its storage while giving you access to the underlying data | ||
* with the "unfiltered" index. The default argument for | ||
* @p n_unfiltered_cells disables this built-in compression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get how this should work...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the test to show how to use this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are on a good path. I have some additional comments. Next to @peterrum's comments, I think we can get close to have the first version in.
/** | ||
* Number of quadrature points of the current cell/face. | ||
*/ | ||
unsigned int n_q_points; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this variable const
and use const_cast<unsigned int&>(n_q_points) = ...
in lines 896 and similar below? I would really like this variable to be seen as const
to the outside, in analogy to
dealii/include/deal.II/fe/fe_values.h
Line 2433 in 7331b14
const unsigned int n_quadrature_points; |
const unsigned int n_q_points; |
Since all these calls happen inside a non-static member function of the class, I do not think we can trigger undefined behavior when a compiler would completely eliminate the
const
variable without having a storage location for it (in which case static_cast<unsigned int&>
would fail with a compile error).
unsigned int current_cell_index = numbers::invalid_unsigned_int; | ||
unsigned int current_face_number = numbers::invalid_unsigned_int; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not assign a value to member variables per the coding conventions. Please do it in the constructor instead. While making the change, please add a one-line comment for doxygen.
@@ -45,6 +46,10 @@ namespace NonMatching | |||
class MappingInfo : public Subscriptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember what exactly we discussed here, but I am a bit concerned about having this in the user-visible namespace, as MappingInfo
does not tell very much. I used it inside an internal namespace of MatrixFree
because I did not want to spell it out as MappingRelatedData
or maybe even find a different name, to avoid confusion with
dealii/include/deal.II/fe/mapping_related_data.h
Lines 39 to 52 in 7331b14
/** | |
* A class that stores all of the mapping related data used in | |
* dealii::FEValues, dealii::FEFaceValues, and dealii::FESubfaceValues | |
* objects. Objects of this kind will be given as <i>output</i> argument | |
* when dealii::FEValues::reinit() calls Mapping::fill_fe_values() for a | |
* given cell, face, or subface. | |
* | |
* The data herein will then be provided as <i>input</i> argument in the | |
* following call to FiniteElement::fill_fe_values(). | |
* | |
* @ingroup feaccess | |
*/ | |
template <int dim, int spacedim = dim> | |
class MappingRelatedData |
In any case, I see the desire to have the same name as
dealii/include/deal.II/matrix_free/mapping_info.h
Lines 48 to 55 in 7331b14
/** | |
* The class that stores all geometry-dependent data related with cell | |
* interiors for use in the matrix-free class. | |
* | |
* @ingroup matrixfree | |
*/ | |
template <int dim, typename Number, typename VectorizedArrayType> | |
struct MappingInfo |
MatrixFreeFunctions::MappingInfo
is also not good, but we might extend there as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the current naming is not optimal but we should discuss where to put this separately from this PR. This is also connected to the question in which namespace and folder FEPointEvaluation
belongs.
/** | ||
* The reference points specified at reinit(). | ||
*/ | ||
std::vector<Point<dim>> unit_points; | ||
ArrayView<const Point<dim>> unit_points; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this here to ArrayView
to avoid the copy of unit points. I can also address this in a follow-up, when we get rid of this class member as proposed in https://github.com/dealii/dealii/pull/14809/files#r1125725149 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good now, apart from the topics we need to address in a follow-up patch. Could you please squash your commits?
/rebuild |
…ctor of cells/faces
Squashed and ready from my side. Any final comments @peterrum ? |
@@ -883,7 +953,7 @@ FEPointEvaluation<n_components, dim, spacedim, Number>::evaluate( | |||
numbers::signaling_nan<gradient_type>()); | |||
} | |||
|
|||
if (unit_points.empty()) | |||
if (unit_points.size() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding empty()
to ArrayView
. We can do that in a follow-up PR!
std::vector<Point<dim>>(unit_points.begin(), unit_points.end()); | ||
this->unit_points = unit_points; | ||
|
||
const_cast<unsigned int &>(n_q_points) = unit_points.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why you do this. I do the same in FEFaceValues
to support mixed meshes. The reason for me to do that there was because n_q_points
was already a const and public member variable. I was not happy with the solution there, since it is very error prone on user side if one should decide to store n_q_points
in a variable. I would much more prefer if we could either introduce a getter function or, if we just need this to iterate over quadrature points, a function like quadrature_point_indices()
in FEEvaluation
(https://www.dealii.org/developer/doxygen/deal.II/classFEEvaluationData.html#a0eca800437c07c9f23ec6815e58f1432).
This PR extends the functionality of
NonMatching::MappingInfo
to precompute the mapping for a (filtered) IteratorRange of cells (and their faces).It also adds access to precomputed JxW and normal vectors in case
MappingInfo
was constructed and reinitialized with appropriate Quadrature and UpdateFlags.This enables writing
FEPointEvaluation
routines like this:which makes it very similar to
FEEvaluation
.