You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As revealed in #8065, the computation of all IndexSet held in internal::DoFHandlerImplementation::NumberCache is pretty expensive as soon as more than 100k MPI ranks are involved due to the variable locally_owned_dofs_per_processor both in terms of setup times and partly also memory consumption. To give an example, a vector of simple IndexSet with a single range is worth 81 bytes (as indicated by IndexSet::memory_consumption(). In case of 64 bit integers this takes likely at least 112 bytes in memory, because a vector is 16 byte aligned and we have 5 64-bit integers.
This raises the more general question whether it really is a good idea to hold such a big data structure, containing information about all processors, at all. Of course, we have it in our API and I guess a few things rely on part of the information, but I would think that for all reasonable operations keeping e.g. an std::map<uint,IndexSet> would be enough, with index sets of all processors which contribute locally. What do others think?
The text was updated successfully, but these errors were encountered:
I did a short audit of the uses of locally_owned_dofs_per_processor and the related locally_owned_mg_dofs_per_processor that I would like to remove. Here are the use cases:
SparsityTools::distribute_sparsity_pattern: We use it to find out about the owner of certain rows. We could do this involving some communication also if just to locally owned dofs of the owner are given. This is the two-phase index lookup Trilinos is using. I'd need to look up how to implement that.
AffineConstraints::is_consistent_in_parallel: This is a debug function and expensive in the number of cores anyway. Nothing prevents us from simply populating a data structure that collects all index sets inside this function. There are other more expensive calls.
DoFRenumbering::hierarchical: We simply use it to compute some offset.
The remaining uses are within dofs/number_cache.cc and dofs/dof_handler_policy.cc, so I guess in the implementation.
My strategy forward would be to:
Make the function locally_owned_{mg_}dofs_per_processor and n_locally_owned_dofs_per_processor return a vector by value rather than reference and compute it at the point of calling. Say in the documentation that this is expensive and involves global communication. Maybe mark as deprecated.
We often use this data structure or MPI_Allgather to compute some index offset of the locally owned range. We should introduce a new function types::global_dof_index compute_starting_index(const types::global_dof_index my_size, MPI_Comm comm); that does this. One could implement a simple strategy of O(log(n_proc)) deep local communications. Or there maybe exists something like that. I have to check.
As revealed in #8065, the computation of all
IndexSet
held ininternal::DoFHandlerImplementation::NumberCache
is pretty expensive as soon as more than 100k MPI ranks are involved due to the variablelocally_owned_dofs_per_processor
both in terms of setup times and partly also memory consumption. To give an example, a vector of simple IndexSet with a single range is worth 81 bytes (as indicated byIndexSet::memory_consumption()
. In case of 64 bit integers this takes likely at least 112 bytes in memory, because a vector is 16 byte aligned and we have 5 64-bit integers.This raises the more general question whether it really is a good idea to hold such a big data structure, containing information about all processors, at all. Of course, we have it in our API and I guess a few things rely on part of the information, but I would think that for all reasonable operations keeping e.g. an
std::map<uint,IndexSet>
would be enough, with index sets of all processors which contribute locally. What do others think?The text was updated successfully, but these errors were encountered: