Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not store IndexSet of all ranks in DoFHandler's NumberCache #8067

Closed
kronbichler opened this issue May 10, 2019 · 2 comments · Fixed by #8298
Closed

Do not store IndexSet of all ranks in DoFHandler's NumberCache #8067

kronbichler opened this issue May 10, 2019 · 2 comments · Fixed by #8298

Comments

@kronbichler
Copy link
Member

As revealed in #8065, the computation of all IndexSet held in internal::DoFHandlerImplementation::NumberCache is pretty expensive as soon as more than 100k MPI ranks are involved due to the variable locally_owned_dofs_per_processor both in terms of setup times and partly also memory consumption. To give an example, a vector of simple IndexSet with a single range is worth 81 bytes (as indicated by IndexSet::memory_consumption(). In case of 64 bit integers this takes likely at least 112 bytes in memory, because a vector is 16 byte aligned and we have 5 64-bit integers.

This raises the more general question whether it really is a good idea to hold such a big data structure, containing information about all processors, at all. Of course, we have it in our API and I guess a few things rely on part of the information, but I would think that for all reasonable operations keeping e.g. an std::map<uint,IndexSet> would be enough, with index sets of all processors which contribute locally. What do others think?

@bangerth
Copy link
Member

That would seem reasonable. Have you audited where these objects are actually used?

@kronbichler
Copy link
Member Author

I did a short audit of the uses of locally_owned_dofs_per_processor and the related locally_owned_mg_dofs_per_processor that I would like to remove. Here are the use cases:

  • SparsityTools::distribute_sparsity_pattern: We use it to find out about the owner of certain rows. We could do this involving some communication also if just to locally owned dofs of the owner are given. This is the two-phase index lookup Trilinos is using. I'd need to look up how to implement that.
  • AffineConstraints::is_consistent_in_parallel: This is a debug function and expensive in the number of cores anyway. Nothing prevents us from simply populating a data structure that collects all index sets inside this function. There are other more expensive calls.
  • DoFRenumbering::hierarchical: We simply use it to compute some offset.

The remaining uses are within dofs/number_cache.cc and dofs/dof_handler_policy.cc, so I guess in the implementation.

My strategy forward would be to:

  • Make the function locally_owned_{mg_}dofs_per_processor and n_locally_owned_dofs_per_processor return a vector by value rather than reference and compute it at the point of calling. Say in the documentation that this is expensive and involves global communication. Maybe mark as deprecated.
  • We often use this data structure or MPI_Allgather to compute some index offset of the locally owned range. We should introduce a new function types::global_dof_index compute_starting_index(const types::global_dof_index my_size, MPI_Comm comm); that does this. One could implement a simple strategy of O(log(n_proc)) deep local communications. Or there maybe exists something like that. I have to check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants