-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up DoFTools::extract_dofs #10883
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not build a std::set
first (see suggestion in #10882) so that you don't have to do the local-to-global translation twice?
// get the component association of each DoF and then select the ones | ||
// that match the given set of components | ||
std::vector<unsigned char> dofs_by_component(dof.n_locally_owned_dofs()); | ||
internal::get_component_association(dof, component_mask, dofs_by_component); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has to translate local to global indices in the index set.
selected_dofs.reserve(dof.n_locally_owned_dofs()); | ||
for (types::global_dof_index i = 0; i < dofs_by_component.size(); ++i) | ||
if (component_mask[dofs_by_component[i]] == true) | ||
selected_dofs.push_back(dof.locally_owned_dofs().nth_index_in_set(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here is the second time.
I am aware of the translations, but my reasoning is that |
Here are the numbers for the approach used in this PR:
Alternative directly collecting indices:
My benchmark runs in serial on a hypercube. In general, the index translation is typically cheap because we most often have a contiguous locally owned range. In case we do not have those, the translation might be more expensive. But we pay it at least once anyway for the check |
Here is the alternative implementation: const auto & fe_collection = dof.get_fe_collection();
std::vector<std::vector<bool>> local_selected_dofs;
Assert(fe_collection.n_components() < 256, ExcNotImplemented());
for (unsigned int f = 0; f < fe_collection.size(); ++f)
{
// First get the component for each shape function
const std::vector<unsigned char> local_component_association =
internal::get_local_component_association(fe_collection[f],
component_mask);
// Check which dofs were selected
std::vector<bool> this_selected_dofs(
fe_collection[f].n_dofs_per_cell());
for (unsigned int i = 0; i < fe_collection[f].n_dofs_per_cell(); ++i)
this_selected_dofs[i] =
component_mask[local_component_association[i]];
local_selected_dofs.emplace_back(std::move(this_selected_dofs));
}
// Then loop over all cells and do the work
std::vector<types::global_dof_index> selected_dofs;
selected_dofs.reserve(dof.n_locally_owned_dofs());
std::vector<types::global_dof_index> local_dof_indices;
for (const auto &c : dof.active_cell_iterators())
if (c->is_locally_owned())
{
const unsigned int fe_index = c->active_fe_index();
const unsigned int dofs_per_cell = c->get_fe().n_dofs_per_cell();
local_dof_indices.resize(dofs_per_cell);
c->get_dof_indices(local_dof_indices);
for (unsigned int i = 0; i < dofs_per_cell; ++i)
if (local_selected_dofs[fe_index][i] &&
dof.locally_owned_dofs().is_element(local_dof_indices[i]))
selected_dofs.push_back(local_dof_indices[i]);
}
std::sort(selected_dofs.begin(), selected_dofs.end());
IndexSet result(dof.n_locally_owned_dofs());
result.add_indices(selected_dofs.begin(),
std::unique(selected_dofs.begin(),
selected_dofs.end()));
return result;
|
If we really insisted on making things faster we could speed up the |
Ah, thanks for actually timing this -- not what I thought would come out of this, but good to learn! |
As reported on the mainling list, the function got really slow. It turned out that the function called IndexSet::add_index() for every index, which leads to frequent memory allocations and movements of indices that can lead to a quadratic complexity in the number of indices. This function was introduced in #9182. Luckily, we have a good implementation of that part almost there by an internal function, so we only need to go there.