Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up DoFTools::extract_dofs #10883

Merged
merged 1 commit into from
Sep 3, 2020
Merged

Conversation

kronbichler
Copy link
Member

As reported on the mainling list, the function got really slow. It turned out that the function called IndexSet::add_index() for every index, which leads to frequent memory allocations and movements of indices that can lead to a quadratic complexity in the number of indices. This function was introduced in #9182. Luckily, we have a good implementation of that part almost there by an internal function, so we only need to go there.

Copy link
Member

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not build a std::set first (see suggestion in #10882) so that you don't have to do the local-to-global translation twice?

// get the component association of each DoF and then select the ones
// that match the given set of components
std::vector<unsigned char> dofs_by_component(dof.n_locally_owned_dofs());
internal::get_component_association(dof, component_mask, dofs_by_component);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has to translate local to global indices in the index set.

selected_dofs.reserve(dof.n_locally_owned_dofs());
for (types::global_dof_index i = 0; i < dofs_by_component.size(); ++i)
if (component_mask[dofs_by_component[i]] == true)
selected_dofs.push_back(dof.locally_owned_dofs().nth_index_in_set(i));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here is the second time.

@kronbichler
Copy link
Member Author

Why not build a std::set first (see suggestion in #10882) so that you don't have to do the local-to-global translation twice?

I am aware of the translations, but my reasoning is that std::set is a slow data structure, so I'd rather avoid it. The alternative would have been to fill a vector of indices on all elements and then sort it. But when I sort things I possibly also get super-linear work (even though only a factor log). Let me run it through my timer.

@kronbichler
Copy link
Member Author

kronbichler commented Sep 3, 2020

Here are the numbers for the approach used in this PR:

2d
Number of indices: 16641 in 0.0030259 seconds 
Number of indices: 66049 in 0.0117698 seconds 
Number of indices: 263169 in 0.0484559 seconds 

3d
Number of indices: 35937 in 0.0098578 seconds 
Number of indices: 274625 in 0.0797143 seconds 
Number of indices: 2146689 in 0.59407 seconds 

Alternative directly collecting indices:

2d
Number of indices: 16641 in 0.00579847 seconds 
Number of indices: 66049 in 0.0168465 seconds 
Number of indices: 263169 in 0.067628 seconds 

3d
Number of indices: 35937 in 0.0170539 seconds 
Number of indices: 274625 in 0.134728 seconds 
Number of indices: 2146689 in 1.14217 seconds 

My benchmark runs in serial on a hypercube. In general, the index translation is typically cheap because we most often have a contiguous locally owned range. In case we do not have those, the translation might be more expensive. But we pay it at least once anyway for the check dof.locally_owned_dofs().is_element(local_dof_indices[i]). Well, in the latter implementation we could move the check past the local_selected_dof check which is the cheaper part (at least if we disregard the advanced stuff like branch prediction accuracy in either case). Unless we expect cases where we extract only a tiny fraction, the proposed function should be best.

@kronbichler
Copy link
Member Author

Here is the alternative implementation:

    const auto &                   fe_collection = dof.get_fe_collection();
    std::vector<std::vector<bool>> local_selected_dofs;
    Assert(fe_collection.n_components() < 256, ExcNotImplemented());
    for (unsigned int f = 0; f < fe_collection.size(); ++f)
      {
        // First get the component for each shape function
        const std::vector<unsigned char> local_component_association =
          internal::get_local_component_association(fe_collection[f],
                                                    component_mask);

        // Check which dofs were selected
        std::vector<bool> this_selected_dofs(
          fe_collection[f].n_dofs_per_cell());
        for (unsigned int i = 0; i < fe_collection[f].n_dofs_per_cell(); ++i)
          this_selected_dofs[i] =
            component_mask[local_component_association[i]];

        local_selected_dofs.emplace_back(std::move(this_selected_dofs));
      }

    // Then loop over all cells and do the work
    std::vector<types::global_dof_index> selected_dofs;
    selected_dofs.reserve(dof.n_locally_owned_dofs());
    std::vector<types::global_dof_index> local_dof_indices;
    for (const auto &c : dof.active_cell_iterators())
      if (c->is_locally_owned())
        {
          const unsigned int fe_index      = c->active_fe_index();
          const unsigned int dofs_per_cell = c->get_fe().n_dofs_per_cell();
          local_dof_indices.resize(dofs_per_cell);
          c->get_dof_indices(local_dof_indices);
          for (unsigned int i = 0; i < dofs_per_cell; ++i)
            if (local_selected_dofs[fe_index][i] &&
                dof.locally_owned_dofs().is_element(local_dof_indices[i]))
              selected_dofs.push_back(local_dof_indices[i]);
        }
    std::sort(selected_dofs.begin(), selected_dofs.end());
    IndexSet result(dof.n_locally_owned_dofs());
    result.add_indices(selected_dofs.begin(),
                       std::unique(selected_dofs.begin(),
                                   selected_dofs.end()));
    return result;

@kronbichler
Copy link
Member Author

If we really insisted on making things faster we could speed up the get_component_association function by hard-coding the case of a single index (not sure it is worth it) or teach the index set a function that combines is_element() and index_within_set in a single check/call, with types::invalid_dof_index returned for those indices not in the set.

@bangerth
Copy link
Member

bangerth commented Sep 3, 2020

Ah, thanks for actually timing this -- not what I thought would come out of this, but good to learn!

@bangerth bangerth merged commit d76df9c into dealii:master Sep 3, 2020
@kronbichler kronbichler deleted the fix_extract_dofs branch October 1, 2020 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants