Speed up DoFTools::extract_dofs #10883

kronbichler · 2020-09-03T11:54:34Z

As reported on the mainling list, the function got really slow. It turned out that the function called IndexSet::add_index() for every index, which leads to frequent memory allocations and movements of indices that can lead to a quadratic complexity in the number of indices. This function was introduced in #9182. Luckily, we have a good implementation of that part almost there by an internal function, so we only need to go there.

bangerth

Why not build a std::set first (see suggestion in #10882) so that you don't have to do the local-to-global translation twice?

bangerth · 2020-09-03T11:58:36Z

source/dofs/dof_tools.cc

+    // get the component association of each DoF and then select the ones
+    // that match the given set of components
+    std::vector<unsigned char> dofs_by_component(dof.n_locally_owned_dofs());
+    internal::get_component_association(dof, component_mask, dofs_by_component);


This function has to translate local to global indices in the index set.

bangerth · 2020-09-03T11:58:45Z

source/dofs/dof_tools.cc

+    selected_dofs.reserve(dof.n_locally_owned_dofs());
+    for (types::global_dof_index i = 0; i < dofs_by_component.size(); ++i)
+      if (component_mask[dofs_by_component[i]] == true)
+        selected_dofs.push_back(dof.locally_owned_dofs().nth_index_in_set(i));


and here is the second time.

kronbichler · 2020-09-03T12:03:22Z

Why not build a std::set first (see suggestion in #10882) so that you don't have to do the local-to-global translation twice?

I am aware of the translations, but my reasoning is that std::set is a slow data structure, so I'd rather avoid it. The alternative would have been to fill a vector of indices on all elements and then sort it. But when I sort things I possibly also get super-linear work (even though only a factor log). Let me run it through my timer.

kronbichler · 2020-09-03T12:17:05Z

Here are the numbers for the approach used in this PR:

2d
Number of indices: 16641 in 0.0030259 seconds 
Number of indices: 66049 in 0.0117698 seconds 
Number of indices: 263169 in 0.0484559 seconds 

3d
Number of indices: 35937 in 0.0098578 seconds 
Number of indices: 274625 in 0.0797143 seconds 
Number of indices: 2146689 in 0.59407 seconds

Alternative directly collecting indices:

2d
Number of indices: 16641 in 0.00579847 seconds 
Number of indices: 66049 in 0.0168465 seconds 
Number of indices: 263169 in 0.067628 seconds 

3d
Number of indices: 35937 in 0.0170539 seconds 
Number of indices: 274625 in 0.134728 seconds 
Number of indices: 2146689 in 1.14217 seconds

My benchmark runs in serial on a hypercube. In general, the index translation is typically cheap because we most often have a contiguous locally owned range. In case we do not have those, the translation might be more expensive. But we pay it at least once anyway for the check dof.locally_owned_dofs().is_element(local_dof_indices[i]). Well, in the latter implementation we could move the check past the local_selected_dof check which is the cheaper part (at least if we disregard the advanced stuff like branch prediction accuracy in either case). Unless we expect cases where we extract only a tiny fraction, the proposed function should be best.

kronbichler · 2020-09-03T12:18:19Z

Here is the alternative implementation:

    const auto &                   fe_collection = dof.get_fe_collection();
    std::vector<std::vector<bool>> local_selected_dofs;
    Assert(fe_collection.n_components() < 256, ExcNotImplemented());
    for (unsigned int f = 0; f < fe_collection.size(); ++f)
      {
        // First get the component for each shape function
        const std::vector<unsigned char> local_component_association =
          internal::get_local_component_association(fe_collection[f],
                                                    component_mask);

        // Check which dofs were selected
        std::vector<bool> this_selected_dofs(
          fe_collection[f].n_dofs_per_cell());
        for (unsigned int i = 0; i < fe_collection[f].n_dofs_per_cell(); ++i)
          this_selected_dofs[i] =
            component_mask[local_component_association[i]];

        local_selected_dofs.emplace_back(std::move(this_selected_dofs));
      }

    // Then loop over all cells and do the work
    std::vector<types::global_dof_index> selected_dofs;
    selected_dofs.reserve(dof.n_locally_owned_dofs());
    std::vector<types::global_dof_index> local_dof_indices;
    for (const auto &c : dof.active_cell_iterators())
      if (c->is_locally_owned())
        {
          const unsigned int fe_index      = c->active_fe_index();
          const unsigned int dofs_per_cell = c->get_fe().n_dofs_per_cell();
          local_dof_indices.resize(dofs_per_cell);
          c->get_dof_indices(local_dof_indices);
          for (unsigned int i = 0; i < dofs_per_cell; ++i)
            if (local_selected_dofs[fe_index][i] &&
                dof.locally_owned_dofs().is_element(local_dof_indices[i]))
              selected_dofs.push_back(local_dof_indices[i]);
        }
    std::sort(selected_dofs.begin(), selected_dofs.end());
    IndexSet result(dof.n_locally_owned_dofs());
    result.add_indices(selected_dofs.begin(),
                       std::unique(selected_dofs.begin(),
                                   selected_dofs.end()));
    return result;

kronbichler · 2020-09-03T12:55:55Z

If we really insisted on making things faster we could speed up the get_component_association function by hard-coding the case of a single index (not sure it is worth it) or teach the index set a function that combines is_element() and index_within_set in a single check/call, with types::invalid_dof_index returned for those indices not in the set.

bangerth · 2020-09-03T19:05:35Z

Ah, thanks for actually timing this -- not what I thought would come out of this, but good to learn!

Speed up DoFTools::extract_dofs

2a4b865

kronbichler added DoFs ready to test labels Sep 3, 2020

bangerth reviewed Sep 3, 2020

View reviewed changes

bangerth approved these changes Sep 3, 2020

View reviewed changes

bangerth merged commit d76df9c into dealii:master Sep 3, 2020

kronbichler deleted the fix_extract_dofs branch October 1, 2020 11:21

kronbichler mentioned this pull request Jun 11, 2021

Make DoFTools::extract_dofs() faster #10882

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up DoFTools::extract_dofs #10883

Speed up DoFTools::extract_dofs #10883

kronbichler commented Sep 3, 2020

bangerth left a comment

bangerth Sep 3, 2020

bangerth Sep 3, 2020

kronbichler commented Sep 3, 2020

kronbichler commented Sep 3, 2020 •

edited

kronbichler commented Sep 3, 2020

kronbichler commented Sep 3, 2020

bangerth commented Sep 3, 2020

Speed up DoFTools::extract_dofs #10883

Speed up DoFTools::extract_dofs #10883

Conversation

kronbichler commented Sep 3, 2020

bangerth left a comment

Choose a reason for hiding this comment

bangerth Sep 3, 2020

Choose a reason for hiding this comment

bangerth Sep 3, 2020

Choose a reason for hiding this comment

kronbichler commented Sep 3, 2020

kronbichler commented Sep 3, 2020 • edited

kronbichler commented Sep 3, 2020

kronbichler commented Sep 3, 2020

bangerth commented Sep 3, 2020

kronbichler commented Sep 3, 2020 •

edited