Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve IndexSet::add_indices() for sorted ranges with duplicates #15400

Closed
bangerth opened this issue Jun 20, 2023 · 3 comments
Closed

Improve IndexSet::add_indices() for sorted ranges with duplicates #15400

bangerth opened this issue Jun 20, 2023 · 3 comments

Comments

@bangerth
Copy link
Member

#15366 illustrates how expensive it can be to insert indices one by one, but I think we can make the process even cheaper. The current best paradigm is to use the following idiom:

  std::sort(additional_elements.begin(), additional_elements.end());
  additional_elements.erase(std::unique(additional_elements.begin(),
                                            additional_elements.end()),
                                additional_elements.end());

  needed_elements.add_indices(additional_elements.begin(),
                                  additional_elements.end());

It seems unnecessary to call the unique()/erase() function combination in the middle: IndexSet::add_indices() should simply be able to deal with duplicates, but I believe that currently it doesn't.

@bangerth bangerth added this to the Developer workshop 2023 milestone Jun 20, 2023
@bangerth
Copy link
Member Author

bangerth commented Jun 20, 2023

Specifically, this is how the function looks like:

template <typename ForwardIterator>
inline void
IndexSet::add_indices(const ForwardIterator &begin, const ForwardIterator &end)
{
  if (begin == end)
    return;

  // identify ranges in the given iterator range by checking whether some
  // indices happen to be consecutive. to avoid quadratic complexity when
  // calling add_range many times (as add_range() going into the middle of an
  // already existing range must shift entries around), we first collect a
  // vector of ranges.
  boost::container::small_vector<std::pair<size_type, size_type>, 200>
       tmp_ranges;
  bool ranges_are_sorted = true;
  for (ForwardIterator p = begin; p != end;)
    {
      // Starting with the current iterator 'p', find an iterator
      // 'q' so that the indices pointed to by the iterators in
      // the range [p,q) are consecutive. These indices then form
      // a range that is contiguous, and that can be added all
      // at once.
      const size_type begin_index = *p;
      size_type       end_index   = begin_index + 1;
      ForwardIterator q           = p;
      ++q;
      while ((q != end) && (static_cast<size_type>(*q) == end_index))             // This will fail for duplicate elements
        {
          ++end_index;
          ++q;
        }

      // Add this range:
      tmp_ranges.emplace_back(begin_index, end_index);

      // Then move on to the next element in the input range.
      // If the starting index of the next go-around of the for loop is less
      // than the end index of the one just identified, then we will have at
      // least one pair of ranges that are not sorted, and consequently the
      // whole collection of ranges is not sorted.
      p = q;
      if (p != end && static_cast<size_type>(*p) < end_index)
        ranges_are_sorted = false;
    }

  add_ranges_internal(tmp_ranges, ranges_are_sorted);
}

@kronbichler
Copy link
Member

@bangerth we have #15402 merged, is there more you plan to do now with this issue?

@bangerth
Copy link
Member Author

No. I thought that merging sets of ranges in add_ranges_internal() could be improved, but after spending 30 minutes looking at that function a couple of days ago, I believe that it's about as good as it gets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants