Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use map::find rather than insert to avoid creating temporary #13884

Merged
merged 1 commit into from Jun 1, 2022

Conversation

kronbichler
Copy link
Member

When looking at our performance tests I realized we create unnecessary temporaries when checking for the presence of an element in a map. It is better to use map::find() first and only insert when the search was unsuccessful.

This is a small performance improvement but not important for the 9.4 release.

@bangerth
Copy link
Member

bangerth commented Jun 1, 2022

I have a hard time seeing how this is better. If you call find(), you do a tree search and if you then do insert() you have to do another tree search. That can't be faster than a single insert() followed by a possible overwrite, can it? Or maybe I just don't understand what the expensive temporary is you are referring to?

I think if you wanted to address some cost, you should use lower_bound() to find a possible insertion points. If lower_point() points to an element that has the same key already, then you infer that the element already exists. If it doesn't exist, you use the returned lower bound as a hint for insertion.

(If I understand the code right, you actually want an "insert or overwrite" semantic. std::map actually has such a function: https://en.cppreference.com/w/cpp/container/map/insert_or_assign It's a shame that that only exists with C++17, but maybe that's a place where it's useful to do #if DEAL_II_WITH_CXX17.)

@kronbichler
Copy link
Member Author

I have a hard time seeing how this is better.

I understand your point, and indeed was the premise why we wrote the code like this originally. The problem is, at least for GNU STL implementation that I have on my machine (and that is also used by my clang), that insert() allocates a new element upon a call to insert(), both when the element already exists and when a new element is to be inserted. Since the map in the two use cases above has very high hit rates (99% is not unusual), we reduce the number of memory allocations, which is the most expensive part especially in parallel, by that hit rate. Now I would sure wish std::map::insert did a better job in the first place, but that is not in my power to change, and there may be reasons to do it like this. Either way, from what I see it is not the tree traversal that is expensive, but the memory allocations and copy operations. You can test this by this small code:

#include <map>
#include <iostream>

__attribute__((noinline))
void compute(std::map<unsigned int, unsigned int> &v1, unsigned int element)
{
  auto it = v1.insert(std::make_pair(element, 2 * element));
  if (it.second == false)
    it.first->second *= 3;
}

__attribute__((noinline))
void compute2(std::map<unsigned int, unsigned int> &v1, unsigned int element)
{
  auto it = v1.find(element);
  if (it != v1.end())
    it->second *= 3;
  else
    v1.insert(std::make_pair(element, 2 * element));
}

int main()
{
  std::map<unsigned int, unsigned int> map;
  compute2(map, 2);
  compute2(map, 3);
  for (unsigned int i=0; i<10; ++i)
    compute2(map, 2);
  compute2(map, 4);
  for (unsigned int i=0; i<4; ++i)
    compute2(map, 4);

  for (const auto it : map)
    std::cout << it.first << " " << it.second << std::endl;
  return 0;
}

If you run the code like this, you should get 3 calls to new, which corresponds to the 3 unique elements added with keys 2, 3, 4. But if you replace compute2 by compute, I see 17 memory allocations.

@kronbichler
Copy link
Member Author

(If I understand the code right, you actually want an "insert or overwrite" semantic. std::map actually has such a function: https://en.cppreference.com/w/cpp/container/map/insert_or_assign It's a shame that that only exists with C++17, but maybe that's a place where it's useful to do #if DEAL_II_WITH_CXX17.)

Unfortunately not, I rather need the semantic "insert or look up", which I use to detect duplicates in a list that later gets saved in a vector, so the key is the element I want to find similarities of, and the value in the map is the first such index that I can use to match the list of values with the position in the vector.

@bangerth
Copy link
Member

bangerth commented Jun 1, 2022

Ah, I see now. The key is the 99% hit rate. Thanks for the explanation, let's get this merged then.

@bangerth bangerth merged commit fc8a09f into dealii:master Jun 1, 2022
@bangerth
Copy link
Member

bangerth commented Jun 1, 2022

Oh, oops. I just violated the rule spelled out just a few minutes ago: Anything that isn't tagged 9.4 should wait. Should I revert?

@kronbichler
Copy link
Member Author

No, this should be rather safe (and if it is breaking tests, we can revert then.) I will keep an eye on the testers.

@kronbichler kronbichler deleted the simplify_access_to_map branch June 1, 2022 17:11
@kronbichler
Copy link
Member Author

kronbichler commented Jun 3, 2022

For reference, the benchmark timing_step_37/mpirun=32/threads=1, sub-test setup_matrix_free documented on https://dealii.org/performance_tests/results/ went from around 0.79 seconds on May 26 to 0.57 seconds on June 3. While I have done several small changes aiming at improving performance, this PR is by far the biggest contributor to that improvement.

@bangerth
Copy link
Member

bangerth commented Jun 3, 2022

Nice!

mkghadban pushed a commit to OpenFCST/dealii that referenced this pull request Sep 8, 2022
Use map::find rather than insert to avoid creating temporary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants