Use Abseil hash maps instead of std unordered containers for a huge speedup #4049

lrineau · 2019-07-02T13:06:48Z

Issue Details

Inspired by one of the cppcon talks I listen to during my free time, I have quickly tried the hash maps from Abseil instead of std::unordered_map.

My try was a modification of copy_face_graph:

diff --git a/BGL/include/CGAL/boost/graph/copy_face_graph.h b/BGL/include/CGAL/boost/graph/copy_face_graph.h
index 9a2cb628825..fd0b602f284 100644
--- a/BGL/include/CGAL/boost/graph/copy_face_graph.h
+++ b/BGL/include/CGAL/boost/graph/copy_face_graph.h
@@ -35,6 +35,7 @@
 #include <boost/unordered_map.hpp>
 #include <boost/utility/enable_if.hpp>
 #include <boost/function_output_iterator.hpp>
+#include <absl/container/flat_hash_map.h>
 
 namespace CGAL {
 
@@ -189,7 +190,7 @@ void copy_face_graph(const SourceMesh& sm, TargetMesh& tm,
   typedef typename boost::graph_traits<SourceMesh>::halfedge_descriptor sm_halfedge_descriptor;
   typedef typename boost::graph_traits<TargetMesh>::halfedge_descriptor tm_halfedge_descriptor;
 
-  boost::unordered_map<sm_halfedge_descriptor,
+  absl::flat_hash_map<sm_halfedge_descriptor,
                        tm_halfedge_descriptor> hash_map(num_halfedges(sm));
   copy_face_graph_impl(sm, tm,
                        boost::make_assoc_property_map(hash_map),

The results are stuning: now 3.2 seconds instead of 7.1 seconds

Source Code

The .cpp code: https://gist.github.com/2e9ea602a7b1e46cd34773611acac3c6

Here is the local patch to CMakeLists.txt (a non-clean hack):

diff --git a/BGL/examples/BGL_polyhedron_3/CMakeLists.txt b/BGL/examples/BGL_polyhedron_3/CMakeLists.txt
index ac3bd5256f2..a821195ff9c 100644
--- a/BGL/examples/BGL_polyhedron_3/CMakeLists.txt
+++ b/BGL/examples/BGL_polyhedron_3/CMakeLists.txt
@@ -67,6 +67,11 @@ create_single_source_cgal_program( "transform_iterator.cpp" )
 
 create_single_source_cgal_program( "copy_polyhedron.cpp" )
 
+add_subdirectory(googletest)
+add_subdirectory(abseil-cpp)
+
+target_link_libraries(copy_polyhedron PRIVATE absl::flat_hash_map)
+
 if(OpenMesh_FOUND)
   target_link_libraries( copy_polyhedron PRIVATE ${OPENMESH_LIBRARIES} )
 endif()

I had to checkout abseil and gtest:

$ git clone git@github.com:abseil/abseil-cpp.git
$ git clone git@github.com:google/googletest.git

I ran that modified copy_polyhedron.cpp on demo/Polyhedron/data/man.off, and as I said the speedup is huge: 3.2s after the patch to <CGAL/boost/graph/copy_face_graph.h>, instead of 7.1s.

Environment

Operating system (Windows/Mac/Linux, 32/64 bits):
Compiler: g++ 9.1.1 with -O3
Release or debug mode: Release
CGAL version: CGAL-5.0-dev, master version 44e4d15
Boost version: 1.69.0
Other libraries versions if used (Eigen, TBB, etc.):

The text was updated successfully, but these errors were encountered:

lrineau · 2019-07-02T13:08:36Z

https://github.com/abseil/abseil-cpp license is Apache License 2.0. It is very permissive.

MaelRL · 2019-07-02T13:21:28Z

https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-01-overview/ had some rather detailed comparisons recently, which seem to yield similar observations.

sloriot · 2019-07-04T07:02:41Z

Shall @maxGimeno propose a global plan with the introduction of CGAL::unordered_map + the cmake machinery to get it everywhere?

mglisse · 2019-07-04T07:14:35Z

abseil is rather inconvenient to use (without bazel or cmake): you need to link about 30 libraries to get a trivial example working 😞
But yes, anything that spends a lot of time in hashtable operations should be benchmarked with other implementations.
Note that in one extreme case (unordered_set was taking > 64G in this application) I had better results with the standard container than with abseil.

lrineau · 2019-07-04T07:27:16Z

@mglisse commented on Jul 4, 2019, 9:14 AM GMT+2:

abseil is rather inconvenient to use (without bazel or cmake): you need to link about 30 libraries to get a trivial example working 😞

Hopefully CGAL uses CMake.

add_subdirectory(googletest)
add_subdirectory(abseil-cpp)

target_link_libraries(copy_polyhedron PRIVATE absl::flat_hash_map)

But yes, anything that spends a lot of time in hashtable operations should be benchmarked with other implementations.
Note that in one extreme case (unordered_set was taking > 64G in this application) I had better results with the standard container than with abseil.

Our opting for Abseil will be conditional to a macro. With it we will easily be able to build benchmarks. Even users should be made aware of that, so that they can tune.

mglisse · 2019-07-04T07:38:21Z

Hopefully CGAL uses CMake.

As a header-only library, it doesn't matter what CGAL uses (if it uses anything at all). Providing CGALConfig.cmake does help. But users of CGAL don't all use cmake.

lrineau · 2019-07-04T07:41:27Z

@mglisse commented on Jul 4, 2019, 9:38 AM GMT+2:

Hopefully CGAL uses CMake.

As a header-only library, it doesn't matter what CGAL uses (if it uses anything at all). Providing CGALConfig.cmake does help. But users of CGAL don't all use cmake.

We do not care. Abseil will be an optional dependency of CGAL, and our CMake users will get it almost for free. As for the others... they can do experiments and benchmarks using CMake, and them make the effort in their build system if the speedup is compelling.

Actually, I do not get your point. Do you have a suggestion about an alternative?

afabri · 2019-07-04T08:03:02Z

Just my two cents, it would be good to see a higher level CGAL algorithm than copy_face_graph() as a motivation.

lrineau · 2019-07-04T09:02:21Z

@mglisse commented on Jul 4, 2019, 9:14 AM GMT+2:

abseil is rather inconvenient to use (without bazel or cmake): you need to link about 30 libraries to get a trivial example working 😞

Actually, can you elaborate? Because on my machine abseil seems to require only the pthread support, and no other dependency.

maxGimeno · 2019-07-04T09:12:04Z

@afabri it concerns everything related to unordered maps, not only copy_face_graph().

afabri · 2019-07-04T10:15:28Z

The questions is if it has measurable impact on Mesh_3, Constrained_triangulation_2, Arrangement_2......

maxGimeno · 2019-07-04T10:17:37Z

We will probably have to bench to be sure. But I have seen traces of unordered maps at least in Mesh_3 and Arrangements_on_surface_2, so there will probbaly be an impact.

lrineau · 2019-07-04T10:18:15Z

For at least one user of us, CGAL::PMP::polygon_soup_to_polygon_mesh was a big consumer of time of their process. And it is all about maps.

mglisse · 2019-07-04T12:17:25Z

Actually, can you elaborate? Because on my machine abseil seems to require only the pthread support, and no other dependency.

I didn't mean external dependencies. Using a trivial program:

#include <absl/container/flat_hash_set.h>
typedef absl::flat_hash_set<int> S;
int main(){ S s; }

Complete the command line with a minimal number of -l flags:
g++ test.cc -I .../include -L .../lib
Here I need -labsl_hashtablez_sampler -labsl_synchronization -labsl_time -labsl_base -labsl_spinlock_wait -labsl_time_zone -labsl_int128 -labsl_symbolize -labsl_stacktrace -labsl_malloc_internal -labsl_demangle_internal -labsl_debugging_internal -labsl_dynamic_annotations -pthread ... Why they didn't at least create a thin archive or linker script so that -labsl would work is an interesting question.

mglisse · 2019-07-04T12:37:18Z

Actually, I do not get your point.

Just that abseil is a bit inconvenient to use, not as easy as using more Boost stuff for instance (even the parts that require linking with a library).

I don't know yet if you plan to test for the presence of abseil or to include a copy of it.

Do you have a suggestion about an alternative?

Not particularly. Mael posted a link to a comparison between many implementations, some of which are header-only. They don't all have Google maintaining them though, and you may wish to use more stuff from abseil later.

lrineau · 2019-07-04T13:05:58Z

I would love to use the single-header implementation robin_hood.h, that is said to be faster by those benchmarks, but it is not compatible, for the moment, with our use of -frounding-math:

robin_hood.h:1901:59: error: ‘(8.0e+1 / 1.0e+2)’ is not a constant expression
 1901 |         static constexpr double factor = MaxLoadFactor100 / 100.0;
      |                                          ~~~~~~~~~~~~~~~~~^~~~~~~

lrineau · 2019-07-04T13:16:08Z

@lrineau commented on Jul 4, 2019, 3:05 PM GMT+2:

I would love to use the single-header implementation robin_hood.h, that is said to be faster by those benchmarks, but it is not compatible, for the moment, with our use of -frounding-math:
robin_hood.h:1901:59: error: ‘(8.0e+1 / 1.0e+2)’ is not a constant expression
 1901 |         static constexpr double factor = MaxLoadFactor100 / 100.0;
      |                                          ~~~~~~~~~~~~~~~~~^~~~~~~

I have artificially removed -frounding-math, to compile the test, and it is slower than using abseil: 3.8 seconds instead of 3.2 using abseil.

mglisse · 2019-07-04T15:28:16Z

our use of -frounding-math

I don't remember if you tried recently to remove it, to see what breaks. But that's not the right place to discuss it.

I guess constexpr is a context where gcc should ignore rounding modes...

it is slower than using abseil

The list also includes phmap, which is described as essentially a headerified version of abseil.

GilesBathgate · 2021-02-08T12:59:45Z

Hi @lrineau, I've been experimenting with replacing the map in Unique_hash_map with std::unordered_map but initial results showed this to be slower than the chained_map implementation. Probably I did something wrong or I am not understanding the use case of Unique_hash_map correctly:

https://gist.github.com/GilesBathgate/3fb78d3949ca52f4d1a0d14dcc66ae23

Would this benefit from the abseil version of map?

mglisse · 2021-02-08T13:29:07Z

I've been experimenting with replacing the map in Unique_hash_map with std::unordered_map but initial results showed this to be slower than the chained_map implementation.

The whole point of this issue is that std::unordered_map is slow, what did you expect?

GilesBathgate · 2021-02-08T13:56:06Z

@mglisse 😂 Yes, But my reason for investigating it (prior to coming across this PR) was that in some cases, I've found anything other than chained_map to be faster, especially when the number of elements is small because chained_map doesn't lazily initialise minimum of 512 buckets.

For example here:

cgal/Nef_S2/include/CGAL/Nef_S2/SM_const_decorator.h

Line 392 in 74af3e7

CGAL::Unique_hash_map<SVertex_const_iterator,bool> visited(false);

where visited only contains approximately 6 to 10 elements. The chained_map construction dominates the CPU usage, making O(1) lookup irrelevant probably a std::vector with O(n) would be faster in this scenario. Another example where Unique_hash_map construction is slow is #5143

lrineau · 2021-02-08T14:21:15Z

Hi @lrineau, I've been experimenting with replacing the map in Unique_hash_map with std::unordered_map but initial results showed this to be slower than the chained_map implementation. Probably I did something wrong or I am not understanding the use case of Unique_hash_map correctly:

gist.github.com/GilesBathgate/3fb78d3949ca52f4d1a0d14dcc66ae23_ (too large to embed)_

Would this benefit from the abseil version of map?

Hi Giles, I do not have the time to investigate on your issue. I hope others will, though.

The best way to know if the Abseil hash maps can benefit your code is to try, anyway.

GilesBathgate · 2021-02-08T14:53:37Z

@lrineau Thanks I've given it a try

https://gist.github.com/GilesBathgate/3fb78d3949ca52f4d1a0d14dcc66ae23

@mglisse Abseil was 20% slower than std:unordered_map, for me though so I'm confused now.

GilesBathgate · 2021-02-11T12:08:13Z

@lrineau @mglisse I am getting good results with robin_hood.h I use robin_hood::unordered_node_map (The issue with -frounding math was resolved martinus/robin-hood-hashing#51)

Updated gist:
https://gist.github.com/GilesBathgate/3fb78d3949ca52f4d1a0d14dcc66ae23

mglisse · 2021-02-11T13:40:01Z

where visited only contains approximately 6 to 10 elements

Did you try a flat_map for this one? (I know nothing about that code, and in particular not if the size of this container can ever be large) Such small maps are not a typical use case for hash maps, so many are probably not optimized for that case. The code you point to seems suboptimal, it calls visited[v] twice in a row, changing the code a bit, a (hash_)set would make more sense than a map (it probably used a map because there was no set in cgal), checking visited.insert(v).second.

Nice if robin hood works well, although if you want to change Unique_hash_map, which is used a lot in CGAL, and not just use a different map in one specific place, reviewers are likely to ask for various benchmarks, to make sure it doesn't regress some other use cases (bigger table, more expensive key, etc). But it seems quite possible that it is indeed faster all around.

GilesBathgate · 2021-02-11T14:04:25Z

@mglisse Yes, I tried a boost::flat_set and it was much faster I'll add a comment about that to #5379

Regarding Unique_hash_map if I understand correctly it's optimised for when the keys are known to be unique and therefore no collisions. That said robin_hood does seem to out-perform chained_map in my tests, especially in the object construction. Either way, I've found some that some cases require robin_hood::unordered_node_map instead of robin_hood::unordered_flat_map So it doesn't get the cache friendliness benefits. I was thinking of asking @martinus if there is a way to use robin_hood::* in a way that is optimal for unique hash maps.

martinus · 2021-02-11T14:50:53Z

If you know the maximum size of the hashmap, it is beneficial to call reserve() with that amount. That way you only get a single allocation. Not much else you can do as far as I see on a quick glance.

Also, as @mglisse said, the linked code looks suboptimal. E.g. unordered_flat_set might help, and I think the std::list could use a custom allocator so it doesn't malloc & free each time you put an element into it. A while ago I've created an allocator for that: https://gist.github.com/martinus/ee4cc9d82d4610647180301c47b030a2 see e.g. this benchmark: https://quick-bench.com/q/BLW3OTlodJ2c3TmQGJUPT_TAdGA

mglisse · 2021-02-11T15:07:34Z

the std::list could

... be wrapped in a std::queue, which could then use its default backend deque 😉
(I didn't profile that code, no idea what part is relevant to optimize)

GilesBathgate · 2021-02-11T15:24:46Z

If you know the maximum size of the hashmap, it is beneficial to call reserve() with that amount.

So this could be done:

-  CGAL::Unique_hash_map<SVertex_const_iterator,bool> visited(false);
+  robin_hood::unordered_flat_set<SVertex_const_iterator,Handle_hash_function> visited;
+  visited.reserve(number_of_svertices());

but unfortunately:

cgal/Nef_3/include/CGAL/Nef_3/Vertex.h

Lines 214 to 219 in 74af3e7

    
           Size_type number_of_svertices() const 
        
           /*{\Mop returns the number of vertices.}*/ 
        
           { Size_type n(0); 
        
             SVertex_const_iterator vit; 
        
             CGAL_forall_svertices(vit, *this) ++n; 
        
             return n; }

😞

GilesBathgate · 2021-02-11T15:50:06Z

@mglisse Presumably one can use return CGAL::iterator_distance(svertices_begin(), svertices_end()); (or just std::distance) in number_of_svertices?

mglisse · 2021-02-11T16:09:32Z

@mglisse Presumably one can use return CGAL::iterator_distance(svertices_begin(), svertices_end()); (or just std::distance) in number_of_svertices?

Probably, but why? It would perform exactly the same iteration.

GilesBathgate · 2021-02-11T16:16:16Z

Probably, but why? It would perform exactly the same iteration.

Because I thought it was a random access iterator, so last-first, it's not of course, so you are right.

GilesBathgate · 2021-02-11T17:00:40Z

I know nothing about that code, and in particular not if the size of this container can ever be large

@mglisse Yes it can be large if you have a cone with 10,000 sides, the sphere_map of the apex will have 10,000 svertices. so visited would need that many elements. robin_hood::unordered_flat_map with .reserve does seem to be a good option though.

lrineau added Pkg::BGL Pkg::PMP labels Jul 2, 2019

lrineau added this to the 5.0-beta milestone Jul 2, 2019

lrineau assigned sloriot, afabri and lrineau Jul 2, 2019

lrineau changed the title ~~Use Abseil hash maps instead of std unordered containers for a hugde speedup~~ Use Abseil hash maps instead of std unordered containers for a huge speedup Jul 4, 2019

MaelRL modified the milestones: 5.0-beta, 5.1-beta Jul 12, 2019

martinus mentioned this issue Oct 30, 2019

Make it work with -frounding-math martinus/robin-hood-hashing#51

Closed

sloriot modified the milestones: 5.1-beta, 5-2-beta Apr 7, 2020

MaelRL added this to the 5.3-beta milestone Dec 2, 2020

This was referenced Feb 11, 2021

~70% Performance speedup for polygon_mesh_to_nef_3 #5379

Closed

Proposed fix for unnecessary allocations in ID_support_handler. Issue #5143 #5144

Merged

Support for Unique hash map? martinus/robin-hood-hashing#112

Closed

GilesBathgate mentioned this issue Feb 20, 2021

Migration to faster hash maps - Handle_hash_map #5484

Closed

MaelRL modified the milestones: 5.3-beta, 5.4-beta Apr 12, 2021

sloriot modified the milestones: 5.4-beta, 5.5-beta1 Oct 7, 2021

MaelRL modified the milestones: 5.5-beta, 5.6-beta Mar 28, 2022

MaelRL modified the milestones: 5.6-beta, 5.7-beta Mar 31, 2023

sloriot modified the milestones: 6.0-beta, 6.1-beta Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Abseil hash maps instead of std unordered containers for a huge speedup #4049

Use Abseil hash maps instead of std unordered containers for a huge speedup #4049

lrineau commented Jul 2, 2019

lrineau commented Jul 2, 2019

MaelRL commented Jul 2, 2019 •

edited

Loading

sloriot commented Jul 4, 2019

mglisse commented Jul 4, 2019

lrineau commented Jul 4, 2019 •

edited

Loading

mglisse commented Jul 4, 2019

lrineau commented Jul 4, 2019

afabri commented Jul 4, 2019

lrineau commented Jul 4, 2019 •

edited

Loading

maxGimeno commented Jul 4, 2019

afabri commented Jul 4, 2019

maxGimeno commented Jul 4, 2019

lrineau commented Jul 4, 2019

mglisse commented Jul 4, 2019

mglisse commented Jul 4, 2019

lrineau commented Jul 4, 2019

lrineau commented Jul 4, 2019

mglisse commented Jul 4, 2019 •

edited by lrineau

Loading

GilesBathgate commented Feb 8, 2021

mglisse commented Feb 8, 2021

GilesBathgate commented Feb 8, 2021 •

edited

Loading

lrineau commented Feb 8, 2021

GilesBathgate commented Feb 8, 2021

GilesBathgate commented Feb 11, 2021

mglisse commented Feb 11, 2021

GilesBathgate commented Feb 11, 2021

martinus commented Feb 11, 2021

mglisse commented Feb 11, 2021

GilesBathgate commented Feb 11, 2021

GilesBathgate commented Feb 11, 2021 •

edited

Loading

mglisse commented Feb 11, 2021

GilesBathgate commented Feb 11, 2021

GilesBathgate commented Feb 11, 2021 •

edited

Loading

Use Abseil hash maps instead of std unordered containers for a huge speedup #4049

Use Abseil hash maps instead of std unordered containers for a huge speedup #4049

Comments

lrineau commented Jul 2, 2019

Issue Details

Source Code

Environment

lrineau commented Jul 2, 2019

MaelRL commented Jul 2, 2019 • edited Loading

sloriot commented Jul 4, 2019

mglisse commented Jul 4, 2019

lrineau commented Jul 4, 2019 • edited Loading

mglisse commented Jul 4, 2019

lrineau commented Jul 4, 2019

afabri commented Jul 4, 2019

lrineau commented Jul 4, 2019 • edited Loading

maxGimeno commented Jul 4, 2019

afabri commented Jul 4, 2019

maxGimeno commented Jul 4, 2019

lrineau commented Jul 4, 2019

mglisse commented Jul 4, 2019

mglisse commented Jul 4, 2019

lrineau commented Jul 4, 2019

lrineau commented Jul 4, 2019

mglisse commented Jul 4, 2019 • edited by lrineau Loading

GilesBathgate commented Feb 8, 2021

mglisse commented Feb 8, 2021

GilesBathgate commented Feb 8, 2021 • edited Loading

lrineau commented Feb 8, 2021

GilesBathgate commented Feb 8, 2021

GilesBathgate commented Feb 11, 2021

mglisse commented Feb 11, 2021

GilesBathgate commented Feb 11, 2021

martinus commented Feb 11, 2021

mglisse commented Feb 11, 2021

GilesBathgate commented Feb 11, 2021

GilesBathgate commented Feb 11, 2021 • edited Loading

mglisse commented Feb 11, 2021

GilesBathgate commented Feb 11, 2021

GilesBathgate commented Feb 11, 2021 • edited Loading

MaelRL commented Jul 2, 2019 •

edited

Loading

lrineau commented Jul 4, 2019 •

edited

Loading

lrineau commented Jul 4, 2019 •

edited

Loading

mglisse commented Jul 4, 2019 •

edited by lrineau

Loading

GilesBathgate commented Feb 8, 2021 •

edited

Loading

GilesBathgate commented Feb 11, 2021 •

edited

Loading

GilesBathgate commented Feb 11, 2021 •

edited

Loading