Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a good design #1

Open
ktprime opened this issue Feb 28, 2022 · 12 comments
Open

a good design #1

ktprime opened this issue Feb 28, 2022 · 12 comments

Comments

@renzibei
Copy link
Owner

Interesting hash map implemantion. I shall try to add your hash table to the bench recently.

I'm thinking of using a dedicated web page to show the benchmark results and plots because there's so much data to display.

By the way, your hash table has several variants. Which one do you want to add? I can't add too many hashtables because there are already many in the comparison now. Too many comparison objects can make the chart look too cluttered.

@ktprime
Copy link
Author

ktprime commented Feb 28, 2022

use the follow twos.
https://github.com/ktprime/emhash/blob/master/hash_table7.hpp and https://github.com/ktprime/emhash/blob/master/thirdparty/emilib/emilib2.hpp

I collect many third-payty flat hash-map implementions and benchmarsks, you can make and test it.

@renzibei
Copy link
Owner

renzibei commented Mar 19, 2022

@ktprime Hi, I tired to add the two hash tables you metioned, but encountered some problems.
You can try the to compile the emhash branch of hashtable-bench. The hash_table7.hpp has some compile errors. The emilib2.hpp can compile in x86 platforms but cannot succeed in arm64.

@ktprime
Copy link
Author

ktprime commented Mar 20, 2022

@renzibei
I fix compile issiue in file hash_table7.hpp by adding compile marco EMH_BUCKET_INDEX=2, but emilib2.hpp is x86 only(sse2).

your benchmark is some like martinus's code and more better.
I think 2-3 hashes(robin_hood_hash,std_hash,xxHash_xxh3) is enough for testing.

@renzibei
Copy link
Owner

@ktprime
I have updated the codes in benchmark with your latest commit, but still got some compile errors with hash_table7.hpp. You can check the emhash branch of hashtable-bench again.

@ktprime
Copy link
Author

ktprime commented Mar 20, 2022

add "ADD_DEFINITIONS(-EMH_BUCKET_INDEX=2)" in cmakelist.txt
or #define EMH_BUCKET_INDEX 2 in Map.h

#pragma once

#include "Hash.h"
#include "Allocator.h"
#define EMH_BUCKET_INDEX 2
#include "emhash/hash_table7.hpp"

static const char* MAP_NAME = "emhash7::hashmap";

template <class Key, class T>
using Map = emhash7::HashMap<Key, T, Hash<Key>, std::equal_to<>>;


@renzibei
Copy link
Owner

Got it.

From some early stage tests, it seems that hash_map7.hpp is faster than emlib2.hpp in most operations. So I think it should be enough to add only hash_map7.hpp to the benchmark.

@ktprime
Copy link
Author

ktprime commented Mar 20, 2022

Performance depends on cpu/compiler/test case/hash/data, emilib2 migrats from absl::flat_hash_map with some optimization, in my owner benchmark it's fast than absl and emhash7 on average.

emhash7 can set high load factor (0.999) without extreme performance deterioration
compared with other flat hash map implementation.

@renzibei
Copy link
Owner

Yeah they are good at different datasets. In general the performance of emilib2 is similar to absl::flat_hash_map in this benchmark. Ploting it would make the plot figure hard to read. So I think emhash7 has more value in comparing.

Here are the two figures comparing these three hash maps. The first is the finding success case and the second is the finding miss case.
Tested in linux, zen2

mask_split_bits_uint64_t,uint64_t ,avg_hit_find_without_rehash

mask_split_bits_uint64_t,uint64_t ,avg_miss_find_without_rehash

@ktprime
Copy link
Author

ktprime commented Mar 20, 2022

emhash focus optimization on find_hit/insertion/iteration/copy/clear performance.
the key&value and index(int) is in same entry(fast with small or very large datasets)
find missing is inefficient compared with other flat hash map because of large dataset and much more cache miss.

emhash6/emhash5 is faster on finding hit than emhash7.
emhash8 is quite faste if key/value is large(string, struct), and it's most fastest iteration than all other hash map.

@ktprime
Copy link
Author

ktprime commented Jun 18, 2022

@renzibei
Copy link
Owner

https://www.reddit.com/r/cpp/comments/ve3qbx/updating_map_benchmarks_send_your_hashmaps

Thank you for sharing this. Considering that fph-table uses a seed hash function, if we want to add the fph-table to that map_benchmarks, we need to either create a version of fph-table taking no-seed hash function or make some changes to the map_benchmarks to make it support the seed hash functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants