New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: Optimizing siphash implementation #18014
base: master
Are you sure you want to change the base?
Conversation
(the Travis failure isn't related, it's a bug in the s390x machines) |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsNo conflicts as of last run. |
That's a very nice speed improvement! |
Where do we use variable-length SipHash? |
I just tested it and the speed improvement is good. Also the formatting is much prettier now. |
This review is more related to the interface rather than the crypto as I don't have much experience to review that.
src/bench/bench.cpp
Outdated
@@ -20,7 +20,15 @@ const std::function<void(const std::string&)> G_TEST_LOG_FUN{}; | |||
|
|||
void benchmark::ConsolePrinter::header() | |||
{ | |||
std::cout << "# Benchmark, evals, iterations, total, min, max, median" << std::endl; | |||
std::cout << "#" | |||
<< std::left << std::left << std::setw(47) << "Benchmark" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played a bit around with it and I find it better to change this to 16
. This will be much more friendly on smaller terminals and/or display
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that doesn't work with tests like CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT
.
see the last paragraph in my post
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be dynamically calculated then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is limited by up to 47 chars of benchmark name, so as long as we don't add more names like CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT and longer then it will be fine.
(it can probably be adjustable but that will require iterating over all the tests before running them to determine the longest cell and I thought the 47 limit is more than reasonable)
src/bench/bench.cpp
Outdated
std::cout << std::setprecision(6); | ||
std::cout << state.m_name << ", " << state.m_num_evals << ", " << state.m_num_iters << ", " << total << ", " << front << ", " << back << ", " << median << std::endl; | ||
std::cout << std::setprecision(6) | ||
<< std::left << std::setw(48) << state.m_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set this to 17
then. Maybe a const for this would be a good idea
src/bench/bench.cpp
Outdated
std::cout << state.m_name << ", " << state.m_num_evals << ", " << state.m_num_iters << ", " << total << ", " << front << ", " << back << ", " << median << std::endl; | ||
std::cout << std::setprecision(6) | ||
<< std::left << std::setw(48) << state.m_name | ||
<< std::left << std::setw(12) << state.m_num_evals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The const idea also applies to all occurrences of 12
src/bench/crypto_hash.cpp
Outdated
} | ||
|
||
|
||
static void SipHash(benchmark::State& state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm not that much into crypto but SipHash
and SipHash_32b look very similar. Maybe you could include them into one function or make one general function which is called by the both. The only difference between them is the vector constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SipHash_32b
is calling a different SipHash implementation that is somewhat optimized to 32 bytes (256bit)
A quick search shows:
EDIT: We could also replace this Write+Finalize with a single invocation and gain a few more percentages (by not storing and checking tail, and improving inlining) but that's kinda lose future usability. |
nit: You have a few whitespace irregularities - running diff --git a/src/bench/crypto_hash.cpp b/src/bench/crypto_hash.cpp
index 9eeb8da16..037260939 100644
--- a/src/bench/crypto_hash.cpp
+++ b/src/bench/crypto_hash.cpp
@@ -90,7 +90,7 @@ static void SipHash(benchmark::State& state)
{
uint64_t hash = 0;
uint64_t k2 = 0;
- std::vector<uint8_t> in(BUFFER_SIZE,0);
+ std::vector<uint8_t> in(BUFFER_SIZE, 0);
while (state.KeepRunning())
hash = CSipHasher(hash, ++k2).Write(in.data(), in.size()).Finalize();
}
diff --git a/src/crypto/siphash.cpp b/src/crypto/siphash.cpp
index 0b62de998..cfc04c194 100644
--- a/src/crypto/siphash.cpp
+++ b/src/crypto/siphash.cpp
@@ -2,8 +2,8 @@
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
-#include <crypto/siphash.h>
#include <crypto/common.h>
+#include <crypto/siphash.h>
#include <algorithm>
@@ -50,7 +50,8 @@ CSipHasher& CSipHasher::Write(uint64_t data)
/// Load a uint64_t from 0 to 7 bytes.
-inline uint64_t ReadU64ByLenLE(const unsigned char* data, size_t len) {
+inline uint64_t ReadU64ByLenLE(const unsigned char* data, size_t len)
+{
assert(len < 8);
uint64_t out = 0;
for (size_t i = 0; i < len; ++i) {
@@ -85,7 +86,7 @@ CSipHasher& CSipHasher::Write(const unsigned char* data, size_t size)
auto i = needed;
while (i < len - left) {
- uint64_t mi = ReadLE64(data+i);
+ uint64_t mi = ReadLE64(data + i);
v3 ^= mi;
SIPROUND;
SIPROUND;
diff --git a/src/crypto/siphash.h b/src/crypto/siphash.h
index 698f9c310..ddf192e8a 100644
--- a/src/crypto/siphash.h
+++ b/src/crypto/siphash.h
@@ -14,7 +14,7 @@ class CSipHasher
{
private:
uint64_t v[4];
- size_t count; // total amount of bytes inputted.
+ size_t count; // total amount of bytes inputted.
uint64_t tail; // bytes that weren't processed yet.
public: |
Github-Pull: bitcoin#18014 Rebased-From: 81e0f14
Concept ACK
before change of formatting
$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
# Benchmark, evals, iterations, total, min, max, median
SipHash_32b, 5, 40000000, 6.28913, 3.10429e-08, 3.27799e-08, 3.11495e-08
# Benchmark, evals, iterations, total, min, max, median
SipHash_32b, 5, 40000000, 6.33962, 3.10606e-08, 3.30034e-08, 3.12351e-08
# Benchmark, evals, iterations, total, min, max, median
SipHash_32b, 5, 40000000, 6.55364, 3.12087e-08, 3.75397e-08, 3.16322e-08
after change of formatting and added benchmarks
((HEAD detached at 52aa1f380b))$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark evals iterations total min max median
SipHash 5 700 7.58726 0.00214442 0.00219727 0.00217387
SipHash_32b 5 40000000 6.32856 3.11782e-08 3.25053e-08 3.1465e-08
SipHash_3b 5 40000000 4.3802 2.16468e-08 2.21054e-08 2.19273e-08
#Benchmark evals iterations total min max median
SipHash 5 700 7.82075 0.0021651 0.0024746 0.00218246
SipHash_32b 5 40000000 6.39722 3.12934e-08 3.3359e-08 3.16498e-08
SipHash_3b 5 40000000 4.6185 2.18849e-08 2.52748e-08 2.30488e-08
#Benchmark evals iterations total min max median
SipHash 5 700 8.12191 0.00215675 0.00248114 0.00232176
SipHash_32b 5 40000000 6.33754 3.12737e-08 3.24442e-08 3.16561e-08
SipHash_3b 5 40000000 4.3563 2.13873e-08 2.21485e-08 2.17682e-08
after change of algorithm implementation
(pr/18014)$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark evals iterations total min max median
SipHash 5 700 2.0267 0.000571765 0.000590692 0.000579299
SipHash_32b 5 40000000 6.27794 3.10675e-08 3.209e-08 3.13121e-08
SipHash_3b 5 40000000 4.07212 2.01261e-08 2.06954e-08 2.02564e-08
#Benchmark evals iterations total min max median
SipHash 5 700 2.13875 0.000582848 0.000627614 0.000614234
SipHash_32b 5 40000000 6.27427 3.10577e-08 3.16413e-08 3.13393e-08
SipHash_3b 5 40000000 4.02383 1.9983e-08 2.05697e-08 2.00265e-08
#Benchmark evals iterations total min max median
SipHash 5 700 2.05042 0.000577004 0.00060258 0.000584264
SipHash_32b 5 40000000 6.32065 3.10846e-08 3.25604e-08 3.14578e-08
SipHash_3b 5 40000000 4.03686 1.96887e-08 2.08056e-08 2.02039e-08
Will look at the algorithm.
uint64_t tmp; | ||
int count; | ||
size_t count; // total amount of bytes inputted. | ||
uint64_t tail; // bytes that weren't processed yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps sort these entries L::16-18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you want to sort them? This order seems to make sense to me?
Sorry for the delay @elichai -- I still plan to review this. |
Updated benchmarks with the new benchmarking library: Before:
After:
|
Github-Pull: bitcoin#18014 Rebased-From: 81e0f14
Needs rebase. |
With the benchmarks adjusted to bytes
After:
|
utACK 19e28a4 |
this PR seems quite dead, but if it gets revived, please do the following
assert(len < 8); | ||
uint64_t out = 0; | ||
for (size_t i = 0; i < len; ++i) { | ||
out |= (uint64_t)data[i] << (i * 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please convert this to a c++11 functional cast
@@ -77,12 +107,14 @@ uint64_t CSipHasher::Finalize() const | |||
{ | |||
uint64_t v0 = v[0], v1 = v[1], v2 = v[2], v3 = v[3]; | |||
|
|||
uint64_t t = tmp | (((uint64_t)count) << 56); | |||
|
|||
uint64_t t = tail | (((uint64_t)count) << 56); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use a c++11 functional-cast
I wouldn't say it is dead. It compiles and (unit) tests fine on itself and on current master. All it needs is review. |
My comment on aliveness was just that it's been about 11 months since it got a review, not to say that there is anything problematic with the pr. Maybe since I commented some others will be notified and will do a review :)
For what it's worth, I don't see any major issues with the PR, although I did add a few more comments of things that likely should be changed
size_t len = size - needed; | ||
auto left = len & 0x07; | ||
|
||
auto i = needed; | ||
while (i < len - left) { | ||
uint64_t mi = ReadLE64(data + i); | ||
v3 ^= mi; | ||
SIPROUND; | ||
SIPROUND; | ||
v0 ^= mi; | ||
i += 8; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really be a for loop
uint64_t tmp; | ||
int count; | ||
size_t count; // total amount of bytes inputted. | ||
uint64_t tail; // bytes that weren't processed yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you want to sort them? This order seems to make sense to me?
Hi,
This builds on #18013
Before anything I want to point out that we have 3 SipHash implementations
CSipHasher
,SipHashUint256
,SipHashUint256Extra
. this PR touches only the first one(not used in any hashmap AFAIK).I re-implemented the
CSipHasher
with performance up to 3X times faster for big strings (BUFFER_SIZE = 1000*1000
) and 5%-19% faster for small strings (3 bytes, because a minute of syncing showed me that 3 bytes siphash is something that happens quite often)Benchmarks against other siphash implementations can be found here: https://gist.github.com/elichai/abdebeeaee7e581bc74c75cb9487b3af (code: https://github.com/elichai/siphash-bench)
My implementation was inspired by the one in Rust's stdlib (https://github.com/rust-lang/rust/blob/master/src/libcore/hash/sip.rs) which rust-bitcoin use in https://github.com/rust-bitcoin/bitcoin_hashes.
Before:
After:
Also made the benchmarks print a more readable output(https://gist.github.com/elichai/812c8866a69959404b480d968e080475),this is limited by up to 47 chars of benchmark name, so as long as we don't add more names like
CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT
and longer then it will be fine.(it can probably be adjustable but that will require iterating over all the tests before running them to determine the longest cell and I thought the 47 limit is more than reasonable)