Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: Optimizing siphash implementation #18014

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

elichai
Copy link
Contributor

@elichai elichai commented Jan 28, 2020

Hi,
This builds on #18013

Before anything I want to point out that we have 3 SipHash implementations CSipHasher, SipHashUint256, SipHashUint256Extra. this PR touches only the first one(not used in any hashmap AFAIK).

I re-implemented the CSipHasher with performance up to 3X times faster for big strings (BUFFER_SIZE = 1000*1000) and 5%-19% faster for small strings (3 bytes, because a minute of syncing showed me that 3 bytes siphash is something that happens quite often)

Benchmarks against other siphash implementations can be found here: https://gist.github.com/elichai/abdebeeaee7e581bc74c75cb9487b3af (code: https://github.com/elichai/siphash-bench)

My implementation was inspired by the one in Rust's stdlib (https://github.com/rust-lang/rust/blob/master/src/libcore/hash/sip.rs) which rust-bitcoin use in https://github.com/rust-bitcoin/bitcoin_hashes.

Before:

$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         4.20809     0.0011912   0.00122256  0.00120163  
SipHash_32b                                     5           40000000    4.1793      2.08632e-08 2.0948e-08  2.08949e-08 
SipHash_3b                                      5           40000000    3.18892     1.56861e-08 1.64617e-08 1.5749e-08  
$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         4.24318     0.00120808  0.00121676  0.00121336  
SipHash_32b                                     5           40000000    4.23684     2.06753e-08 2.16015e-08 2.14555e-08 
SipHash_3b                                      5           40000000    3.15998     1.54582e-08 1.61558e-08 1.58555e-08 
$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         4.2472      0.0012113   0.00121558  0.00121324  
SipHash_32b                                     5           40000000    4.20925     2.09789e-08 2.11288e-08 2.10327e-08 
SipHash_3b                                      5           40000000    3.10727     1.54352e-08 1.55982e-08 1.55463e-08 
$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         4.37224     0.00124528  0.00125769  0.0012473   
SipHash_32b                                     5           40000000    4.26011     2.1214e-08  2.134e-08   2.13171e-08 
SipHash_3b                                      5           40000000    3.18842     1.59033e-08 1.59832e-08 1.59432e-08 

After:

$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         1.36254     0.000386656 0.000392219 0.000388635 
SipHash_32b                                     5           40000000    4.31286     2.13773e-08 2.17857e-08 2.16181e-08 
SipHash_3b                                      5           40000000    2.91375     1.44794e-08 1.46495e-08 1.45848e-08 
$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         1.32683     0.000372232 0.000386258 0.000376842 
SipHash_32b                                     5           40000000    4.15533     2.069e-08   2.08661e-08 2.07693e-08 
SipHash_3b                                      5           40000000    2.77612     1.38154e-08 1.3988e-08  1.38665e-08 
$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         1.36596     0.00038727  0.000392932 0.000391074 
SipHash_32b                                     5           40000000    4.27694     2.13219e-08 2.14471e-08 2.13672e-08 
SipHash_3b                                      5           40000000    2.75763     1.37529e-08 1.38244e-08 1.37862e-08 
$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         1.34316     0.000376846 0.000386059 0.000385079 
SipHash_32b                                     5           40000000    4.23368     2.1066e-08  2.14124e-08 2.11283e-08 
SipHash_3b                                      5           40000000    2.81931     1.40299e-08 1.42123e-08 1.40787e-08 

Also made the benchmarks print a more readable output(https://gist.github.com/elichai/812c8866a69959404b480d968e080475),
this is limited by up to 47 chars of benchmark name, so as long as we don't add more names like CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT and longer then it will be fine.
(it can probably be adjustable but that will require iterating over all the tests before running them to determine the longest cell and I thought the 47 limit is more than reasonable)

@elichai elichai changed the title Optimizing siphash implementation lib: Optimizing siphash implementation Jan 28, 2020
@elichai
Copy link
Contributor Author

elichai commented Jan 28, 2020

(the Travis failure isn't related, it's a bug in the s390x machines)

@MarcoFalke
Copy link
Member

MarcoFalke commented Jan 29, 2020

@elichai The issue is upstream, see #18016

@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 29, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@laanwj
Copy link
Member

laanwj commented Jan 29, 2020

That's a very nice speed improvement!

@sipa
Copy link
Member

sipa commented Jan 29, 2020

Where do we use variable-length SipHash?

@emilengler
Copy link
Contributor

emilengler commented Jan 29, 2020

I just tested it and the speed improvement is good. Also the formatting is much prettier now.
Will review the code now

Copy link
Contributor

@emilengler emilengler left a comment

This review is more related to the interface rather than the crypto as I don't have much experience to review that.

@@ -20,7 +20,15 @@ const std::function<void(const std::string&)> G_TEST_LOG_FUN{};

void benchmark::ConsolePrinter::header()
{
std::cout << "# Benchmark, evals, iterations, total, min, max, median" << std::endl;
std::cout << "#"
<< std::left << std::left << std::setw(47) << "Benchmark"
Copy link
Contributor

@emilengler emilengler Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played a bit around with it and I find it better to change this to 16. This will be much more friendly on smaller terminals and/or display

Copy link
Contributor Author

@elichai elichai Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that doesn't work with tests like CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT.
see the last paragraph in my post

Copy link
Contributor

@emilengler emilengler Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be dynamically calculated then?

Copy link
Contributor Author

@elichai elichai Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is limited by up to 47 chars of benchmark name, so as long as we don't add more names like CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT and longer then it will be fine.
(it can probably be adjustable but that will require iterating over all the tests before running them to determine the longest cell and I thought the 47 limit is more than reasonable)

std::cout << std::setprecision(6);
std::cout << state.m_name << ", " << state.m_num_evals << ", " << state.m_num_iters << ", " << total << ", " << front << ", " << back << ", " << median << std::endl;
std::cout << std::setprecision(6)
<< std::left << std::setw(48) << state.m_name
Copy link
Contributor

@emilengler emilengler Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set this to 17 then. Maybe a const for this would be a good idea

std::cout << state.m_name << ", " << state.m_num_evals << ", " << state.m_num_iters << ", " << total << ", " << front << ", " << back << ", " << median << std::endl;
std::cout << std::setprecision(6)
<< std::left << std::setw(48) << state.m_name
<< std::left << std::setw(12) << state.m_num_evals
Copy link
Contributor

@emilengler emilengler Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The const idea also applies to all occurrences of 12

}


static void SipHash(benchmark::State& state)
Copy link
Contributor

@emilengler emilengler Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not that much into crypto but SipHash and SipHash_32b look very similar. Maybe you could include them into one function or make one general function which is called by the both. The only difference between them is the vector constructor

Copy link
Contributor Author

@elichai elichai Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SipHash_32b is calling a different SipHash implementation that is somewhat optimized to 32 bytes (256bit)

src/bench/bench.cpp Outdated Show resolved Hide resolved
@elichai
Copy link
Contributor Author

elichai commented Jan 29, 2020

Where do we use variable-length SipHash?

A quick search shows:

  1. GCSFilter::HashToRange BIP158 Compact block filters (blockfilters.h)
  2. RelayAddress I think as a PRNG (net_processing.h).
  3. ByteVectorHash used as the hash for the std::unordered_set<Element, ByteVectorHash> ElementSet; in blockfilter.h.
  4. CConnman::GetDeterministicRandomizer again some randomizer thing (net.cpp).

EDIT: We could also replace this Write+Finalize with a single invocation and gain a few more percentages (by not storing and checking tail, and improving inlining) but that's kinda lose future usability.

@Empact
Copy link
Member

Empact commented Jan 30, 2020

nit: You have a few whitespace irregularities - running git-clang-format generates this diff

diff --git a/src/bench/crypto_hash.cpp b/src/bench/crypto_hash.cpp
index 9eeb8da16..037260939 100644
--- a/src/bench/crypto_hash.cpp
+++ b/src/bench/crypto_hash.cpp
@@ -90,7 +90,7 @@ static void SipHash(benchmark::State& state)
 {
     uint64_t hash = 0;
     uint64_t k2 = 0;
-    std::vector<uint8_t> in(BUFFER_SIZE,0);
+    std::vector<uint8_t> in(BUFFER_SIZE, 0);
     while (state.KeepRunning())
         hash = CSipHasher(hash, ++k2).Write(in.data(), in.size()).Finalize();
 }
diff --git a/src/crypto/siphash.cpp b/src/crypto/siphash.cpp
index 0b62de998..cfc04c194 100644
--- a/src/crypto/siphash.cpp
+++ b/src/crypto/siphash.cpp
@@ -2,8 +2,8 @@
 // Distributed under the MIT software license, see the accompanying
 // file COPYING or http://www.opensource.org/licenses/mit-license.php.
 
-#include <crypto/siphash.h>
 #include <crypto/common.h>
+#include <crypto/siphash.h>
 
 #include <algorithm>
 
@@ -50,7 +50,8 @@ CSipHasher& CSipHasher::Write(uint64_t data)
 
 
 /// Load a uint64_t from 0 to 7 bytes.
-inline uint64_t ReadU64ByLenLE(const unsigned char* data, size_t len) {
+inline uint64_t ReadU64ByLenLE(const unsigned char* data, size_t len)
+{
     assert(len < 8);
     uint64_t out = 0;
     for (size_t i = 0; i < len; ++i) {
@@ -85,7 +86,7 @@ CSipHasher& CSipHasher::Write(const unsigned char* data, size_t size)
 
     auto i = needed;
     while (i < len - left) {
-        uint64_t mi = ReadLE64(data+i);
+        uint64_t mi = ReadLE64(data + i);
         v3 ^= mi;
         SIPROUND;
         SIPROUND;
diff --git a/src/crypto/siphash.h b/src/crypto/siphash.h
index 698f9c310..ddf192e8a 100644
--- a/src/crypto/siphash.h
+++ b/src/crypto/siphash.h
@@ -14,7 +14,7 @@ class CSipHasher
 {
 private:
     uint64_t v[4];
-    size_t count; // total amount of bytes inputted.
+    size_t count;  // total amount of bytes inputted.
     uint64_t tail; // bytes that weren't processed yet.
 
 public:

src/bench/crypto_hash.cpp Outdated Show resolved Hide resolved
@elichai elichai requested a review from sipa Jan 30, 2020
Copy link

@m2709 m2709 left a comment

.

@bitcoin bitcoin deleted a comment from m2709 Jan 31, 2020
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this issue Feb 9, 2020
@elichai
Copy link
Contributor Author

elichai commented Feb 23, 2020

I decided to drop commit 2b32471 because it's more controversial than I thought (there might be software that relies on the current formatting and there's #18011 coming up)

Copy link
Member

@jonatack jonatack left a comment

Concept ACK

before change of formatting

$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"

# Benchmark, evals, iterations, total, min, max, median
SipHash_32b, 5, 40000000, 6.28913, 3.10429e-08, 3.27799e-08, 3.11495e-08

# Benchmark, evals, iterations, total, min, max, median
SipHash_32b, 5, 40000000, 6.33962, 3.10606e-08, 3.30034e-08, 3.12351e-08

# Benchmark, evals, iterations, total, min, max, median
SipHash_32b, 5, 40000000, 6.55364, 3.12087e-08, 3.75397e-08, 3.16322e-08

after change of formatting and added benchmarks

((HEAD detached at 52aa1f380b))$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"

#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         7.58726     0.00214442  0.00219727  0.00217387  
SipHash_32b                                     5           40000000    6.32856     3.11782e-08 3.25053e-08 3.1465e-08  
SipHash_3b                                      5           40000000    4.3802      2.16468e-08 2.21054e-08 2.19273e-08 

#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         7.82075     0.0021651   0.0024746   0.00218246  
SipHash_32b                                     5           40000000    6.39722     3.12934e-08 3.3359e-08  3.16498e-08 
SipHash_3b                                      5           40000000    4.6185      2.18849e-08 2.52748e-08 2.30488e-08 

#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         8.12191     0.00215675  0.00248114  0.00232176  
SipHash_32b                                     5           40000000    6.33754     3.12737e-08 3.24442e-08 3.16561e-08 
SipHash_3b                                      5           40000000    4.3563      2.13873e-08 2.21485e-08 2.17682e-08

after change of algorithm implementation

(pr/18014)$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"

#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         2.0267      0.000571765 0.000590692 0.000579299 
SipHash_32b                                     5           40000000    6.27794     3.10675e-08 3.209e-08   3.13121e-08 
SipHash_3b                                      5           40000000    4.07212     2.01261e-08 2.06954e-08 2.02564e-08 

#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         2.13875     0.000582848 0.000627614 0.000614234 
SipHash_32b                                     5           40000000    6.27427     3.10577e-08 3.16413e-08 3.13393e-08 
SipHash_3b                                      5           40000000    4.02383     1.9983e-08  2.05697e-08 2.00265e-08 

#Benchmark                                      evals       iterations  total       min         max         median      
SipHash                                         5           700         2.05042     0.000577004 0.00060258  0.000584264 
SipHash_32b                                     5           40000000    6.32065     3.10846e-08 3.25604e-08 3.14578e-08 
SipHash_3b                                      5           40000000    4.03686     1.96887e-08 2.08056e-08 2.02039e-08 

Will look at the algorithm.

uint64_t tmp;
int count;
size_t count; // total amount of bytes inputted.
uint64_t tail; // bytes that weren't processed yet.
Copy link
Member

@jonatack jonatack Feb 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps sort these entries L::16-18

Copy link
Contributor

@PastaPastaPasta PastaPastaPasta Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you want to sort them? This order seems to make sense to me?

@jonatack
Copy link
Member

jonatack commented Feb 23, 2020

I decided to drop commit 2b32471 because it's more controversial than I thought (there might be software that relies on the current formatting and there's #18011 coming up)

Right, will be easier to review as more focused; it was 2 PRs in one before.

@jonatack
Copy link
Member

jonatack commented Jul 6, 2020

Sorry for the delay @elichai -- I still plan to review this.

@elichai
Copy link
Contributor Author

elichai commented Aug 2, 2020

Updated benchmarks with the new benchmarking library:

Before:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        2,982,785.00 |              335.26 |    0.2% |   23,000,202.00 |    7,146,700.00 |  3.218 |   2,125,014.00 |    0.0% |      0.03 | `SipHash`
|               39.53 |       25,296,689.57 |    0.1% |          237.03 |           94.79 |  2.501 |           3.00 |    0.1% |      0.00 | `SipHash_32b`
|               27.87 |       35,879,242.25 |    0.1% |          239.02 |           66.89 |  3.573 |          17.00 |    0.0% |      0.00 | `SipHash_3b`

After:

$ sudo pyperf system tune
$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          676,413.00 |            1,478.39 |    0.1% |    4,250,201.00 |    1,623,300.00 |  2.618 |     125,015.00 |    0.0% |      0.01 | `SipHash`
|               39.49 |       25,324,719.95 |    0.1% |          237.03 |           94.73 |  2.502 |           3.00 |    0.1% |      0.00 | `SipHash_32b`
|               23.75 |       42,108,786.61 |    0.2% |          193.02 |           57.00 |  3.387 |          16.00 |    0.0% |      0.00 | `SipHash_3b`

src/crypto/siphash.cpp Show resolved Hide resolved
@jonatack
Copy link
Member

jonatack commented Jan 15, 2021

Needs rebase.

@elichai
Copy link
Contributor Author

elichai commented Mar 19, 2021

With the benchmarks adjusted to bytes
Before:

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                1.79 |      559,246,583.00 |    0.2% |           17.00 |            4.29 |  3.967 |           2.13 |    0.0% |      0.02 | `SipHash`
|                1.24 |      806,638,658.38 |    0.1% |            7.44 |            2.98 |  2.497 |           0.09 |    0.2% |      0.00 | `SipHash_32b`
|               12.02 |       83,226,929.77 |    0.3% |           74.01 |           28.84 |  2.566 |           5.67 |    0.0% |      0.00 | `SipHash_3b`

After:

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                0.69 |    1,452,369,540.91 |    0.4% |            4.25 |            1.63 |  2.610 |           0.13 |    0.0% |      0.01 | `SipHash`
|                1.24 |      804,669,432.11 |    0.1% |            7.44 |            2.98 |  2.493 |           0.09 |    0.1% |      0.00 | `SipHash_32b`
|                9.93 |      100,677,637.95 |    0.1% |           65.67 |           23.86 |  2.752 |           5.33 |    0.0% |      0.00 | `SipHash_3b`

@sipa
Copy link
Member

sipa commented Mar 20, 2021

utACK 19e28a4

Copy link
Contributor

@PastaPastaPasta PastaPastaPasta left a comment

this PR seems quite dead, but if it gets revived, please do the following

assert(len < 8);
uint64_t out = 0;
for (size_t i = 0; i < len; ++i) {
out |= (uint64_t)data[i] << (i * 8);
Copy link
Contributor

@PastaPastaPasta PastaPastaPasta Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please convert this to a c++11 functional cast

@@ -77,12 +107,14 @@ uint64_t CSipHasher::Finalize() const
{
uint64_t v0 = v[0], v1 = v[1], v2 = v[2], v3 = v[3];

uint64_t t = tmp | (((uint64_t)count) << 56);

uint64_t t = tail | (((uint64_t)count) << 56);
Copy link
Contributor

@PastaPastaPasta PastaPastaPasta Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use a c++11 functional-cast

@MarcoFalke
Copy link
Member

MarcoFalke commented Feb 1, 2022

this PR seems quite dead

I wouldn't say it is dead. It compiles and (unit) tests fine on itself and on current master. All it needs is review.

Copy link
Contributor

@PastaPastaPasta PastaPastaPasta left a comment

My comment on aliveness was just that it's been about 11 months since it got a review, not to say that there is anything problematic with the pr. Maybe since I commented some others will be notified and will do a review :)

For what it's worth, I don't see any major issues with the PR, although I did add a few more comments of things that likely should be changed

size_t len = size - needed;
auto left = len & 0x07;

auto i = needed;
while (i < len - left) {
uint64_t mi = ReadLE64(data + i);
v3 ^= mi;
SIPROUND;
SIPROUND;
v0 ^= mi;
i += 8;
}

Copy link
Contributor

@PastaPastaPasta PastaPastaPasta Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be a for loop

uint64_t tmp;
int count;
size_t count; // total amount of bytes inputted.
uint64_t tail; // bytes that weren't processed yet.
Copy link
Contributor

@PastaPastaPasta PastaPastaPasta Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you want to sort them? This order seems to make sense to me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants