-
Notifications
You must be signed in to change notification settings - Fork 35.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use std::unordered_set instead of set in blockfilter interface #14074
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
src/util.h
Outdated
public: | ||
ByteVectorHash(); | ||
size_t operator()(const std::vector<unsigned char>& input) const; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think util.h
is the best place for a generic utility data structure such as this, it's more for assorted operating system functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laanwj What would be the right place? A new file utiltypes.{h,cpp}
? Or util/types.{h,cpp}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe support/bytevectorhash.h
?
unless there's a strong need to group this with other things
@@ -0,0 +1,47 @@ | |||
// Copyright (c) 2016-2018 The Bitcoin Core developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK on moving siphash to a separate unit
aa37070
to
f460835
Compare
utACK f4608359e4e643c6e197df822e03226146e37a49 |
v[2] = 0x6c7967656e657261ULL ^ k0; | ||
v[3] = 0x7465646279746573ULL ^ k1; | ||
count = 0; | ||
tmp = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize count
and tmp
using default member initializers or in the member initializer list of the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a move-only commit. I'd rather not modify the contents of the code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
@ryanofsky can you take a look here please? according to @jimpo this was your suggestion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimpo can you improve PR description of the advantages of this change (beside linking @ryanofsky suggestion)?
Do you have numbers to support this change?
Left some nits.
src/support/bytevectorhash.h
Outdated
#define BITCOIN_SUPPORT_BYTEVECTORHASH_H | ||
|
||
/** | ||
* Implementation of Hash named requirement for a byte vector. This may be used the hash function in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be used as the hash.. ?
src/support/bytevectorhash.h
Outdated
*/ | ||
class ByteVectorHash | ||
{ | ||
private: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop private
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more clear to leave it explicit even if the default for class
is private.
@@ -15,6 +15,7 @@ | |||
|
|||
#include <amount.h> | |||
#include <coins.h> | |||
#include <crypto/siphash.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in SaltedTxidHasher
src/util.cpp
Outdated
@@ -6,6 +6,7 @@ | |||
#include <util.h> | |||
|
|||
#include <chainparamsbase.h> | |||
#include <crypto/siphash.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed?
src/support/bytevectorhash.h
Outdated
#ifndef BITCOIN_SUPPORT_BYTEVECTORHASH_H | ||
#define BITCOIN_SUPPORT_BYTEVECTORHASH_H | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing includes <stdint.h>
and <vector>
.
src/support/bytevectorhash.h
Outdated
* std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random | ||
* instance of SipHash-2-4 is used. | ||
*/ | ||
class ByteVectorHash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, final.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK f4608359e4e643c6e197df822e03226146e37a49. Sorry for missing this previously. I confirmed the first commit is move-only, and I think the changes in the second commit should make it easier to use hash maps more places in the future. I left a suggestion below, but also think this change looks good as it is.
src/support/bytevectorhash.h
Outdated
|
||
/** | ||
* Implementation of Hash named requirement for a byte vector. This may be used the hash function in | ||
* std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "blockfilter: Use unordered_set instead of set in blockfilter." (f4608359e4e643c6e197df822e03226146e37a49)
The way this is written makes it seem like this whole class is tied to std::vector<unsigned char>
, when actually only the call operator is. I guess my concern that if someone wants to reuse this hash function on a similar type like Span<char>
, prevector
, or std::string
, the naming and comments here will lead them to copy/paste/rename this whole class instead of doing something simpler like overloading the call operator or adding a Span
conversion.
I'd suggest:
- Mentioning in this comment that
operator()
overloads for new types could be added here in the future. - Changing the name of the class from
ByteVectorHash
to something likeRandomizedSipHash
to avoid tying it tostd::vector
. - Maybe replacing
std::vector
withSpan
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment to say that it works for any types that internally store (or reference) a byte array, but decided not to change the class name.
7406129
to
cfabca8
Compare
utACK cfabca8749a132220e649d4563fb876afc21ceec |
Just one not comment: I always imagined the things in 'support' as low level features that are independently usable, and find it a bit strange to put something there that depends on |
@sipa I'd prefer to start creating more directory structure. How would you feel about a |
@jimpo Sounds great, but perhaps something to do independently? I'd prefer to not start a new convention, and then have it linger in a half finished state if there's unclarity about how to move forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK cfabca8749a132220e649d4563fb876afc21ceec. Only changes are updating ByteVectorHash comment, adding final
specifier, and tweaking includes.
2068f08 scripted-diff: Move util files to separate directory. (Jim Posen) Pull request description: As discussed [here](#14074 (comment)), this establishes a `util/` directory to introduce more organizational structure and have a clear place for new util files. It's really not scary to review, it's just one big scripted diff. Tree-SHA512: 39cf15480d7d35e987b6088d52a857a2d5b1802e36c6b815eb42718d80cd95e669757af9bcc7c04426cd8523662cb1050b8da1e2377d3730672820ed298b894b
This is a move-only commit with the exception of changes to includes.
cfabca8
to
fef5adc
Compare
@sipa I moved bytevectorhash to util/ now. |
utACK fef5adc |
…terface fef5adc blockfilter: Use unordered_set instead of set in blockfilter. (Jim Posen) 4fb789e Extract CSipHasher to it's own file in crypto/ directory. (Jim Posen) Pull request description: Use `std::unordered_set` (hash set) instead of `std::set` (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured. This moves `CSipHasher` to it's own file `crypto/siphash.h`, so that it can be used in the libbitcoin_util library without including `hash.{h,cpp}`. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is. Tree-SHA512: 593d1abda771e45f2860d5334272980d20df0b81925a402bb9ee875e17595c2517c0d8ac9c579218b84bbf66e15b49418241c1fe9f9265719bcd2377b0cd0d88
partial bitcoin#15638, bitcoin#21966, bitcoin#16889, merge bitcoin#14555, bitcoin#20499, bitcoin#14074, bitcoin#17073: util refactoring
Use
std::unordered_set
(hash set) instead ofstd::set
(tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured.This moves
CSipHasher
to it's own filecrypto/siphash.h
, so that it can be used in the libbitcoin_util library without includinghash.{h,cpp}
. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.