Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use std::unordered_set instead of set in blockfilter interface #14074

Merged
merged 2 commits into from Nov 6, 2018

Conversation

@jimpo
Copy link
Contributor

@jimpo jimpo commented Aug 26, 2018

Use std::unordered_set (hash set) instead of std::set (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured.

This moves CSipHasher to it's own file crypto/siphash.h, so that it can be used in the libbitcoin_util library without including hash.{h,cpp}. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.

@DrahtBot
Copy link
Contributor

@DrahtBot DrahtBot commented Aug 26, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #14387 (Faster Input Deduplication Algorithm by JeremyRubin)
  • #14224 (Document intentional and unintentional unsigned integer overflows (wraparounds) using annotations by practicalswift)
  • #14121 (Index for BIP 157 block filters by jimpo)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/util.h Outdated
public:
ByteVectorHash();
size_t operator()(const std::vector<unsigned char>& input) const;
};
Copy link
Member

@laanwj laanwj Aug 27, 2018

I don't think util.h is the best place for a generic utility data structure such as this, it's more for assorted operating system functions

Copy link
Contributor Author

@jimpo jimpo Aug 27, 2018

@laanwj What would be the right place? A new file utiltypes.{h,cpp}? Or util/types.{h,cpp}?

Copy link
Member

@laanwj laanwj Aug 28, 2018

or maybe support/bytevectorhash.h ?
unless there's a strong need to group this with other things

@@ -0,0 +1,47 @@
// Copyright (c) 2016-2018 The Bitcoin Core developers
Copy link
Member

@laanwj laanwj Aug 27, 2018

ACK on moving siphash to a separate unit

@jimpo jimpo force-pushed the blockfilter-unordered-set branch from aa37070 to f460835 Aug 28, 2018
@laanwj
Copy link
Member

@laanwj laanwj commented Aug 31, 2018

utACK f4608359e4e643c6e197df822e03226146e37a49
verified move-onlyness of extraction commit

v[2] = 0x6c7967656e657261ULL ^ k0;
v[3] = 0x7465646279746573ULL ^ k1;
count = 0;
tmp = 0;
Copy link
Contributor

@practicalswift practicalswift Sep 10, 2018

Initialize count and tmp using default member initializers or in the member initializer list of the constructor?

Copy link
Contributor Author

@jimpo jimpo Sep 10, 2018

This is a move-only commit. I'd rather not modify the contents of the code here.

Copy link
Contributor

@practicalswift practicalswift Sep 10, 2018

Makes sense!

@laanwj
Copy link
Member

@laanwj laanwj commented Sep 10, 2018

@ryanofsky can you take a look here please? according to @jimpo this was your suggestion

Copy link
Member

@promag promag left a comment

@jimpo can you improve PR description of the advantages of this change (beside linking @ryanofsky suggestion)?

Do you have numbers to support this change?

Left some nits.

#define BITCOIN_SUPPORT_BYTEVECTORHASH_H

/**
* Implementation of Hash named requirement for a byte vector. This may be used the hash function in
Copy link
Member

@promag promag Oct 8, 2018

This may be used as the hash.. ?

*/
class ByteVectorHash
{
private:
Copy link
Member

@promag promag Oct 8, 2018

Drop private.

Copy link
Contributor Author

@jimpo jimpo Oct 9, 2018

I think it's more clear to leave it explicit even if the default for class is private.

@@ -15,6 +15,7 @@

#include <amount.h>
#include <coins.h>
#include <crypto/siphash.h>
Copy link
Member

@promag promag Oct 8, 2018

Can be removed?

Copy link
Contributor Author

@jimpo jimpo Oct 10, 2018

It's used in SaltedTxidHasher

src/util.cpp Outdated
@@ -6,6 +6,7 @@
#include <util.h>

#include <chainparamsbase.h>
#include <crypto/siphash.h>
Copy link
Member

@promag promag Oct 8, 2018

Can be removed?

#ifndef BITCOIN_SUPPORT_BYTEVECTORHASH_H
#define BITCOIN_SUPPORT_BYTEVECTORHASH_H

/**
Copy link
Member

@promag promag Oct 8, 2018

Missing includes <stdint.h> and <vector>.

* std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random
* instance of SipHash-2-4 is used.
*/
class ByteVectorHash
Copy link
Member

@promag promag Oct 8, 2018

nit, final.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

utACK f4608359e4e643c6e197df822e03226146e37a49. Sorry for missing this previously. I confirmed the first commit is move-only, and I think the changes in the second commit should make it easier to use hash maps more places in the future. I left a suggestion below, but also think this change looks good as it is.


/**
* Implementation of Hash named requirement for a byte vector. This may be used the hash function in
* std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random
Copy link
Contributor

@ryanofsky ryanofsky Oct 9, 2018

In commit "blockfilter: Use unordered_set instead of set in blockfilter." (f4608359e4e643c6e197df822e03226146e37a49)

The way this is written makes it seem like this whole class is tied to std::vector<unsigned char>, when actually only the call operator is. I guess my concern that if someone wants to reuse this hash function on a similar type like Span<char>, prevector, or std::string, the naming and comments here will lead them to copy/paste/rename this whole class instead of doing something simpler like overloading the call operator or adding a Span conversion.

I'd suggest:

  1. Mentioning in this comment that operator() overloads for new types could be added here in the future.
  2. Changing the name of the class from ByteVectorHash to something like RandomizedSipHash to avoid tying it to std::vector.
  3. Maybe replacing std::vector with Span.

Copy link
Contributor Author

@jimpo jimpo Oct 10, 2018

I updated the comment to say that it works for any types that internally store (or reference) a byte array, but decided not to change the class name.

@jimpo jimpo force-pushed the blockfilter-unordered-set branch 2 times, most recently from 7406129 to cfabca8 Oct 10, 2018
@etscrivner
Copy link
Contributor

@etscrivner etscrivner commented Oct 10, 2018

utACK cfabca8749a132220e649d4563fb876afc21ceec

@sipa
Copy link
Member

@sipa sipa commented Oct 11, 2018

Just one not comment: I always imagined the things in 'support' as low level features that are independently usable, and find it a bit strange to put something there that depends on random and crypto/siphash. Perhaps others have a different understanding, as I don't think it was ever written down somewhere what it was for, but I would just put it in src/.

@jimpo
Copy link
Contributor Author

@jimpo jimpo commented Oct 11, 2018

@sipa I'd prefer to start creating more directory structure. How would you feel about a util/ directory instead and we can put it in there?

@sipa
Copy link
Member

@sipa sipa commented Oct 12, 2018

@jimpo Sounds great, but perhaps something to do independently? I'd prefer to not start a new convention, and then have it linger in a half finished state if there's unclarity about how to move forward.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

utACK cfabca8749a132220e649d4563fb876afc21ceec. Only changes are updating ByteVectorHash comment, adding final specifier, and tweaking includes.

laanwj added a commit that referenced this issue Nov 5, 2018
2068f08 scripted-diff: Move util files to separate directory. (Jim Posen)

Pull request description:

  As discussed [here](#14074 (comment)), this establishes a `util/` directory to introduce more organizational structure and have a clear place for new util files. It's really not scary to review, it's just one big scripted diff.

Tree-SHA512: 39cf15480d7d35e987b6088d52a857a2d5b1802e36c6b815eb42718d80cd95e669757af9bcc7c04426cd8523662cb1050b8da1e2377d3730672820ed298b894b
@jimpo jimpo force-pushed the blockfilter-unordered-set branch from cfabca8 to fef5adc Nov 5, 2018
@jimpo
Copy link
Contributor Author

@jimpo jimpo commented Nov 5, 2018

@sipa I moved bytevectorhash to util/ now.

@laanwj
Copy link
Member

@laanwj laanwj commented Nov 6, 2018

utACK fef5adc

@laanwj laanwj merged commit fef5adc into bitcoin:master Nov 6, 2018
2 checks passed
laanwj added a commit that referenced this issue Nov 6, 2018
…terface

fef5adc blockfilter: Use unordered_set instead of set in blockfilter. (Jim Posen)
4fb789e Extract CSipHasher to it's own file in crypto/ directory. (Jim Posen)

Pull request description:

  Use `std::unordered_set` (hash set) instead of `std::set` (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured.

  This moves `CSipHasher` to it's own file `crypto/siphash.h`, so that it can be used in the libbitcoin_util library without including `hash.{h,cpp}`. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.

Tree-SHA512: 593d1abda771e45f2860d5334272980d20df0b81925a402bb9ee875e17595c2517c0d8ac9c579218b84bbf66e15b49418241c1fe9f9265719bcd2377b0cd0d88
@jimpo jimpo deleted the blockfilter-unordered-set branch Nov 6, 2018
kittywhiskers added a commit to kittywhiskers/dash that referenced this issue Jun 15, 2021
kittywhiskers added a commit to kittywhiskers/dash that referenced this issue Jun 15, 2021
kittywhiskers added a commit to kittywhiskers/dash that referenced this issue Jun 16, 2021
kittywhiskers added a commit to kittywhiskers/dash that referenced this issue Jun 25, 2021
kittywhiskers added a commit to kittywhiskers/dash that referenced this issue Jun 25, 2021
kittywhiskers added a commit to kittywhiskers/dash that referenced this issue Jun 26, 2021
kittywhiskers added a commit to kittywhiskers/dash that referenced this issue Jun 26, 2021
kittywhiskers added a commit to kittywhiskers/dash that referenced this issue Jun 27, 2021
kittywhiskers added a commit to kittywhiskers/dash that referenced this issue Jun 27, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

9 participants