Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use std::unordered_set instead of set in blockfilter interface #14074

Merged
merged 2 commits into from Nov 6, 2018

Conversation

Projects
None yet
9 participants
@jimpo
Copy link
Contributor

commented Aug 26, 2018

Use std::unordered_set (hash set) instead of std::set (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured.

This moves CSipHasher to it's own file crypto/siphash.h, so that it can be used in the libbitcoin_util library without including hash.{h,cpp}. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.

@DrahtBot

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #14387 (Faster Input Deduplication Algorithm by JeremyRubin)
  • #14224 (Document intentional and unintentional unsigned integer overflows (wraparounds) using annotations by practicalswift)
  • #14121 (Index for BIP 157 block filters by jimpo)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/util.h Outdated
public:
ByteVectorHash();
size_t operator()(const std::vector<unsigned char>& input) const;
};

This comment has been minimized.

Copy link
@laanwj

laanwj Aug 27, 2018

Member

I don't think util.h is the best place for a generic utility data structure such as this, it's more for assorted operating system functions

This comment has been minimized.

Copy link
@jimpo

jimpo Aug 27, 2018

Author Contributor

@laanwj What would be the right place? A new file utiltypes.{h,cpp}? Or util/types.{h,cpp}?

This comment has been minimized.

Copy link
@laanwj

laanwj Aug 28, 2018

Member

or maybe support/bytevectorhash.h ?
unless there's a strong need to group this with other things

@@ -0,0 +1,47 @@
// Copyright (c) 2016-2018 The Bitcoin Core developers

This comment has been minimized.

Copy link
@laanwj

laanwj Aug 27, 2018

Member

ACK on moving siphash to a separate unit

@jimpo jimpo force-pushed the jimpo:blockfilter-unordered-set branch Aug 28, 2018

@laanwj

This comment has been minimized.

Copy link
Member

commented Aug 31, 2018

utACK f4608359e4e643c6e197df822e03226146e37a49
verified move-onlyness of extraction commit

Show resolved Hide resolved src/support/bytevectorhash.cpp Outdated
v[2] = 0x6c7967656e657261ULL ^ k0;
v[3] = 0x7465646279746573ULL ^ k1;
count = 0;
tmp = 0;

This comment has been minimized.

Copy link
@practicalswift

practicalswift Sep 10, 2018

Member

Initialize count and tmp using default member initializers or in the member initializer list of the constructor?

This comment has been minimized.

Copy link
@jimpo

jimpo Sep 10, 2018

Author Contributor

This is a move-only commit. I'd rather not modify the contents of the code here.

This comment has been minimized.

Copy link
@practicalswift

practicalswift Sep 10, 2018

Member

Makes sense!

@laanwj

This comment has been minimized.

Copy link
Member

commented Sep 10, 2018

@ryanofsky can you take a look here please? according to @jimpo this was your suggestion

@promag
Copy link
Member

left a comment

@jimpo can you improve PR description of the advantages of this change (beside linking @ryanofsky suggestion)?

Do you have numbers to support this change?

Left some nits.

src/support/bytevectorhash.h Outdated
#define BITCOIN_SUPPORT_BYTEVECTORHASH_H

/**
* Implementation of Hash named requirement for a byte vector. This may be used the hash function in

This comment has been minimized.

Copy link
@promag

promag Oct 8, 2018

Member

This may be used as the hash.. ?

src/support/bytevectorhash.h Outdated
*/
class ByteVectorHash
{
private:

This comment has been minimized.

Copy link
@promag

promag Oct 8, 2018

Member

Drop private.

This comment has been minimized.

Copy link
@jimpo

jimpo Oct 9, 2018

Author Contributor

I think it's more clear to leave it explicit even if the default for class is private.

@@ -15,6 +15,7 @@

#include <amount.h>
#include <coins.h>
#include <crypto/siphash.h>

This comment has been minimized.

Copy link
@promag

promag Oct 8, 2018

Member

Can be removed?

This comment has been minimized.

Copy link
@jimpo

jimpo Oct 10, 2018

Author Contributor

It's used in SaltedTxidHasher

src/util.cpp Outdated
@@ -6,6 +6,7 @@
#include <util.h>

#include <chainparamsbase.h>
#include <crypto/siphash.h>

This comment has been minimized.

Copy link
@promag

promag Oct 8, 2018

Member

Can be removed?

src/support/bytevectorhash.h Outdated
#ifndef BITCOIN_SUPPORT_BYTEVECTORHASH_H
#define BITCOIN_SUPPORT_BYTEVECTORHASH_H

/**

This comment has been minimized.

Copy link
@promag

promag Oct 8, 2018

Member

Missing includes <stdint.h> and <vector>.

src/support/bytevectorhash.h Outdated
* std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random
* instance of SipHash-2-4 is used.
*/
class ByteVectorHash

This comment has been minimized.

Copy link
@promag

promag Oct 8, 2018

Member

nit, final.

@ryanofsky
Copy link
Contributor

left a comment

utACK f4608359e4e643c6e197df822e03226146e37a49. Sorry for missing this previously. I confirmed the first commit is move-only, and I think the changes in the second commit should make it easier to use hash maps more places in the future. I left a suggestion below, but also think this change looks good as it is.

src/support/bytevectorhash.h Outdated

/**
* Implementation of Hash named requirement for a byte vector. This may be used the hash function in
* std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Oct 9, 2018

Contributor

In commit "blockfilter: Use unordered_set instead of set in blockfilter." (f4608359e4e643c6e197df822e03226146e37a49)

The way this is written makes it seem like this whole class is tied to std::vector<unsigned char>, when actually only the call operator is. I guess my concern that if someone wants to reuse this hash function on a similar type like Span<char>, prevector, or std::string, the naming and comments here will lead them to copy/paste/rename this whole class instead of doing something simpler like overloading the call operator or adding a Span conversion.

I'd suggest:

  1. Mentioning in this comment that operator() overloads for new types could be added here in the future.
  2. Changing the name of the class from ByteVectorHash to something like RandomizedSipHash to avoid tying it to std::vector.
  3. Maybe replacing std::vector with Span.

This comment has been minimized.

Copy link
@jimpo

jimpo Oct 10, 2018

Author Contributor

I updated the comment to say that it works for any types that internally store (or reference) a byte array, but decided not to change the class name.

@jimpo jimpo force-pushed the jimpo:blockfilter-unordered-set branch 2 times, most recently Oct 10, 2018

@etscrivner

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2018

utACK cfabca8749a132220e649d4563fb876afc21ceec

@sipa

This comment has been minimized.

Copy link
Member

commented Oct 11, 2018

Just one not comment: I always imagined the things in 'support' as low level features that are independently usable, and find it a bit strange to put something there that depends on random and crypto/siphash. Perhaps others have a different understanding, as I don't think it was ever written down somewhere what it was for, but I would just put it in src/.

@jimpo

This comment has been minimized.

Copy link
Contributor Author

commented Oct 11, 2018

@sipa I'd prefer to start creating more directory structure. How would you feel about a util/ directory instead and we can put it in there?

@sipa

This comment has been minimized.

Copy link
Member

commented Oct 12, 2018

@jimpo Sounds great, but perhaps something to do independently? I'd prefer to not start a new convention, and then have it linger in a half finished state if there's unclarity about how to move forward.

@ryanofsky
Copy link
Contributor

left a comment

utACK cfabca8749a132220e649d4563fb876afc21ceec. Only changes are updating ByteVectorHash comment, adding final specifier, and tweaking includes.

laanwj added a commit that referenced this pull request Nov 5, 2018

Merge #14555: Move util files to directory
2068f08 scripted-diff: Move util files to separate directory. (Jim Posen)

Pull request description:

  As discussed [here](#14074 (comment)), this establishes a `util/` directory to introduce more organizational structure and have a clear place for new util files. It's really not scary to review, it's just one big scripted diff.

Tree-SHA512: 39cf15480d7d35e987b6088d52a857a2d5b1802e36c6b815eb42718d80cd95e669757af9bcc7c04426cd8523662cb1050b8da1e2377d3730672820ed298b894b

jimpo added some commits Aug 24, 2018

Extract CSipHasher to it's own file in crypto/ directory.
This is a move-only commit with the exception of changes to includes.

@jimpo jimpo force-pushed the jimpo:blockfilter-unordered-set branch to fef5adc Nov 5, 2018

@jimpo

This comment has been minimized.

Copy link
Contributor Author

commented Nov 5, 2018

@sipa I moved bytevectorhash to util/ now.

@laanwj

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

utACK fef5adc

@laanwj laanwj merged commit fef5adc into bitcoin:master Nov 6, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Nov 6, 2018

Merge #14074: Use std::unordered_set instead of set in blockfilter in…
…terface

fef5adc blockfilter: Use unordered_set instead of set in blockfilter. (Jim Posen)
4fb789e Extract CSipHasher to it's own file in crypto/ directory. (Jim Posen)

Pull request description:

  Use `std::unordered_set` (hash set) instead of `std::set` (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured.

  This moves `CSipHasher` to it's own file `crypto/siphash.h`, so that it can be used in the libbitcoin_util library without including `hash.{h,cpp}`. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.

Tree-SHA512: 593d1abda771e45f2860d5334272980d20df0b81925a402bb9ee875e17595c2517c0d8ac9c579218b84bbf66e15b49418241c1fe9f9265719bcd2377b0cd0d88

@jimpo jimpo deleted the jimpo:blockfilter-unordered-set branch Nov 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.