Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use std::unordered_set instead of set in blockfilter interface #14074

Merged
merged 2 commits into from
Nov 6, 2018

Conversation

jimpo
Copy link
Contributor

@jimpo jimpo commented Aug 26, 2018

Use std::unordered_set (hash set) instead of std::set (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured.

This moves CSipHasher to it's own file crypto/siphash.h, so that it can be used in the libbitcoin_util library without including hash.{h,cpp}. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 26, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #14387 (Faster Input Deduplication Algorithm by JeremyRubin)
  • #14224 (Document intentional and unintentional unsigned integer overflows (wraparounds) using annotations by practicalswift)
  • #14121 (Index for BIP 157 block filters by jimpo)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/util.h Outdated
public:
ByteVectorHash();
size_t operator()(const std::vector<unsigned char>& input) const;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think util.h is the best place for a generic utility data structure such as this, it's more for assorted operating system functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laanwj What would be the right place? A new file utiltypes.{h,cpp}? Or util/types.{h,cpp}?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe support/bytevectorhash.h ?
unless there's a strong need to group this with other things

@@ -0,0 +1,47 @@
// Copyright (c) 2016-2018 The Bitcoin Core developers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK on moving siphash to a separate unit

@laanwj
Copy link
Member

laanwj commented Aug 31, 2018

utACK f4608359e4e643c6e197df822e03226146e37a49
verified move-onlyness of extraction commit

v[2] = 0x6c7967656e657261ULL ^ k0;
v[3] = 0x7465646279746573ULL ^ k1;
count = 0;
tmp = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize count and tmp using default member initializers or in the member initializer list of the constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a move-only commit. I'd rather not modify the contents of the code here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

@laanwj
Copy link
Member

laanwj commented Sep 10, 2018

@ryanofsky can you take a look here please? according to @jimpo this was your suggestion

Copy link
Member

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimpo can you improve PR description of the advantages of this change (beside linking @ryanofsky suggestion)?

Do you have numbers to support this change?

Left some nits.

#define BITCOIN_SUPPORT_BYTEVECTORHASH_H

/**
* Implementation of Hash named requirement for a byte vector. This may be used the hash function in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be used as the hash.. ?

*/
class ByteVectorHash
{
private:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more clear to leave it explicit even if the default for class is private.

@@ -15,6 +15,7 @@

#include <amount.h>
#include <coins.h>
#include <crypto/siphash.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in SaltedTxidHasher

src/util.cpp Outdated
@@ -6,6 +6,7 @@
#include <util.h>

#include <chainparamsbase.h>
#include <crypto/siphash.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed?

#ifndef BITCOIN_SUPPORT_BYTEVECTORHASH_H
#define BITCOIN_SUPPORT_BYTEVECTORHASH_H

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing includes <stdint.h> and <vector>.

* std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random
* instance of SipHash-2-4 is used.
*/
class ByteVectorHash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, final.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK f4608359e4e643c6e197df822e03226146e37a49. Sorry for missing this previously. I confirmed the first commit is move-only, and I think the changes in the second commit should make it easier to use hash maps more places in the future. I left a suggestion below, but also think this change looks good as it is.


/**
* Implementation of Hash named requirement for a byte vector. This may be used the hash function in
* std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "blockfilter: Use unordered_set instead of set in blockfilter." (f4608359e4e643c6e197df822e03226146e37a49)

The way this is written makes it seem like this whole class is tied to std::vector<unsigned char>, when actually only the call operator is. I guess my concern that if someone wants to reuse this hash function on a similar type like Span<char>, prevector, or std::string, the naming and comments here will lead them to copy/paste/rename this whole class instead of doing something simpler like overloading the call operator or adding a Span conversion.

I'd suggest:

  1. Mentioning in this comment that operator() overloads for new types could be added here in the future.
  2. Changing the name of the class from ByteVectorHash to something like RandomizedSipHash to avoid tying it to std::vector.
  3. Maybe replacing std::vector with Span.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment to say that it works for any types that internally store (or reference) a byte array, but decided not to change the class name.

@jimpo jimpo force-pushed the blockfilter-unordered-set branch 2 times, most recently from 7406129 to cfabca8 Compare October 10, 2018 18:22
@etscrivner
Copy link
Contributor

utACK cfabca8749a132220e649d4563fb876afc21ceec

@sipa
Copy link
Member

sipa commented Oct 11, 2018

Just one not comment: I always imagined the things in 'support' as low level features that are independently usable, and find it a bit strange to put something there that depends on random and crypto/siphash. Perhaps others have a different understanding, as I don't think it was ever written down somewhere what it was for, but I would just put it in src/.

@jimpo
Copy link
Contributor Author

jimpo commented Oct 11, 2018

@sipa I'd prefer to start creating more directory structure. How would you feel about a util/ directory instead and we can put it in there?

@sipa
Copy link
Member

sipa commented Oct 12, 2018

@jimpo Sounds great, but perhaps something to do independently? I'd prefer to not start a new convention, and then have it linger in a half finished state if there's unclarity about how to move forward.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK cfabca8749a132220e649d4563fb876afc21ceec. Only changes are updating ByteVectorHash comment, adding final specifier, and tweaking includes.

laanwj added a commit that referenced this pull request Nov 5, 2018
2068f08 scripted-diff: Move util files to separate directory. (Jim Posen)

Pull request description:

  As discussed [here](#14074 (comment)), this establishes a `util/` directory to introduce more organizational structure and have a clear place for new util files. It's really not scary to review, it's just one big scripted diff.

Tree-SHA512: 39cf15480d7d35e987b6088d52a857a2d5b1802e36c6b815eb42718d80cd95e669757af9bcc7c04426cd8523662cb1050b8da1e2377d3730672820ed298b894b
@jimpo
Copy link
Contributor Author

jimpo commented Nov 5, 2018

@sipa I moved bytevectorhash to util/ now.

@laanwj
Copy link
Member

laanwj commented Nov 6, 2018

utACK fef5adc

@laanwj laanwj merged commit fef5adc into bitcoin:master Nov 6, 2018
laanwj added a commit that referenced this pull request Nov 6, 2018
…terface

fef5adc blockfilter: Use unordered_set instead of set in blockfilter. (Jim Posen)
4fb789e Extract CSipHasher to it's own file in crypto/ directory. (Jim Posen)

Pull request description:

  Use `std::unordered_set` (hash set) instead of `std::set` (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured.

  This moves `CSipHasher` to it's own file `crypto/siphash.h`, so that it can be used in the libbitcoin_util library without including `hash.{h,cpp}`. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.

Tree-SHA512: 593d1abda771e45f2860d5334272980d20df0b81925a402bb9ee875e17595c2517c0d8ac9c579218b84bbf66e15b49418241c1fe9f9265719bcd2377b0cd0d88
@jimpo jimpo deleted the blockfilter-unordered-set branch November 6, 2018 16:59
kwvg added a commit to kwvg/dash that referenced this pull request Jun 15, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Jun 15, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Jun 16, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Jun 25, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Jun 25, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Jun 26, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Jun 26, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Jun 27, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Jun 27, 2021
UdjinM6 added a commit to dashpay/dash that referenced this pull request Jun 27, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants