New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIP 158: Compact Block Filters for Light Clients #12254

Merged
merged 14 commits into from Aug 26, 2018

Conversation

@jimpo
Contributor

jimpo commented Jan 24, 2018

This implements the compact block filter construction in BIP 158. The code is not used anywhere in the Bitcoin Core code base yet. The next step towards BIP 157 support would be to create an indexing module similar to TxIndex that constructs the basic and extended filters for each validated block.

Filter Sizes

Here is a CSV of filter sizes for blocks in the main chain.

As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

filter_sizes

The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios for blocks after height 150,000:

Stat Filter Type
Weighted Size Ratio Mean 0.0198
Size Ratio Mean 0.0224
Size Ratio Std Deviation 0.0202
Mean Element Size (bits) 21.145
Approx Theoretical Min Element Size (bits) 21.025

@fanquake fanquake added the Consensus label Jan 24, 2018

@MeshCollider

This comment has been minimized.

Member

MeshCollider commented Jan 24, 2018

Big Concept ACK, excited about this

@laanwj

This comment has been minimized.

Member

laanwj commented Jan 24, 2018

Should this be labeled consensus? This is a P2P change, right?

@jimpo

This comment has been minimized.

Contributor

jimpo commented Jan 24, 2018

@laanwj This is a data structure to be used in a P2P change. I first thought that it shouldn't be tagged "Consensus", but there's an argument to be made for it. It doesn't affect blockchain consensus, but it is kind of a softer P2P consensus change, where network clients (though not other full nodes) may disconnect/ban you if you serve incorrectly computed block filters. I'll let you make the call on the tag.

@sipa

This comment has been minimized.

Member

sipa commented Jan 24, 2018

Any fork that can be resolved by a P2P adaptor that speaks both protocols is not a consensus change.

@laanwj laanwj added P2P and removed Consensus labels Jan 24, 2018

@laanwj

This comment has been minimized.

Member

laanwj commented Jan 24, 2018

This is a data structure to be used in a P2P change.

Thanks for the explanation. With "consensus" we mean the blockchain consensus rules code. Banning\disconnecting is a P2P level issue. So changing the label to P2P.

@jonasschnelli

This comment has been minimized.

Member

jonasschnelli commented Jan 25, 2018

Great work @jimpo!
Big Concept ACK,... will help to get this done.

throw std::invalid_argument("N must be <2^32");
}
// Surface any errors decoding the filter on construction.

This comment has been minimized.

@jonasschnelli

jonasschnelli Feb 9, 2018

Member

Not sure, but I guess that comments belongs to L113?

This comment has been minimized.

@jimpo

jimpo Feb 9, 2018

Contributor

No, the idea is that the below lines fully decode the filter in the constructor so that any errors decoding get raised during construction rather than when it is first matched against. I'll elaborate on the comment.

@jonasschnelli

This comment has been minimized.

Member

jonasschnelli commented Feb 9, 2018

Reviewed and tested a bit... nice, clean PR!
I would wish we had more test vectors...

@Sjors

This comment has been minimized.

Member

Sjors commented Feb 9, 2018

Concept ACK. Would it useful to add some (hidden) RPC commands so other developers can test it?

@jimpo

This comment has been minimized.

Contributor

jimpo commented Feb 9, 2018

@jonasschnelli Thanks for reviewing. The test vectors were generated from a Go program I have that cross-validates against the btcsuite implementation. I can easily add any specific testnet blocks to the list of cases. The blocks were chosen to exercise certain edge cases (eg. empty filters, duplicate pushdatas, invalid output scripts), but the vectors aren't commented with which edges cases they exercise. I'll add the comments, because it seems worthwhile.

@Sjors I'd definitely like to see RPC commands to fetch specific filters and filter headers, but I think it makes more sense to do that after adding the filter index, so that the RPC handlers just have to look up a precomputed filter/header. (So basically, in a subsequent PR).

@jimpo jimpo force-pushed the jimpo:bip-158 branch 3 times, most recently from 69f0acd to 68777c5 Mar 11, 2018

@ryanofsky

(edited 2018-08-17)

utACK a23681b. Left minor comments (feel free to ignore). Overall code looks very good.

  • f154ded streams: Create VectorReader stream interface for vectors. (1/14)
  • bdb3419 streams: Unit test for VectorReader class. (2/14)
  • faaa4b8 streams: Implement BitStreamReader/Writer classes. (3/14)
  • e15e553 streams: Unit tests for BitStreamReader and BitStreamWriter. (4/14)
  • a4b03be blockfilter: Declare GCSFilter class for BIP 158 impl. (5/14)
  • 515af03 blockfilter: Implement GCSFilter constructors. (6/14)
  • 6c1262f blockfilter: Implement GCSFilter Match methods. (7/14)
  • e986211 blockfilter: Simple test for GCSFilter construction and Match. (8/14)
  • f7a5bdb blockfilter: Construction of basic block filters. (9/14)
  • c14e4d5 blockfilter: Serialization methods on BlockFilter. (10/14)
  • ab852f9 blockfilter: Additional helper methods to compute hash and header. (11/14)
  • 9cf17eb blockfilter: Unit test against BIP 158 test vectors. (12/14)
  • 2837e40 blockfilter: Optimization on compilers with int128 support. (13/14)
  • a23681b bench: Benchmark GCS filter creation and matching. (14/14)
@@ -138,6 +138,80 @@ class CVectorWriter
size_t nPos;
};
/* Minimal stream for reading from an existing vector by reference
*/
class CVectorReader

This comment has been minimized.

@ryanofsky

ryanofsky Mar 15, 2018

Contributor

In commit "streams: Create CVectorReader stream interface for vectors." (93f702b)

This is pretty similar to the VectorReader class @TheBlueMatt is adding here: TheBlueMatt@bb608a9 for master...TheBlueMatt:2018-02-miningserver

Your implementation is more general with support for deserialization in the constructor and more complete comments. But his has a pos() method and uses non-hungarian names which are recommended by the contrib guide. Anyway you may want to incorporate some of his changes.

This comment has been minimized.

@jimpo

jimpo Mar 19, 2018

Contributor

OK, I can bring that commit over instead or modify this one to remove the hungarian notation.

This comment has been minimized.

@sipa

sipa May 12, 2018

Member

Agree, it would be nice to align the two implementations.

This comment has been minimized.

@jimpo

jimpo May 14, 2018

Contributor

Done.

private:
const int nType;
const int nVersion;
const std::vector<unsigned char>& vchData;

This comment has been minimized.

@ryanofsky

ryanofsky Mar 15, 2018

Contributor

In commit "streams: Create CVectorReader stream interface for vectors." (93f702b)

It would be nice if this just had const unsigned char* and size_t members instead of a requiring a reference to an actual vector. That way the class could be used to efficiently deserialize from any memory location, and be compatible with other containers like std::string.

This comment has been minimized.

@jimpo

jimpo Mar 19, 2018

Contributor

I'd rather not deal with raw pointers because it leaves space for unsafe accesses. If generality is a concern, I'd prefer a templated approach with random access iterators.

This comment has been minimized.

@sipa

sipa May 12, 2018

Member

An alternative (which could be done later, not a blocker for this PR) is using the Span class that was introduced in #12886 and is being extended in #13062).

This comment has been minimized.

@ryanofsky

ryanofsky Aug 15, 2018

Contributor

Re: #12254 (comment)

I think it would be good to change VectorReader to SpanReader now that Span exists. It would be a simple change, and better describe what this class does, and make it more reusable.

@laanwj laanwj added this to Blockers in High-priority for review Mar 15, 2018

@Sjors

This comment has been minimized.

Member

Sjors commented Mar 15, 2018

@Sjors I'd definitely like to see RPC commands to fetch specific filters and filter headers, but I think it makes more sense to do that after adding the filter index, so that the RPC handlers just have to look up a precomputed filter/header. (So basically, in a subsequent PR).

Even a proof-of-concept PR for that would be useful for review.

@jimpo

This comment has been minimized.

Contributor

jimpo commented Mar 20, 2018

@Sjors Here is a branch that exposes an RPC for testing/playing around: https://github.com/jimpo/bitcoin/tree/bip158-rpc. Is not intended to be merged for reasons stated above.

@laanwj laanwj removed this from Blockers in High-priority for review Mar 22, 2018

private:
IStream& m_istream;
uint8_t m_buffer;
int m_offset;

This comment has been minimized.

@ryanofsky

ryanofsky Apr 2, 2018

Contributor

In commit "streams: Implement BitStreamReader/Writer classes."

Comment for m_offset would be helpful. Maybe //!< Number of high order bits in m_buffer already returned by previous Read() calls.

public:
BitStreamReader(IStream& istream)
: m_istream(istream), m_buffer(0), m_offset(8) {}

This comment has been minimized.

@ryanofsky

ryanofsky Apr 2, 2018

Contributor

In commit "streams: Implement BitStreamReader/Writer classes."

Would be nice to initialize m_buffer, m_offset above, where they are declared (see "Initialize all non-static class members where they are defined" guideline from https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md). Similarly in BitStreamWriter below.

This comment has been minimized.

@jimpo

jimpo Apr 12, 2018

Contributor

Didn't know about that guideline. Will do.

template <typename OStream>
class BitStreamWriter

This comment has been minimized.

@ryanofsky

ryanofsky Apr 2, 2018

Contributor

In commit "streams: Implement BitStreamReader/Writer classes."

Would be nice to add a simple unit test writing values to a stream with BitStreamWriter, and then making sure same values are returned from BitStreamReader.

This comment has been minimized.

@jimpo

jimpo Apr 17, 2018

Contributor

Added in 5f67272.

public:
BitStreamWriter(OStream& ostream)

This comment has been minimized.

@ryanofsky

ryanofsky Apr 2, 2018

Contributor

In commit "streams: Implement BitStreamReader/Writer classes."

Would be good to add destructor either asserting m_offset == 0, or calling Flush(). Discarding bits that have been written but not flushed seems less safe than you might want as default behavior.

private:
OStream& m_ostream;
uint8_t m_buffer;
int m_offset;

This comment has been minimized.

@ryanofsky

ryanofsky Apr 2, 2018

Contributor

In commit "streams: Implement BitStreamReader/Writer classes."

Would add = 0; //!< Number of high-order bits in m_buffer that have been written but not yet flushed to the stream.

* This implements a Golomb-coded set as defined in BIP 158. It is a
* compact, probabilistic data structure for testing set membership.
*/
class GCSFilter

This comment has been minimized.

@ryanofsky

ryanofsky Apr 2, 2018

Contributor

In commit "blockfilter: Declare GCSFilter class for BIP 158 impl."

It seems cumbersome for this to be implemented as a class, since none of the class members can change after construction, and some of the stored state is redundant (m_F is derived from m_N and m_P), m_N is redundant with elements.size() and can be derived from m_encoded).

If this were a simple set of functions instead, like:

struct FilterParams { k0; k1; P; };

vector<char> BuildFilter(FilterParams, set<Elements>);

bool FilterContains(FilterParams, vector<char>, Element);

bool FilterContainsAny(FilterParams, vector<char>, set<Element>);

usage would be more obvious, and you could get rid of the current runtime throws and asserts checking for inconsistencies in the state.

This comment has been minimized.

@jimpo

jimpo Apr 17, 2018

Contributor

I originally tried something like that, but I preferred to make it an actual class. I see the ability to store derivable data in private fields and check data consistency before using the data as features of this approach.

constexpr int GCS_SER_VERSION = 0;
template <typename OStream>
static void GolombRiceEncode(BitStreamWriter<OStream>& bitwriter, uint8_t k, uint64_t n)

This comment has been minimized.

@ryanofsky

ryanofsky Apr 2, 2018

Contributor

In commit "blockfilter: Implement GCSFilter constructors."

Here and other places, N seems like it should be 32 bits instead 64.

This comment has been minimized.

@jimpo

jimpo Apr 17, 2018

Contributor

Why do you say that? This function can encode 64-bit ints if it needs to. Also, it would have to immediately get cast to a uint64_t either way.

This comment has been minimized.

@sipa

sipa May 12, 2018

Member

I don't understand this comment either.

This comment has been minimized.

@ryanofsky

ryanofsky Aug 17, 2018

Contributor

RE: #12254 (comment)

I don't understand this comment either.

The comment was just attached in the wrong place. In 15d529d m_N was uint64_t (fixed now).

template <typename OStream>
static void GolombRiceEncode(BitStreamWriter<OStream>& bitwriter, uint8_t k, uint64_t n)
{
// Write quotient as unary-encoded: q 1's followed by one 0.

This comment has been minimized.

@ryanofsky

ryanofsky Apr 2, 2018

Contributor

In commit "blockfilter: Implement GCSFilter constructors."

I wonder if the optimization below actually buys anything over a more direct:

for (int i = 0; i < q; ++i) bitwriter.Write(1, 1);
return false;
}
bool GCSFilter::MatchAny(const std::set<Element>& elements) const

This comment has been minimized.

@ryanofsky

ryanofsky Apr 2, 2018

Contributor

In commit "blockfilter: Implement GCSFilter Match methods."

Would suggest implementing Match and MatchAny in terms of a common

Match(const uint64_t* sorted_element_hashes, size_t size)

method to get rid of all the code duplication between the existing methods. No outside code would need to change, they could just call

Match(&query, 1);
Match(queries.data(), queries.size());

respectively.

This comment has been minimized.

@jimpo

jimpo Apr 17, 2018

Contributor

Good suggestion.

@practicalswift

This comment has been minimized.

Member

practicalswift commented Apr 15, 2018

Concept ACK

Nice work!

@braydonf

This comment has been minimized.

braydonf commented Apr 17, 2018

Has there been any work yet on using this to implement BIP 157? I've worked on indexing in the past, and could take a look at implementing it.

}
}
// Include all data pushes in output scripts.

This comment has been minimized.

@sipa

sipa Apr 17, 2018

Member

This does not seem up to date with the latest version of BIP158 (I'll review in full later).

This comment has been minimized.

@jimpo

jimpo Apr 17, 2018

Contributor

Fixed.

@jimpo jimpo force-pushed the jimpo:bip-158 branch from 68777c5 to 6bf7e0b Apr 17, 2018

@jimpo jimpo force-pushed the jimpo:bip-158 branch 2 times, most recently from 3e7bab0 to 6e4d41a May 1, 2018

@jimpo

This comment has been minimized.

Contributor

jimpo commented Aug 24, 2018

@ryanofsky Thank you for the review. I'm going to leave GCSElementSet typedef'ed to std::set in this PR and will open a separate PR after this is merged to change to unordered_set. Creating the custom hasher and putting it in an appropriate location requires shuffling some code around which is not relevant to this PR and I'd rather not add another reason to hold off on merging this.

@jimpo jimpo force-pushed the jimpo:bip-158 branch from 7c28333 to 2ba1533 Aug 24, 2018

@jimpo jimpo force-pushed the jimpo:bip-158 branch from 2ba1533 to 254c85b Aug 25, 2018

@laanwj

This comment has been minimized.

Member

laanwj commented Aug 26, 2018

Question for maintainers: While this PR was fun to review, I'm curious if there's a specific reason it wasn't merged previously? It has 4 other acks and doesn't touch existing functionality, and the new functionality is pretty well covered by test vectors.

didn't want to merge it last-minute for 0.17, but it can go in now IMO

@laanwj laanwj merged commit 254c85b into bitcoin:master Aug 26, 2018

1 of 2 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Aug 26, 2018

Merge #12254: BIP 158: Compact Block Filters for Light Clients
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen)
f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen)
97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen)
a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen)
cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen)
c1855f6 blockfilter: Construction of basic block filters. (Jim Posen)
53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen)
558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen)
cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen)
c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen)
9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen)
fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen)
87f2d9e streams: Unit test for VectorReader class. (Jim Posen)
947133d streams: Create VectorReader stream interface for vectors. (Jim Posen)

Pull request description:

  This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block.

  ### Filter Sizes

  [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain.

  As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

  ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png)

  The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

  The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*:

  | Stat | Filter Type |
  |-------|--------------|
  | Weighted Size Ratio Mean | 0.0198 |
  | Size Ratio Mean | 0.0224 |
  | Size Ratio Std Deviation | 0.0202 |
  | Mean Element Size (bits) | 21.145 |
  | Approx Theoretical Min Element Size (bits) | 21.025 |

Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
for (const CTransactionRef& tx : block.vtx) {
for (const CTxOut& txout : tx->vout) {
const CScript& script = txout.scriptPubKey;
if (script[0] == OP_RETURN) continue;

This comment has been minimized.

@TheBlueMatt

TheBlueMatt Aug 26, 2018

Contributor

Isnt this an OOB read?

explicit BitStreamReader(IStream& istream) : m_istream(istream) {}
/** Read the specified number of bits from the stream. The data is returned
* in the nbits least signficant bits of a 64-bit uint.

This comment has been minimized.

@practicalswift

practicalswift Aug 27, 2018

Member

Post-merge nit: "signficant" should be "significant".

This misspelling was automatically identified by codespell.

Automatic codespell checking is introduced in PR #13954. It warns (note: warn only, no build failure) when a PR introduces spelling errors. Please review :-)

laanwj added a commit that referenced this pull request Aug 31, 2018

Merge #14073: blockfilter: Avoid out-of-bounds script access.
f055995 blockfilter: Omit empty scripts from filter contents. (Jim Posen)

Pull request description:

  Caught during review of #12254 by @TheBlueMatt. #12254 (comment)

Tree-SHA512: cfc9e3eeaba12a14fd3d2e1ccce1a1f89e8cf44cc340ceec05d2d5fa61d27ff64e355603f4ad2184ff73c0ed23dfdab6e2103bddc48f3b76cb13b88d428770ac

laanwj added a commit that referenced this pull request Nov 6, 2018

Merge #14074: Use std::unordered_set instead of set in blockfilter in…
…terface

fef5adc blockfilter: Use unordered_set instead of set in blockfilter. (Jim Posen)
4fb789e Extract CSipHasher to it's own file in crypto/ directory. (Jim Posen)

Pull request description:

  Use `std::unordered_set` (hash set) instead of `std::set` (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured.

  This moves `CSipHasher` to it's own file `crypto/siphash.h`, so that it can be used in the libbitcoin_util library without including `hash.{h,cpp}`. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.

Tree-SHA512: 593d1abda771e45f2860d5334272980d20df0b81925a402bb9ee875e17595c2517c0d8ac9c579218b84bbf66e15b49418241c1fe9f9265719bcd2377b0cd0d88
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment