Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIP 158: Compact Block Filters for Light Clients #12254

Merged
merged 14 commits into from
Aug 26, 2018
Merged

Conversation

jimpo
Copy link
Contributor

@jimpo jimpo commented Jan 24, 2018

This implements the compact block filter construction in BIP 158. The code is not used anywhere in the Bitcoin Core code base yet. The next step towards BIP 157 support would be to create an indexing module similar to TxIndex that constructs the basic and extended filters for each validated block.

Filter Sizes

Here is a CSV of filter sizes for blocks in the main chain.

As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

filter_sizes

The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios for blocks after height 150,000:

Stat Filter Type
Weighted Size Ratio Mean 0.0198
Size Ratio Mean 0.0224
Size Ratio Std Deviation 0.0202
Mean Element Size (bits) 21.145
Approx Theoretical Min Element Size (bits) 21.025

@meshcollider
Copy link
Contributor

Big Concept ACK, excited about this

@laanwj
Copy link
Member

laanwj commented Jan 24, 2018

Should this be labeled consensus? This is a P2P change, right?

@jimpo
Copy link
Contributor Author

jimpo commented Jan 24, 2018

@laanwj This is a data structure to be used in a P2P change. I first thought that it shouldn't be tagged "Consensus", but there's an argument to be made for it. It doesn't affect blockchain consensus, but it is kind of a softer P2P consensus change, where network clients (though not other full nodes) may disconnect/ban you if you serve incorrectly computed block filters. I'll let you make the call on the tag.

@sipa
Copy link
Member

sipa commented Jan 24, 2018

Any fork that can be resolved by a P2P adaptor that speaks both protocols is not a consensus change.

@laanwj laanwj added P2P and removed Consensus labels Jan 24, 2018
@laanwj
Copy link
Member

laanwj commented Jan 24, 2018

This is a data structure to be used in a P2P change.

Thanks for the explanation. With "consensus" we mean the blockchain consensus rules code. Banning\disconnecting is a P2P level issue. So changing the label to P2P.

@jonasschnelli
Copy link
Contributor

Great work @jimpo!
Big Concept ACK,... will help to get this done.

throw std::invalid_argument("N must be <2^32");
}

// Surface any errors decoding the filter on construction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but I guess that comments belongs to L113?

Copy link
Contributor Author

@jimpo jimpo Feb 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the idea is that the below lines fully decode the filter in the constructor so that any errors decoding get raised during construction rather than when it is first matched against. I'll elaborate on the comment.

@jonasschnelli
Copy link
Contributor

Reviewed and tested a bit... nice, clean PR!
I would wish we had more test vectors...

@Sjors
Copy link
Member

Sjors commented Feb 9, 2018

Concept ACK. Would it useful to add some (hidden) RPC commands so other developers can test it?

@jimpo
Copy link
Contributor Author

jimpo commented Feb 9, 2018

@jonasschnelli Thanks for reviewing. The test vectors were generated from a Go program I have that cross-validates against the btcsuite implementation. I can easily add any specific testnet blocks to the list of cases. The blocks were chosen to exercise certain edge cases (eg. empty filters, duplicate pushdatas, invalid output scripts), but the vectors aren't commented with which edges cases they exercise. I'll add the comments, because it seems worthwhile.

@Sjors I'd definitely like to see RPC commands to fetch specific filters and filter headers, but I think it makes more sense to do that after adding the filter index, so that the RPC handlers just have to look up a precomputed filter/header. (So basically, in a subsequent PR).

@jimpo jimpo force-pushed the bip-158 branch 3 times, most recently from 69f0acd to 68777c5 Compare March 12, 2018 16:01
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edited 2018-08-17)

utACK a23681bd6382f44a1246bcdd430e6f54f71c1f99. Left minor comments (feel free to ignore). Overall code looks very good.

  • f154ded087a1a1f0c9748dd75ceb8531968b75f7 streams: Create VectorReader stream interface for vectors. (1/14)
  • bdb34199345ad2c6f98da5d3f303d0993f6c722c streams: Unit test for VectorReader class. (2/14)
  • faaa4b8432d194acb26ffcac07c379444da327ac streams: Implement BitStreamReader/Writer classes. (3/14)
  • e15e553b746e64f0eeda420e4c10eccfc747d69f streams: Unit tests for BitStreamReader and BitStreamWriter. (4/14)
  • a4b03be28699e62a92085e3b75f6a80fe0ad05ea blockfilter: Declare GCSFilter class for BIP 158 impl. (5/14)
  • 515af0398b8f2aab77645871198c69fea91a07ff blockfilter: Implement GCSFilter constructors. (6/14)
  • 6c1262fac124acc70599760798240027ddf58f8e blockfilter: Implement GCSFilter Match methods. (7/14)
  • e98621135aef92106c5b0a7f7b06877a275c19fb blockfilter: Simple test for GCSFilter construction and Match. (8/14)
  • f7a5bdb54ba597fe06bcb54e886c43436899ab48 blockfilter: Construction of basic block filters. (9/14)
  • c14e4d5a950ed40d96a8956e4622e1d9670da72d blockfilter: Serialization methods on BlockFilter. (10/14)
  • ab852f91602c268d894e8f882f7f8212fd39f9dd blockfilter: Additional helper methods to compute hash and header. (11/14)
  • 9cf17eb1518f9f7c96ceac4f8f91f13e8dd5c9c2 blockfilter: Unit test against BIP 158 test vectors. (12/14)
  • 2837e40ef94d9ae5133c0756ea9ec38d6591d730 blockfilter: Optimization on compilers with int128 support. (13/14)
  • a23681bd6382f44a1246bcdd430e6f54f71c1f99 bench: Benchmark GCS filter creation and matching. (14/14)

src/streams.h Outdated
@@ -138,6 +138,80 @@ class CVectorWriter
size_t nPos;
};

/* Minimal stream for reading from an existing vector by reference
*/
class CVectorReader
Copy link
Contributor

@ryanofsky ryanofsky Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "streams: Create CVectorReader stream interface for vectors." (93f702b08e413c5c025b155bfb62b721d27939f5)

This is pretty similar to the VectorReader class @TheBlueMatt is adding here: TheBlueMatt@bb608a9 for master...TheBlueMatt:2018-02-miningserver

Your implementation is more general with support for deserialization in the constructor and more complete comments. But his has a pos() method and uses non-hungarian names which are recommended by the contrib guide. Anyway you may want to incorporate some of his changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can bring that commit over instead or modify this one to remove the hungarian notation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it would be nice to align the two implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/streams.h Outdated
private:
const int nType;
const int nVersion;
const std::vector<unsigned char>& vchData;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "streams: Create CVectorReader stream interface for vectors." (93f702b08e413c5c025b155bfb62b721d27939f5)

It would be nice if this just had const unsigned char* and size_t members instead of a requiring a reference to an actual vector. That way the class could be used to efficiently deserialize from any memory location, and be compatible with other containers like std::string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not deal with raw pointers because it leaves space for unsafe accesses. If generality is a concern, I'd prefer a templated approach with random access iterators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative (which could be done later, not a blocker for this PR) is using the Span class that was introduced in #12886 and is being extended in #13062).

Copy link
Contributor

@ryanofsky ryanofsky Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: #12254 (comment)

I think it would be good to change VectorReader to SpanReader now that Span exists. It would be a simple change, and better describe what this class does, and make it more reusable.

@Sjors
Copy link
Member

Sjors commented Mar 15, 2018

@Sjors I'd definitely like to see RPC commands to fetch specific filters and filter headers, but I think it makes more sense to do that after adding the filter index, so that the RPC handlers just have to look up a precomputed filter/header. (So basically, in a subsequent PR).

Even a proof-of-concept PR for that would be useful for review.

@jimpo
Copy link
Contributor Author

jimpo commented Mar 20, 2018

@Sjors Here is a branch that exposes an RPC for testing/playing around: https://github.com/jimpo/bitcoin/tree/bip158-rpc. Is not intended to be merged for reasons stated above.

src/streams.h Outdated
private:
IStream& m_istream;
uint8_t m_buffer;
int m_offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "streams: Implement BitStreamReader/Writer classes."

Comment for m_offset would be helpful. Maybe //!< Number of high order bits in m_buffer already returned by previous Read() calls.

src/streams.h Outdated

public:
BitStreamReader(IStream& istream)
: m_istream(istream), m_buffer(0), m_offset(8) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "streams: Implement BitStreamReader/Writer classes."

Would be nice to initialize m_buffer, m_offset above, where they are declared (see "Initialize all non-static class members where they are defined" guideline from https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md). Similarly in BitStreamWriter below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about that guideline. Will do.


template <typename OStream>
class BitStreamWriter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "streams: Implement BitStreamReader/Writer classes."

Would be nice to add a simple unit test writing values to a stream with BitStreamWriter, and then making sure same values are returned from BitStreamReader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 5f67272f5a2d8567faf7b139c0239bc2788ed8b1.

src/streams.h Outdated

public:
BitStreamWriter(OStream& ostream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "streams: Implement BitStreamReader/Writer classes."

Would be good to add destructor either asserting m_offset == 0, or calling Flush(). Discarding bits that have been written but not flushed seems less safe than you might want as default behavior.

src/streams.h Outdated
private:
OStream& m_ostream;
uint8_t m_buffer;
int m_offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "streams: Implement BitStreamReader/Writer classes."

Would add = 0; //!< Number of high-order bits in m_buffer that have been written but not yet flushed to the stream.

* This implements a Golomb-coded set as defined in BIP 158. It is a
* compact, probabilistic data structure for testing set membership.
*/
class GCSFilter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "blockfilter: Declare GCSFilter class for BIP 158 impl."

It seems cumbersome for this to be implemented as a class, since none of the class members can change after construction, and some of the stored state is redundant (m_F is derived from m_N and m_P), m_N is redundant with elements.size() and can be derived from m_encoded).

If this were a simple set of functions instead, like:

struct FilterParams { k0; k1; P; };

vector<char> BuildFilter(FilterParams, set<Elements>);

bool FilterContains(FilterParams, vector<char>, Element);

bool FilterContainsAny(FilterParams, vector<char>, set<Element>);

usage would be more obvious, and you could get rid of the current runtime throws and asserts checking for inconsistencies in the state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally tried something like that, but I preferred to make it an actual class. I see the ability to store derivable data in private fields and check data consistency before using the data as features of this approach.

constexpr int GCS_SER_VERSION = 0;

template <typename OStream>
static void GolombRiceEncode(BitStreamWriter<OStream>& bitwriter, uint8_t k, uint64_t n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "blockfilter: Implement GCSFilter constructors."

Here and other places, N seems like it should be 32 bits instead 64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you say that? This function can encode 64-bit ints if it needs to. Also, it would have to immediately get cast to a uint64_t either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE: #12254 (comment)

I don't understand this comment either.

The comment was just attached in the wrong place. In 15d529dfc2b65f2badaa520799187aff103af2b6 m_N was uint64_t (fixed now).

template <typename OStream>
static void GolombRiceEncode(BitStreamWriter<OStream>& bitwriter, uint8_t k, uint64_t n)
{
// Write quotient as unary-encoded: q 1's followed by one 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "blockfilter: Implement GCSFilter constructors."

I wonder if the optimization below actually buys anything over a more direct:

for (int i = 0; i < q; ++i) bitwriter.Write(1, 1);

return false;
}

bool GCSFilter::MatchAny(const std::set<Element>& elements) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "blockfilter: Implement GCSFilter Match methods."

Would suggest implementing Match and MatchAny in terms of a common

Match(const uint64_t* sorted_element_hashes, size_t size)

method to get rid of all the code duplication between the existing methods. No outside code would need to change, they could just call

Match(&query, 1);
Match(queries.data(), queries.size());

respectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion.

@practicalswift
Copy link
Contributor

Concept ACK

Nice work!

@braydonf
Copy link

Has there been any work yet on using this to implement BIP 157? I've worked on indexing in the past, and could take a look at implementing it.

}
}

// Include all data pushes in output scripts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem up to date with the latest version of BIP158 (I'll review in full later).

Copy link
Contributor Author

@jimpo jimpo Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK.

I'm very skeptical about the usefulness of the extended filter, and don't think it should be implemented in Bitcoin Core until there is a clear use case, but that's perhaps a discussion to be had about the BIP itself.

src/streams.h Outdated
@@ -138,6 +138,80 @@ class CVectorWriter
size_t nPos;
};

/* Minimal stream for reading from an existing vector by reference
*/
class CVectorReader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it would be nice to align the two implementations.

src/streams.h Outdated
private:
const int nType;
const int nVersion;
const std::vector<unsigned char>& vchData;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative (which could be done later, not a blocker for this PR) is using the Span class that was introduced in #12886 and is being extended in #13062).

@@ -509,12 +509,102 @@ class CDataStream
}
};

template <typename IStream>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "streams: Implement BitStreamReader/Writer classes.", any reason to not make these operate with a 64-bit (or even larger) buffer? That would both simplify the code (no need to loop in the read/write operations) and possibly improve performance (due to fewer read/flush calls to the underlying buffer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps could improve performance, but it'd be trickier to handle streams that are not aligned on 8-byte boundaries. Since the buffer size is an implementation detail, would you be OK leaving that optimization for a later PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's look at this later. It would complicate the reader indeed. For the writer I think it's pretty straightfoward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @jimpo, used IStream and OStream should be buffered implementations, not this specific algo.

constexpr int GCS_SER_VERSION = 0;

template <typename OStream>
static void GolombRiceEncode(BitStreamWriter<OStream>& bitwriter, uint8_t k, uint64_t n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment either.

// See: https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
static uint64_t MapIntoRange(uint64_t x, uint64_t n)
{
// To perform the calculation on 64-bit numbers without losing the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a note here to use unsigned __int128 on supported platforms; that should be significantly faster than doing 4 separate multiplications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 07df986.


bool GCSFilter::MatchAny(const std::set<Element>& elements) const
{
const std::vector<uint64_t>&& queries = BuildHashedSet(elements);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the reference type is useful here; copy elision should apply, and otherwise at worst a move will occur.

throw std::ios_base::failure("N must be <2^32");
}

// Verify that the encoded filter contains exactly N elements. If it has too much or too little
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this worth it? The filter will be decoded twice in practice due to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll drop it if people prefer. I think having the constructor check its input makes for a better API. I'd also note that Core won't actually use these match methods unless it implements a light client mode -- they are just here for completeness and testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect we'll start using it for rescanning pretty quickly, regardless of protocol implementations.

Do you have any numbers for how fast it is to iterate through a 10000 element set or so?

Copy link
Contributor Author

@jimpo jimpo May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks added in f1b341a. Decoding a 10,000 element filter and matching against one missing element takes about 20us, and grows linearly in the number of elements (as expected).

explicit BitStreamReader(IStream& istream) : m_istream(istream) {}

/** Read the specified number of bits from the stream. The data is returned
* in the nbits least signficant bits of a 64-bit uint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-merge nit: "signficant" should be "significant".

This misspelling was automatically identified by codespell.

Automatic codespell checking is introduced in PR #13954. It warns (note: warn only, no build failure) when a PR introduces spelling errors. Please review :-)

laanwj added a commit that referenced this pull request Aug 31, 2018
f055995 blockfilter: Omit empty scripts from filter contents. (Jim Posen)

Pull request description:

  Caught during review of #12254 by @TheBlueMatt. #12254 (comment)

Tree-SHA512: cfc9e3eeaba12a14fd3d2e1ccce1a1f89e8cf44cc340ceec05d2d5fa61d27ff64e355603f4ad2184ff73c0ed23dfdab6e2103bddc48f3b76cb13b88d428770ac
laanwj added a commit that referenced this pull request Nov 6, 2018
…terface

fef5adc blockfilter: Use unordered_set instead of set in blockfilter. (Jim Posen)
4fb789e Extract CSipHasher to it's own file in crypto/ directory. (Jim Posen)

Pull request description:

  Use `std::unordered_set` (hash set) instead of `std::set` (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured.

  This moves `CSipHasher` to it's own file `crypto/siphash.h`, so that it can be used in the libbitcoin_util library without including `hash.{h,cpp}`. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.

Tree-SHA512: 593d1abda771e45f2860d5334272980d20df0b81925a402bb9ee875e17595c2517c0d8ac9c579218b84bbf66e15b49418241c1fe9f9265719bcd2377b0cd0d88
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jan 26, 2020
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen)
f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen)
97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen)
a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen)
cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen)
c1855f6 blockfilter: Construction of basic block filters. (Jim Posen)
53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen)
558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen)
cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen)
c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen)
9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen)
fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen)
87f2d9e streams: Unit test for VectorReader class. (Jim Posen)
947133d streams: Create VectorReader stream interface for vectors. (Jim Posen)

Pull request description:

  This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block.

  ### Filter Sizes

  [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain.

  As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

  ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png)

  The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

  The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*:

  | Stat | Filter Type |
  |-------|--------------|
  | Weighted Size Ratio Mean | 0.0198 |
  | Size Ratio Mean | 0.0224 |
  | Size Ratio Std Deviation | 0.0202 |
  | Mean Element Size (bits) | 21.145 |
  | Approx Theoretical Min Element Size (bits) | 21.025 |

Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 15, 2020
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen)
f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen)
97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen)
a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen)
cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen)
c1855f6 blockfilter: Construction of basic block filters. (Jim Posen)
53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen)
558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen)
cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen)
c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen)
9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen)
fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen)
87f2d9e streams: Unit test for VectorReader class. (Jim Posen)
947133d streams: Create VectorReader stream interface for vectors. (Jim Posen)

Pull request description:

  This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block.

  ### Filter Sizes

  [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain.

  As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

  ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png)

  The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

  The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*:

  | Stat | Filter Type |
  |-------|--------------|
  | Weighted Size Ratio Mean | 0.0198 |
  | Size Ratio Mean | 0.0224 |
  | Size Ratio Std Deviation | 0.0202 |
  | Mean Element Size (bits) | 21.145 |
  | Approx Theoretical Min Element Size (bits) | 21.025 |

Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 15, 2020
f055995 blockfilter: Omit empty scripts from filter contents. (Jim Posen)

Pull request description:

  Caught during review of bitcoin#12254 by @TheBlueMatt. bitcoin#12254 (comment)

Tree-SHA512: cfc9e3eeaba12a14fd3d2e1ccce1a1f89e8cf44cc340ceec05d2d5fa61d27ff64e355603f4ad2184ff73c0ed23dfdab6e2103bddc48f3b76cb13b88d428770ac
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 16, 2020
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen)
f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen)
97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen)
a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen)
cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen)
c1855f6 blockfilter: Construction of basic block filters. (Jim Posen)
53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen)
558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen)
cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen)
c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen)
9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen)
fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen)
87f2d9e streams: Unit test for VectorReader class. (Jim Posen)
947133d streams: Create VectorReader stream interface for vectors. (Jim Posen)

Pull request description:

  This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block.

  ### Filter Sizes

  [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain.

  As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

  ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png)

  The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

  The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*:

  | Stat | Filter Type |
  |-------|--------------|
  | Weighted Size Ratio Mean | 0.0198 |
  | Size Ratio Mean | 0.0224 |
  | Size Ratio Std Deviation | 0.0202 |
  | Mean Element Size (bits) | 21.145 |
  | Approx Theoretical Min Element Size (bits) | 21.025 |

Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 16, 2020
f055995 blockfilter: Omit empty scripts from filter contents. (Jim Posen)

Pull request description:

  Caught during review of bitcoin#12254 by @TheBlueMatt. bitcoin#12254 (comment)

Tree-SHA512: cfc9e3eeaba12a14fd3d2e1ccce1a1f89e8cf44cc340ceec05d2d5fa61d27ff64e355603f4ad2184ff73c0ed23dfdab6e2103bddc48f3b76cb13b88d428770ac
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 16, 2020
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen)
f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen)
97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen)
a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen)
cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen)
c1855f6 blockfilter: Construction of basic block filters. (Jim Posen)
53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen)
558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen)
cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen)
c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen)
9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen)
fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen)
87f2d9e streams: Unit test for VectorReader class. (Jim Posen)
947133d streams: Create VectorReader stream interface for vectors. (Jim Posen)

Pull request description:

  This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block.

  ### Filter Sizes

  [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain.

  As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

  ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png)

  The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

  The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*:

  | Stat | Filter Type |
  |-------|--------------|
  | Weighted Size Ratio Mean | 0.0198 |
  | Size Ratio Mean | 0.0224 |
  | Size Ratio Std Deviation | 0.0202 |
  | Mean Element Size (bits) | 21.145 |
  | Approx Theoretical Min Element Size (bits) | 21.025 |

Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 16, 2020
f055995 blockfilter: Omit empty scripts from filter contents. (Jim Posen)

Pull request description:

  Caught during review of bitcoin#12254 by @TheBlueMatt. bitcoin#12254 (comment)

Tree-SHA512: cfc9e3eeaba12a14fd3d2e1ccce1a1f89e8cf44cc340ceec05d2d5fa61d27ff64e355603f4ad2184ff73c0ed23dfdab6e2103bddc48f3b76cb13b88d428770ac
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 16, 2020
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen)
f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen)
97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen)
a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen)
cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen)
c1855f6 blockfilter: Construction of basic block filters. (Jim Posen)
53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen)
558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen)
cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen)
c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen)
9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen)
fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen)
87f2d9e streams: Unit test for VectorReader class. (Jim Posen)
947133d streams: Create VectorReader stream interface for vectors. (Jim Posen)

Pull request description:

  This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block.

  ### Filter Sizes

  [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain.

  As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

  ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png)

  The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

  The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*:

  | Stat | Filter Type |
  |-------|--------------|
  | Weighted Size Ratio Mean | 0.0198 |
  | Size Ratio Mean | 0.0224 |
  | Size Ratio Std Deviation | 0.0202 |
  | Mean Element Size (bits) | 21.145 |
  | Approx Theoretical Min Element Size (bits) | 21.025 |

Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 16, 2020
f055995 blockfilter: Omit empty scripts from filter contents. (Jim Posen)

Pull request description:

  Caught during review of bitcoin#12254 by @TheBlueMatt. bitcoin#12254 (comment)

Tree-SHA512: cfc9e3eeaba12a14fd3d2e1ccce1a1f89e8cf44cc340ceec05d2d5fa61d27ff64e355603f4ad2184ff73c0ed23dfdab6e2103bddc48f3b76cb13b88d428770ac
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 16, 2020
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen)
f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen)
97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen)
a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen)
cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen)
c1855f6 blockfilter: Construction of basic block filters. (Jim Posen)
53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen)
558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen)
cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen)
c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen)
9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen)
fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen)
87f2d9e streams: Unit test for VectorReader class. (Jim Posen)
947133d streams: Create VectorReader stream interface for vectors. (Jim Posen)

Pull request description:

  This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block.

  ### Filter Sizes

  [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain.

  As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

  ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png)

  The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

  The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*:

  | Stat | Filter Type |
  |-------|--------------|
  | Weighted Size Ratio Mean | 0.0198 |
  | Size Ratio Mean | 0.0224 |
  | Size Ratio Std Deviation | 0.0202 |
  | Mean Element Size (bits) | 21.145 |
  | Approx Theoretical Min Element Size (bits) | 21.025 |

Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 16, 2020
f055995 blockfilter: Omit empty scripts from filter contents. (Jim Posen)

Pull request description:

  Caught during review of bitcoin#12254 by @TheBlueMatt. bitcoin#12254 (comment)

Tree-SHA512: cfc9e3eeaba12a14fd3d2e1ccce1a1f89e8cf44cc340ceec05d2d5fa61d27ff64e355603f4ad2184ff73c0ed23dfdab6e2103bddc48f3b76cb13b88d428770ac
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 12, 2020
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen)
f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen)
97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen)
a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen)
cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen)
c1855f6 blockfilter: Construction of basic block filters. (Jim Posen)
53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen)
558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen)
cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen)
c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen)
9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen)
fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen)
87f2d9e streams: Unit test for VectorReader class. (Jim Posen)
947133d streams: Create VectorReader stream interface for vectors. (Jim Posen)

Pull request description:

  This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block.

  ### Filter Sizes

  [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain.

  As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

  ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png)

  The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

  The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*:

  | Stat | Filter Type |
  |-------|--------------|
  | Weighted Size Ratio Mean | 0.0198 |
  | Size Ratio Mean | 0.0224 |
  | Size Ratio Std Deviation | 0.0202 |
  | Mean Element Size (bits) | 21.145 |
  | Approx Theoretical Min Element Size (bits) | 21.025 |

Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 12, 2020
f055995 blockfilter: Omit empty scripts from filter contents. (Jim Posen)

Pull request description:

  Caught during review of bitcoin#12254 by @TheBlueMatt. bitcoin#12254 (comment)

Tree-SHA512: cfc9e3eeaba12a14fd3d2e1ccce1a1f89e8cf44cc340ceec05d2d5fa61d27ff64e355603f4ad2184ff73c0ed23dfdab6e2103bddc48f3b76cb13b88d428770ac
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 13, 2020
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen)
f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen)
97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen)
a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen)
cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen)
c1855f6 blockfilter: Construction of basic block filters. (Jim Posen)
53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen)
558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen)
cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen)
c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen)
9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen)
fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen)
87f2d9e streams: Unit test for VectorReader class. (Jim Posen)
947133d streams: Create VectorReader stream interface for vectors. (Jim Posen)

Pull request description:

  This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block.

  ### Filter Sizes

  [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain.

  As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

  ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png)

  The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

  The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*:

  | Stat | Filter Type |
  |-------|--------------|
  | Weighted Size Ratio Mean | 0.0198 |
  | Size Ratio Mean | 0.0224 |
  | Size Ratio Std Deviation | 0.0202 |
  | Mean Element Size (bits) | 21.145 |
  | Approx Theoretical Min Element Size (bits) | 21.025 |

Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 13, 2020
f055995 blockfilter: Omit empty scripts from filter contents. (Jim Posen)

Pull request description:

  Caught during review of bitcoin#12254 by @TheBlueMatt. bitcoin#12254 (comment)

Tree-SHA512: cfc9e3eeaba12a14fd3d2e1ccce1a1f89e8cf44cc340ceec05d2d5fa61d27ff64e355603f4ad2184ff73c0ed23dfdab6e2103bddc48f3b76cb13b88d428770ac
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 17, 2020
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen)
f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen)
97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen)
a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen)
cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen)
c1855f6 blockfilter: Construction of basic block filters. (Jim Posen)
53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen)
558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen)
cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen)
c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen)
9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen)
fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen)
87f2d9e streams: Unit test for VectorReader class. (Jim Posen)
947133d streams: Create VectorReader stream interface for vectors. (Jim Posen)

Pull request description:

  This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block.

  ### Filter Sizes

  [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain.

  As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

  ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png)

  The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

  The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*:

  | Stat | Filter Type |
  |-------|--------------|
  | Weighted Size Ratio Mean | 0.0198 |
  | Size Ratio Mean | 0.0224 |
  | Size Ratio Std Deviation | 0.0202 |
  | Mean Element Size (bits) | 21.145 |
  | Approx Theoretical Min Element Size (bits) | 21.025 |

Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 17, 2020
f055995 blockfilter: Omit empty scripts from filter contents. (Jim Posen)

Pull request description:

  Caught during review of bitcoin#12254 by @TheBlueMatt. bitcoin#12254 (comment)

Tree-SHA512: cfc9e3eeaba12a14fd3d2e1ccce1a1f89e8cf44cc340ceec05d2d5fa61d27ff64e355603f4ad2184ff73c0ed23dfdab6e2103bddc48f3b76cb13b88d428770ac
gades pushed a commit to cosanta/cosanta-core that referenced this pull request Jun 30, 2021
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen)
f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen)
97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen)
a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen)
cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen)
c1855f6 blockfilter: Construction of basic block filters. (Jim Posen)
53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen)
558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen)
cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen)
c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen)
9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen)
fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen)
87f2d9e streams: Unit test for VectorReader class. (Jim Posen)
947133d streams: Create VectorReader stream interface for vectors. (Jim Posen)

Pull request description:

  This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block.

  ### Filter Sizes

  [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain.

  As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks:

  ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png)

  The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements.

  The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*:

  | Stat | Filter Type |
  |-------|--------------|
  | Weighted Size Ratio Mean | 0.0198 |
  | Size Ratio Mean | 0.0224 |
  | Size Ratio Std Deviation | 0.0202 |
  | Mean Element Size (bits) | 21.145 |
  | Approx Theoretical Min Element Size (bits) | 21.025 |

Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
gades pushed a commit to cosanta/cosanta-core that referenced this pull request Jul 1, 2021
f055995 blockfilter: Omit empty scripts from filter contents. (Jim Posen)

Pull request description:

  Caught during review of bitcoin#12254 by @TheBlueMatt. bitcoin#12254 (comment)

Tree-SHA512: cfc9e3eeaba12a14fd3d2e1ccce1a1f89e8cf44cc340ceec05d2d5fa61d27ff64e355603f4ad2184ff73c0ed23dfdab6e2103bddc48f3b76cb13b88d428770ac
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.