-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new bip: compact client side filtering #609
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to address backward compatibility, at least briefly (eg, just say there isn't any).
We probably shouldn't copy the test vectors here. The external repo is good for that.
gcs_light_client.mediawiki
Outdated
Comments-URI: ??? | ||
Type: Standards Track | ||
Created: 05-24-2017 | ||
License: PD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please choose one of the recommended, or at least acceptable licenses (and also add a Copyright section below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunzo, we chose CCO-1.0
.
gcs_light_client.mediawiki
Outdated
37 as it provides a greater degree of privacy, utility, and also reduces the | ||
resources required for full-nodes to service this new light client mode | ||
compared to BIP 37. The light client mode described in this BIP can be seen as | ||
a "reversal"[1] of BIP 37: rather than the light clients sending filters to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1]
should be <ref>https://github.com/bitcoin/bips/blob/master/bip-0037.mediawiki</ref>
(and use <references/>
below) to make it auto-linkable. (suggestion, not mandatory)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* Transmission of runs of 1's is omitted. | ||
* Two 1's in a row are denoted by a zero-length run of zero. | ||
|
||
As an example, consider the following sequence of bits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should mention that k=4 in the example.
Also, I'm confused by the RLE scheme. How are runs of more than 2 1's represented? What would be the encoding of the below example with s/0{20}/0{15} 1 0{5}/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is underspecified and also probably not entirely relevant here - the point of the Specification section is to describe what you do, not some alternative ways to encode the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opted to move this to a "Background" appendix at the end of the document.
gcs_light_client.mediawiki
Outdated
Before we specify the details of our proposal, we'll first go over a few | ||
preliminaries which will aid in the understanding our proposal. | ||
|
||
By <code>[]byte</code> we refer to a slice (or array) of bytes. This value is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like "vector" is more appropriate than "array" (not just in a C++ sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified wording to say "vector" instead of array.
gcs_light_client.mediawiki
Outdated
</pre> | ||
|
||
RLE allows one to efficiently encode a data stream in a lossless manner. Due | ||
the the encoding of runs, RLE works best when encoding a set with a high degree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Due to the encoding
gcs_light_client.mediawiki
Outdated
</pre> | ||
|
||
[https://en.wikipedia.org/wiki/Golomb_coding Golomb Coding] encodes the two | ||
values (<code>q</code> and <code>m</code> for a given integer <code>n</code> as a two-tuple. The first value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: (q and r) for a given integer n
Also missing closing parenthesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
gcs_light_client.mediawiki
Outdated
c*m + r | ||
</pre> | ||
|
||
To aide in understanding we provide the following examples of using Golomb-Rice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: aid
. aide is a noun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
gcs_light_client.mediawiki
Outdated
|
||
In addition, to help optimize the algorithm, we use a fast range algorithm[7], | ||
multiplying the hashed value by F and taking only the top 64 bits. This fairly | ||
distributes the values over F without division and can be done with fewer cycles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't mention what strategy this algorithm uses fewer cycles than.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elaborated a bit more.
|
||
// Compute the difference between this value and the last one, divided | ||
// by P. This is our quotient. | ||
let quotient = (value - last_value - remainder) >> fp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to subtract the remainder? Just the shift is sufficient, no?
gcs_light_client.mediawiki
Outdated
|
||
The 6th service bit will now be dedicated to signaling support for the | ||
features described within this BIP: | ||
* <code>CFNodeCF = 1 << 6</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SFNodeCF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
message that allows light clients to ascertain which filters a node supports. | ||
|
||
The <code>getcftypes</code> message is an ''empty message'' whose command string is: | ||
<code>getcftypes</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioning the command string seems redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this spec very confusing...I could probably stumble my way through implementing it purely because of the test vectors, but I would certainly miss parts of it the first few times around. Also, there is way too much background information in the "Specification" section - it doesn't read as a specification, it reads as a general goal of what you want it to do.
* Transmission of runs of 1's is omitted. | ||
* Two 1's in a row are denoted by a zero-length run of zero. | ||
|
||
As an example, consider the following sequence of bits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is underspecified and also probably not entirely relevant here - the point of the Specification section is to describe what you do, not some alternative ways to encode the data.
fact to leverage a more efficient encoding scheme based on the distribution of | ||
the length of a run. The [https://en.wikipedia.org/wiki/Geometric_distribution | ||
Geometric Distribution] represents the probabilities of a number of failures | ||
before the first success in a series of Bernoulli trials (yes/no experiments). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph probably doesn't belong in the "Specification" section, motivation section seems much more appropriate, or maybe a "Background Math" section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion, move the bulk of the initial section to a "Background Math" section at the end of the document.
needs to sync a basic Bitcoin wallet. In order to facilitate this use-case, for | ||
each transaction, normal filters contain: | ||
* The outpoints of each input within a transaction. | ||
* The data-pushes contained within the public key script of each output within the transaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think "public key script" is the way we normally refer to it....I'd say stick with either "scriptPubKey" or "the output script".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to use "output script".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No?
<code>new_bit_stream</code> The <code>bit_stream</code> has two functions that | ||
operate on it, <code>unary_encode(stream, n)</code> and | ||
<code>write_bits_big_endian(stream, n, k)</code> where <code>unary_encode(steam, | ||
n)</code> emits n (an integer) to the stream in unary, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically underspecified - "unary" is not a normative encoding, and can be 0-indexed, 1-indexed and can encode as 0s followed by a 1 or 1s followed by a 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to make more explicit.
gcs_light_client.mediawiki
Outdated
|
||
The following routine describes the compression process: | ||
<pre> | ||
gcs_compress(sorted_set, fp) -> []byte: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, sorted_set? No where else does anything reference that the set must be sorted? Sorted in what order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hashed_set_construct
function above this returns a sorted set. Update to point this out, and make it explicit in the section above gcs_compress
.
gcs_light_client.mediawiki
Outdated
This filter header chain can be utilized by light clients to gain a greater | ||
degree of security against bamboozling full-nodes during their initial chain | ||
sync. In addition to fetching all the bitcoin headers, light clients | ||
implementing this BIP should also fetch all the ''filter headers'' from ''each'' of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section may be clearer if you use RFC words (which you appear to already be doing) and be clear that you are. Also, what is the advantage of fetching the filter headerS from each peer? It would be less bandwidth and just as much cross-checking to only fetch the top block's filter header after you've synced the headers chain (which is probably a much better approach in any case as you can first make sure you're on the best chain, and then download a filters chain, protecting you from some types of DoS issues). You can then do a backwards sync of the filters. If there is a mismatch you do a binary search or otherwise to figure out which peer is lying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to make the usage of RFC works explicit. The section below assumes the client has fully synced, as is fetching from tip. As a result, they already have the entire header chain. The naive way would be just to fetch the entire filter from each peer. This algo avoids that and only fetches the filter in the case a bamboozlement attempt by a peer.
gcs_light_client.mediawiki
Outdated
bandwidth), a light client instead does the following: | ||
|
||
<pre> | ||
verify_from_tip(tip_block_hash: [32]byte): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know why this needs to be in the normative "Specification" section - clients can use this how they feel, and I think this is probably not the best approach anyway. You may wish to say something like clients SHOULD avoid syncing the full filter chain from all peers, but instead should only download from multiple peers when an inconsistency is detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to the "Implementation Notes" section.
gcs_light_client.mediawiki
Outdated
// matches any relevant items. | ||
</pre> | ||
|
||
Light clients should persistently commit all filter headers to disk, as when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this should be normative, but if it is, I'd suggest using RFC words and use MAY here instead of SHOULD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to use MAY.
gcs_light_client.mediawiki
Outdated
positives, and the bandwidth used to download the filters themselves. The code along with a demo used for the paramter tuning can be found [here] | ||
|
||
|
||
=== Protocol Version Bump === |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not believe this is neccessary? There is a new service bit, which should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, all new extensions to the p2p protocol have been accompanied by a protocol version bump? (or at least ones that add new messages)
We define the concept of an abstract bit stream instantiated by the function: | ||
<code>new_bit_stream</code> The <code>bit_stream</code> has two functions that | ||
operate on it, <code>unary_encode(stream, n)</code> and | ||
<code>write_bits_big_endian(stream, n, k)</code> where <code>unary_encode(steam, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to much more seriously consider whether we want to, for the first time, introduce big endian at the purely-p2p layer - it is nice at a bit level, but it kinda sucks to have yet more endianness confusion in Bitcoin's P2P protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big endian just seemed natural initially. It's not as if we're using big-endian to encode a field that was previously encoded using little-endian. This is a new data-structure all together.
gcs_light_client.mediawiki
Outdated
achieve our desired false positive rate. | ||
|
||
In addition, to help optimize the algorithm, we use a fast range algorithm[7], | ||
multiplying the hashed value by F and taking only the top 64 bits. This fairly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64 bits appears as a magic number here. I would note that the 64 bits comes from the SipHash output size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specified.
@luke-jr AFAICT other BIPs do contain test vectors in folders named the same as the BIP. For example BIP-00039 contains the word list used. Added a section for backwards compatability. @TheBlueMatt thanks for the thorough review! Admittedly the initial section reads more like a blog post. Following your suggestion I've adopted RFC words where applicable throughout, and created a new section ("Mathematical Background") that houses what use to be the opening portion of the specification. |
As far as I can tell the privacy improvement is never quantified. It would seem that the independent downloading of less than all blocks (which is ultimately the objective) is the only source of privacy loss. However, given that privacy is a primary objective, it would be helpful if this analysis was explicit. I am specifically interested in whether there is any privacy loss in the filtering aspect (as compared to the operation of a full node). |
@Roasbeef I see you're doing a lot of revising of stuff. Let me know when you'd like this merged. (others commenting should note that this is just a draft, and merging does not preclude acting on comments in future revisions) There's no hard rule against test vectors in the repo. My opinion on where they belong is an opinion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reviewed the P2P part again.
==== Peer to Peer Service Bit ==== | ||
|
||
To start, we reserve a currently unutilized service bit. This is required as | ||
light clients SHOULD ''preferentially'' peer to full-nodes that support the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SHOULD is maybe wrong here - some light clients will almost certainly have to only connect to nodes that have the bit set. You could replace with MAY preferentially, or only connect to, as required, or just drop it.
light clients SHOULD ''preferentially'' peer to full-nodes that support the | ||
features outlined in this BIP. | ||
|
||
The 6th service bit will now be dedicated to signaling support for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess terminology...its kinda the 7th bit.
needs to sync a basic Bitcoin wallet. In order to facilitate this use-case, for | ||
each transaction, normal filters contain: | ||
* The outpoints of each input within a transaction. | ||
* The data-pushes contained within the public key script of each output within the transaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No?
features outlined in this BIP. | ||
|
||
The 6th service bit will now be dedicated to signaling support for the | ||
features described within this BIP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hugely under-defined - can you set the BIP if you're not a full node (ie is NODE_NETWORK implied or is that separate and you can look for eg NODE_NETWORK_LIMITED & NODE_CF)? Do you have to have implemented both currently-defined filter types (No is better for maintainability, but that has huge useability implications)? What does this bit mean without NODE_NETWORK? Maybe you always have to return the filter chain hashes but dont need to return the filters themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It would be ideal if pruned nodes could set this and still be usable by light wallets (I might even suggest that clients SHOULD support using pruned nodes... many users want pruned nodes, and their own node should be the expected server)
|
||
The 6th service bit will now be dedicated to signaling support for the | ||
features described within this BIP: | ||
* <code>SFNodeCF = 1 << 6</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the SF prefix here? I think generally BIPs have used Bitcoin Core's NODE_* constant naming (and generally use all-caps constant naming).
|
||
==== Compact Filter Header Chain ==== | ||
|
||
As the filters described in this BIP ''are not'' consensus critical, meaning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest you strike this portion - you later say that clients should detect and ban nodes that dont have a normative filter for each block, so its a kind of consensus, even if not "consensus critical" per se.
header hashed with zero bytes). Clients can take this fact into account in | ||
order to save a round trip for <code>null</code> headers. | ||
|
||
Light clients implementing this proposal SHOULD utilize the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very strong. Why SHOULD? Hell, why even MAY? It seems very unrelated.
|
||
However, there exists a special case of a <code>null</code> filter. This this | ||
case an empty byte slice is transmitted rather than consuming | ||
<code>4-bytes</code> to encode the size of zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the special case here? You're already sending many bytes anyway, dropping the extra 4 just seems...over optimization?
|
||
== Implementation Notes == | ||
|
||
This filter header chain SHOULD be utilized by light clients to gain a greater |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe s/by light clients/in place of BIP 37/
degree of security against bamboozling full-nodes during their initial chain | ||
sync. In addition to fetching all the bitcoin headers, light clients | ||
implementing this BIP SHOULD also fetch all the ''filter headers'' from | ||
''each'' of their connected peers. With these headers, light clients SHOULD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say they SHOULD only fetch the top header after doing a headers sync from all of their peers except one, from which they build the whole header tree.
require mapping set items to prime representatives in the associated group | ||
which can be preemptively expensive. | ||
|
||
==== Matrix Based Probabilistic Set Datastructures ==== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Datastructures" should probably be two words.
Closing this in favor of #636. Thanks to everyone who gave valuable review comments, especially @TheBlueMatt for his comments on the P2P aspect which led to notable improvements as proposed by @jimpo. |
This BIP describes a new light client node type for Bitcoin as well as the
modifications to current full-nodes required to support this new type of light
client. The light client mode described in this BIP is meant to supersede BIP
37 as it provides a greater degree of privacy, utility, and also reduces the
resources required for full-nodes to service this new light client mode
compared to BIP 37. The light client mode described in this BIP can be seen as
a "reversal" of BIP 37: rather than the light clients sending filters to
full-nodes, full-nodes send filters to light clients. Unlike BIP 37, we don't
utilize bloom filters. Instead, we utilize a compact filter (more efficient
than bloom filters) which leverages Golomb-Rice coding for compression.
Additionally, blocks are downloaded as a whole (from any source), rather than
directly from peers as fragments with merkle-branches proving their
authenticity.
We've tested the implementation of both a light client implementing this
BIP, as well as the serving full-node on Bitcoin's testnet over the past several months. Additionally, this new operating mode also serves as the basis for our light weight Lightning node.