New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bloom filters #1795

Merged
merged 17 commits into from Jan 17, 2013

Conversation

Projects
None yet
9 participants
@TheBlueMatt
Contributor

TheBlueMatt commented Sep 6, 2012

Filter tx invs and add MSG_FILTERED_BLOCK to provide blocks as header+vector (not including tx itself)

@BitcoinPullTester

This comment has been minimized.

BitcoinPullTester commented Sep 8, 2012

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/bd93f61c2abe5b19e77030fa5859bc370c789a3a for binaries and test log.

@gavinandresen

This comment has been minimized.

Contributor

gavinandresen commented Sep 18, 2012

Very exciting.

Did you see this:
http://matthias.vallentin.net/blog/2011/06/a-garden-variety-of-bloom-filters/

He has a really interesting looking C++11 headers-only implementation here at github that might give you inspiration (e.g. I think he uses boost to do the hashing; he seems to get a ton done in a very small number of lines of code).
https://github.com/mavam/libbf

@jgarzik

This comment has been minimized.

Contributor

jgarzik commented Sep 27, 2012

implementation ACK, I like it a lot. This seems to match the initial sketch from @mikehearn and myself on IRC.

@jgarzik

This comment has been minimized.

Contributor

jgarzik commented Oct 3, 2012

Poke, let's get some additional review on this.

@mikehearn

This comment has been minimized.

Contributor

mikehearn commented Oct 24, 2012

Hey Matt, could you rebase this on top of ultraprune? I'm ready to start merging the bitcoinj side but want to test this with the latest code. Will do a review now-ish.

@mikehearn

This comment has been minimized.

Contributor

mikehearn commented Oct 24, 2012

ACK on this. Most of my comments are purely cosmetic except for the unit test commit being in the wrong place (it wouldn't compile in the order given).

I guess we'll need a BIP? I'll start on one that documents what we've come up with.

@TheBlueMatt

This comment has been minimized.

Contributor

TheBlueMatt commented Nov 2, 2012

Thanks mike for the good review (as usual). I updated the code to addresses most of the issues you mentioned.
Changes:

  • many additional comments
  • filteradd is now limited to 1MB, any larger and the node gets a Misbehaving(100).
  • fRelayTxn now defaults to false - this means we will no longer relay tx inv message before receiving the remote peer's version message, if this irks anyone it can be discussed.
    More changes to come (sipa's new merkle block stuff, more unit tests and a seed twiddle value for bloom filters)
@TheBlueMatt

This comment has been minimized.

Contributor

TheBlueMatt commented Nov 2, 2012

@mikehearn there are a few comments which note what the protocol defines as spec (these are obviously up for discussion):

  • there is no formal definition for filter parameters for a filteradd command if no filter is loaded - it is up to the serving node (if the client doest care, why should the spec determine the parameters?)
  • as noted above, 1MB limit to filteradd
  • txes in MSG_FILTERED_BLOCK can be relayed even when a node already has seen it (the current code does this) however it MUST always forward txn that the node has not seen (I think the bip says this, but its not really clearly worded imho, I updated the BIP to make this more clear)
@BitcoinPullTester

This comment has been minimized.

BitcoinPullTester commented Nov 2, 2012

Automatic sanity-testing: FAILED BUILD/TEST, see http://jenkins.bluematt.me/pull-tester/1860b72e356f0d4279b25bf1ca562ac00511eded for binaries and test log.

This could happen for one of several reasons:

  1. It chanages paths in makefile.linux-mingw or otherwise changes build scripts in a way that made them incompatible with the automated testing scripts
  2. It does not build on either Linux i386 or Win32 (via MinGW cross compile)
  3. The test suite fails on either Linux i386 or Win32
  4. The block test-cases failed (lookup the first bNN identifier which failed in https://github.com/TheBlueMatt/test-scripts/blob/master/FullBlockTestGenerator.java)
@TheBlueMatt

This comment has been minimized.

Contributor

TheBlueMatt commented Nov 2, 2012

@BitcoinPullTester hey, I wasnt done yet!

@BitcoinPullTester

This comment has been minimized.

BitcoinPullTester commented Nov 3, 2012

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/5be9d251b5901399c6f70b99f1c2bddc4e14e7ec for binaries and test log.

@TheBlueMatt

This comment has been minimized.

Contributor

TheBlueMatt commented Nov 12, 2012

Added the requested nTweak value and merged sipa's work on partial merkle tree representations. The BIP now needs rewritten and this is ready for second (third?) round reviews.

@sipa

This comment has been minimized.

Member

sipa commented Nov 18, 2012

Mostly ACK;

  • I'd prefer to have the Murmurhash3 implementation separate from CBloomFilter
  • The 0xFFFFFFFF/(n-1)_i seed value seems intended to result in large bit differences between the different hash function's seeds. Together with the tweak, however, the first and the last now get seeds tweak and tweak-1. I think something simpler like k1_i+k2*n+tweak is better (with k1 and k2 arbitrary large odd 32-bit integers).
  • I feel uneasy about the arbitrary filter parameters used for the implicitly created filter when sending filteradd without filterload. The server cannot be expected to make a reasonable guess how the client is going to use the filter, and the client still has to track how large the server-side filter grows anyway. I'd prefer to make this simply illegal (DoS 100): filteradd always requires an active filter. Maybe the actual filter data in filterload can be made optional: if it is omitted, it's assumed to be all zeroes (though that would mean the size has to be specified).
@BitcoinPullTester

This comment has been minimized.

BitcoinPullTester commented Nov 27, 2012

Automatic sanity-testing: FAILED MERGE, see http://jenkins.bluematt.me/pull-tester/21e780e05c7b18ab318e57bd9e24961a34378e2d for test log.

This pull does not merge cleanly onto current master

@BitcoinPullTester

This comment has been minimized.

BitcoinPullTester commented Dec 18, 2012

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/a5a72aa691296af6de92658ff998da120c4b34d8 for binaries and test log.

// Nodes must NEVER send a data item > 520 bytes (the max size for a script data object,
// and thus, the maximum size any matched object can have) in a filteradd message
if (vData.size() > 520)

This comment has been minimized.

@mikehearn

mikehearn Jan 4, 2013

Contributor

If vData is too large, it will still be added to the filter. There needs to be an else block here.

This comment has been minimized.

@TheBlueMatt

TheBlueMatt Jan 7, 2013

Contributor

yep, fixed.

This comment has been minimized.

@Diapolo

Diapolo Jan 11, 2013

Wouldn't it be nice if that 520 was a constant somewhere or a define? So it could be used for the max size for a script data object and in here.

@mikehearn

View changes

src/main.cpp Outdated
if (pfrom->pfilter)
pfrom->pfilter->insert(vData);
else
pfrom->Misbehaving(100);

This comment has been minimized.

@mikehearn

mikehearn Jan 4, 2013

Contributor

Maybe a log message here would be appropriate?

This comment has been minimized.

@TheBlueMatt

TheBlueMatt Jan 7, 2013

Contributor

Misbehaving already logs info on who/how much, it would be kinda nice to pass it a why/where in code param, but that should be in a separate pull.

@mikehearn

This comment has been minimized.

Contributor

mikehearn commented Jan 9, 2013

Can you add a new feature that lets clients opt out of auto-adding? Sipa points out that it's only necessary for wallets that contain outputs which don't require predictable data in the corresponding inputs, which (for now) is most of them. This should resolve the excessive expansion issues.

@gmaxwell

This comment has been minimized.

Member

gmaxwell commented Jan 9, 2013

@mikehearn Kind of an an unfortunate privacy loss for ones who do set it, and it still leaves the feature potentially useless for any who set it. (But I agree on having the option, if for nothing else because it makes the load-balancing lark I suggested viable)

@mikehearn

This comment has been minimized.

Contributor

mikehearn commented Jan 9, 2013

Well not really, you can still set the FP rate to whatever you want. If all
your outputs are pay-to-address then it's a no-op functionality-wise, the
auto-adding didn't buy you anything. For the case where you have
pay-to-pubkey or other scripts, the client has to refresh the filter every
so often .... it can be implemented in bcj later.

@gmaxwell

This comment has been minimized.

Member

gmaxwell commented Jan 9, 2013

Setting that bit or not is still another anonymity set reduction, but I'm splitting hairs there ignore it. I guess I don't have a good feel for how fast the filter fails and needs to be reset. If it's too fast periodic refreshes won't be sufficient (E.g. will the size of the refresh plus the unwanted data end up being more than just not using the filter).

@mikehearn

This comment has been minimized.

Contributor

mikehearn commented Jan 9, 2013

I think we can explore different algorithms later. The exact choices will
depend on the users privacy preferences and device situation anyway.

On Wed, Jan 9, 2013 at 7:09 PM, Gregory Maxwell notifications@github.comwrote:

Setting that bit or not is still another anonymity set reduction, but I'm
splitting hairs there ignore it. I guess I don't have a good feel for how
fast the filter fails and needs to be reset. If it's too fast periodic
refreshes won't be sufficient (E.g. will the size of the refresh plus the
unwanted data end up being more than just not using the filter).


Reply to this email directly or view it on GitHubhttps://github.com//pull/1795#issuecomment-12058432.

@TheBlueMatt

This comment has been minimized.

Contributor

TheBlueMatt commented Jan 11, 2013

Added a nFlags to let the peer pick how/when it wants the filter updated...also added a second two commits to make it possible to match an arbitrary script template as a discussion point (that second part is not yet tested).
Would like comments.

@gavinandresen

This comment has been minimized.

Contributor

gavinandresen commented Jan 16, 2013

Code review notes:

I don't like the entire implementation of CPartialMerkleTree being stuffed into main.h, it should be its own .cpp/.h

Message handling looks good from a potential vulnerability point of view.

NACK on the script.cpp / script.h changes -- they are unused (yes? I don't see where MatchesTemplate is used outside of script.cpp/.h) and I don't see unit tests for the new features like OP_OPCODE.

@TheBlueMatt

This comment has been minimized.

Contributor

TheBlueMatt commented Jan 16, 2013

@gavinandresen Yes, after discussion I believe we are currently targeting skipping the 2 MatchesTemplate commits for 0.8 and maybe skipping those entirely depending on what we may need in the future.

@BitcoinPullTester

This comment has been minimized.

BitcoinPullTester commented Jan 16, 2013

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/e1a4f3778cb90ba9f0d4e736752f78dad1703caa for binaries and test log.

@TheBlueMatt

This comment has been minimized.

Contributor

TheBlueMatt commented Jan 16, 2013

@gavinandresen moved it to main.cpp, is that ok for you?

@gavinandresen

This comment has been minimized.

Contributor

gavinandresen commented Jan 16, 2013

ACK, looks good.

gavinandresen added a commit that referenced this pull request Jan 17, 2013

@gavinandresen gavinandresen merged commit 91f70a7 into bitcoin:master Jan 17, 2013

@rebroad

This comment has been minimized.

Contributor

rebroad commented Mar 27, 2013

So.. what was the BIP for these protocol changes? It would be nice if it could be referenced in the pull requests.

@@ -2815,6 +2969,10 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv)
vRecv >> pfrom->strSubVer;
if (!vRecv.empty())
vRecv >> pfrom->nStartingHeight;
if (!vRecv.empty())
vRecv >> pfrom->fRelayTxes; // set to true after we get the first filter* message

This comment has been minimized.

@rebroad

rebroad Mar 27, 2013

Contributor

Somehow this doesn't feel like the best way to implement this....

@mikehearn

This comment has been minimized.

Contributor

mikehearn commented Mar 27, 2013

https://en.bitcoin.it/wiki/BIP_0037 as was discussed on the mailing list. Regardless, this code was already merged and shipped months ago, it's a bit late to be commenting on it now. You can always open up new pull reqs if you want to change how the protocol works.

laudney pushed a commit to reddcoin-project/reddcoin that referenced this pull request Mar 19, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment