Skip to content

graphene: first draft #973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 128 commits into from
Jul 23, 2018
Merged

graphene: first draft #973

merged 128 commits into from
Jul 23, 2018

Conversation

bissias
Copy link
Contributor

@bissias bissias commented Feb 16, 2018

This commit provides a functional implementation of graphene blocks. All unit tests pass and the QA test grapheneblocks.py also passes. From log output it is evident that graphene blocks are requested, created, serialized, deserialized, and reconstructed. However there is still work to be done. My general approach has been to largely replicate the workflow of xthin blocks. Thus I have replicated large portions of the xthin code. It's not clear that this is preferable to sharing code between the two block types. There also remains the question of transaction ordering for graphene blocks. If it is possible to commit a canonical ordering before graphene, then we will want to change this patch accordingly.

The code will also require further optimization (primarily block size optimization) before being deployed to production. My aim is to initiate a review for the basic workflow now and continue to work on optimization while graphene is running on the testnet. The following optimization tasks remain.

  • Update Bloom filter constants for optimal symdiff
  • Employ numerical symdiff calculation for greater accuracy
  • Use smaller dirty hashes for IBLT
  • Consider options for compressing transaction ordering: smaller hashes or perhaps a different data structure

@sickpig
Copy link
Collaborator

sickpig commented Feb 16, 2018

Hi @bissias,
I'm eager and excited to test and review the code you submitted. Just want to thank you in advance for the work you have done here!

@bissias
Copy link
Contributor Author

bissias commented Feb 16, 2018 via email

@sickpig
Copy link
Collaborator

sickpig commented Feb 16, 2018 via email

@sickpig
Copy link
Collaborator

sickpig commented Feb 16, 2018

the travis integration test failure are due to source code formatting problem, they are not actual failures. see https://travis-ci.org/BitcoinUnlimited/BitcoinUnlimited/jobs/342353084#L698 for more details

to get your files automatically formatted add this snippet to your .git/config file in your cloned repo:

[filter "bitcoin-clang-format"]
        clean = "contrib/devtools/clang-format.py format-stdout-if-wanted clang-format-3.8 %f"
        smudge = cat

and install clang-format-3.8 on you deev machine

(I'm supposing you are using a relatively recent ubuntu linux distro for your dev env)

@bissias
Copy link
Contributor Author

bissias commented Feb 16, 2018 via email

@sickpig
Copy link
Collaborator

sickpig commented Feb 16, 2018

@bissias
no need to open a new PR.

just fix the formatting problems, committing the changes and push it. Github will automatically update the PR for you w/o loosing all the comments and the reviewed already made.

if you want before pushing you can squash the commit containing the formatting fix to the previous one

1) fix formatting issue in src/main.cpp
2) git add src/main.cpp 
3) git commit -m "fix format"
4) git rebase -i HEAD~2
5) change the first word on the second line from pick to fixup

@awemany
Copy link
Contributor

awemany commented Feb 16, 2018

@bissias : Alright, awesome!

Funny coincidence, but just a couple hours ago, I asked Gavin Andresen on the BU slack when and where (which code fork) implementations of the Graphene protocol are to be expected. He told me to ask Brian Levine - and before I could do that, here we are :) I assume you work in his group?

I am working on preconsensus in the form of weak blocks, as described by Peter Rizun in his paper.

One of the things that I want to do, but which would be kind-of overlapping with your work is transmitting weak blocks as delta blocks. What I do right now is to hijack @ptschip's thinblock code in a bad way and transmit delta blocks as a special variant of thin blocks. That does work (as tested on the giganetwork) and it does reduce peak bandwidth as expected. Preliminary results might be presented by Peter at the satoshisvision conference (if we get to further test it some more).

But yeah, so in terms of actual transmission of my deltablocks I really still need to do something proper :)

But I imagine that it should be very doable to also use the graphene protocol or a variant that would point to a weak block as the basis and then just transmit the delta that is not in the weak block?

If so, I like to take the opportunity to remark here that it would be great if graphene could provide such a hook and provisions to use it in a "deltablocks" scheme as well. That way, both could be combined in the longer run and even further reduce the burstiness of block transmission.

I guess I should also remark that I still consider the whole notion of weakblocks as experimental, and even though I am optimistic that this will deliver benefits, it is in experimental stages still. The proof for that is in the pudding.

What do you think about a delta scheme for graphene?

@bissias bissias force-pushed the graphene branch 2 times, most recently from 4c7bbf5 to 7c2b039 Compare February 16, 2018 21:18
@bissias
Copy link
Contributor Author

bissias commented Feb 17, 2018

@awemany, that is a funny coincidence! Glad to finally have something to submit for discussion. And yes I'm working in Brian's group.

I'm only familiar with the basic idea of weak blocks, but my understanding is that you need data structures for both the weak and final blocks. The final block will point to a weak block and may additionally contain some new transactions.

From what you wrote, it sounds like you're considering using something like graphene for the final block? It seems reasonable in that final and graphene blocks share the same objective of communicating some set of transactions that are probably already on the recipients machine. But what hooks did you have in mind exactly?

The core logic for graphene that performs set reconciliation is pretty small. It's the small details of the overall protocol that constitute the majority of the code. So one way I could enable reuse of the graphene code would be to factor out that core logic into a class that could be used outside of the graphene protocol.

@awemany
Copy link
Contributor

awemany commented Feb 17, 2018

@bissias:

From what you wrote, it sounds like you're considering using something like graphene for the final block?

Kind of! Weak blocks is what Peter Rizun's called subchains in his paper: https://www.bitcoinunlimited.info/resources/subchains.pdf

The idea of weak blocks is to basically publish partial solutions of finding a block with reduced difficulty (but while searching for a strong block, of course, so the miners essentially do the same thing as before!). As it is found, it will be sent around and kept as an increment to the state of the chain by all participating nodes ("deltablocks"). It is meant as a non-sybil-able preconsensus scheme of what goes into the next strong block. It is assumed that participating nodes will refer to the last weak block when building their next block and enjoy easier propagation of their solutions as most of the block's entropy can be referred to by pointing to the weak block as a 'delta' (the incentive to use it). A side benefit is faster "fractional confirmations" on top of the 0-conf of transactions.

Now, the requirement is added that a node participating in weakblock propagation will build its next block (weak or strong) on top of the last weak block (or rather: longest chain), in the following way (currently; this should of course be adapted for things like a preferred ordering etc.):

  • the Coinbase is replaced with the miner's variant
  • the rest of the transactions are an extension of the weak block such that each transaction in the underlying block is in the new block at the same spot
  • more transactions follow after that

Assuming full propagation of such a weakblock, this means that it can then send out any found solution (strong or weak) by referring to the weakblock it is based upon, plus just the set of transactions on top of that.

In terms of graphene, if I understand the approach correctly, this would mean it looks like the mempool is reduced to just those txn that are not in the previous weak block and a single extra hash would be necessary in a "deltagraphene" message that refers to this last propagated weak block.

And this is what I am wondering about: How to make it so that one can say "these transactions are for sure in the transmitted block, see hash so and so here" and thus further reduce the amount of transactions that the graphene set reconciliation has to deal with: By a factor of weakblockrate / strongblockrate assuming no orphaning problems. Basically, what I like to have is a way to say "transmit this CBlock using graphene, but tell graphene that these transactions from that other block (except CB) are for sure in this block, and in the same order".

The advantage of weakblocks over just reducing the block time interval is that this purely an opt-in scheme that can also be gradually implemented by participating miners and does not need a chain fork nor a change of this fundamental variable and all the assumptions that have been built on top of it.

All other code stays the same and code that is ignorant of weak blocks will see the chain progress just like it did before.

self.nodes[0].generate(100)
self.sync_all()
self.nodes[0].generate(5)
self.sync_all()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just

self.nodes[0].generate(105)
self.sync_all()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,97 @@
#!/usr/bin/env python3
# Copyright (c) 2014-2015 The Bitcoin Core developers
# Copyright (c) 2015-2016 The Bitcoin Unlimited developers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. copyright header should be

Copyright (c) 2018 The Bitcoin Unlimited developers

since this is a new file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/graphene.cpp Outdated
@@ -0,0 +1,1514 @@
// Copyright (c) 2016-2017 The Bitcoin Unlimited developers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit copyright header (same as qa/rpc-tests/grapheneblocks.py)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/graphene.cpp Outdated
{
LOCK(cs_orphancache);
LOCK(cs_xval);
if (!ReconstructBlock(pfrom, fXVal, missingCount, unnecessaryCount))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, why not LOCK2(cs_orphancache, cs_xval)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I was copying xthin here :-D. But it compiles as you recommended so I changed it. In general, I would appreciate going over all the locks before merging with dev to make sure I got them right.

src/graphene.cpp Outdated
}
else
{
BOOST_FOREACH(const CTransaction &tx, block.vtx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you inherited from thinblock.cpp, but we are trying to use c++11 for loop when introducing new code. e.g. something like:

-            BOOST_FOREACH(const CTransaction &tx, block.vtx)
+            for (const CTransaction &tx : block.vtx)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/graphene.h Outdated
@@ -0,0 +1,254 @@
// Copyright (c) 2016-2017 The Bitcoin Unlimited developers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit 2018

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,266 @@
#include "iblt.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing copyright header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question here. I lifted this code from Gavin's repo. It is distributed with a MIT license. I did make some edits though. How should I copyright this while also crediting the original author?

Copy link
Collaborator

@sickpig sickpig Feb 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should add use the original copyright notice (the one that present on Gavin repo and any other copyright that Gavin included if any), then add yours and BU ones if you wanted to (IANAL).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no copyright in any single file, but the top level of the repository contains the following file. So I guess I should paste the contents into each iblt file and then include the BU copyright just above it?

https://github.com/gavinandresen/IBLT_Cplusplus/blob/master/LICENSE

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again IANAL, but this what I would do 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fair enough! I pushed license information for the IBLT code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you make of wallet.py test failing due to KeyboardInterupt in CI? This just happened after I committed the license changes to the IBLT code; doesn't seem like it could be related. For me all unit tests pass in my devenv, at least when I run make check. Is there another test suite I should be running as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a spurious failure due to travis problem. Going to rekick the failing job

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, intermittent failure. Looks green now. Thanks.

@@ -0,0 +1,100 @@
#ifndef CIblt_H
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing copyright header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as with iblt.cpp.

src/net.h Outdated
}

// BUIP055:
bool BitcoinCashCapable()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not present anymore in the dev branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

src/protocol.h Outdated
// NODE_GRAPHENE means the node supports Graphene blocks
// If this is turned off then the node will not service graphene requests nor
// make graphene requests
NODE_GRAPHENE = (1 << 6),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case we are going to port graphene to the BU legacy code base (BitcoinCore branch) guess we need to check if some other implementation is already using this bit to signal some other services.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a note about this. I will be sure to double-check before merging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

@bissias
Copy link
Contributor Author

bissias commented Feb 18, 2018

@awemany thanks for the link, that paper didn't immediately surface when I was looking up weak blocks. I'll be sure to take a close look. Anyhow, thanks also for the description. I think I'm getting the overall picture now (but tell me if not).

Here are my thoughts on weak block integration. In graphene, the sender creates both a Bloom filter (BF) and IBLT containing the txs in the block. And the receiver reconstructs the blocks by trying to pass every tx it knows about through the BF, and placing all positives in its own IBLT. The difference between sender and receiver IBLTs allows the receiver to determine the final tx set. With a reference to a weak block, the receiver simply has a new source of transactions to pass through the BF on the way to constructing its IBLT. So I believe in this way weak blocks and graphene can compliment each other nicely.

Some practical considerations:

  1. The current graphene implementation requires a vector of tx indices to indicate order. There has been some talk of instituting a canonical ordering for txs, which would alleviate this requirement. But as it stands, using graphene for the strong block, we would have to pass an order vector for the new transactions.
  2. From a coding standpoint, it's not clear to me which approach is better: update the graphene protocol to accommodate weak blocks or implement an entirely separate weak block protocol that leverages a distinct library containing core graphene logic. IMHO, the latter approach seems a little cleaner, but I defer to your judgement and the rest of the team. In general, I ended up replicating a large amount of xthin code in implementing graphene, and I imagine there will only be more and more new block types. So I wonder if it doesn't make sense to factor out a lot of the generic protocol logic.

@bissias
Copy link
Contributor Author

bissias commented Feb 18, 2018

I just updated the single commit for this pull request with changes requested by @sickpig. Some details about the updates are below.

Completed tasks:

  1. License
  2. Use c++11 for loop
  3. Use LOCK2 where appropriate
  4. remove unused methods
  5. update stale comments

Some tasks noted but not yet completed:

  1. License for IBLT code
  2. Examine potential conflict with xthin in preferential timer
  3. Double-check version bits are unique before merging

src/net.cpp Outdated
if (strCommand == NetMsgType::GRAPHENETX)
LOG(GRAPHENE, "ReceiveMsgBytes GRAPHENETX\n");
if (strCommand == NetMsgType::GET_GRAPHENETX)
LOG(GRAPHENE, "ReceiveMsgBytes GET_GRAPHENETX\n");
Copy link
Collaborator

@ptschip ptschip Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little bit inefficient... better to use "else if" statements so we don't have to potentially evaluate every if statement... or do just on if statement with || connecting ...and then just one generic LOG statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I went with the "else if" option.

mi != mapOrphanTransactions.end(); ++mi)
vOrphanHashes.push_back((*mi).first);
}
BuildSeededBloomFilter(filterMemPool, vOrphanHashes, inv2.hash, pfrom);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create a seeded bloom filter here? this is what we need for an Xthin. It looks like you're possibly not creating a proper graphene block here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. That's a copy paste error. I've removed that call to BuildSeededBloomFilter and added a line like in 461.

Copy link
Collaborator

@ptschip ptschip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like nice work and not that much code really. But I haven't take a deep dive yet. One question...I haven't taken it out for a test drive but if i setup two nodes will this actually work as is?

@bissias
Copy link
Contributor Author

bissias commented Feb 21, 2018

@ptschip thanks for taking a look and for the comments. I've committed the changes you mentioned.

The code should work as-is. However the only way I have actually run it is with the qa test: qa/rpc-tests/grapheneblocks.py. It's a close copy of the test thinblocks.py. There's enough log output to see the blocks being requested, created, sent, and received. The next step for me is to actually deploy some nodes on the test net.

@ptschip
Copy link
Collaborator

ptschip commented Feb 21, 2018

@bissias btw, when you make requested changes , don't squash your commits, it makes it hard to know what you changed when we come back to review the changes. Squashing can be done at the end of the cycle or intermittently if there are just too many commits. Thanks anyway, and look forward to seeing how it in action.

@bissias
Copy link
Contributor Author

bissias commented Feb 21, 2018

@ptschip sorry about that. Makes sense to squash only after the commits get unruly. Will do in the future.

@bissias
Copy link
Contributor Author

bissias commented Feb 22, 2018

Pushed a small update to fix reporting of in-memory graphene bytes.

@awemany
Copy link
Contributor

awemany commented Feb 24, 2018

@bissias: Right now, I send weakblocks through my own logic but hijack some of the thinblocks code to implement deltablocks. So I think your second approach (a library to transmit using graphene) would be preferred by me :-)

More specifically, my main point is that it would be great if it is not closely tied to the mempool code: My weakblocks currently live outside the mempool. It is all in a state of flux and I might try integrating weakblocks with the mempool with some kind of subset or flagging system.

It would be great if the sets to be reconciled and the pools available on each side can be set explicitly instead of graphene figuring that out by itself (and thus tying itself to the rest of system more closely).

I am thinking something along the lines of this on the sending side (just as a general picture)
(gb: Graphene block)

  gb.setAvailable(<txn-set>);
  gb.setContained(block.vtx);
  for (CNodeSomething p : peers)  gb.transmit(p);

And vice-versa on the receiving side.

And then it would be great if it would be easy for me to add a single 32-byte hash to a sent graphene block (and maybe a corresponding boolean flag as well), so that when a graphene block is received, I can basically say 'Oh ok, this is a graphene block that refers to this weak block, so lets reconstruct it from that one'.

I don't care about the exact mechanism, whether I have to subclass the graphene block representation classes or need to set a couple configuration variables.

Does that make sense?

I am by now means in the position to make this an official request, but maybe you also agree that this would be great also in terms of decoupled architecture of the graphene logic :)

@bissias
Copy link
Contributor Author

bissias commented Feb 26, 2018

@awemany I think I see what you mean about making graphene agnostic to tx sources. I guess I was imagining a slightly different abstraction that would fall entirely outside the protocol layer. The core logic would encapsulate set reconciliation code in a data structure like a GrapheneSet:

GrapheneSet(size_t receiverUniverseSize, cont vector<T> &items) : itemOrder(null);

On the sender side:

set = GrapheneSet(nReceiverMemPoolTx, items);

And then on the receiver side:

GrapheneSet senderSet;
vRecv >> senderSet;
vector<uint64_t> blockTxs = senderSet.reconcile(txsFromReceiverMempool);

The GrapheneSet logic could be used for the implementation of the graphene and weak block protocols as well as other applications beyond block propagation such as mempool synchronization.

With all this being said, I also see a (separate?) opportunity to refactor and abstract the block propagation protocol logic. With graphene and weak blocks we will have five different block types: regular, thin, xthin, graphene, and weak, which share a lot of boilerplate logic. I don't have a strong opinion about whether or not such a refactoring should be performed (it's fairly high risk since it would touch existing block protocols). But my sense is that any abstraction of the protocol logic should remain separate from abstraction of set reconciliation logic.

Do you see what I mean here? Are you more advocating for abstraction of the set reconciliation or protocol logic (or both)?

If you're thinking about abstracting protocol logic, there might be an opportunity to create a general library with interface compatible with existing block types as well as our own. A safe way forward would be to implement our new block types with this common interface and leave refactoring the old block types for future work.

@awemany
Copy link
Contributor

awemany commented Feb 27, 2018

@bissias : Yes that kind of interface makes sense for me and it seems easy enough to integrate that into the TBD weakblocks protocol messages. Great, thank you! I guess I'll eventually rebase onto your patch then, as soon as I am at the stage where I want to clean up my currently messy & hacky abuse of the thinblocks code.

And there we get to your second point: My own, personal two satoshis on this is that such a refactoring would be awesome to do and much riskier things have been done to the code.

This might be getting a bit OT now, but there's a couple places in the current block transmission code that are fairly closely tied to block validation logic. Because weak blocks take different paths at some points in the validation logic, if anyone wants to do such a refactoring, s/he should IMO also make sure to decouple that properly.

awemany and others added 3 commits July 21, 2018 09:05
And force it to be deserialized to zero at the moment.
And create an easy method to force the number of checksum bits to
a lower value for easier testing of IBLT failures giving wrong results.
// Check for Misbehaving and DOS
// If they make more than 20 requests in 10 minutes then disconnect them
{
LOCK(cs_vNodes);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this lock is unnecessary. pfrom should have a reference added by the caller so no chance that it will be deleted

@ptschip
Copy link
Collaborator

ptschip commented Jul 23, 2018 via email

uint64_t nLocalGrapheneBlockBytes; // the bytes used in creating this graphene block, updated dynamically
int nSizeGrapheneBlock; // Original on-wire size of the block. Just used for reporting
int grapheneBlockWaitingForTxns; // if -1 then not currently waiting
CCriticalSection cs_grapheneadditionaltxs; // lock grapheneAdditionalTxs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do these 2 structures need critical sections but not grapheneBlock, grapheneBlockHashes, grapheneMapHashOrderIndex, mapGrapheneMissingTx, ...

@ptschip
Copy link
Collaborator

ptschip commented Jul 23, 2018 via email

src/iblt.h Outdated
CIblt();
CIblt(size_t _expectedNumEntries);
CIblt(const CIblt &other);
virtual ~CIblt();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have a virtual destructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH this code was originally copied from Gavin's repo, so I didn't actively choose virtual. Do you prefer that it be dropped?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, drop it

@gandrewstone gandrewstone merged commit 9de9016 into BitcoinUnlimited:dev Jul 23, 2018
self.stats['mempool_info_size'] = MEMPOOL_INFO_SIZE
self.stats['rank'] = self.extract_bytes(self.extract_stats(self.nodes[0])['rank'])
self.stats['filter'] = self.extract_bytes(self.extract_stats(self.nodes[0])['filter'])
self.stats['iblt'] = self.extract_bytes(self.extract_stats(self.nodes[0])['iblt'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit repetitive. Wouldn't it be nicer on the eyes to name the dict self.extract_bytes(self.extract_stats(self.nodes[0]) with some shorthand first and then extracting the stats?

Also, this would avoid doing repetitive RPC calls which might create confusion down the road in unforeseen corner cases where a stats update is in progress.

nFlags = nFlagsIn;
}

void CBloomFilter::setupGuaranteeFPR(unsigned int nElements,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this, except for the ceil, an almost-copy of the other constructor? Can this code duplication be avoided? Also, I noticed that in your new constructor, the nDesiredSize is not constrained when fSizeConstrained is specified as true. Is that intentional or a copy-and-paste error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I intentionally duplicated the setup method (modulo edits) in order to avoid having to make subtle changes to the existing CBloomFilter implementation. Originally @ptschip suggested subclassing CBloomFilter, but I needed access to private members, so we settled on this approach.

The fact that nDesiredSize can be arbitrarily large is not an accident. I need the Bloom filter to honor the requested false positive rate. I don't think that there is potential for data-blowup here because the entire vData vector is deserialized, so any filter that would be huge in memory would already be huge on the wire. With that being said, do we already avoid deserializing really huge messages into memory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the almost duplication, the issue I have regarding fSizeConstrained is that you take the flag in the constructor, but only seem to honor it partially. nDesiredSize is unlimited, but nHashFuncs is still being limited. Is that really intentional? Assuming you want an exact FPR, isn't the whole notion of having fSizeConstrained kind of counterproductive?

@@ -89,6 +96,15 @@ class CBloomFilter
unsigned int nTweak,
unsigned char nFlagsIn,
uint32_t nMaxFilterSize = SMALLEST_MAX_BLOOM_FILTER_SIZE);
/**
* Add the option to force the bloom filter setup to guarantee the false positive rate does not exceed nFPRate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think documentation of the sort of "Add the option" is more like for a git commit message. I don't think it makes much sense to note in the code that this functionality has been added - just add it and document its proper use :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok but the new setup function already made you wonder why the code duplication was necessary ;-), so maybe it's worth leaving it in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe your issue is just with the comment being too casual? I'm happy to fix that also.

Copy link
Contributor

@awemany awemany Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was just a nit about the comment.

@@ -461,7 +461,8 @@ static void addDebuggingOptions(AllowedArgs &allowedArgs, HelpMessageMode mode)
{
std::string debugCategories = "addrman, bench, blk, bloom, coindb, db, estimatefee, evict, http, lck, "
"libevent, mempool, mempoolrej, miner, net, parallel, partitioncheck, "
"proxy, prune, rand, reindex, req, rpc, selectcoins, thin, tor, wallet, zmq";
"proxy, prune, rand, reindex, req, rpc, selectcoins, thin, tor, wallet, zmq"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is missing a comma and a space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, should be:

"proxy, prune, rand, reindex, req, rpc, selectcoins, thin, tor, wallet, zmq, "
"graphene"

It would seem the debug category at the end is now "zmqgraphene"

@@ -531,7 +569,15 @@ class CNode
return false;
}

void AddAddressKnown(const CAddress &_addr) { addrKnown.insert(_addr.GetKey()); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a regression here: This undoes the recent shadowing fix by @sickpig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. I'm pretty sure that this was removed during a rebase with dev. I'll add it back in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks like this has already been added back in the merged version.

@ptschip
Copy link
Collaborator

ptschip commented Jul 26, 2018 via email

@jtoomim
Copy link
Contributor

jtoomim commented Jul 26, 2018

@bissias I'm playing around with a scheme for encoding transaction order efficiently. The rough idea is to start with a transaction list that's sorted by feerate, and compare that to the block's actual transaction order, and move transactions around to make the two orders equal. It uses some quick heuristics to decide which transactions to move in order to get the majority of transactions to have offsets of 0, allowing for efficient encoding. I've still got some bugs to fix, but the python draft I have right now is getting the encoding of the ordering of a 910 tx block template from the BTC network down to about 186 bytes, and 500 bytes for a 2000 tx template. There does appear to be some non-linearity, in which larger blocks take more bits per transaction to encode, but the efficiency might still be good enough for now. Having a canonical in-block transaction ordering would obviously be more efficient, but that's a fairly substantial hard fork change.

@sickpig
Copy link
Collaborator

sickpig commented Jul 26, 2018

@jtoomim just wanted to mention that the bitcoin cash network will introduce a canonical txns ordering in the consensus rules set in the next Nov 2018 upgrade. This mean that we won't need to communicate the txns order in a block to get it propagated.

@bissias
Copy link
Contributor Author

bissias commented Jul 27, 2018

@jtoomim that's a very cool idea, thanks for posting. It looks like your numbers are well below the nlog(n) theoretical limit for arbitrary items. Do you have a writeup anywhere that I can take a look at? As @sickpig pointed out, a canonical ordering for BCH will likely obviate the need for passing ordering information in this implementation. However, I know that Dash is also working on a graphene implementation that could potentially benefit from your approach. In any case, I'd be interested to learn more.

@jtoomim
Copy link
Contributor

jtoomim commented Jul 29, 2018

@bissias I was having some issues with getting the decoding to work properly with my older (and more ambitious) encoding scheme, so I made a version that uses a much simpler encoding scheme and uploaded that.

https://github.com/jtoomim/orderencode

The new version seems to make encodings that are about 25% bigger, or roundabout 600 bytes for a 2000 tx block, but it's still not too bad. If there is interest in greater efficiency, I can work on fixing the bugs and writing an explanation for the older version.

The new encoding works like this:

First, generate a list of offsets. If the transaction at index 5 in the block is at index 20 in the list sorted by feerate, then store a value of 15 into position 5 in the offsets list.

Second, identify repeating offsets and encode them as (count, offset) pairs. If your offset list is [0, 0, 0, 4, 1, 1, 1, -4], you can summarize that as [(3,0), (1,4), (3,1), (1,-4)] since you have three 0s, followed by one 4, etc.

Third, since most of the counts will have a value of 1 (i.e. a single transaction out of place, followed by one or more transactions from a different place), we can reencode the counts by generating a bitmap of which counts are equal to 1. Any count that is not equal to 1 is encoded separately in a list of residual counts.

Last, encode all of the residuals and the offset values as signed varints and serialize them. I didn't bother with this step in the python proof of concept.

@bissias
Copy link
Contributor Author

bissias commented Jul 29, 2018

Thanks for putting that repository together @jtoomim. I've been talking with Darren Tapp from Dash research, and I think this might be useful for their implementation. I will definitely let you know if they head down that path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants