New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphene: first draft #973

Merged
merged 128 commits into from Jul 23, 2018

Conversation

Projects
None yet
7 participants
@bissias
Contributor

bissias commented Feb 16, 2018

This commit provides a functional implementation of graphene blocks. All unit tests pass and the QA test grapheneblocks.py also passes. From log output it is evident that graphene blocks are requested, created, serialized, deserialized, and reconstructed. However there is still work to be done. My general approach has been to largely replicate the workflow of xthin blocks. Thus I have replicated large portions of the xthin code. It's not clear that this is preferable to sharing code between the two block types. There also remains the question of transaction ordering for graphene blocks. If it is possible to commit a canonical ordering before graphene, then we will want to change this patch accordingly.

The code will also require further optimization (primarily block size optimization) before being deployed to production. My aim is to initiate a review for the basic workflow now and continue to work on optimization while graphene is running on the testnet. The following optimization tasks remain.

  • Update Bloom filter constants for optimal symdiff
  • Employ numerical symdiff calculation for greater accuracy
  • Use smaller dirty hashes for IBLT
  • Consider options for compressing transaction ordering: smaller hashes or perhaps a different data structure
@sickpig

This comment has been minimized.

Collaborator

sickpig commented Feb 16, 2018

Hi @bissias,
I'm eager and excited to test and review the code you submitted. Just want to thank you in advance for the work you have done here!

@bissias

This comment has been minimized.

Contributor

bissias commented Feb 16, 2018

@sickpig

This comment has been minimized.

Collaborator

sickpig commented Feb 16, 2018

@sickpig

This comment has been minimized.

Collaborator

sickpig commented Feb 16, 2018

the travis integration test failure are due to source code formatting problem, they are not actual failures. see https://travis-ci.org/BitcoinUnlimited/BitcoinUnlimited/jobs/342353084#L698 for more details

to get your files automatically formatted add this snippet to your .git/config file in your cloned repo:

[filter "bitcoin-clang-format"]
        clean = "contrib/devtools/clang-format.py format-stdout-if-wanted clang-format-3.8 %f"
        smudge = cat

and install clang-format-3.8 on you deev machine

(I'm supposing you are using a relatively recent ubuntu linux distro for your dev env)

@bissias

This comment has been minimized.

Contributor

bissias commented Feb 16, 2018

@sickpig

This comment has been minimized.

Collaborator

sickpig commented Feb 16, 2018

@bissias
no need to open a new PR.

just fix the formatting problems, committing the changes and push it. Github will automatically update the PR for you w/o loosing all the comments and the reviewed already made.

if you want before pushing you can squash the commit containing the formatting fix to the previous one

1) fix formatting issue in src/main.cpp
2) git add src/main.cpp 
3) git commit -m "fix format"
4) git rebase -i HEAD~2
5) change the first word on the second line from pick to fixup
@awemany

This comment has been minimized.

Contributor

awemany commented Feb 16, 2018

@bissias : Alright, awesome!

Funny coincidence, but just a couple hours ago, I asked Gavin Andresen on the BU slack when and where (which code fork) implementations of the Graphene protocol are to be expected. He told me to ask Brian Levine - and before I could do that, here we are :) I assume you work in his group?

I am working on preconsensus in the form of weak blocks, as described by Peter Rizun in his paper.

One of the things that I want to do, but which would be kind-of overlapping with your work is transmitting weak blocks as delta blocks. What I do right now is to hijack @ptschip's thinblock code in a bad way and transmit delta blocks as a special variant of thin blocks. That does work (as tested on the giganetwork) and it does reduce peak bandwidth as expected. Preliminary results might be presented by Peter at the satoshisvision conference (if we get to further test it some more).

But yeah, so in terms of actual transmission of my deltablocks I really still need to do something proper :)

But I imagine that it should be very doable to also use the graphene protocol or a variant that would point to a weak block as the basis and then just transmit the delta that is not in the weak block?

If so, I like to take the opportunity to remark here that it would be great if graphene could provide such a hook and provisions to use it in a "deltablocks" scheme as well. That way, both could be combined in the longer run and even further reduce the burstiness of block transmission.

I guess I should also remark that I still consider the whole notion of weakblocks as experimental, and even though I am optimistic that this will deliver benefits, it is in experimental stages still. The proof for that is in the pudding.

What do you think about a delta scheme for graphene?

@bissias

This comment has been minimized.

Contributor

bissias commented Feb 17, 2018

@awemany, that is a funny coincidence! Glad to finally have something to submit for discussion. And yes I'm working in Brian's group.

I'm only familiar with the basic idea of weak blocks, but my understanding is that you need data structures for both the weak and final blocks. The final block will point to a weak block and may additionally contain some new transactions.

From what you wrote, it sounds like you're considering using something like graphene for the final block? It seems reasonable in that final and graphene blocks share the same objective of communicating some set of transactions that are probably already on the recipients machine. But what hooks did you have in mind exactly?

The core logic for graphene that performs set reconciliation is pretty small. It's the small details of the overall protocol that constitute the majority of the code. So one way I could enable reuse of the graphene code would be to factor out that core logic into a class that could be used outside of the graphene protocol.

@awemany

This comment has been minimized.

Contributor

awemany commented Feb 17, 2018

@bissias:

From what you wrote, it sounds like you're considering using something like graphene for the final block?

Kind of! Weak blocks is what Peter Rizun's called subchains in his paper: https://www.bitcoinunlimited.info/resources/subchains.pdf

The idea of weak blocks is to basically publish partial solutions of finding a block with reduced difficulty (but while searching for a strong block, of course, so the miners essentially do the same thing as before!). As it is found, it will be sent around and kept as an increment to the state of the chain by all participating nodes ("deltablocks"). It is meant as a non-sybil-able preconsensus scheme of what goes into the next strong block. It is assumed that participating nodes will refer to the last weak block when building their next block and enjoy easier propagation of their solutions as most of the block's entropy can be referred to by pointing to the weak block as a 'delta' (the incentive to use it). A side benefit is faster "fractional confirmations" on top of the 0-conf of transactions.

Now, the requirement is added that a node participating in weakblock propagation will build its next block (weak or strong) on top of the last weak block (or rather: longest chain), in the following way (currently; this should of course be adapted for things like a preferred ordering etc.):

  • the Coinbase is replaced with the miner's variant
  • the rest of the transactions are an extension of the weak block such that each transaction in the underlying block is in the new block at the same spot
  • more transactions follow after that

Assuming full propagation of such a weakblock, this means that it can then send out any found solution (strong or weak) by referring to the weakblock it is based upon, plus just the set of transactions on top of that.

In terms of graphene, if I understand the approach correctly, this would mean it looks like the mempool is reduced to just those txn that are not in the previous weak block and a single extra hash would be necessary in a "deltagraphene" message that refers to this last propagated weak block.

And this is what I am wondering about: How to make it so that one can say "these transactions are for sure in the transmitted block, see hash so and so here" and thus further reduce the amount of transactions that the graphene set reconciliation has to deal with: By a factor of weakblockrate / strongblockrate assuming no orphaning problems. Basically, what I like to have is a way to say "transmit this CBlock using graphene, but tell graphene that these transactions from that other block (except CB) are for sure in this block, and in the same order".

The advantage of weakblocks over just reducing the block time interval is that this purely an opt-in scheme that can also be gradually implemented by participating miners and does not need a chain fork nor a change of this fundamental variable and all the assumptions that have been built on top of it.

All other code stays the same and code that is ignorant of weak blocks will see the chain progress just like it did before.

qa/rpc-tests/grapheneblocks.py Outdated
self.nodes[0].generate(100)
self.sync_all()
self.nodes[0].generate(5)
self.sync_all()

This comment has been minimized.

@sickpig

sickpig Feb 17, 2018

Collaborator

why not just

self.nodes[0].generate(105)
self.sync_all()

This comment has been minimized.

@bissias

bissias Feb 18, 2018

Contributor

Done.

qa/rpc-tests/grapheneblocks.py Outdated
@@ -0,0 +1,97 @@
#!/usr/bin/env python3
# Copyright (c) 2014-2015 The Bitcoin Core developers
# Copyright (c) 2015-2016 The Bitcoin Unlimited developers

This comment has been minimized.

@sickpig

sickpig Feb 17, 2018

Collaborator

nit. copyright header should be

Copyright (c) 2018 The Bitcoin Unlimited developers

since this is a new file.

This comment has been minimized.

@bissias

bissias Feb 18, 2018

Contributor

Done.

src/graphene.cpp Outdated
@@ -0,0 +1,1514 @@
// Copyright (c) 2016-2017 The Bitcoin Unlimited developers

This comment has been minimized.

@sickpig

sickpig Feb 17, 2018

Collaborator

nit copyright header (same as qa/rpc-tests/grapheneblocks.py)

This comment has been minimized.

@bissias

bissias Feb 18, 2018

Contributor

Done.

src/graphene.cpp Outdated
{
LOCK(cs_orphancache);
LOCK(cs_xval);
if (!ReconstructBlock(pfrom, fXVal, missingCount, unnecessaryCount))

This comment has been minimized.

@sickpig

sickpig Feb 17, 2018

Collaborator

nit, why not LOCK2(cs_orphancache, cs_xval)?

This comment has been minimized.

@bissias

bissias Feb 18, 2018

Contributor

Done. I was copying xthin here :-D. But it compiles as you recommended so I changed it. In general, I would appreciate going over all the locks before merging with dev to make sure I got them right.

src/graphene.cpp Outdated
}
else
{
BOOST_FOREACH(const CTransaction &tx, block.vtx)

This comment has been minimized.

@sickpig

sickpig Feb 17, 2018

Collaborator

I know you inherited from thinblock.cpp, but we are trying to use c++11 for loop when introducing new code. e.g. something like:

-            BOOST_FOREACH(const CTransaction &tx, block.vtx)
+            for (const CTransaction &tx : block.vtx)

This comment has been minimized.

@bissias

bissias Feb 18, 2018

Contributor

Done.

src/graphene.h Outdated
@@ -0,0 +1,254 @@
// Copyright (c) 2016-2017 The Bitcoin Unlimited developers

This comment has been minimized.

@sickpig

sickpig Feb 17, 2018

Collaborator

nit 2018

This comment has been minimized.

@bissias

bissias Feb 18, 2018

Contributor

Done.

@@ -0,0 +1,266 @@
#include "iblt.h"

This comment has been minimized.

@sickpig

sickpig Feb 17, 2018

Collaborator

missing copyright header

This comment has been minimized.

@bissias

bissias Feb 18, 2018

Contributor

I have a question here. I lifted this code from Gavin's repo. It is distributed with a MIT license. I did make some edits though. How should I copyright this while also crediting the original author?

This comment has been minimized.

@sickpig

sickpig Feb 19, 2018

Collaborator

you should add use the original copyright notice (the one that present on Gavin repo and any other copyright that Gavin included if any), then add yours and BU ones if you wanted to (IANAL).

This comment has been minimized.

@bissias

bissias Feb 19, 2018

Contributor

There is no copyright in any single file, but the top level of the repository contains the following file. So I guess I should paste the contents into each iblt file and then include the BU copyright just above it?

https://github.com/gavinandresen/IBLT_Cplusplus/blob/master/LICENSE

This comment has been minimized.

@sickpig

sickpig Feb 19, 2018

Collaborator

Again IANAL, but this what I would do 😄

This comment has been minimized.

@bissias

bissias Feb 20, 2018

Contributor

Ok fair enough! I pushed license information for the IBLT code.

This comment has been minimized.

@bissias

bissias Feb 20, 2018

Contributor

What do you make of wallet.py test failing due to KeyboardInterupt in CI? This just happened after I committed the license changes to the IBLT code; doesn't seem like it could be related. For me all unit tests pass in my devenv, at least when I run make check. Is there another test suite I should be running as well?

This comment has been minimized.

@sickpig

sickpig Feb 20, 2018

Collaborator

this is a spurious failure due to travis problem. Going to rekick the failing job

This comment has been minimized.

@bissias

bissias Feb 20, 2018

Contributor

Ah I see, intermittent failure. Looks green now. Thanks.

@@ -0,0 +1,100 @@
#ifndef CIblt_H

This comment has been minimized.

@sickpig

sickpig Feb 17, 2018

Collaborator

missing copyright header

This comment has been minimized.

@bissias

bissias Feb 18, 2018

Contributor

Same question as with iblt.cpp.

src/net.h Outdated
}
// BUIP055:
bool BitcoinCashCapable()

This comment has been minimized.

@sickpig

sickpig Feb 17, 2018

Collaborator

this method is not present anymore in the dev branch

This comment has been minimized.

@bissias

bissias Feb 18, 2018

Contributor

Removed.

src/protocol.h Outdated
// NODE_GRAPHENE means the node supports Graphene blocks
// If this is turned off then the node will not service graphene requests nor
// make graphene requests
NODE_GRAPHENE = (1 << 6),

This comment has been minimized.

@sickpig

sickpig Feb 17, 2018

Collaborator

in case we are going to port graphene to the BU legacy code base (BitcoinCore branch) guess we need to check if some other implementation is already using this bit to signal some other services.

This comment has been minimized.

@bissias

bissias Feb 18, 2018

Contributor

I've made a note about this. I will be sure to double-check before merging.

This comment has been minimized.

@sickpig

sickpig Feb 19, 2018

Collaborator

great

@bissias

This comment has been minimized.

Contributor

bissias commented Feb 18, 2018

@awemany thanks for the link, that paper didn't immediately surface when I was looking up weak blocks. I'll be sure to take a close look. Anyhow, thanks also for the description. I think I'm getting the overall picture now (but tell me if not).

Here are my thoughts on weak block integration. In graphene, the sender creates both a Bloom filter (BF) and IBLT containing the txs in the block. And the receiver reconstructs the blocks by trying to pass every tx it knows about through the BF, and placing all positives in its own IBLT. The difference between sender and receiver IBLTs allows the receiver to determine the final tx set. With a reference to a weak block, the receiver simply has a new source of transactions to pass through the BF on the way to constructing its IBLT. So I believe in this way weak blocks and graphene can compliment each other nicely.

Some practical considerations:

  1. The current graphene implementation requires a vector of tx indices to indicate order. There has been some talk of instituting a canonical ordering for txs, which would alleviate this requirement. But as it stands, using graphene for the strong block, we would have to pass an order vector for the new transactions.
  2. From a coding standpoint, it's not clear to me which approach is better: update the graphene protocol to accommodate weak blocks or implement an entirely separate weak block protocol that leverages a distinct library containing core graphene logic. IMHO, the latter approach seems a little cleaner, but I defer to your judgement and the rest of the team. In general, I ended up replicating a large amount of xthin code in implementing graphene, and I imagine there will only be more and more new block types. So I wonder if it doesn't make sense to factor out a lot of the generic protocol logic.

@sickpig sickpig referenced this pull request Feb 18, 2018

Merged

Update thinblock.cpp #977

@bissias

This comment has been minimized.

Contributor

bissias commented Feb 18, 2018

I just updated the single commit for this pull request with changes requested by @sickpig. Some details about the updates are below.

Completed tasks:

  1. License
  2. Use c++11 for loop
  3. Use LOCK2 where appropriate
  4. remove unused methods
  5. update stale comments

Some tasks noted but not yet completed:

  1. License for IBLT code
  2. Examine potential conflict with xthin in preferential timer
  3. Double-check version bits are unique before merging
src/net.cpp Outdated
if (strCommand == NetMsgType::GRAPHENETX)
LOG(GRAPHENE, "ReceiveMsgBytes GRAPHENETX\n");
if (strCommand == NetMsgType::GET_GRAPHENETX)
LOG(GRAPHENE, "ReceiveMsgBytes GET_GRAPHENETX\n");

This comment has been minimized.

@ptschip

ptschip Feb 21, 2018

Collaborator

this is a little bit inefficient... better to use "else if" statements so we don't have to potentially evaluate every if statement... or do just on if statement with || connecting ...and then just one generic LOG statement.

This comment has been minimized.

@bissias

bissias Feb 21, 2018

Contributor

Sounds good. I went with the "else if" option.

src/requestManager.cpp Outdated
mi != mapOrphanTransactions.end(); ++mi)
vOrphanHashes.push_back((*mi).first);
}
BuildSeededBloomFilter(filterMemPool, vOrphanHashes, inv2.hash, pfrom);

This comment has been minimized.

@ptschip

ptschip Feb 21, 2018

Collaborator

Why create a seeded bloom filter here? this is what we need for an Xthin. It looks like you're possibly not creating a proper graphene block here?

This comment has been minimized.

@bissias

bissias Feb 21, 2018

Contributor

Whoops. That's a copy paste error. I've removed that call to BuildSeededBloomFilter and added a line like in 461.

@ptschip

Looks like nice work and not that much code really. But I haven't take a deep dive yet. One question...I haven't taken it out for a test drive but if i setup two nodes will this actually work as is?

@bissias

This comment has been minimized.

Contributor

bissias commented Feb 21, 2018

@ptschip thanks for taking a look and for the comments. I've committed the changes you mentioned.

The code should work as-is. However the only way I have actually run it is with the qa test: qa/rpc-tests/grapheneblocks.py. It's a close copy of the test thinblocks.py. There's enough log output to see the blocks being requested, created, sent, and received. The next step for me is to actually deploy some nodes on the test net.

@ptschip

This comment has been minimized.

Collaborator

ptschip commented Feb 21, 2018

@bissias btw, when you make requested changes , don't squash your commits, it makes it hard to know what you changed when we come back to review the changes. Squashing can be done at the end of the cycle or intermittently if there are just too many commits. Thanks anyway, and look forward to seeing how it in action.

@bissias

This comment has been minimized.

Contributor

bissias commented Feb 21, 2018

@ptschip sorry about that. Makes sense to squash only after the commits get unruly. Will do in the future.

@bissias

This comment has been minimized.

Contributor

bissias commented Feb 22, 2018

Pushed a small update to fix reporting of in-memory graphene bytes.

@awemany

This comment has been minimized.

Contributor

awemany commented Feb 24, 2018

@bissias: Right now, I send weakblocks through my own logic but hijack some of the thinblocks code to implement deltablocks. So I think your second approach (a library to transmit using graphene) would be preferred by me :-)

More specifically, my main point is that it would be great if it is not closely tied to the mempool code: My weakblocks currently live outside the mempool. It is all in a state of flux and I might try integrating weakblocks with the mempool with some kind of subset or flagging system.

It would be great if the sets to be reconciled and the pools available on each side can be set explicitly instead of graphene figuring that out by itself (and thus tying itself to the rest of system more closely).

I am thinking something along the lines of this on the sending side (just as a general picture)
(gb: Graphene block)

  gb.setAvailable(<txn-set>);
  gb.setContained(block.vtx);
  for (CNodeSomething p : peers)  gb.transmit(p);

And vice-versa on the receiving side.

And then it would be great if it would be easy for me to add a single 32-byte hash to a sent graphene block (and maybe a corresponding boolean flag as well), so that when a graphene block is received, I can basically say 'Oh ok, this is a graphene block that refers to this weak block, so lets reconstruct it from that one'.

I don't care about the exact mechanism, whether I have to subclass the graphene block representation classes or need to set a couple configuration variables.

Does that make sense?

I am by now means in the position to make this an official request, but maybe you also agree that this would be great also in terms of decoupled architecture of the graphene logic :)

@bissias

This comment has been minimized.

Contributor

bissias commented Feb 26, 2018

@awemany I think I see what you mean about making graphene agnostic to tx sources. I guess I was imagining a slightly different abstraction that would fall entirely outside the protocol layer. The core logic would encapsulate set reconciliation code in a data structure like a GrapheneSet:

GrapheneSet(size_t receiverUniverseSize, cont vector<T> &items) : itemOrder(null);

On the sender side:

set = GrapheneSet(nReceiverMemPoolTx, items);

And then on the receiver side:

GrapheneSet senderSet;
vRecv >> senderSet;
vector<uint64_t> blockTxs = senderSet.reconcile(txsFromReceiverMempool);

The GrapheneSet logic could be used for the implementation of the graphene and weak block protocols as well as other applications beyond block propagation such as mempool synchronization.

With all this being said, I also see a (separate?) opportunity to refactor and abstract the block propagation protocol logic. With graphene and weak blocks we will have five different block types: regular, thin, xthin, graphene, and weak, which share a lot of boilerplate logic. I don't have a strong opinion about whether or not such a refactoring should be performed (it's fairly high risk since it would touch existing block protocols). But my sense is that any abstraction of the protocol logic should remain separate from abstraction of set reconciliation logic.

Do you see what I mean here? Are you more advocating for abstraction of the set reconciliation or protocol logic (or both)?

If you're thinking about abstracting protocol logic, there might be an opportunity to create a general library with interface compatible with existing block types as well as our own. A safe way forward would be to implement our new block types with this common interface and leave refactoring the old block types for future work.

@awemany

This comment has been minimized.

Contributor

awemany commented Feb 27, 2018

@bissias : Yes that kind of interface makes sense for me and it seems easy enough to integrate that into the TBD weakblocks protocol messages. Great, thank you! I guess I'll eventually rebase onto your patch then, as soon as I am at the stage where I want to clean up my currently messy & hacky abuse of the thinblocks code.

And there we get to your second point: My own, personal two satoshis on this is that such a refactoring would be awesome to do and much riskier things have been done to the code.

This might be getting a bit OT now, but there's a couple places in the current block transmission code that are fairly closely tied to block validation logic. Because weak blocks take different paths at some points in the validation logic, if anyone wants to do such a refactoring, s/he should IMO also make sure to decouple that properly.

@bissias

This comment has been minimized.

Contributor

bissias commented Feb 27, 2018

Sure thing @awemany. I'll ping you on slack when I've pushed that interface change to this branch.

Regarding the larger refactoring, maybe it would make sense to loop in others on this discussion; although clearly relevant to this PR, I agree it probably deserves a broader audience. I currently have a very narrow perspective on the codebase and probably wouldn't venture down the refactoring path alone. I'm sure you and others will bring a richer perspective and have some thoughts on how best to proceed.

@ptschip

This comment has been minimized.

Collaborator

ptschip commented Mar 7, 2018

needs a rebase onto current dev,.. net.h is conflicting

@bissias

This comment has been minimized.

Contributor

bissias commented Mar 7, 2018

@ptschip I'm on it. Will rebase a little later today. I think it's something small; version bits perhaps.

@sickpig

just minor nits, will review further.

src/graphene_set.h Outdated
#define BITCOIN_GRAPHENE_SET_H
#include "bloom.h"
#include "boost/dynamic_bitset.hpp"

This comment has been minimized.

@sickpig

sickpig Mar 7, 2018

Collaborator

nit: separate boost includes from BU ones

src/graphene_set.h Outdated
#include <cmath>
#include <vector>
using namespace std;

This comment has been minimized.

@sickpig

sickpig Mar 7, 2018

Collaborator

we decided to avoid implicit use of std namespace, so that we are going to be explicit in any declaration involving std type. So always use std::vector<size_t> v; rather than using namespace std; and later on vector<size_t> v.

I noticed that you use explicit declaration later on (e.g. 9af4597#diff-c18e3d2f8ec69646ca22a86494be587aR29)

This comment has been minimized.

@bissias

bissias Mar 7, 2018

Contributor

@sickpig will fix. I was wondering about the convention. I've been pretty sloppy about it up to this point.

src/net.h Outdated
@@ -392,6 +407,27 @@ class CNode
uint32_t nXthinBloomfilterSize; // The maximum xthin bloom filter size (in bytes) that our peer will accept.
// BUIP010 Xtreme Thinblocks: end section
// BUIPXXX Graphene blocks: begin section
CCriticalSection cs_mempoolsize;
CBlock grapheneBlock;

This comment has been minimized.

@ptschip

ptschip Mar 25, 2018

Collaborator

what is this critical section locking...i would assume everything underneath it but seems like a lot of things there that maybe are not related? not sure... if not everything underneath, then separate items out... at a minimum have the criticalsection first, then items it locks, then an empty line.

This comment has been minimized.

@bissias

bissias Mar 26, 2018

Contributor

It only locks only nGrapheneMemPoolTx, but I realize now that I didn't follow naming conventions for locks. I renamed it to cs_ngraphenemempooltx, grouped it with nGrapheneMemPoolTx, and added a comment.

I'm also not certain that a lock is necessary here. I was following the lead of LoadFilter, which locks cs_filter, performs a delete, and then reallocates memory. However, in my case I'm just updating an integer, so perhaps the lock is overkill. My ambivalence is really a result of not knowing the broader codebase better. Is it the case that multiple block requests from the same node could be processed simultaneously? Thinking about it now, I figure not and the cs_filter lock was just a precaution.

In general I think I need to do a lock audit. Perhaps at some point I can get with someone who knows the code a little better and talk through where there is concurrency and what objects are affected.

src/graphene.cpp Outdated
delete pGrapheneBlockIblt;
pGrapheneBlockIblt = NULL;
delete pGrapheneSet;
pGrapheneSet = NULL;

This comment has been minimized.

@sickpig

sickpig Mar 26, 2018

Collaborator

use nullptr

This comment has been minimized.

@bissias

bissias Mar 26, 2018

Contributor

Fixed.

@sickpig

This comment has been minimized.

Collaborator

sickpig commented May 5, 2018

@bissias need to be rebased.

bissias and others added some commits Jul 19, 2018

iblt: Fix vectors passsed by reference
As they are changed during peeling. This is a bare patch from Andrew made into
a commit with an extra comment added each by awemany.
iblt: Add a version field
And force it to be deserialized to zero at the moment.
iblt: Create separate function to calculate key checksum
And create an easy method to force the number of checksum bits to
a lower value for easier testing of IBLT failures giving wrong results.
// Check for Misbehaving and DOS
// If they make more than 20 requests in 10 minutes then disconnect them
{
LOCK(cs_vNodes);

This comment has been minimized.

@gandrewstone

gandrewstone Jul 23, 2018

Collaborator

I think this lock is unnecessary. pfrom should have a reference added by the caller so no chance that it will be deleted

@ptschip

This comment has been minimized.

Collaborator

ptschip commented Jul 23, 2018

uint64_t nLocalGrapheneBlockBytes; // the bytes used in creating this graphene block, updated dynamically
int nSizeGrapheneBlock; // Original on-wire size of the block. Just used for reporting
int grapheneBlockWaitingForTxns; // if -1 then not currently waiting
CCriticalSection cs_grapheneadditionaltxs; // lock grapheneAdditionalTxs

This comment has been minimized.

@gandrewstone

gandrewstone Jul 23, 2018

Collaborator

why do these 2 structures need critical sections but not grapheneBlock, grapheneBlockHashes, grapheneMapHashOrderIndex, mapGrapheneMissingTx, ...

@ptschip

This comment has been minimized.

Collaborator

ptschip commented Jul 23, 2018

src/iblt.h Outdated
CIblt();
CIblt(size_t _expectedNumEntries);
CIblt(const CIblt &other);
virtual ~CIblt();

This comment has been minimized.

@gandrewstone

gandrewstone Jul 23, 2018

Collaborator

Any reason to have a virtual destructor?

This comment has been minimized.

@bissias

bissias Jul 23, 2018

Contributor

TBH this code was originally copied from Gavin's repo, so I didn't actively choose virtual. Do you prefer that it be dropped?

This comment has been minimized.

@gandrewstone

gandrewstone Jul 23, 2018

Collaborator

yes, drop it

bissias and others added some commits Jul 23, 2018

fix iblt reset/resize issue, document iblt and graphene_set APIs, rep…
…lace double lookups in std containers with a single insert, don't calculate cheaphashes twice

@gandrewstone gandrewstone merged commit 9de9016 into BitcoinUnlimited:dev Jul 23, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
self.stats['mempool_info_size'] = MEMPOOL_INFO_SIZE
self.stats['rank'] = self.extract_bytes(self.extract_stats(self.nodes[0])['rank'])
self.stats['filter'] = self.extract_bytes(self.extract_stats(self.nodes[0])['filter'])
self.stats['iblt'] = self.extract_bytes(self.extract_stats(self.nodes[0])['iblt'])

This comment has been minimized.

@awemany

awemany Jul 24, 2018

Contributor

This seems a bit repetitive. Wouldn't it be nicer on the eyes to name the dict self.extract_bytes(self.extract_stats(self.nodes[0]) with some shorthand first and then extracting the stats?

Also, this would avoid doing repetitive RPC calls which might create confusion down the road in unforeseen corner cases where a stats update is in progress.

nFlags = nFlagsIn;
}
void CBloomFilter::setupGuaranteeFPR(unsigned int nElements,

This comment has been minimized.

@awemany

awemany Jul 24, 2018

Contributor

Isn't this, except for the ceil, an almost-copy of the other constructor? Can this code duplication be avoided? Also, I noticed that in your new constructor, the nDesiredSize is not constrained when fSizeConstrained is specified as true. Is that intentional or a copy-and-paste error?

This comment has been minimized.

@bissias

bissias Jul 24, 2018

Contributor

Actually, I intentionally duplicated the setup method (modulo edits) in order to avoid having to make subtle changes to the existing CBloomFilter implementation. Originally @ptschip suggested subclassing CBloomFilter, but I needed access to private members, so we settled on this approach.

The fact that nDesiredSize can be arbitrarily large is not an accident. I need the Bloom filter to honor the requested false positive rate. I don't think that there is potential for data-blowup here because the entire vData vector is deserialized, so any filter that would be huge in memory would already be huge on the wire. With that being said, do we already avoid deserializing really huge messages into memory?

This comment has been minimized.

@awemany

awemany Jul 24, 2018

Contributor

Except for the almost duplication, the issue I have regarding fSizeConstrained is that you take the flag in the constructor, but only seem to honor it partially. nDesiredSize is unlimited, but nHashFuncs is still being limited. Is that really intentional? Assuming you want an exact FPR, isn't the whole notion of having fSizeConstrained kind of counterproductive?

@@ -89,6 +96,15 @@ class CBloomFilter
unsigned int nTweak,
unsigned char nFlagsIn,
uint32_t nMaxFilterSize = SMALLEST_MAX_BLOOM_FILTER_SIZE);
/**
* Add the option to force the bloom filter setup to guarantee the false positive rate does not exceed nFPRate.

This comment has been minimized.

@awemany

awemany Jul 24, 2018

Contributor

I think documentation of the sort of "Add the option" is more like for a git commit message. I don't think it makes much sense to note in the code that this functionality has been added - just add it and document its proper use :)

This comment has been minimized.

@bissias

bissias Jul 24, 2018

Contributor

Ok but the new setup function already made you wonder why the code duplication was necessary ;-), so maybe it's worth leaving it in?

This comment has been minimized.

@bissias

bissias Jul 24, 2018

Contributor

Or maybe your issue is just with the comment being too casual? I'm happy to fix that also.

This comment has been minimized.

@awemany

awemany Jul 24, 2018

Contributor

Yeah, this was just a nit about the comment.

@@ -461,7 +461,8 @@ static void addDebuggingOptions(AllowedArgs &allowedArgs, HelpMessageMode mode)
{
std::string debugCategories = "addrman, bench, blk, bloom, coindb, db, estimatefee, evict, http, lck, "
"libevent, mempool, mempoolrej, miner, net, parallel, partitioncheck, "
"proxy, prune, rand, reindex, req, rpc, selectcoins, thin, tor, wallet, zmq";
"proxy, prune, rand, reindex, req, rpc, selectcoins, thin, tor, wallet, zmq"

This comment has been minimized.

@awemany

awemany Jul 24, 2018

Contributor

This looks like it is missing a comma and a space.

This comment has been minimized.

@gubatron

gubatron Jul 26, 2018

Contributor

yup, should be:

"proxy, prune, rand, reindex, req, rpc, selectcoins, thin, tor, wallet, zmq, "
"graphene"

It would seem the debug category at the end is now "zmqgraphene"

@@ -531,7 +569,15 @@ class CNode
return false;
}
void AddAddressKnown(const CAddress &_addr) { addrKnown.insert(_addr.GetKey()); }

This comment has been minimized.

@awemany

awemany Jul 24, 2018

Contributor

There's a regression here: This undoes the recent shadowing fix by @sickpig.

This comment has been minimized.

@bissias

bissias Jul 24, 2018

Contributor

Weird. I'm pretty sure that this was removed during a rebase with dev. I'll add it back in.

This comment has been minimized.

@bissias

bissias Jul 24, 2018

Contributor

Actually, it looks like this has already been added back in the merged version.

@ptschip

This comment has been minimized.

Collaborator

ptschip commented Jul 26, 2018

@jtoomim

This comment has been minimized.

jtoomim commented Jul 26, 2018

@bissias I'm playing around with a scheme for encoding transaction order efficiently. The rough idea is to start with a transaction list that's sorted by feerate, and compare that to the block's actual transaction order, and move transactions around to make the two orders equal. It uses some quick heuristics to decide which transactions to move in order to get the majority of transactions to have offsets of 0, allowing for efficient encoding. I've still got some bugs to fix, but the python draft I have right now is getting the encoding of the ordering of a 910 tx block template from the BTC network down to about 186 bytes, and 500 bytes for a 2000 tx template. There does appear to be some non-linearity, in which larger blocks take more bits per transaction to encode, but the efficiency might still be good enough for now. Having a canonical in-block transaction ordering would obviously be more efficient, but that's a fairly substantial hard fork change.

@sickpig

This comment has been minimized.

Collaborator

sickpig commented Jul 26, 2018

@jtoomim just wanted to mention that the bitcoin cash network will introduce a canonical txns ordering in the consensus rules set in the next Nov 2018 upgrade. This mean that we won't need to communicate the txns order in a block to get it propagated.

@bissias

This comment has been minimized.

Contributor

bissias commented Jul 27, 2018

@jtoomim that's a very cool idea, thanks for posting. It looks like your numbers are well below the nlog(n) theoretical limit for arbitrary items. Do you have a writeup anywhere that I can take a look at? As @sickpig pointed out, a canonical ordering for BCH will likely obviate the need for passing ordering information in this implementation. However, I know that Dash is also working on a graphene implementation that could potentially benefit from your approach. In any case, I'd be interested to learn more.

@jtoomim

This comment has been minimized.

jtoomim commented Jul 29, 2018

@bissias I was having some issues with getting the decoding to work properly with my older (and more ambitious) encoding scheme, so I made a version that uses a much simpler encoding scheme and uploaded that.

https://github.com/jtoomim/orderencode

The new version seems to make encodings that are about 25% bigger, or roundabout 600 bytes for a 2000 tx block, but it's still not too bad. If there is interest in greater efficiency, I can work on fixing the bugs and writing an explanation for the older version.

The new encoding works like this:

First, generate a list of offsets. If the transaction at index 5 in the block is at index 20 in the list sorted by feerate, then store a value of 15 into position 5 in the offsets list.

Second, identify repeating offsets and encode them as (count, offset) pairs. If your offset list is [0, 0, 0, 4, 1, 1, 1, -4], you can summarize that as [(3,0), (1,4), (3,1), (1,-4)] since you have three 0s, followed by one 4, etc.

Third, since most of the counts will have a value of 1 (i.e. a single transaction out of place, followed by one or more transactions from a different place), we can reencode the counts by generating a bitmap of which counts are equal to 1. Any count that is not equal to 1 is encoded separately in a list of residual counts.

Last, encode all of the residuals and the offset values as signed varints and serialize them. I didn't bother with this step in the python proof of concept.

@bissias

This comment has been minimized.

Contributor

bissias commented Jul 29, 2018

Thanks for putting that repository together @jtoomim. I've been talking with Darren Tapp from Dash research, and I think this might be useful for their implementation. I will definitely let you know if they head down that path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment