New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headers-first synchronization #4468

Merged
merged 11 commits into from Oct 17, 2014

Conversation

Projects
None yet
@sipa
Member

sipa commented Jul 5, 2014

Here's a first (well, second) version of a fully-functional headers-first synchronization.

Many changes:

  • Do not use 'getblocks', but 'getheaders', and use it to build a headers tree.
  • Blocks are fetched in parallel from all available outbound peers, using a limited moving window. When one peer stalls the movement of the window, it is disconnected.
  • No more orphan blocks. At all. We only ever request a block for which we have verified the headers, and store it to disk immediately. This means that a disk-fill attack would require PoW.
  • Some new fields in getpeerinfo:
    • 'syncheight': the height up to which we've validated this peer's headers.
    • 'commonheight': the height up to which we have all blocks in common with this peer.
    • 'inflight': the heights we're currently requesting from this peer.
  • Require protocol version 31800 for every peer (released in december 2010).
  • No more syncnode (we sync from everyone we can, though limited to 1 during initial headers sync).
  • Introduce some extra named constants and comments.
  • Reindexing support for out-of-order blocks on disk.
@gmaxwell

This comment has been minimized.

Member

gmaxwell commented Jul 6, 2014

While syncing with this code from a pair of local peers:

2014-07-06 04:26:28 UpdateTip: new best=000000001ee1d3053357a374d6d9746e80d56a666f3827c92e80e0d1b3f2f2a1 height=1888 log2_work=42.883429 tx=1917 date=2009-01-26 03:
21:12 progress=0.000023
2014-07-06 04:26:28 nActualTimespan = 942027 before bounds
2014-07-06 04:26:28 GetNextWorkRequired RETARGET
2014-07-06 04:26:28 Params().TargetTimespan() = 1209600 nActualTimespan = 942027
2014-07-06 04:26:28 Before: 1a016164 0000000000000161640000000000000000000000000000000000000000000000
2014-07-06 04:26:28 After: 1a011337 000000000000011337c4f5c28f5c28f5c28f5c28f5c28f5c28f5c28f5c28f5c2
2014-07-06 04:26:28 ERROR: AcceptBlock() : prev block not found
2014-07-06 04:26:28 ERROR: ProcessBlock() : AcceptBlock FAILED
2014-07-06 04:26:28 Misbehaving: 192.168.42.76 (0 -> 10)
2014-07-06 04:26:28 UpdateTip: new best=00000000bf3fc4c4ab6737df907f613b5aa86373d426b5e86a1009030852f129 height=1889 log2_work=42.884193 tx=1918 date=2009-01-26 03:26:22 progress=0.000023

(not the Misbehaving)

Also, it seems to only be pulling from one so far:

{
    "addr" : "192.168.42.76",
    "services" : "0000000000000001",
    "lastsend" : 1404621016,
    "lastrecv" : 1404621016,
    "bytessent" : 7131512,
    "bytesrecv" : 131870355,
    "conntime" : 1404620769,
    "pingtime" : 0.07961100,
    "version" : 70002,
    "subver" : "/Satoshi:0.9.99/",
    "inbound" : false,
    "startingheight" : 309423,
    "banscore" : 10,
    "syncheight" : 309424,
    "commonheight" : 113772,
    "inflight" : [
        113773,
        113774,
        113775,
        113776,
        113777,
        113778,
        113779,
        113780,
        113781,
        113782,
        113783,
        113784,
        113785,
        113786,
        113787,
        113788
    ]
},
{
    "addr" : "192.168.42.87",
    "services" : "0000000000000001",
    "lastsend" : 1404621015,
    "lastrecv" : 1404621009,
    "bytessent" : 1442,
    "bytesrecv" : 2342,
    "conntime" : 1404620769,
    "pingtime" : 0.04079200,
    "version" : 70002,
    "subver" : "/Satoshi:0.9.99/",
    "inbound" : false,
    "startingheight" : 309423,
    "banscore" : 0,
    "syncheight" : -1,
    "commonheight" : -1,
    "inflight" : [
    ]
}

It started pulling from it eventually, perhaps when a block came in?

@gmaxwell

This comment has been minimized.

Member

gmaxwell commented Jul 6, 2014

2014-07-06 04:38:07 Leaving block file 0: CBlockFileInfo(blocks=119950, size=134216038, heights=0...309426, time=2009-01-03...2014-07-06)
2014-07-06 04:38:37 Leaving block file 1: CBlockFileInfo(blocks=11284, size=134206314, heights=119942...131232, time=2011-04-24...2011-06-16)

The first range looks wrong. :)

@sipa

This comment has been minimized.

Member

sipa commented Jul 6, 2014

@gmaxwell The 'prev block not found' should be fixed; there is a code path to fetch blocks directly (ignoring the headers-based fetching), in case we're very close to being synced (to avoid an extra roundtrip for newly inv'ed blocks)... but the time comparison used < instead of >.

@gmaxwell

This comment has been minimized.

Member

gmaxwell commented Jul 6, 2014

3hr 46 minute resync over the network here. I'm a bit confused that the ping time to the two peers I was pulling from was still >10 seconds even when the sync was well into the cpu bound part (for their own part, the peers were fairly low on cpu utilization).

@sipa

This comment has been minimized.

Member

sipa commented Jul 6, 2014

4h21m here, from random peers, and default dbcache.

@laanwj

This comment has been minimized.

Member

laanwj commented Jul 7, 2014

Woohoo!
Testing...

map<uint256, pair<NodeId, list<QueuedBlock>::iterator> >::iterator itInFlight = mapBlocksInFlight.find(hash);
if (itInFlight != mapBlocksInFlight.end()) {
CNodeState *state = State(itInFlight->second.first);
state->vBlocksInFlight.erase(itInFlight->second.second);
state->nBlocksInFlight--;
if (itInFlight->second.first == nodeFrom)
state->nLastBlockReceive = GetTimeMicros();
state->nStallingSince = 0;

This comment has been minimized.

@rebroad

rebroad Jul 7, 2014

Contributor

Why wait until a complete block is received before marking it as not stalling? Wouldn't it be better to consider it as not stalling as long as a block is being downloaded (even if it takes a while on a slow connection)?

This comment has been minimized.

@sipa

sipa Jul 7, 2014

Member

"Stalling" is already a rather strong condition: it means all blocks in
flight are from a single peer, and we can't ask any peer for other blocks
because they are outside the download window. It does not just mean that
we're not receiving anything from this peer, it means we can't download
anything from anyone because of this peer. The 2 second delay before
actually connecting is to give us some chance to act on blocks that were
already received while we were busy.

That said, this is certainly not perfect. We'll need some tracking of
peers' speed and latency to adjust window sizes, for example. I really just
want something that reasonably well in now.

CNodeState *state = State(nodeid);
assert(state != NULL);
// Make sure it's not listed somewhere already.
MarkBlockAsReceived(hash);
QueuedBlock newentry = {hash, GetTimeMicros(), state->nBlocksInFlight};
if (state->nBlocksInFlight == 0)
state->nLastBlockReceive = newentry.nTime; // Reset when a first request is sent.

This comment has been minimized.

@rebroad

rebroad Jul 7, 2014

Contributor

Are you trying to break #4431 ?!

This comment has been minimized.

@fanquake

fanquake Jul 7, 2014

Member

Sipa already mentioned that headers first would render #4431 almost obselete

This comment has been minimized.

@rebroad

rebroad Jul 7, 2014

Contributor

yes he did, but it's somewhat annoying when so newly introduced variables are removed by the same author in subsequent pulls.

This comment has been minimized.

@laanwj

laanwj Jul 7, 2014

Member

The node / block chain handling code is very much in flux at the moment, you'll have to live with that. I can understand annoyance if it is just some white-space reordering that broke your pull for the zillionth time, but these are difficult and important changes and it's not possible to know everything in advance.

This comment has been minimized.

@sipa

sipa Jul 7, 2014

Member

I understand, but some temporary code was necessary to get the block
tracking code into the existing (non headers first) mechanism. I think I
mentioned that in the PR or commit that introduced it.

bool ProcessBlock(CValidationState &state, CNode* pfrom, CBlock* pblock, CDiskBlockPos *dbp)
{
// Check for duplicate

This comment has been minimized.

@rebroad

rebroad Jul 7, 2014

Contributor

If duplicate blocks aren't requested, then surely if we receive them this should be considered misbehaviour.

This comment has been minimized.

@sipa

sipa Jul 7, 2014

Member

Good catch. I can't check the code now, but if there is no other logic for
punishing duplicates, that will need fixing.

This comment has been minimized.

@sipa

sipa Jul 8, 2014

Member

Fixed. Made it a level 20 dossable offense (in AcceptBlock, which already has cs_main).

This comment has been minimized.

@gmaxwell

gmaxwell Sep 4, 2014

Member

oh no, you can't currently punish blocks you didn't request, because when a node finds a block itself, it relays it without INVing. (IIRC)

This comment has been minimized.

@sipa

sipa Sep 4, 2014

Member

That only happens for just-mined blocks - peers shouldn't ever have those already?

This comment has been minimized.

@gmaxwell

gmaxwell Sep 4, 2014

Member

I'm responding to the comments, not the code— if it's only on duplicates and not just non-requested then I think its safe but only if we release note loudly. Right now I think eloipool users will end up GBT submitting blocks to multiple Bitcoinds which will then aggressively relay. It's bandwidth wasteful and it shouldn't be permitted. This is a borderline p2p protocol change, I think.

Though relaynodeclient should do an INV— it's local, this is going to add like... 1ms. And it would also let the client better track when it's the software providing the block. Same with p2pool. Though indeed, whitelisting also covers these cases.

This comment has been minimized.

@rebroad

rebroad Sep 4, 2014

Contributor

@sipa However, you do still need to make sure not to DoS a node for sending a block that you've requested. So far, this logic isn't in there. This happened to me today when I got the inv from the non-miner, but the miner sent the block directly and it arrived from them first. My node requested the block from the node that sent the inv, and then DoSed it when it received it - it was only doing what it was asked to, so shouldn't have been DoSed.

This comment has been minimized.

@sipa

sipa Sep 4, 2014

Member

@rebroad That's a good point. I'm not sure how to deal with that correctly.

  • Making an exception for allowing a duplicate block to be received if we actually requested it is possible, but requires extra logic.
  • Not DoS/disconnecting in case of any duplicate is easy, but doesn't really encourage low bandwidth operation.

Previously my position was that sending-without-inv was safe in case you know the peer does not have the data yet. Your case is a good counterexample for that, and perhaps the right solution is indeed just to argue for a protocol change that makes sending-without-inv illegal.

This comment has been minimized.

@gmaxwell

gmaxwell Sep 4, 2014

Member

Perhaps define that as of protocol X we'll never do that ourselves anymore, and enforce it. Later minimum support advances past protocol X. Seems like such a minor change to bother staging…

This comment has been minimized.

@rebroad

rebroad Sep 4, 2014

Contributor

@sipa more simply, miners can still send the block without it being requested, as long as they send the "inv" just before it (so that nodes don't think they're not receiving the block until it arrives). This way they'll very unlikely get banned since their invs will almost always be received before anyone else's.

The code can be changed easily to do what it did before, which is to record which node sent the block and compare if it was the node that it was requested from. Then just change the DoS logic over to the node that sent it unsolicted rather than the node that the block arrived from later.

The node that sent me the block without an inv was public.au.relay.mattcorallo.com:8335 - so maybe someone can submit a pull request to his software so that invs are sent immediatley before the block is sent.

@TheBlueMatt

@laanwj

This comment has been minimized.

Member

laanwj commented Jul 7, 2014

Synced in 3:43 here, and with lots of other things running on the same computer.

I do get this error when running with -checkblocks=0 -checklevel=4 afterwards (after shutting down with 'stop'):

2014-07-07 10:48:02 Verifying last 309612 blocks at level 4
2014-07-07 10:48:03 ERROR: ReadFromDisk : Deserialize or I/O error - CAutoFile::read : end of file
2014-07-07 10:48:03 ERROR: VerifyDB() : *** found bad undo data at 309606, hash=000000000000000013e7146f5a1f63a188935bedc491b8b5bf5ab823f6ec0d9c

Oh I get the same without any checklevel options. I'll keep this copy of the blocks data and database in case it's interesting for debuging.

@sipa

View changes

src/main.cpp Outdated
if (Params().NetworkID() == CBaseChainParams::REGTEST ||
chainActive.Tip()->GetBlockTime() > GetAdjustedTime() - Params().TargetSpacing() * 20) {
vToFetch.push_back(inv);
MarkBlockAsInFlight(pfrom->GetId(), inv.hash);

This comment has been minimized.

@sipa

sipa Jul 7, 2014

Member

We already download every block that peers advertize. In PR changes it to
first ask for headers, and only asks for the block immediately if we're
close to being synced. Validating the header first would require an extra
roundtrip, which can hurt network propagation time.

@sipa

View changes

src/main.cpp Outdated
// to avoid an extra round-trip. Note that we must *first* ask for the headers, so by the
// time the block arrives, the header chain leading up to it is already validated. Not
// doing this will result in the received block being rejected as an orphan.
pfrom->PushMessage("getheaders", chainActive.GetLocator(pindexBestHeader), inv.hash);

This comment has been minimized.

@sipa

sipa Jul 7, 2014

Member

We do, for initial sync (see SendMessages). This code is for dealing with
newly announced blocks so they can propagate.

This comment has been minimized.

@gmaxwell

gmaxwell Jul 7, 2014

Member

It does, under "// Start block sync" in SendMessages(). The code you're commenting on is for newly announced blocks. Not that it's relevant to an inv.hash, so it's not something we can do before even hearing something from the peer.

@@ -2624,93 +2672,25 @@ void CBlockIndex::BuildSkip()
pskip = pprev->GetAncestor(GetSkipHeight(nHeight));
}
void PushGetBlocks(CNode* pnode, CBlockIndex* pindexBegin, uint256 hashEnd)

This comment has been minimized.

@sipa

sipa Jul 7, 2014

Member

We use getheaders instead of getblocks. See SendMessages.

@sipa

This comment has been minimized.

Member

sipa commented Jul 8, 2014

Changed the code to use block-first-seen rather than header-first-seen to distinguish between equal-work branches.

@sipa

This comment has been minimized.

Member

sipa commented Jul 11, 2014

Rebased on top of #4496 and #4497 (which were already included here).

@sipa

This comment has been minimized.

Member

sipa commented Jul 11, 2014

Moved RPC changes to a separate commit (now the actual headers-first commit is a net negative in lines, while adding comments!).

@laanwj laanwj added this to the 0.10.0 milestone Jul 14, 2014

@gmaxwell

This comment has been minimized.

Member

gmaxwell commented Jul 14, 2014

Needs rebase.

@sipa

This comment has been minimized.

Member

sipa commented Jul 14, 2014

Rebased.

@laanwj

This comment has been minimized.

Member

laanwj commented Jul 15, 2014

I retried my testing, now with a non-corrupting destination, and found no issues. I tickled it in various ways with -checklevel -checkblocks and it completed fine.

@rebroad

This comment has been minimized.

Contributor

rebroad commented Jul 16, 2014

Ok, I have a thought/question. With this pull, won't it effectively mean that the average height of the average node be less than without this pull? I.e. detrimental to the bitcoin network?

@gmaxwell

This comment has been minimized.

Member

gmaxwell commented Jul 16, 2014

@rebroad I can't figure out why you'd think that. Once synchronized the heights of all nodes will be the best available to them, and this change makes the system synchronize much faster.

@rebroad

This comment has been minimized.

Contributor

rebroad commented Jul 17, 2014

It's not the headers first aspect that makes them sync faster but rather the use of concurrent block downloads. The getting of headers first and the downloading of blocks that aren't necessarily in the best chain will delay nodes getting up to date.

@laanwj

This comment has been minimized.

Member

laanwj commented Jul 17, 2014

I'm not convinced by your claim @rebroad. Let's look at the evidence here: with this code, new nodes get up to speed much faster. With a decent internet connection this is scarcely longer than a -reindex would take. I don't see one single case of a user reporting that this makes a node get up to date slower.

@gmaxwell

This comment has been minimized.

Member

gmaxwell commented Jul 17, 2014

@rebroad Part of the whole point is that it can use the headers to determine the best chain (with very high probability— only invalidated if the majority hashrate chain is invalidated) very fast, then it only downloads blocks in the best chain. New blocks at the tip are downloaded like they've always been.

@laanwj

This comment has been minimized.

Member

laanwj commented Jul 17, 2014

Right, this mostly avoids downloading blocks not on the best chain unlike before. No more orphans...

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 12, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 13, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 14, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 16, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 18, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 20, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 23, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 23, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 23, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 24, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 26, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 26, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 26, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 26, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 27, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 27, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Dec 30, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Jan 2, 2017

rebroad added a commit to rebroad/bitcoin that referenced this pull request Jan 12, 2017

rebroad added a commit to rebroad/bitcoin that referenced this pull request Jan 17, 2017

rebroad added a commit to rebroad/bitcoin that referenced this pull request Jan 20, 2017

rebroad added a commit to rebroad/bitcoin that referenced this pull request Jan 20, 2017

rebroad added a commit to rebroad/bitcoin that referenced this pull request Feb 4, 2017

rebroad added a commit to rebroad/bitcoin that referenced this pull request Feb 4, 2017

rebroad added a commit to rebroad/bitcoin that referenced this pull request Oct 6, 2017

rebroad added a commit to rebroad/bitcoin that referenced this pull request Oct 7, 2017

rebroad added a commit to rebroad/bitcoin that referenced this pull request Oct 11, 2017

rebroad added a commit to rebroad/bitcoin that referenced this pull request Oct 11, 2017

@apaulb apaulb referenced this pull request Jun 5, 2018

Closed

Fix network syncing issues #22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment