Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headers-first synchronization #4468

Merged
merged 11 commits into from Oct 17, 2014
Merged

Headers-first synchronization #4468

merged 11 commits into from Oct 17, 2014

Conversation

sipa
Copy link
Member

@sipa sipa commented Jul 5, 2014

Here's a first (well, second) version of a fully-functional headers-first synchronization.

Many changes:

  • Do not use 'getblocks', but 'getheaders', and use it to build a headers tree.
  • Blocks are fetched in parallel from all available outbound peers, using a limited moving window. When one peer stalls the movement of the window, it is disconnected.
  • No more orphan blocks. At all. We only ever request a block for which we have verified the headers, and store it to disk immediately. This means that a disk-fill attack would require PoW.
  • Some new fields in getpeerinfo:
    • 'syncheight': the height up to which we've validated this peer's headers.
    • 'commonheight': the height up to which we have all blocks in common with this peer.
    • 'inflight': the heights we're currently requesting from this peer.
  • Require protocol version 31800 for every peer (released in december 2010).
  • No more syncnode (we sync from everyone we can, though limited to 1 during initial headers sync).
  • Introduce some extra named constants and comments.
  • Reindexing support for out-of-order blocks on disk.

@gmaxwell
Copy link
Contributor

gmaxwell commented Jul 6, 2014

While syncing with this code from a pair of local peers:

2014-07-06 04:26:28 UpdateTip: new best=000000001ee1d3053357a374d6d9746e80d56a666f3827c92e80e0d1b3f2f2a1 height=1888 log2_work=42.883429 tx=1917 date=2009-01-26 03:
21:12 progress=0.000023
2014-07-06 04:26:28 nActualTimespan = 942027 before bounds
2014-07-06 04:26:28 GetNextWorkRequired RETARGET
2014-07-06 04:26:28 Params().TargetTimespan() = 1209600 nActualTimespan = 942027
2014-07-06 04:26:28 Before: 1a016164 0000000000000161640000000000000000000000000000000000000000000000
2014-07-06 04:26:28 After: 1a011337 000000000000011337c4f5c28f5c28f5c28f5c28f5c28f5c28f5c28f5c28f5c2
2014-07-06 04:26:28 ERROR: AcceptBlock() : prev block not found
2014-07-06 04:26:28 ERROR: ProcessBlock() : AcceptBlock FAILED
2014-07-06 04:26:28 Misbehaving: 192.168.42.76 (0 -> 10)
2014-07-06 04:26:28 UpdateTip: new best=00000000bf3fc4c4ab6737df907f613b5aa86373d426b5e86a1009030852f129 height=1889 log2_work=42.884193 tx=1918 date=2009-01-26 03:26:22 progress=0.000023

(not the Misbehaving)

Also, it seems to only be pulling from one so far:

{
    "addr" : "192.168.42.76",
    "services" : "0000000000000001",
    "lastsend" : 1404621016,
    "lastrecv" : 1404621016,
    "bytessent" : 7131512,
    "bytesrecv" : 131870355,
    "conntime" : 1404620769,
    "pingtime" : 0.07961100,
    "version" : 70002,
    "subver" : "/Satoshi:0.9.99/",
    "inbound" : false,
    "startingheight" : 309423,
    "banscore" : 10,
    "syncheight" : 309424,
    "commonheight" : 113772,
    "inflight" : [
        113773,
        113774,
        113775,
        113776,
        113777,
        113778,
        113779,
        113780,
        113781,
        113782,
        113783,
        113784,
        113785,
        113786,
        113787,
        113788
    ]
},
{
    "addr" : "192.168.42.87",
    "services" : "0000000000000001",
    "lastsend" : 1404621015,
    "lastrecv" : 1404621009,
    "bytessent" : 1442,
    "bytesrecv" : 2342,
    "conntime" : 1404620769,
    "pingtime" : 0.04079200,
    "version" : 70002,
    "subver" : "/Satoshi:0.9.99/",
    "inbound" : false,
    "startingheight" : 309423,
    "banscore" : 0,
    "syncheight" : -1,
    "commonheight" : -1,
    "inflight" : [
    ]
}

It started pulling from it eventually, perhaps when a block came in?

@gmaxwell
Copy link
Contributor

gmaxwell commented Jul 6, 2014

2014-07-06 04:38:07 Leaving block file 0: CBlockFileInfo(blocks=119950, size=134216038, heights=0...309426, time=2009-01-03...2014-07-06)
2014-07-06 04:38:37 Leaving block file 1: CBlockFileInfo(blocks=11284, size=134206314, heights=119942...131232, time=2011-04-24...2011-06-16)

The first range looks wrong. :)

@sipa
Copy link
Member Author

sipa commented Jul 6, 2014

@gmaxwell The 'prev block not found' should be fixed; there is a code path to fetch blocks directly (ignoring the headers-based fetching), in case we're very close to being synced (to avoid an extra roundtrip for newly inv'ed blocks)... but the time comparison used < instead of >.

@gmaxwell
Copy link
Contributor

gmaxwell commented Jul 6, 2014

3hr 46 minute resync over the network here. I'm a bit confused that the ping time to the two peers I was pulling from was still >10 seconds even when the sync was well into the cpu bound part (for their own part, the peers were fairly low on cpu utilization).

@sipa
Copy link
Member Author

sipa commented Jul 6, 2014

4h21m here, from random peers, and default dbcache.

@laanwj
Copy link
Member

laanwj commented Jul 7, 2014

Woohoo!
Testing...

map<uint256, pair<NodeId, list<QueuedBlock>::iterator> >::iterator itInFlight = mapBlocksInFlight.find(hash);
if (itInFlight != mapBlocksInFlight.end()) {
CNodeState *state = State(itInFlight->second.first);
state->vBlocksInFlight.erase(itInFlight->second.second);
state->nBlocksInFlight--;
if (itInFlight->second.first == nodeFrom)
state->nLastBlockReceive = GetTimeMicros();
state->nStallingSince = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wait until a complete block is received before marking it as not stalling? Wouldn't it be better to consider it as not stalling as long as a block is being downloaded (even if it takes a while on a slow connection)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Stalling" is already a rather strong condition: it means all blocks in
flight are from a single peer, and we can't ask any peer for other blocks
because they are outside the download window. It does not just mean that
we're not receiving anything from this peer, it means we can't download
anything from anyone because of this peer. The 2 second delay before
actually connecting is to give us some chance to act on blocks that were
already received while we were busy.

That said, this is certainly not perfect. We'll need some tracking of
peers' speed and latency to adjust window sizes, for example. I really just
want something that reasonably well in now.

@laanwj
Copy link
Member

laanwj commented Jul 7, 2014

Synced in 3:43 here, and with lots of other things running on the same computer.

I do get this error when running with -checkblocks=0 -checklevel=4 afterwards (after shutting down with 'stop'):

2014-07-07 10:48:02 Verifying last 309612 blocks at level 4
2014-07-07 10:48:03 ERROR: ReadFromDisk : Deserialize or I/O error - CAutoFile::read : end of file
2014-07-07 10:48:03 ERROR: VerifyDB() : *** found bad undo data at 309606, hash=000000000000000013e7146f5a1f63a188935bedc491b8b5bf5ab823f6ec0d9c

Oh I get the same without any checklevel options. I'll keep this copy of the blocks data and database in case it's interesting for debuging.

if (Params().NetworkID() == CBaseChainParams::REGTEST ||
chainActive.Tip()->GetBlockTime() > GetAdjustedTime() - Params().TargetSpacing() * 20) {
vToFetch.push_back(inv);
MarkBlockAsInFlight(pfrom->GetId(), inv.hash);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already download every block that peers advertize. In PR changes it to
first ask for headers, and only asks for the block immediately if we're
close to being synced. Validating the header first would require an extra
roundtrip, which can hurt network propagation time.

@sipa
Copy link
Member Author

sipa commented Jul 8, 2014

Changed the code to use block-first-seen rather than header-first-seen to distinguish between equal-work branches.

@sipa
Copy link
Member Author

sipa commented Jul 11, 2014

Rebased on top of #4496 and #4497 (which were already included here).

@sipa
Copy link
Member Author

sipa commented Jul 11, 2014

Moved RPC changes to a separate commit (now the actual headers-first commit is a net negative in lines, while adding comments!).

@laanwj laanwj added this to the 0.10.0 milestone Jul 14, 2014
@gmaxwell
Copy link
Contributor

Needs rebase.

@sipa
Copy link
Member Author

sipa commented Jul 14, 2014

Rebased.

@laanwj
Copy link
Member

laanwj commented Jul 15, 2014

I retried my testing, now with a non-corrupting destination, and found no issues. I tickled it in various ways with -checklevel -checkblocks and it completed fine.

@rebroad
Copy link
Contributor

rebroad commented Jul 16, 2014

Ok, I have a thought/question. With this pull, won't it effectively mean that the average height of the average node be less than without this pull? I.e. detrimental to the bitcoin network?

@gmaxwell
Copy link
Contributor

@rebroad I can't figure out why you'd think that. Once synchronized the heights of all nodes will be the best available to them, and this change makes the system synchronize much faster.

@rebroad
Copy link
Contributor

rebroad commented Jul 17, 2014

It's not the headers first aspect that makes them sync faster but rather the use of concurrent block downloads. The getting of headers first and the downloading of blocks that aren't necessarily in the best chain will delay nodes getting up to date.

@laanwj
Copy link
Member

laanwj commented Jul 17, 2014

I'm not convinced by your claim @rebroad. Let's look at the evidence here: with this code, new nodes get up to speed much faster. With a decent internet connection this is scarcely longer than a -reindex would take. I don't see one single case of a user reporting that this makes a node get up to date slower.

@gmaxwell
Copy link
Contributor

@rebroad Part of the whole point is that it can use the headers to determine the best chain (with very high probability— only invalidated if the majority hashrate chain is invalidated) very fast, then it only downloads blocks in the best chain. New blocks at the tip are downloaded like they've always been.

@laanwj
Copy link
Member

laanwj commented Jul 17, 2014

Right, this mostly avoids downloading blocks not on the best chain unlike before. No more orphans...

rebroad added a commit to rebroad/bitcoin that referenced this pull request Oct 11, 2017
rebroad added a commit to rebroad/bitcoin that referenced this pull request Mar 2, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Mar 3, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Mar 31, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 8, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 9, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 10, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 10, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 15, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 15, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 20, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 21, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 21, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 21, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 21, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 21, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 26, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 27, 2020
rebroad added a commit to rebroad/bitcoin that referenced this pull request Apr 29, 2020
reddink pushed a commit to reddcoin-project/reddcoin that referenced this pull request Jul 11, 2020
Remember out-of-order block headers along with disk positions. This is
likely the simplest and least-impact way to make -reindex work with
headers first.

Based on top of bitcoin#4468.

(cherry picked from commit ad96e7c)

# Conflicts:
#	src/main.cpp
reddink pushed a commit to reddcoin-project/reddcoin that referenced this pull request Jul 14, 2020
Remember out-of-order block headers along with disk positions. This is
likely the simplest and least-impact way to make -reindex work with
headers first.

Based on top of bitcoin#4468.

(cherry picked from commit ad96e7c)

# Conflicts:
#	src/main.cpp
rebroad added a commit to rebroad/bitcoin that referenced this pull request Feb 15, 2021
rebroad added a commit to rebroad/bitcoin that referenced this pull request Feb 18, 2021
rebroad added a commit to rebroad/bitcoin that referenced this pull request Feb 19, 2021
rebroad added a commit to rebroad/bitcoin that referenced this pull request Feb 19, 2021
rebroad added a commit to rebroad/bitcoin that referenced this pull request Mar 28, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet