Re-issue getblocks to the next suitable peer when the previously selected one disappears #1271

Closed
wants to merge 1 commit into
from

Projects

None yet

8 participants

@rebroad
rebroad commented May 12, 2012

Fixes Issue #1234 - re-issues getblocks to the next suitable peer when the previously selected one disappears.

edit@laanwj: clarified title, was "issue1234"

@sipa sipa and 1 other commented on an outdated diff May 12, 2012
.gitignore
@@ -20,3 +20,5 @@ qrc_*.cpp
*.pro.user
#mac specific
.DS_Store
+/build/
+/.git.freshlycloned/
@sipa
sipa May 12, 2012 Bitcoin member

That doesn't seem very necessary.

@rebroad
rebroad May 12, 2012

it's not.. I've removed this and made a new fixup commit now. Just need to squash(?) the commits together now.. @sipa, can you remind me the git command please?

(the /build/ line is needed though, right? without it, git status reports on the contents of the build directory...)

@sipa
Member
sipa commented May 12, 2012

git fetch upstream; git rebase -i upstream/master; (in the opened editor, move the line of the fixup commit up to below the gitognore commit, and change the 'pick' to 'fixup', then save, and force-push the resulting branch)

@sipa
Member
sipa commented May 12, 2012

with force-push i mean: git push -f origin

@rebroad
rebroad commented May 12, 2012

@sipa thankyou.. now 1 commit.

@jgarzik
Member
jgarzik commented May 13, 2012
  1. Thread safety of 'nAskedForBlocks' ? Accessed in both ProcessMessage() and CNode::CloseSocketDisconnect()

  2. fAskedForBlocks should be set to false, if found to be true in CNode::CloseSocketDisconnect()

@rebroad
rebroad commented May 13, 2012

@jgarzik 1) accessed in both, yes. thread safety - not needed from what I can tell, but please feel free to explain why you think it is. 2) No. Once asked for blocks, it's true. It never becomes false, since the past cannot be changed. Any new CNode, it's set to false on creation.

@sipa
Member
sipa commented May 13, 2012
  1. you modify the value from two threads (message handler thread and socket handler thread)
  2. Changing it to false when decrementing nAskedForBlocks would make it obvious that it cannot be decremented twice (even though that probably is already the case)
@gmaxwell
Member

Of course, this only helps if the peer is actually disconnected. If he just becomes slow/unresponsive, or he simply doesn't have the chain past a certain point ... we'll still be waiting. It's a small enough change that I guess it makes sense for now, though at some point I think we need to move to something like this: https://en.bitcoin.it/wiki/User:Gmaxwell/Reverse_header-fetching_sync

@rebroad
rebroad commented May 14, 2012

@gmaxwell there's certainly room for improvement, but this is a small (intentionally, to increase the change of it bring pulled) step towards making it get the blocks more quickly. This particular change has been tested in my fork for over a month, but I've also got other code that checks for stuck downloads (which then timeout's the askfor and asks elsewhere). Currently it's not ideal in that it often causes the same blocks to be downloaded from several nodes (as sometimes they do wake up again), so isn't as bandwidth efficient as I'd like it to be, and it also has various timeouts hardcoded, which is based on my internet connection. My eventual plan is to a patch that will determine the speed of the network and peers over time, and factor that knowledge into the block download process.

@sipa, I see what you're saying. it does get modified from those two places. It seems to work though. Are you saying it could end up being two different values within the two different threads? I've never noticed this happen during over a month of testing so far. I could move the nAskedForBlocks++ code from ProcessMessages() into the socket handler thread. It probably belongs there more than ProcessMessages() now anyway, since it's not part of the strCommend == "version" code any longer. I was going to do this as a later pull, since as it is it works, and is less of a change to keep it in ProcessMessages(), and move it elsewhere later.

@rebroad rebroad and 1 other commented on an outdated diff May 14, 2012
.gitignore
@@ -20,3 +20,4 @@ qrc_*.cpp
*.pro.user
#mac specific
.DS_Store
+/build/
@rebroad
rebroad May 14, 2012

out of interest, how come I needed to add this line, but no one else seems to need it?

@luke-jr
luke-jr May 18, 2012 Bitcoin member

My directory is so cluttered that I use "git status -uno" ;)

@TheBlueMatt TheBlueMatt and 1 other commented on an outdated diff May 20, 2012
@@ -2402,6 +2392,20 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv)
}
+ // Ask the first connected node for block updates
+ if (!pfrom->fClient && !pfrom->fOneShot &&
+ (pfrom->nVersion < NOBLKS_VERSION_START ||
+ pfrom->nVersion >= NOBLKS_VERSION_END) &&
+ (nAskedForBlocks < 1 || vNodes.size() <= 1))
+ {
+ nAskedForBlocks++;
+ pfrom->fAskedForBlocks = true;
+ printf("initial getblocks to %s\n", pfrom->addr.ToString().c_str());
+ pfrom->PushGetBlocks(pindexBest, uint256(0));
+ }
+
+
+ if (strCommand == "version") ;
@TheBlueMatt
TheBlueMatt May 20, 2012 collaborator

Why was this moved to outside of all the if statements, and why the addition of the if statement here? Rebase error?

@rebroad
rebroad May 21, 2012

Not a rebase error. It's a way for the if statement's elses so continue working. i.e. without the re-if, the unknown command code would become true. Strictly speaking this code, now that it's no longer strictly part of ProcessMessages() could be moved now to just after the line that calls that. I guess I wanted to keep the change minimal, by keeping the lines as close to their original location as possible.

@TheBlueMatt
TheBlueMatt May 21, 2012 collaborator

It would be simpler to move this block above the if "version" block, that way you wouldnt have to add another if statement. Also, correct me if I'm wrong, but the way its written here, instead of waiting for a new block to come in after being disconnected from the first node, you are now just waiting for any message. Though tx inv messages come in quite often, it may make more sense to move this block to the disconnect handler, where it would be called much quicker after a node disconnects (though I havent looked at locking implications of that). Wherever it ends up going, you need to make sure you have already exchanged version messages with the target node.

@rebroad
rebroad May 21, 2012

@TheBlueMatt it was originally above the if version block, but then it would miss an opportunity to fire off right after that version message, which would be an ideal time. It can't go into the disconnect handler, as then for which node would it be sending the getblocks?

@TheBlueMatt
TheBlueMatt May 21, 2012 collaborator

Firing off right after the version message is ideal when you want to fire off right after a new connection, but if you just dropped a connection and just want to fire off on any other random node right away, it shouldnt matter (because its very likely that you already have one or more connections that have already exchanged version messages). If you move it to the disconnect handler, you can just chose a random node from vNodes (given that the selected node has already passed the version message exchange stuff).

@rebroad
rebroad May 21, 2012

@TheBlueMatt In my experience, a significant number of the connected nodes can be stale, and non-responsive, therefore it makes far more sense to fire off the getblocks to a node that's still communicating. Picking a random node wouldn't achieve this, but picking a node that's sending messages would.

@TheBlueMatt
TheBlueMatt May 21, 2012 collaborator

Fair enough, though that probably also means you need to decrease your timeouts (also, maybe we should put those pong messages to good use...).
In any case, I would say throw that block at the end of ProcessMessage so that a. it cleans up the diff (no new if statements, and the if/elses become one block again and b. as you point out, let the if "version" block complete before running this block. Finally, make sure you have exchanged version messages with the target node before asking for blocks, and then Id say this looks good to me.

@sipa sipa and 1 other commented on an outdated diff Jun 12, 2012
src/net.cpp
@@ -526,7 +526,14 @@ void CNode::CloseSocketDisconnect()
fDisconnect = true;
if (hSocket != INVALID_SOCKET)
{
- printf("disconnecting node %s\n", addrName.c_str());
+ if (fDebug)
+ printf("%s ", DateTimeStrFormat("%x %H:%M:%S", GetTime()).c_str());
@sipa
sipa Jun 12, 2012 Bitcoin member

why print the time?

@rebroad
rebroad Jun 14, 2012

I have no idea at all! I don't know how this got in here! I'll remove it.

@TheBlueMatt
Collaborator

Note that though I think this pull is good and should be added to specifically fix #1234, it appears that the motivation for this patch is to fix an issue where some ISPs (specifically @rebroad's) are closing connections without RSTing them after a certain amount of time and I would kinda like to see a specific fix for that to fix the underlying issue here instead of just working around it.

@TheBlueMatt
Collaborator

For some reason really old pulls don't show up in github's API (the total pull count returned seems to be limited, but its not mentioned in the API Docs...), so pull tester won't test this or any old pulls (maybe it will if the total pull count decreases?). If anyone needs this, or any older pull explicitly tested, I can run it manually.

@BitcoinPullTester

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/9543ca35cab6dfc53de84cb59dc4aedcc9253c09 for binaries and test log.

@BitcoinPullTester

Automatic sanity-testing: FAILED BUILD/TEST, see http://jenkins.bluematt.me/pull-tester/8ce572c8ad0960ff08a577f1ed2bf49ed0108ab0 for binaries and test log.

This could happen for one of several reasons:

  1. It chanages paths in makefile.linux-mingw or otherwise changes build scripts in a way that made them incompatible with the automated testing scripts
  2. It does not build on either Linux i386 or Win32 (via MinGW cross compile)
  3. The test suite fails on either Linux i386 or Win32
  4. The block test-cases failed (lookup the first bNN identifier which failed in https://github.com/TheBlueMatt/test-scripts/blob/master/FullBlockTestGenerator.java)
@luke-jr
Member
luke-jr commented Nov 13, 2012

This needs to be rebased.

@luke-jr
Member
luke-jr commented Nov 25, 2012

Did you keep the branch name the same and push it to github? Looks like maybe you pushed it as "master" instead of "issue1234"

@rebroad
rebroad commented Nov 25, 2012

I did:

git fetch upstream master
git checkout issue1234
git rebase -i upstream/master
git push --force origin issue 1234

The rebase didn't require any manual intervention, which I was surprised by, so I'm wondering if I did something wrong before the push...

@luke-jr
Member
luke-jr commented Nov 25, 2012

Any errors?

@rebroad
rebroad commented Nov 25, 2012

none

@luke-jr
Member
luke-jr commented Nov 25, 2012

Is your remote actually named "upstream"? The default is "origin".

@sipa sipa referenced this pull request Nov 25, 2012
Closed

Block download management #2034

@rebroad
rebroad commented Dec 20, 2012

rebased as requested. tested too... this patch is still helping with stale connections. reducing the time block download is delayed by 7 minutes on average (based on a wait of 10 minutes for the next block, compared with a wait of 3 minutes for a stale node to timeout).

@rebroad
rebroad commented Dec 20, 2012

@TheBlueMatt just to clarify. This patch has nothing to do with ISPs that RST connections. It's needed for all ISPs for where any connection goes stale and eventually (after about 3 minutes in my last tests) times out.

@rebroad rebroad referenced this pull request Dec 20, 2012
Closed

Parallel block download #1326

@rebroad rebroad Fixes Issue #1234 - re-issues getblocks when current node providing t…
…hem disappears.

Conflicts:

	src/net.cpp

Conflicts:

	.gitignore
9fc4c03
@gavinandresen
Member

Closing this, mostly replaced by sipa's "make sure you always have a peer to sync to" patch.

@rebroad rebroad deleted the rebroad:issue1234 branch Sep 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment