-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request Manager fixes for out of order blocks , thinblocks and regtest workaround #312
Conversation
946e32a
to
e0d2d8a
Compare
I would like to see block download be more like bittorrent where blocks can be downloaded out-of-order like bittorrent file sections are pulled out of order. This is the purpose of the request manager. If subsystems (seems like most of them right now) can't handle out of order blocks, we could create a cache between the out of order download and the subsystem. And then we can work to remove the ordering requirement in the subsystems. This is important for several reasons:
So I think that the in-order block request part of this PR likely doesn't solve the problem, it just makes it less likely. But it also makes it MUCH less likely in a very controlled single machine regtest environment (because there are no network bandwidth constraints), so we may be getting a skewed idea of its efficacy in a real network. Instead, we need to ensure that blocks are presented to the system in order AFTER they are received. |
"So I think that the in-order block request part of this PR likely doesn't solve the problem, it just makes it less likely. But it also makes it MUCH less likely in a very controlled single machine regtest environment (because there are no network bandwidth constraints), so we may be getting a skewed idea of its efficacy in a real network. Instead, we need to ensure that blocks are presented to the system in order AFTER they are " You are correct it doesn't necessarily solve it 100% of the time, but , close enough to be very valuable IMO. It does fix the hangups during IBD, which is why it fixed the problems in regtest, because that is where we do a lot of IBD - we mine 100 blocks at a time and then sync...etc. If you run the regression tests with this patch you'll see how we don't get any of the 5 second delays (which are 30 secs on mainnet)...and then run the regression tests without and you'll get a lot hick ups, most obviously seen in the very first regression test but they also happen in other tests as well. But, there is another deeper issue with out of order blocks which can happen when two blocks are mined one just after the other, or lets say, during startup, where we are two blocks behind...then we request a block from different nodes, but the node giving us the first block disconnects and we end up getting the second block/header first. That's a different issue that needs a different solution. As you suggest , we need some sort of cache for that. I think in this particular case though we would need to cache the headers, not the blocks...(as long as we have correctly linked headers then Bitcoin will already cache any blocks on disk without invalidating them). But I think this other issue needs more discussion and a different PR for that. |
https://travis-ci.org/BitcoinUnlimited/BitcoinUnlimited/jobs/207134836#L1503 for some reason the last step of Is there anything in the code to justify the test to be so slow? |
no, i didn't add any unit tests ...
…On 02/03/2017 1:36 PM, Andrea Suisani wrote:
https://travis-ci.org/BitcoinUnlimited/BitcoinUnlimited/jobs/207134836#L1503
for some reason the last step of |make check| spent more than 10
minutes without producing any output, hence travis timed out.
Is there anything in the code to justify the test to be so slow?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#312 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMRF0Mz8EtPStiRzhYNf-EMNbbHoVQoOks5rhzZ2gaJpZM4MGotP>.
|
c50f07b
to
260a754
Compare
once PR 333 is merged I can rebase this and take it out of WIP |
src/requestManager.cpp
Outdated
next = item.availableFrom.front(); // Grab the next location where we can find this object. | ||
item.availableFrom.pop_front(); | ||
if (next.node != NULL) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are going to fix indentation, at least do it right.
src/requestManager.cpp
Outdated
LogPrint("thin", "Requesting Thinblock %s from peer %s (%d)\n", inv2.hash.ToString(), pfrom->addrName.c_str(),pfrom->id); | ||
return true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congratulation, you broken the whole indentation. Which is bad in itself, but has the added benefit of making the diff completely impossible to review. Please fix.
src/requestManager.h
Outdated
|
||
// keeps track of the order of block requests so we can iterate through | ||
// the mapBlkInfo in the order that block INV or HEADERS were received. | ||
// This keeps blocks from returning out of order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that a problem ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of order blocks can and do cause unnecessary timeouts and re-requests during IBD. Imagine you've just request 8 large blocks from a node, with the youngest block last...now you have to wait for all 8 blocks to download before you can connect any of them...now what if that node is also a slow downloader ...now you're stuck, when you could have been doing useful work , until you get a timeout and end up re-requesting the blocks from another node. This doesn't solve all the problems to do with out of order blocks such as when two blocks are mined closely in time and inv's/headers show up out of order, however, it does solve the problem most often seen.
now that we're fixing style globally, i'm going to take this indentation fix out...it's not really part of the PR anyway. |
Use a vector of block hashes to keep track of when a block INV was received so we can request the blocks in the order that inventory was received. This will help blocks from being requested and arriving out of order. -Make sure the lockstack is updated correctly use ENTER and LEAVE critical sections to make sure the lockstack is updated correctly and we can properly detect potential deadlocks. -Add cs_vNodes lock on txReqLatency
This fixes a bug in the request manager when we try to request two thinblocks at the same time from the same peer. We can not currently handle to thinkblocks from the same peer concurrently and so what was happening was we would not request the thinblock but the request manager was removing the block source. By using a bool return we can check to see if the thinkblock was actually requested and then update the block source accordingly. This but would cause frequent hangups in during regression testing and was another cause of blocks being re-requested. - Remove regtest workaround for retry interval With the above fix for requesting thinblocks we no longer need the regtest workaround for requesting blocks.
@gandrewstone is this PR ready to be committed? If yes I would vote also for a backport to |
closing for now |
Several items here:
Blocks are now requested in Request manager in the correct order, or rather the order than INV's were received. This fixes some issues with regression tests and is more efficient at connecting blocks because they no longer have to be written to disk, then read from disk to be connected.
If two thinblocks are requested from the same peer, only one will actually be requested but the block source to the second will get deleted. This behavior would cause frequent hangups on regtest and cause block to be re-requested in the future. Instead we do not delete the block source if we we're unable to actually make the request.
With the two fixes above the "regtest" workaround is no longer needed for the block re-request intterval.