New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Block Validation to prevent DDOS big block attack #93

Closed
wants to merge 22 commits into
base: 0.12.1bu
from

Conversation

Projects
None yet
6 participants
@ptschip
Collaborator

ptschip commented Sep 13, 2016

This is almost finished. I works well enough but a little more polishing needed.

Also need a good python regression for this and some documentation on the design.

@ptschip ptschip changed the title from [WIP] Parallel Block Validation to prevent DDOS big block attack to Parallel Block Validation to prevent DDOS big block attack Oct 8, 2016

@ptschip

This comment has been minimized.

Show comment
Hide comment
@ptschip

ptschip Oct 8, 2016

Collaborator

I'm taking this out of WIP. It's ready for code review. All python script are working (including pruning.py) with the exception of maxuploadtarget.py which has been broken for some time. Also the unit tests all pass. There is just one feature update I need to make in regards to how the scriptthreads get interrupted and I also need to add documentation and maybe expand on the "parallel.py" python script. I should have the final adds done by early next week.

Collaborator

ptschip commented Oct 8, 2016

I'm taking this out of WIP. It's ready for code review. All python script are working (including pruning.py) with the exception of maxuploadtarget.py which has been broken for some time. Also the unit tests all pass. There is just one feature update I need to make in regards to how the scriptthreads get interrupted and I also need to add documentation and maybe expand on the "parallel.py" python script. I should have the final adds done by early next week.

@ptschip

This comment has been minimized.

Show comment
Hide comment
@ptschip

ptschip Oct 11, 2016

Collaborator

Added documentation found in /doc/bu-parallel-validation.md

Just need to add one more feature for quitting the script queue threads and then extend the parallel.py test script.

Collaborator

ptschip commented Oct 11, 2016

Added documentation found in /doc/bu-parallel-validation.md

Just need to add one more feature for quitting the script queue threads and then extend the parallel.py test script.

@ptschip

This comment has been minimized.

Show comment
Hide comment
@ptschip

ptschip Oct 12, 2016

Collaborator

Added quitting the script threads but still need to be able to do a control.Wait() after forcing the QUIT on the script threads and hopefully will take care of the conditional wait error i'm seeing at the end of the python script.

Then it's just a matter of getting a more extended python script to work in the case where we have multiple attack blocks to contend with.

Collaborator

ptschip commented Oct 12, 2016

Added quitting the script threads but still need to be able to do a control.Wait() after forcing the QUIT on the script threads and hopefully will take care of the conditional wait error i'm seeing at the end of the python script.

Then it's just a matter of getting a more extended python script to work in the case where we have multiple attack blocks to contend with.

@ptschip

This comment has been minimized.

Show comment
Hide comment
@ptschip

ptschip Oct 13, 2016

Collaborator

OK, I believe this is fully functional and debugged now.

Just need to append an extra python test or two.

Collaborator

ptschip commented Oct 13, 2016

OK, I believe this is fully functional and debugged now.

Just need to append an extra python test or two.

@deadalnix

This comment has been minimized.

Show comment
Hide comment
@deadalnix

deadalnix Oct 19, 2016

Collaborator

While this is certainly something very good to have, I would argue against merging it in 0.12bu . It is too large of a change to get in there without increasing risks for a key release of BU.

Collaborator

deadalnix commented Oct 19, 2016

While this is certainly something very good to have, I would argue against merging it in 0.12bu . It is too large of a change to get in there without increasing risks for a key release of BU.

@ptschip ptschip changed the title from Parallel Block Validation to prevent DDOS big block attack to [WIP] Parallel Block Validation to prevent DDOS big block attack Oct 20, 2016

Show outdated Hide outdated qa/rpc-tests/excessive.py
Show outdated Hide outdated qa/rpc-tests/p2p-acceptblock.py
@@ -91,6 +91,7 @@ def send_blocks_with_version(self, peer, numblocks, nVersionToUse):
block_time += 1
height += 1
tip = block.sha256
#tip = block.hash

This comment has been minimized.

@ftrader

ftrader Oct 23, 2016

Collaborator

dead code that can be removed?

@ftrader

ftrader Oct 23, 2016

Collaborator

dead code that can be removed?

Show outdated Hide outdated qa/rpc-tests/p2p-versionbits-warning.py
Show outdated Hide outdated qa/rpc-tests/p2p-versionbits-warning.py
Show outdated Hide outdated qa/rpc-tests/p2p-versionbits-warning.py
self.nodes[0].keypoolrefill(100)
for i in xrange(200):
send_to[self.nodes[0].getnewaddress()] = Decimal("0.01")
self.nodes[0].sendmany("", send_to)

This comment has been minimized.

@ftrader

ftrader Oct 23, 2016

Collaborator

To me it looks like the 5 lines from this one upward are repeated identically a few times in the following sections.

Am I missing a reason why you're not using a loop over them?

@ftrader

ftrader Oct 23, 2016

Collaborator

To me it looks like the 5 lines from this one upward are repeated identically a few times in the following sections.

Am I missing a reason why you're not using a loop over them?

Show outdated Hide outdated qa/rpc-tests/parallel.py
Show outdated Hide outdated src/init.cpp
@deadalnix

This comment has been minimized.

Show comment
Hide comment
@deadalnix

deadalnix Oct 23, 2016

Collaborator

On a more general tone, we should really not have any sleep in there...

Collaborator

deadalnix commented Oct 23, 2016

On a more general tone, we should really not have any sleep in there...

@deadalnix

This comment has been minimized.

Show comment
Hide comment
@deadalnix

deadalnix Oct 23, 2016

Collaborator

I looks like like what you need is a thread pool and a finish flag.

Collaborator

deadalnix commented Oct 23, 2016

I looks like like what you need is a thread pool and a finish flag.

@ptschip

This comment has been minimized.

Show comment
Hide comment
@ptschip

ptschip Oct 25, 2016

Collaborator

@ftrader I commented and/or fixed the issues you highlighted.

Collaborator

ptschip commented Oct 25, 2016

@ftrader I commented and/or fixed the issues you highlighted.

@ptschip

This comment has been minimized.

Show comment
Hide comment
@ptschip

ptschip Oct 25, 2016

Collaborator

@deadalnix I think what you getting at is why not put everything in a threadgroup. However, these threads are detached threads. We can't join them, we want them to run until they complete or are otherwise interrupted, without affecting the main processing thread, (and then we use a semaphore to make sure we don't launch any more until at least one has returned). And since we can't .join them then there is no use in have a threadgroup at shutdown to make sure the threads have finished (and we can't add detached threads to a group anyway). So instead, I use mapBlockValidationThreads to track all current running thread information, which is essentially, a custom threadgroup for our detached threads.

But you do bring up an issue , there needs to be code in there on shutdown to make sure that all the PV threads, if any, have finished their work. I'll work on that next an update here when ready.

Collaborator

ptschip commented Oct 25, 2016

@deadalnix I think what you getting at is why not put everything in a threadgroup. However, these threads are detached threads. We can't join them, we want them to run until they complete or are otherwise interrupted, without affecting the main processing thread, (and then we use a semaphore to make sure we don't launch any more until at least one has returned). And since we can't .join them then there is no use in have a threadgroup at shutdown to make sure the threads have finished (and we can't add detached threads to a group anyway). So instead, I use mapBlockValidationThreads to track all current running thread information, which is essentially, a custom threadgroup for our detached threads.

But you do bring up an issue , there needs to be code in there on shutdown to make sure that all the PV threads, if any, have finished their work. I'll work on that next an update here when ready.

@deadalnix

This comment has been minimized.

Show comment
Hide comment
@deadalnix

deadalnix Oct 26, 2016

Collaborator

You have a set of thread that have a set of task to do. This is textbook use case for a Threadpool .

These threads each have their block to process, plus a shared object so they can coordinate. When one the thread successfully validate a block, it can set a flag int hat object. On a regular basis (for instance, in between each transaction, the actual policy do not matter much here) a worker checks the flag and abandon if it notices the flag is set. (setting the flag needs to be done with a release atomic, reading with a acquire atomic)

Interleaving the synchronization logic with the code doing the work not only is hard to understand and fragile, but it is also almost impossible to get right.

You can set the maximum level of parallelism in the Threadpool (currently 4, but I see no point setting this in stone) and provides blocks to the threadpool to process.

Collaborator

deadalnix commented Oct 26, 2016

You have a set of thread that have a set of task to do. This is textbook use case for a Threadpool .

These threads each have their block to process, plus a shared object so they can coordinate. When one the thread successfully validate a block, it can set a flag int hat object. On a regular basis (for instance, in between each transaction, the actual policy do not matter much here) a worker checks the flag and abandon if it notices the flag is set. (setting the flag needs to be done with a release atomic, reading with a acquire atomic)

Interleaving the synchronization logic with the code doing the work not only is hard to understand and fragile, but it is also almost impossible to get right.

You can set the maximum level of parallelism in the Threadpool (currently 4, but I see no point setting this in stone) and provides blocks to the threadpool to process.

@ptschip

This comment has been minimized.

Show comment
Hide comment
@ptschip

ptschip Nov 4, 2016

Collaborator

@deadalnix Sorry for the long post, but with all the chatter going on in slack I thought I should respond to your concerns here.

I understand what you're asking as far as wanting something more classical in the design of the threadgoup and that way things are now doesn't mean we can't move in that direction. But this is how I had to build up the code. It was extremely and I mean extremely difficult to figure out how the parallelism should work with the locking, the UTXO and scriptcheckqueues, and also just the learning curve of a new part of the codebase. So rather than start from a perfect design I started by spinning up a few threads and working with IBD. Not having any kind of way to generate parallel blocks in large amounts I needed to work that way so that I could actually see whether this PV was even going to work and whether there would be any serious performance issues that would preclude us from going down this path and wasting a lot of time. So it took many weeks of incremental coding and testing to make it work, and it does work. I'm a very good and thorough tester, of course no one should trust the developer of his code in that, but just to say I'm very confident in the working of the code as it is.

But, yes you are right, we can move to a better design, better encapsulation of the code to make it more readable and maintainable, but the functionality of the code will not change. It is complicated, as you say, particularly in the area locking and locking orders, but that fact will not change by changing the thread model IMO.

As for the hard coded 4 thread scriptcheckqueue groups. I agree with you, it would be good to have that configurable but it requires some work there but it's not critical that we make it configurable right now IMO, and it is not such a straightforward task as it sounds. I think it is work that can be done at a later date.

As for the concerns about consensus. Certainly we are touching on the part of the code that determines which chain we follow. Tom Harding has had a look there and found a problem which I've corrected; so we've had some experienced eyes on that. Really the issue there is around pindexMostWork and how we handle that during parallelism. You can easily find that part in connectblock() in main.cpp. In the end while there are many commits to the code on github, there really isn't that much code that has changed. Most of the significant changes and changes related to consensus are found in connectblock(). Really the difficult part there was ensuring that we have the correct pindexMostWork. You can read more about that in the bu-parallel-validation.md found in the docs folder.

As for Bugs, well yes, I would like to know if there are any bugs that need fixing. We need more people testing and reviewing the code. But our group is still small and everybody is working on their own projects. So far though, Tom Harding and @ftrader have done some good review which has been very helpful so far. I would suggest, if you have time, at least that you compile and run the code, and there is a good python test now parallel.py which you can run and which can be extended on (still I'm working on it in the case of the 4 block scenarios at the end of the script) Feedback on any specific bugs would be much appreciated.

Collaborator

ptschip commented Nov 4, 2016

@deadalnix Sorry for the long post, but with all the chatter going on in slack I thought I should respond to your concerns here.

I understand what you're asking as far as wanting something more classical in the design of the threadgoup and that way things are now doesn't mean we can't move in that direction. But this is how I had to build up the code. It was extremely and I mean extremely difficult to figure out how the parallelism should work with the locking, the UTXO and scriptcheckqueues, and also just the learning curve of a new part of the codebase. So rather than start from a perfect design I started by spinning up a few threads and working with IBD. Not having any kind of way to generate parallel blocks in large amounts I needed to work that way so that I could actually see whether this PV was even going to work and whether there would be any serious performance issues that would preclude us from going down this path and wasting a lot of time. So it took many weeks of incremental coding and testing to make it work, and it does work. I'm a very good and thorough tester, of course no one should trust the developer of his code in that, but just to say I'm very confident in the working of the code as it is.

But, yes you are right, we can move to a better design, better encapsulation of the code to make it more readable and maintainable, but the functionality of the code will not change. It is complicated, as you say, particularly in the area locking and locking orders, but that fact will not change by changing the thread model IMO.

As for the hard coded 4 thread scriptcheckqueue groups. I agree with you, it would be good to have that configurable but it requires some work there but it's not critical that we make it configurable right now IMO, and it is not such a straightforward task as it sounds. I think it is work that can be done at a later date.

As for the concerns about consensus. Certainly we are touching on the part of the code that determines which chain we follow. Tom Harding has had a look there and found a problem which I've corrected; so we've had some experienced eyes on that. Really the issue there is around pindexMostWork and how we handle that during parallelism. You can easily find that part in connectblock() in main.cpp. In the end while there are many commits to the code on github, there really isn't that much code that has changed. Most of the significant changes and changes related to consensus are found in connectblock(). Really the difficult part there was ensuring that we have the correct pindexMostWork. You can read more about that in the bu-parallel-validation.md found in the docs folder.

As for Bugs, well yes, I would like to know if there are any bugs that need fixing. We need more people testing and reviewing the code. But our group is still small and everybody is working on their own projects. So far though, Tom Harding and @ftrader have done some good review which has been very helpful so far. I would suggest, if you have time, at least that you compile and run the code, and there is a good python test now parallel.py which you can run and which can be extended on (still I'm working on it in the case of the 4 block scenarios at the end of the script) Feedback on any specific bugs would be much appreciated.

@dgenr8

This is not a complete review.

Show outdated Hide outdated src/main.cpp
Show outdated Hide outdated src/main.cpp
// Once the first block has been disconnected and a re-org has begun then we need to terminate any
// currently running PV threads that are validating. They will likley have self terminated
// at this point anyway because the chain tip and UTXO base view will have changed but just
// to be sure we are not waiting on script threads to finish we can issue the termination here.

This comment has been minimized.

@dgenr8

dgenr8 Nov 6, 2016

Collaborator

This new code could be moved before fBlocksDisconnected = true and conditioned on !fBlocksDisconnected instead of a new boolean.

@dgenr8

dgenr8 Nov 6, 2016

Collaborator

This new code could be moved before fBlocksDisconnected = true and conditioned on !fBlocksDisconnected instead of a new boolean.

This comment has been minimized.

@ptschip

ptschip Nov 7, 2016

Collaborator

I put it after the first DisconnectTip() because if the disconnect fails it will return false and therefore we don't want to kill the validation threads if there are any running . (although it seems rather unlikely that a disconnect would fail and if it did i think there would be a more serious problem there, not sure it's even necessary to put in the code that it should return, perhaps an assert would be more appropriate).

Also I put it there in that section because I thought it would be best to kill the PV threads as soon as the first block was disconnected ...there may be other blocks to disconnect in the the event of a larger re-org so rather than waiting until all the blocks are disconnected , the PV threads are terminated immediately after the chaintip they are working on has been undone.

@ptschip

ptschip Nov 7, 2016

Collaborator

I put it after the first DisconnectTip() because if the disconnect fails it will return false and therefore we don't want to kill the validation threads if there are any running . (although it seems rather unlikely that a disconnect would fail and if it did i think there would be a more serious problem there, not sure it's even necessary to put in the code that it should return, perhaps an assert would be more appropriate).

Also I put it there in that section because I thought it would be best to kill the PV threads as soon as the first block was disconnected ...there may be other blocks to disconnect in the the event of a larger re-org so rather than waiting until all the blocks are disconnected , the PV threads are terminated immediately after the chaintip they are working on has been undone.

This comment has been minimized.

@dgenr8

dgenr8 Nov 7, 2016

Collaborator

Sorry, my only point here was that a new boolean isn't needed.

@dgenr8

dgenr8 Nov 7, 2016

Collaborator

Sorry, my only point here was that a new boolean isn't needed.

This comment has been minimized.

@ptschip

ptschip Nov 9, 2016

Collaborator

@dgenr8 Oh, ok...thanks, that makes sense. I'll update that.

@ptschip

ptschip Nov 9, 2016

Collaborator

@dgenr8 Oh, ok...thanks, that makes sense. I'll update that.

Show outdated Hide outdated src/main.cpp
Show outdated Hide outdated src/main.cpp
@@ -2408,8 +2447,12 @@ static int64_t nTimeIndex = 0;
static int64_t nTimeCallbacks = 0;
static int64_t nTimeTotal = 0;
bool ConnectBlock(const CBlock& block, CValidationState& state, CBlockIndex* pindex, CCoinsViewCache& view, bool fJustCheck)
bool ConnectBlock(const CBlock& block, CValidationState& state, CBlockIndex* pindex, CCoinsViewCache& view, bool fJustCheck, bool fParallel)

This comment has been minimized.

@deadalnix

deadalnix Nov 10, 2016

Collaborator

Boolean parameter is usually a code smell. I know they are all over the codebase, but I don't think that's a good reason to pile up.

@deadalnix

deadalnix Nov 10, 2016

Collaborator

Boolean parameter is usually a code smell. I know they are all over the codebase, but I don't think that's a good reason to pile up.

This comment has been minimized.

@ptschip

ptschip Dec 10, 2016

Collaborator

it is needed here, we have to support both parallel mode and non-parallel mode of operation.

@ptschip

ptschip Dec 10, 2016

Collaborator

it is needed here, we have to support both parallel mode and non-parallel mode of operation.

if (coins && !coins->IsPruned())
return state.DoS(100, error("ConnectBlock(): tried to overwrite transaction"),
REJECT_INVALID, "bad-txns-BIP30");
}

This comment has been minimized.

@deadalnix

deadalnix Nov 10, 2016

Collaborator

The whole thing should go in its own diff.

@deadalnix

deadalnix Nov 10, 2016

Collaborator

The whole thing should go in its own diff.

Show outdated Hide outdated src/main.cpp
return false;
}

This comment has been minimized.

@deadalnix

deadalnix Nov 10, 2016

Collaborator

Revert

@deadalnix

deadalnix Nov 10, 2016

Collaborator

Revert

This comment has been minimized.

@deadalnix

deadalnix Nov 10, 2016

Collaborator

Also,

if (!foo) return false;
return true;
@deadalnix

deadalnix Nov 10, 2016

Collaborator

Also,

if (!foo) return false;
return true;
Show outdated Hide outdated src/main.cpp
@deadalnix

This comment has been minimized.

Show comment
Hide comment
@deadalnix

deadalnix Nov 11, 2016

Collaborator

Alright. There are all kind of things that are wrong here. I think you have a good prototype, but I don't think this meets the standards of quality we should expect from code that run a $10B network. I know that the code is fubared to begin with, but that's not a good reason to add more.

I think it is fair to say that this code grown organically rather than grew from principled software engineering. that's ok to get something up and running quick, but that's it.

I'd like to paraphrase Romero here - it is about guideline they put in place with Carmack and other when growing id software - "When you find a problem with existing code, fix it. Do not continue to do what you are doing, do not work around it and do not open a task for it. If you don't, then your code will be build on faulty foundations."

That being said, here we go.

  • The code mix synchronization with busniess logic. You need to come up with an architecture that allow a separation of concern here. See reason below in a).
  • The code uses 3 different synchronization techniques (mutexes, flagging, and boost thread interruption). At least 2 of them are redundant with each others (flags and boost thread facilities).
  • The flags are not used properly. First, they need to be set using release atomic and read using acquire , and second, it is unclear what prevents the UTXO to be changed between the flag read and the UTXO use. It looks like the right thing to do here is to check the flag while holding a read lock on the UTXO.
  • Second there are 2 different fQuit flags. For maximum confusion.
  • There seems to be no use of RWLock . Things like the UTXO would greatly benefit from using this.
  • There is a lot of manual lock and unlocking. This isn't exception safe. This is also an indicator of poor code structure. Note that boost thread termination uses exceptions. Scope guards are needed, and code using several mutexes needs to be physically separated.
  • There is a TON of fParallel checks. this is a code smell. once again, it shows poor code structure - same as for IsWitnessProgram checks scattered all over the codebase. If you need to specialize a bunch of behavior based on some check, there are various superior techniques. One would be to go OOP and use virtual methods. The checks needs to be done once and virtual dispatch takes care of the rest. Another alternative is to use compile-time policies ( http://www.intopalo.com/blog/2014-03-28-policy-based-design/ ).

a) Synchronization logic cannot be tested the way regular code is. It is also notoriously error prone. It is important that it needs to be reviewable independently from the application logic.

Last but not least large changes that do contains both refactoring and logic change are the most risky. They also tend to not be a very efficient way to work for the contributor. In effect, nothing can be merged as long as EVERYTHING is in a good enough state to be merged. This contains code that is perfectly fine, which, maybe become not fine in the future because of other's work. I would suggest you proceed as follow:
1/ Do small restructuring of the existing code to move toward getting the hooks you need at the right place. These changes needs to be purely structural and do not change any application logic. This is often a good opportunity to add extra unittests as the structure of the code improves.
2/ In parallel, start introducing the code for parallel validation. This do not need to be wired in existing logic outside of tests at first.

When 1/ and 2/ are sufficiently advanced, you should find yourself in a position where you can add the fParallel flag, a check to it early on in the validation logic and chose the right policy accordingly. This should looks like (pseudocode) :

ValidateBlock<SingleThreadValidationPolicy>(chain, block);

changed into

// Various code to checks if we want parallel validation
fParallel = ...;

if (fParallel) {
    ValidateBlock(chain, block, MultipleThreadValidationPolicy(4));
} else {
    ValidateBlock(chain, block, SingleThreadValidationPolicy());
}

And Voila !

I hopes that helps.

Collaborator

deadalnix commented Nov 11, 2016

Alright. There are all kind of things that are wrong here. I think you have a good prototype, but I don't think this meets the standards of quality we should expect from code that run a $10B network. I know that the code is fubared to begin with, but that's not a good reason to add more.

I think it is fair to say that this code grown organically rather than grew from principled software engineering. that's ok to get something up and running quick, but that's it.

I'd like to paraphrase Romero here - it is about guideline they put in place with Carmack and other when growing id software - "When you find a problem with existing code, fix it. Do not continue to do what you are doing, do not work around it and do not open a task for it. If you don't, then your code will be build on faulty foundations."

That being said, here we go.

  • The code mix synchronization with busniess logic. You need to come up with an architecture that allow a separation of concern here. See reason below in a).
  • The code uses 3 different synchronization techniques (mutexes, flagging, and boost thread interruption). At least 2 of them are redundant with each others (flags and boost thread facilities).
  • The flags are not used properly. First, they need to be set using release atomic and read using acquire , and second, it is unclear what prevents the UTXO to be changed between the flag read and the UTXO use. It looks like the right thing to do here is to check the flag while holding a read lock on the UTXO.
  • Second there are 2 different fQuit flags. For maximum confusion.
  • There seems to be no use of RWLock . Things like the UTXO would greatly benefit from using this.
  • There is a lot of manual lock and unlocking. This isn't exception safe. This is also an indicator of poor code structure. Note that boost thread termination uses exceptions. Scope guards are needed, and code using several mutexes needs to be physically separated.
  • There is a TON of fParallel checks. this is a code smell. once again, it shows poor code structure - same as for IsWitnessProgram checks scattered all over the codebase. If you need to specialize a bunch of behavior based on some check, there are various superior techniques. One would be to go OOP and use virtual methods. The checks needs to be done once and virtual dispatch takes care of the rest. Another alternative is to use compile-time policies ( http://www.intopalo.com/blog/2014-03-28-policy-based-design/ ).

a) Synchronization logic cannot be tested the way regular code is. It is also notoriously error prone. It is important that it needs to be reviewable independently from the application logic.

Last but not least large changes that do contains both refactoring and logic change are the most risky. They also tend to not be a very efficient way to work for the contributor. In effect, nothing can be merged as long as EVERYTHING is in a good enough state to be merged. This contains code that is perfectly fine, which, maybe become not fine in the future because of other's work. I would suggest you proceed as follow:
1/ Do small restructuring of the existing code to move toward getting the hooks you need at the right place. These changes needs to be purely structural and do not change any application logic. This is often a good opportunity to add extra unittests as the structure of the code improves.
2/ In parallel, start introducing the code for parallel validation. This do not need to be wired in existing logic outside of tests at first.

When 1/ and 2/ are sufficiently advanced, you should find yourself in a position where you can add the fParallel flag, a check to it early on in the validation logic and chose the right policy accordingly. This should looks like (pseudocode) :

ValidateBlock<SingleThreadValidationPolicy>(chain, block);

changed into

// Various code to checks if we want parallel validation
fParallel = ...;

if (fParallel) {
    ValidateBlock(chain, block, MultipleThreadValidationPolicy(4));
} else {
    ValidateBlock(chain, block, SingleThreadValidationPolicy());
}

And Voila !

I hopes that helps.

Show outdated Hide outdated src/checkqueue.h
Show outdated Hide outdated src/main.cpp
@@ -2541,16 +2632,35 @@ bool ConnectBlock(const CBlock& block, CValidationState& state, CBlockIndex* pin
if (!tx.IsCoinBase())
{
if (!view.HaveInputs(tx))
if (!viewTempCache.HaveInputs(tx)) {

This comment has been minimized.

@deadalnix

deadalnix Nov 14, 2016

Collaborator

Can you explain why view cannot be used ?

@deadalnix

deadalnix Nov 14, 2016

Collaborator

Can you explain why view cannot be used ?

Show outdated Hide outdated src/main.cpp
Show outdated Hide outdated src/main.cpp
Show outdated Hide outdated src/main.cpp
Show outdated Hide outdated src/main.cpp
Show outdated Hide outdated src/main.cpp
Show outdated Hide outdated src/checkqueue.h
Show outdated Hide outdated src/main.cpp
Show outdated Hide outdated src/main.cpp
Show outdated Hide outdated src/main.cpp

Peter Tschipper and others added some commits Sep 6, 2016

Parallel Block Validation to mitigate DDOS big block attack
- add 3 more script check queues
- use a wrapper function to call HandleBlockMessageThread
- Create semaphores to control and handle queue selection and throughput
- Update locking - we do not need to hold cs_main when checking inputs
- Flush the temporary view to the base view rather than re-running every txn with UpdateCoins
- unlock cs_main before waiting for script check threads to complete
- Python test for Parallel Validation

- Disable SENDHEADERS requests and always revert to INV
Seems that SENDHEADERS functionality really causes problems
even more so with parallel validation.  Turning it off here
completely, but we will still service nodes that want it.

- Only allow parallel block validation during IBD or receiving new blocks

- When mining new blocks or re-indexing or any other activity the blocks
are prevented from entering parallel validation: the cs_main locks are
not relinquished in those cases.

- Turn off Parallel Validation when doing IBD

- Ability to Quit scriptcheck threads

When  4 blocks are running in parallel and a
new, smaller block shows up, we need to be able to interrupt the
script threads that are currently validating for a one of the  blocks
so that we can free up a script check queue for the new smaller  block.

Documentation for Parallel Validation

Change some logprint messages from parallel to parallel_2

- Update the nBlockSequenceId after the block has advanced the tip

This is important for Parallel Validation.  By decreasing the sequence id
we are indicating that this block represents the current pindexMostWork. This
prevents the losing block from having the pindexMostWork point to it rather
than to the winning block.

- Continously check for new blocks to connect

With Parallel Validation new blocks can be accepted while we are connecting at
tip therefore we need to check after we connect each block whether a potentially
longer more work chain now exists. If we don't do this then we can at times end
up with a temporarily unconnected block until the next block gets mined and
another attempt is made to connect the most work chain.

- Terminate any PV threads during a re-org

PV can only operate with blocks that will advance the current chaintip (fork1). Therefore,
if we are needing to re-org to another chain (fork 2)  then we have to kill any currently
running PV threads assoicated with the current chain tip on fork1. This is the
solution to the problem of having two forks being mined while there is an attack
block processing on fork1. If fork2 continues to be mined and eventually pulls
ahead of fork1 in proof of work, then a re-org to fork2 will be initiated causing
the PV threads on fork1 to be terminated and fork2 blocks then connected and fork2 then
becoming the chain active tip.

- Use Chain Work instead of nHeight to self terminate a PV thread

If the Chain Work has changed either positve or negative to where it
was when we started the PV thread then we will exit the thread.  Previously
we were using Chain Height, which worked fine, but this is more understantable
from a coding perspective and also we added the feature to check for when
Chain Work has decreased from the starting point which would indicate that
a re-org was underway.
Move ZMQ notifications to ActivateBestChainStep
We must notify ZMQ after each tip is connected rather
than after ActivateBestChain otherwise we can miss a block
when there are several to connect together at once.
Don't relay block inventory unless the chain is nearly sync'd
This particularly seems to help sync issues on regest where
we're mining many blocks all at the same time.
Simplify the selection of the scriptcheck_mutex and pScriptQueue
Abstract the code and make it more readable by returning both
values at the same time.

Small Pointer Optimization

Reorder where the control is aquired
Completely remove cs_main locks from checkinputs and sigs during PV
During the loop whre we check inputs and signatures we can completely
remove the cs_main locks by making sure to add an internal lock
cs_main lock to ChainWorkHasChanged().  This function rarely will
get invoked and only if the thread is about to exit so there is no
problem here of adding any additional overhead.

Removing the cs_main locks here is the most important step in
completely removing any significant inter-dependancy between
parallel validation threads.
Must maintain locking order between scoped lock and cs_main
If we are in PV mode then we must make sure the scoped lock is
unlocked before re-locking cs_main otherwise there is a potential
for a deadlock.  The locking order should be cs_main -> scoped lock.
However during PV we unlock cs_main before aquiring the scoped lock
and therefore we much unlock the scoped lock before re-aquiring
cs_main near the end of ConnectBlock().
Make sure locks are correct before returning after SequenceLocks check.
The scriptlock must be unlocked and cs_main must be locked before
returning after the call to SequenceLocks() fails.
Create SetLocks() used to set the locking order when returning from PV
If there is an error during PV then we have to make sure the locks
are set in the right order before returning.  cs_main must always
be locked when we return from ConnectBlock() as cs_main is recursive
but we must also ensure that the scriptlock is unlocked before
re-locking cs_main to avoid a potential deadlock.

By creating SetLocks() we can abstract away the setting of the locking
order and prevent any developer confusion or possible introduction of  errors
into the code if future changes are made in the ConnectBlock() function.
Consolidate and simplify the sript check thread pool creation
In the past the code for creaating the script check threads and pools
was all over the place in several files but is now consolidated in
parallel.cpp and parallel.h.  Also is is much easier to make any
changes to the number of scriptcheckqueue's by just editing two
lines in  parallel.cpp.
Move cs_blocksemaphore into globals.
Using it defined as a static within HandleBlockMessageThread is not
threadsafe until we get to c++11.
Change back to using a mutex for fQuit flag
It seems overkill to use atomic here for code that will
very rarely get executed.
Use only one semaphore for PV instead of two
This simplifies the code and makes it easier to understand. Also, there
is very little improvement if any to IBD by using a separate semaphore.
@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Dec 17, 2016

Contributor

This would significantly incentivize the creation of empty blocks, as they would always win races even if they arrived significantly later. Is this your intention?

Contributor

gmaxwell commented Dec 17, 2016

This would significantly incentivize the creation of empty blocks, as they would always win races even if they arrived significantly later. Is this your intention?

@gandrewstone gandrewstone changed the title from [WIP] Parallel Block Validation to prevent DDOS big block attack to Parallel Block Validation to prevent DDOS big block attack Jan 8, 2017

@gandrewstone

This comment has been minimized.

Show comment
Hide comment
@gandrewstone

gandrewstone Jan 9, 2017

Collaborator

closing, ptschip has a new branch and will open a new PR soon...

Collaborator

gandrewstone commented Jan 9, 2017

closing, ptschip has a new branch and will open a new PR soon...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment