Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensus: Avoid data race in CBlock class #14803

Closed
wants to merge 2 commits into from

Conversation

hebasto
Copy link
Member

@hebasto hebasto commented Nov 25, 2018

Ref: #14058, #14072

CheckBlock can be invoked from multiple threads in parallel. To avoid a data race std::atomic<bool> is used.
All comments from #14072 are addressed.

As this PR introduces copy constructors for both CBlockHeader and CBlock classes, additional refactoring proposed for CBlock::GetBlockHeader().

@fanquake
Copy link
Member

@hebasto Can you also link to #14058 from the PR description, as that is the issue #14072 was solving.

@hebasto
Copy link
Member Author

hebasto commented Nov 25, 2018

@fanquake Done.

@hebasto hebasto changed the title consensus: Avoid data race in CBlock class [WIP] consensus: Avoid data race in CBlock class Nov 25, 2018
@hebasto hebasto changed the title [WIP] consensus: Avoid data race in CBlock class consensus: Avoid data race in CBlock class Nov 26, 2018
@gmaxwell
Copy link
Contributor

gmaxwell commented Nov 26, 2018

Thanks for working on this.

Simply making the check flag atomic doesn't really seem correct to me-- why would we be checking the same block object twice concurrently? Just making the test atomic prevents the that particular bit of UB but it doesn't mean that there isn't wasted work, or even the potential of a deadlock.

Can someone please summarize how the same block can get checked in parallel and why it makes sense for that to happen? 14058 simply states the test fail with thread sanitizing enabled, but don't explain the sequence of events that lead to the race. I want to make sure that we're not bandaging over a structural flaw that should be fixed with a lock someplace or some other change.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 26, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #14829 (travis: Enable functional tests in the ThreadSanitizer (TSan) build job by practicalswift)
  • #10785 (Serialization improvements by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@promag
Copy link
Contributor

promag commented Nov 26, 2018

I agree with @gmaxwell, why make data structure thread safe?

Beside, CBlock::operator= doesn't look thread safe to me, can't m_checked change between load and store?

@maflcko
Copy link
Member

maflcko commented Nov 26, 2018

The race happens because we run 10 threads in parallel that check 1000 random blocks in sequence. With only about 100 blocks in total, it is unlikely that every thread picks blocks that have not yet been picked by another thread.
So I think there are at least two alternative solutions:

  • Assume that we only ever call ProcessNewBlock for each block once, so we'd rewrite the unit test to not pick a random block, but pop (remove and return) a random block from the list. But this assumption seems unlikely, since a block could arrive on the p2p and rpc interface at the same time, which then both call ProcessNewBlock.
  • Assume that ProcessNewBlock avoids checking the same block twice, which could be fixed by guarding the read-write of fChecked on a larger scope, instead of just fixing the symptom that the member variable is racy.
  • something else?

@promag
Copy link
Contributor

promag commented Nov 26, 2018

I think 2nd option is the most reasonable.

@sipa
Copy link
Member

sipa commented Nov 26, 2018

My preference is avoid adding synchronization primitives to structures that don't need it. Here it seems the case that the validation logic doesn't actually supported multi-threaded block validation, but the unit tests verify (a small) part of it regardless. That seems overkill.

@maflcko
Copy link
Member

maflcko commented Nov 26, 2018

If it doesn't support multi-threaded block validation, it should still be fixed to avoid multiple threads calling it at the same time, because that is a thing that can happen in practice with the rpc thread running along the "main thread".

@gmaxwell
Copy link
Contributor

It doesn't support multithreaded validation and there are lot of things that prevent that, which is why I was concerned. Why doesn't the lock on the block index or even cs main prevent concurrency here?

I'd missed that the original case was a unit test.

@maflcko
Copy link
Member

maflcko commented Nov 27, 2018

Why doesn't the lock on the block index or even cs main prevent concurrency here?

The cs main lock was moved down in this change "Harden against mistakes handling invalid blocks" #9765 by @sdaftuar

So unless there was a reason to not put it before CheckBlock, I suggest to move it up by one line.

@maflcko maflcko closed this Nov 29, 2018
laanwj added a commit that referenced this pull request Dec 1, 2018
c5ed6e7 Move CheckBlock() call to critical section (Hennadii Stepanov)

Pull request description:

  This is an alternative to #14803.

  Refs:
  - #14058
  - #14072
  - #14803 (comment) by @gmaxwell
  > It doesn't support multithreaded validation and there are lot of things that prevent that, which is why I was concerned. Why doesn't the lock on the block index or even cs main prevent concurrency here?

  - #14803 (comment) by @MarcoFalke

Tree-SHA512: 2152e97106e11da5763b2748234ecd2982daadab13a0da04215f4db60af802a44ab5700f32249137d122eb13fc2a02e0f2d561d364607d727d8c6ab879339afb
@hebasto hebasto deleted the 20181125-cblock-threadsafe branch December 1, 2018 09:47
Munkybooty pushed a commit to Munkybooty/dash that referenced this pull request Aug 17, 2021
…ction

c5ed6e7 Move CheckBlock() call to critical section (Hennadii Stepanov)

Pull request description:

  This is an alternative to bitcoin#14803.

  Refs:
  - bitcoin#14058
  - bitcoin#14072
  - bitcoin#14803 (comment) by @gmaxwell
  > It doesn't support multithreaded validation and there are lot of things that prevent that, which is why I was concerned. Why doesn't the lock on the block index or even cs main prevent concurrency here?

  - bitcoin#14803 (comment) by @MarcoFalke

Tree-SHA512: 2152e97106e11da5763b2748234ecd2982daadab13a0da04215f4db60af802a44ab5700f32249137d122eb13fc2a02e0f2d561d364607d727d8c6ab879339afb
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Sep 5, 2021
…ction

c5ed6e7 Move CheckBlock() call to critical section (Hennadii Stepanov)

Pull request description:

  This is an alternative to bitcoin#14803.

  Refs:
  - bitcoin#14058
  - bitcoin#14072
  - bitcoin#14803 (comment) by @gmaxwell
  > It doesn't support multithreaded validation and there are lot of things that prevent that, which is why I was concerned. Why doesn't the lock on the block index or even cs main prevent concurrency here?

  - bitcoin#14803 (comment) by @MarcoFalke

Tree-SHA512: 2152e97106e11da5763b2748234ecd2982daadab13a0da04215f4db60af802a44ab5700f32249137d122eb13fc2a02e0f2d561d364607d727d8c6ab879339afb
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Sep 6, 2021
…ction

c5ed6e7 Move CheckBlock() call to critical section (Hennadii Stepanov)

Pull request description:

  This is an alternative to bitcoin#14803.

  Refs:
  - bitcoin#14058
  - bitcoin#14072
  - bitcoin#14803 (comment) by @gmaxwell
  > It doesn't support multithreaded validation and there are lot of things that prevent that, which is why I was concerned. Why doesn't the lock on the block index or even cs main prevent concurrency here?

  - bitcoin#14803 (comment) by @MarcoFalke

Tree-SHA512: 2152e97106e11da5763b2748234ecd2982daadab13a0da04215f4db60af802a44ab5700f32249137d122eb13fc2a02e0f2d561d364607d727d8c6ab879339afb
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants