Skip to content

Conversation

ken2812221
Copy link
Contributor

@ken2812221 ken2812221 commented Oct 11, 2018

This PR removes use of the following object in checkqueue

  • boost::condition_variable
  • boost::mutex
  • boost::unique_lock

Instead of using boost::thread_group::interrupt_all to trigger boost::thread_interrupted exception, it should call InterruptScriptCheck() or scriptcheckqueue.Interrupt() to interrupt the checkqueue thread.

Also, it adds three functions:

  • StartScriptCheck()
  • InterruptScriptCheck()
  • StopScriptCheck()

for clearer logic for script check threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to PR title boost/thread/condition_variable.hpp should go as well? :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only remove them in checkqueue, the another place to use it is in scheduler.

@practicalswift
Copy link
Contributor

Concept ACK

Towards a Boost free future! :-)

src/checkqueue.h Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could just call this interrupted

src/checkqueue.h Outdated
Copy link
Contributor

@Empact Empact Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an if only has a single-statement then-clause, it can appear
on the same line as the if, without braces. In every other case,
braces are required, and the then and else clauses must appear
correctly indented on a new line.

https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md#coding-style-c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format doesn't catch this. Done manually.

@ken2812221 ken2812221 changed the title rafactor: Remove use of boost::condition_variable and boost::mutex in checkqueue refactor: Remove use of boost::condition_variable and boost::mutex in checkqueue Oct 11, 2018
src/validation.h Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pluralize to threads

@ch4ot1c
Copy link
Contributor

ch4ot1c commented Oct 11, 2018

utACK

@ken2812221 ken2812221 changed the title refactor: Remove use of boost::condition_variable and boost::mutex in checkqueue refactor: Remove boost in checkqueue Oct 12, 2018
@sipa
Copy link
Member

sipa commented Oct 12, 2018

utACK 9aacef638f0c2159930640762daf524942052a9d

@donaloconnor
Copy link
Contributor

utACK 9aacef6

src/checkqueue.h Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: only one master thread so maybe condMaster.notify_one() is more consistent. No difference to code though.

Copy link
Contributor

@JeremyRubin JeremyRubin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weak concept Ack.

Necessary step to get rid of boost, but conflicts with some unmerged work (#9938, JeremyRubin#1) which does the same thing.

Overall, this appears to be correct.

Before merge I'd like to see some benchmarks, the checkqueue code is very performance sensitive, sometimes the boost condition variables and mutexs difference from std's can be substantial. (I'm not kidding -- the time to wake-up and condition variable sleep behavior is one of the major performance issues with this code).

src/checkqueue.h Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the condition as a callback to wait.

e.g., cond.wait(lock, [&] { return interrupted || !queue.empty(); })

@ken2812221
Copy link
Contributor Author

Addressed comments

@ken2812221
Copy link
Contributor Author

ken2812221 commented Oct 18, 2018

@JeremyRubin
benchmark result on my Ubuntu 18.04 virtual machine:
master 816fab9

# Benchmark, evals, iterations, total, min, max, median
CCheckQueueSpeedPrevectorJob, 100, 1400, 83.1845, 0.000461328, 0.000692877, 0.000668382

This PR 625c55a70c376af9a94a2bb118c39685d3b82b33

# Benchmark, evals, iterations, total, min, max, median
CCheckQueueSpeedPrevectorJob, 100, 1400, 66.268, 0.000451829, 0.000491473, 0.000473479

@JeremyRubin
Copy link
Contributor

That's great, thanks for checking that -- much bigger change than I expected it to be.

Looking at the numbers, it seems not only to be faster on the microbenchmark, but more consistent.

This merits running two additional tests, if you have energy for it:

  1. Time to reindex (with assumevalid up to last month or something so it doesn't take forever
  2. Two nodes (one with this PR one without) connected behind a third node (on latest release), with debug bench set. Collect the logs on ConnectBlock.
    • Also collect a general system profile of which processes are using what on the cpu.

The second test is important because one thing I noticed when working on this code previously is that it was trivially easy to get speedups on this code at the expense of general performance. E.g., getting rid of condition variables (and just spinning until jobs came in) was much faster, but meant that the OS scheduler would be giving our CheckQueue workers run-time when there was nothing to do, impacting other processes on the computer (most likely because of causing more cache invalidation, rather than the impact of losing a few cycles of a process switch). Hence my suspicion: the 20% speedup is because the standard condition variables are not using system sleep (which plays better with the OS). This can actually negatively impact the process too as our main thread might be getting context switched out while we are trying to add jobs or something rather than waiting for notify.

src/checkqueue.h Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could assert(! (fMaster && nTodo == 0)) here.

If fMaster were ever true here, we would deadlock.

Hence, there is no point to check that...

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 20, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

src/checkqueue.h Outdated
Copy link
Contributor Author

@ken2812221 ken2812221 Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After tested a few times, return false or somthing else? seem to make the node have to reindex next time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue could be that when you interrupt, you don't want to make fMaster return false causing the block to be marked false (which could be written into cache somehow).

Is interrupt guaranteed to wait until the current job is done? Making it guaranteed to wait probably could help fix that behavior... One way of doing that is to separate out when you wake up if there's an active job or not, and don't allow to quit when there is an active job until it's done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend adapting the quitting logic from https://github.com/JeremyRubin/bitcoin/blob/aa847431e266fb08289edbac3665ac7c93a81b8a/src/checkqueue.h if you want a reference, which should be more or less correct (albeit with a lot of extensions you can ignore).

@ken2812221 ken2812221 changed the title refactor: Remove boost in checkqueue [WIP] refactor: Remove boost in checkqueue Oct 24, 2018
@JeremyRubin
Copy link
Contributor

This version is not correct either if I'm correct about the block getting flushed with validity cached during shutdown.

Now you might incorrectly cache that a block was valid when it may not have been.

You need to finish processing the block, then shut down. Or, if a shutdown is triggered, abandon the block in the master (you could make a trivalent return: false, true, shutdown).

@JeremyRubin
Copy link
Contributor

Actually I think that's not right, I think this version will just hang because nTodo will be != 0 and queue.empty() will be true, so the master would sleep and never get woken?

@ken2812221
Copy link
Contributor Author

ken2812221 commented Oct 26, 2018

@JeremyRubin master would never wait/sleep if interrupted is true. It would keep looping until nTodo == 0. Oh, I know what you mean.

@JeremyRubin
Copy link
Contributor

Your best bet IMO is to have a CheckQueueControl mutex which is separate from the CheckQueue mutex. Make interrupting get both...

@ken2812221
Copy link
Contributor Author

ken2812221 commented Oct 26, 2018

@JeremyRubin
How about now? I think the logic here would be more clear. It would keep checking until finish.
If there are new tasks added into queue after the worker threads ended, it would be only master worker checking the tasks unfortunately. However I think this is better than interrupt the master thread. Alternatively, we can detach these worker threads.

I am going to test this to see if there is anything broken.

@JeremyRubin
Copy link
Contributor

Where's the recursive lock?

@ken2812221
Copy link
Contributor Author

In test_bitcoin, it constuct CCheckQueueControl and call Interrupt in same thread

@JeremyRubin
Copy link
Contributor

Interrupt should be called in the destructor of the test which should happen after any queuecontrols are destructed though?

@ken2812221
Copy link
Contributor Author

Interrupt should be called in the destructor of the test which should happen after any queuecontrols are destructed though?

I think it would be better to make the checkqueue to control their worker threads instead of calling Thread() in somewhere.

@DrahtBot DrahtBot mentioned this pull request Oct 30, 2018
@ken2812221 ken2812221 changed the title refactor: Remove boost in checkqueue refactor: make checkqueue manage the threads by itself (also removed some boost dependencies) Nov 23, 2018
@bitcoin bitcoin deleted a comment from minblock Jan 3, 2019
@bitcoin bitcoin deleted a comment from DrahtBot Jan 3, 2019
@maflcko
Copy link
Member

maflcko commented Mar 6, 2019

Could try a git rebase -Xignore-all-space df36ddf9ce8a4dcf0d7f324e57a13abb6cf6a57c?

This commit add a Interrupt function for CCheckQueue that it can handle interrupt by itself instead of relying on boost thread interrupt
replace boost::mutex with debuggable Mutex
replace boost::condition_variable with std::condition_variable
add const specifier to fMaster and nBatchSize
add clang thread safety attributes
move init value of member of CCheckQueue from constructor to definition
Manage CCheckQueue workder threads by itself. Do not expose the threads outside the object.
@practicalswift
Copy link
Contributor

@ken2812221 Why the close? :-)

@ken2812221
Copy link
Contributor Author

The CI was failed and I don't have time to find the bugs out recently, so I closed it, I might reopen this when I has more time working on this.

laanwj added a commit that referenced this pull request Jan 25, 2021
bb6fcc7 refactor: Drop boost::thread stuff in CCheckQueue (Hennadii Stepanov)
6784ac4 bench: Use CCheckQueue local thread pool (Hennadii Stepanov)
dba3069 test: Use CCheckQueue local thread pool (Hennadii Stepanov)
0151177 Add local thread pool to CCheckQueue (Hennadii Stepanov)
0ef9386 refactor: Use member initializers in CCheckQueue (Hennadii Stepanov)

Pull request description:

  This PR:
  - gets rid of `boost::thread_group` in the `CCheckQueue` class
  - allows thread safety annotation usage in the `CCheckQueue` class
  - is alternative to #14464 (#18710 (comment), #18710 (comment))

  Also, with this PR (I hope) it could be easier to resurrect a bunch of brilliant ideas from #9938.

  Related: #17307

ACKs for top commit:
  laanwj:
    Code review ACK bb6fcc7
  LarryRuane:
    ACK bb6fcc7
  jonatack:
    Code review ACK bb6fcc7 and verified rebase to master builds cleanly with unit/functional tests green

Tree-SHA512: fddeb720d5a391b48bb4c6fa58ed34ccc3f57862fdb8e641745c021841c8340e35c5126338271446cbd98f40bd5484f27926aa6c3e76fa478ba1efafe72e73c1
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jan 25, 2021
bb6fcc7 refactor: Drop boost::thread stuff in CCheckQueue (Hennadii Stepanov)
6784ac4 bench: Use CCheckQueue local thread pool (Hennadii Stepanov)
dba3069 test: Use CCheckQueue local thread pool (Hennadii Stepanov)
0151177 Add local thread pool to CCheckQueue (Hennadii Stepanov)
0ef9386 refactor: Use member initializers in CCheckQueue (Hennadii Stepanov)

Pull request description:

  This PR:
  - gets rid of `boost::thread_group` in the `CCheckQueue` class
  - allows thread safety annotation usage in the `CCheckQueue` class
  - is alternative to bitcoin#14464 (bitcoin#18710 (comment), bitcoin#18710 (comment))

  Also, with this PR (I hope) it could be easier to resurrect a bunch of brilliant ideas from bitcoin#9938.

  Related: bitcoin#17307

ACKs for top commit:
  laanwj:
    Code review ACK bb6fcc7
  LarryRuane:
    ACK bb6fcc7
  jonatack:
    Code review ACK bb6fcc7 and verified rebase to master builds cleanly with unit/functional tests green

Tree-SHA512: fddeb720d5a391b48bb4c6fa58ed34ccc3f57862fdb8e641745c021841c8340e35c5126338271446cbd98f40bd5484f27926aa6c3e76fa478ba1efafe72e73c1
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants