Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shutdown: Stop threads before resetting ptrs #13894

Merged
merged 1 commit into from
Aug 8, 2018

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Aug 6, 2018

On shutdown some threads would continue to run after or during a pointer reset. This leads to occasional segfaults on shutdown.

Fix this by resetting the smart pointers after all threads that might read from them have been stopped.

This should fix:

  • A segfault in the txindex thread, that occurs when the txindex destructor is done, but the thread was not yet stopped (as this is done in the base index destructor)
  • A segfault in the scheduler thread, which dereferences conman. (e.g. CheckForStaleTipAndEvictPeers)

@maflcko maflcko force-pushed the Mf1808-shutdownStopThreads branch 2 times, most recently from faa202b to fa89c63 Compare August 6, 2018 16:38
@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 6, 2018

Note to reviewers: This pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/init.cpp Outdated
@@ -222,6 +218,12 @@ void Shutdown()
threadGroup.interrupt_all();
threadGroup.join_all();

// After the threads that potentially access these pointers have been stopped,
// desctruct and reset all to nullptr.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: destruct

@maflcko
Copy link
Member Author

maflcko commented Aug 6, 2018

For reference on current master:

2018-08-06T21:43:45Z Shutdown: In progress...
2018-08-06T21:43:45Z dnsseed thread start
2018-08-06T21:43:45Z net thread start
2018-08-06T21:43:45Z msghand thread start
2018-08-06T21:43:45Z addcon thread start
2018-08-06T21:43:45Z opencon thread start
2018-08-06T21:43:45Z addcon thread exit
2018-08-06T21:43:45Z net thread exit
2018-08-06T21:43:45Z torcontrol thread exit
2018-08-06T21:43:45Z opencon thread exit
2018-08-06T21:43:45Z dnsseed thread exit
2018-08-06T21:43:45Z msghand thread exit
==14658== Thread 9 bitcoin-schedule:
==14658== Invalid read of size 8
==14658==    at 0x238DD4: PeerLogicValidation::CheckForStaleTipAndEvictPeers(Consensus::Params const&) (net_processing.cpp:3207)

...

2018-08-06T21:43:47Z scheduler thread interrupt
2018-08-06T21:43:47Z Dumped mempool: 0.00263s to copy, 0.017917s to dump
2018-08-06T21:43:47Z Shutdown: done

and

2018-08-06T21:49:55Z Shutdown: In progress...
2018-08-06T21:49:55Z txindex thread start
pure virtual method called
terminate called without an active exception
==14993== 
==14993== Process terminating with default action of signal 6 (SIGABRT): dumping core
==14993==    at 0x7115FEB: raise (in /usr/lib64/libc-2.27.so)
==14993==    by 0x71005C0: abort (in /usr/lib64/libc-2.27.so)
==14993==    by 0x660EA9A: ??? (in /usr/lib64/libstdc++.so.6.0.25)
==14993==    by 0x6614EFB: ??? (in /usr/lib64/libstdc++.so.6.0.25)
==14993==    by 0x6614F56: std::terminate() (in /usr/lib64/libstdc++.so.6.0.25)
==14993==    by 0x6615DA4: __cxa_pure_virtual (in /usr/lib64/libstdc++.so.6.0.25)
==14993==    by 0x387AF0: BaseIndex::ThreadSync() (base.cpp:140)
==14993==    by 0x1F4DBE: operator() (std_function.h:687)
==14993==    by 0x1F4DBE: void TraceThread<std::function<void ()> >(char const*, std::function<void ()>) (util.h:329)
==14993==    by 0x388BA6: __invoke_impl<void, void (*)(char const*, std::function<void()>), char const*, std::_Bind<void (BaseIndex::*(BaseIndex*))()> > (invoke.h:60)
==14993==    by 0x388BA6: __invoke<void (*)(char const*, std::function<void()>), char const*, std::_Bind<void (BaseIndex::*(BaseIndex*))()> > (invoke.h:95)
==14993==    by 0x388BA6: _M_invoke<0, 1, 2> (thread:234)
==14993==    by 0x388BA6: operator() (thread:243)
==14993==    by 0x388BA6: std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(char const*, std::function<void ()>), char const*, std::_Bind<void (BaseIndex::*(BaseIndex*))()> > > >::_M_run() (thread:186)
==14993==    by 0x6641522: ??? (in /usr/lib64/libstdc++.so.6.0.25)
==14993==    by 0x6EC7593: start_thread (in /usr/lib64/libpthread-2.27.so)
==14993==    by 0x71D90DE: clone (in /usr/lib64/libc-2.27.so)
==14993== 
==14993== HEAP SUMMARY:
==14993==     in use at exit: 59,700,169 bytes in 84,309 blocks
==14993==   total heap usage: 98,728 allocs, 14,419 frees, 88,664,198 bytes allocated
==14993== 
==14993== LEAK SUMMARY:
==14993==    definitely lost: 9,085,501 bytes in 81,256 blocks
==14993==    indirectly lost: 856 bytes in 8 blocks
==14993==      possibly lost: 13,139,949 bytes in 33 blocks
==14993==    still reachable: 37,473,863 bytes in 3,012 blocks
==14993==                       of which reachable via heuristic:
==14993==                         newarray           : 48 bytes in 1 blocks
==14993==         suppressed: 0 bytes in 0 blocks
==14993== Rerun with --leak-check=full to see details of leaked memory
==14993== 
==14993== For counts of detected and suppressed errors, rerun with: -v
==14993== Use --track-origins=yes to see where uninitialised values come from
==14993== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Aborted (core dumped)

Copy link
Contributor

@domob1812 domob1812 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK faab631

if (g_txindex) {
g_txindex.reset();
}
if (g_txindex) g_txindex->Stop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is newly added, was it missing completely (independent of the position of pointer resets)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base destructor was calling it, but yes, imo it was missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing then especially!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: right now it would be called twice. Once here and then another time (as belt-and-supenders) in the destructor when the pointer is reset. (Just like for conman)

Though, the belt-and-suspenders are not enough and can lead to segfaults occasionally. (See above valgrind output)

@laanwj laanwj added the Bug label Aug 7, 2018
@laanwj laanwj added this to the 0.17.0 milestone Aug 7, 2018
@achow101
Copy link
Member

achow101 commented Aug 8, 2018

utACK faab631

@laanwj laanwj merged commit faab631 into bitcoin:master Aug 8, 2018
@laanwj
Copy link
Member

laanwj commented Aug 8, 2018

utACK faab631

@maflcko maflcko deleted the Mf1808-shutdownStopThreads branch August 8, 2018 13:26
maflcko pushed a commit that referenced this pull request Aug 27, 2018
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as #13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
codablock pushed a commit to codablock/dash that referenced this pull request Mar 23, 2020
faab631 shutdown: Stop threads before resetting ptrs (MarcoFalke)

Pull request description:

  On shutdown some threads would continue to run after or during a pointer reset. This leads to occasional segfaults on shutdown.

  Fix this by resetting the smart pointers after all threads that might read from them have been stopped.

  This should fix:
  * A segfault in the txindex thread, that occurs when the txindex destructor is done, but the thread was not yet stopped (as this is done in the base index destructor)
  * A segfault in the scheduler thread, which dereferences conman. (e.g. CheckForStaleTipAndEvictPeers)

Tree-SHA512: abbcf67fadd088e10fe8c384fadfb90bb115d5317145ccb5363603583b320efc18131e46384f55a9bc574969013dfcbd08c49e0d42c004ed7212eca193858ab2
codablock added a commit to dashpay/dash that referenced this pull request Mar 24, 2020
random-zebra added a commit to PIVX-Project/PIVX that referenced this pull request Feb 21, 2021
aab15d7 ReplayBlocks: use find instead of brackets operator to access to the element. (furszy)
e898353 [Refactoring] Use const CBlockIndex* where appropriate (random-zebra)
c76fa04 qa: Extract rpc_timewait as test param (furszy)
0f832e3 shutdown: Stop threads before resetting ptrs (MarcoFalke)
67aebbf http: Remove numThreads and ThreadCounter (Wladimir J. van der Laan)
e24c710 http: Remove WaitExit from WorkQueue (Wladimir J. van der Laan)
b8f7364 http: Join worker threads before deleting work queue (Wladimir J. van der Laan)
7d68769 rpc: further constrain the libevent workaround (Cory Fields)
75af065 rpc: work-around an upstream libevent bug (Cory Fields)
50e5833 Always return true if AppInitMain got to the end (Matt Corallo)
bd70dcc [qa] Test non-atomic chainstate writes (furszy)
8f04970 Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo)
93f2b15 Random db flush crash simulator (Pieter Wuille)
72f3b17 Adapt memory usage estimation for flushing (Pieter Wuille)
8540113 Non-atomic flushing using the blockchain as replay journal (Pieter Wuille)
8d6625f [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille)

Pull request description:

  > This patch adds an extra "head blocks" to the chainstate, which gives the range of blocks for writes may be incomplete. At the start of a flush, we write this record, write the dirty dbcache entries in 16 MiB batches, and at the end we remove the heads record again. If it is present at startup it means we crashed during flush, and we rollback/roll forward blocks inside of it to get a consistent tip on disk before proceeding.

  > If a flush completes succesfully, the resulting database is compatible with previous versions. If the node crashes in the middle of a flush, a version of the code with this patch is needed to recovery.

  An adaptation of the following PRs with further modifications to the `feature_dbcrash.py` test to be up-to-date with upstream and solve RPC related bugs.

  * bitcoin#10148.
  * Increase RPC wait time.
  * bitcoin#11831
  * bitcoin#11593
  * bitcoin#12366
  * bitcoin#13837
  * bitcoin#13894

ACKs for top commit:
  random-zebra:
    ACK aab15d7
  Fuzzbawls:
    ACK aab15d7

Tree-SHA512: 898806746f581a9eb8deb0155c558481abf4454c6f3b3c3ad505c557938d0700fe9796e98e36492286ae869378647072c3ad77ad65e9dd7075108ff96469ade1
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 27, 2021
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as bitcoin#13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 28, 2021
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as bitcoin#13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 29, 2021
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as bitcoin#13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 29, 2021
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as bitcoin#13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 29, 2021
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as bitcoin#13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 29, 2021
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as bitcoin#13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 29, 2021
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as bitcoin#13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
Munkybooty pushed a commit to Munkybooty/dash that referenced this pull request Jun 30, 2021
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as bitcoin#13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
Munkybooty pushed a commit to Munkybooty/dash that referenced this pull request Jun 30, 2021
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as bitcoin#13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
Munkybooty pushed a commit to Munkybooty/dash that referenced this pull request Jul 1, 2021
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as bitcoin#13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 1, 2021
faf4a9b qa: Stop txindex thread before calling destructor (MarcoFalke)

Pull request description:

  Same as bitcoin#13894, but for the tests.

Tree-SHA512: a21d9f8ad8dc9703217d1808cb14bd969903c364fe30bbdc0dd2df170ddc0cbaba98b0bde28bc21ff1319222aaf6cb4f1b2c45cd6b236fe3c645a92eab6bacba
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants