Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulcrum 1.9.5 terminates with uncaught exception #214

Closed
Francisco-DAnconia opened this issue Nov 21, 2023 · 13 comments
Closed

Fulcrum 1.9.5 terminates with uncaught exception #214

Francisco-DAnconia opened this issue Nov 21, 2023 · 13 comments
Labels
bug Something isn't working Fixed

Comments

@Francisco-DAnconia
Copy link

Francisco-DAnconia commented Nov 21, 2023

I just updated to 1.9.5 and was running Fulcrum on BTC Testnet. After a "Failed to Find Previous Transaction" (same as issue #141??, which I still get a lot of on an older version of Fulcrum) an exception propagates to Qt which terminates Fulcrum.

Monterey 12.7, built with ZeroMQ, no jemalloc.

Terminal Output:

[2023-11-15 10:32:04.876] <Controller> 155 mempool txs involving 95 addresses
[2023-11-15 10:33:04.875] <Controller> 175 mempool txs involving 124 addresses
[2023-11-15 10:33:18.914] <SynchMempool> Synch mempool expected to drop 173, but in fact dropped 176 -- retrying getrawmempool
[2023-11-15 10:33:18.919] <SyncMempoolPreCache> threadFunc: Unable to find prevout 016fb7d6eb6884ba4c7e536ccbdd0b9558ebc01c32342b3e8eb49a561bc73b6f:1 in DB for tx bddb0bf8513cc6cfd454638b89460f9e701625ccee097bd957f33d834b5c0014 (possibly a block arrived while synching mempool, will retry)
[2023-11-15 10:33:18.921] <SynchMempool> Caught exception when processing mempool tx's: FAILED TO FIND PREVIOUS TX 269a0a7d5662a12c7ab14000df2b0ba5b978ab546bc3d8986791d0283f1c1184:1 IN EITHER MEMPOOL OR DB for TxHash: 59d9e9472792b2da078ac85584b43b14e9d4145ef9ad39d4c708e0849f5bc06c (input 0)
[2023-11-15 10:33:18.922] <Controller> [Qt Warning] Qt has caught an exception thrown from an event handler. Throwing
exceptions from an event handler is not supported in Qt.
You must not let any exception whatsoever propagate through Qt code.
If that is not possible, in Qt 5 you must at least reimplement
QCoreApplication::notify() and catch all exceptions there.
 (:0, )
libc++abi: terminating with uncaught exception of type std::__1::system_error: thread::join failed: No such process
@cculianu
Copy link
Owner

Hmm. This should never happen, yet I believe you that it did. I will investigate. Thanks so much for the bug report.

Question: Do you think it was some 1-off or does it happen with any regularity on testnet? Do you have any more of the log preceding this ? Did a block indeed arrive at that time ?

Anyway this requires more investigation from me. Thanks for the report.

@cculianu cculianu added the Requires Investigation Not clear if bug here or bug outside of Fulcrum label Nov 22, 2023
@cculianu cculianu self-assigned this Nov 22, 2023
@cculianu
Copy link
Owner

Ok, I managed to reproduce the error. Still not sure why inputs were missing on your testnet but this exposed a bug in the synch code (a race condition). I have a fix. I will push it. Are you getting this error often? Would you be able to try my fix on master?

@Francisco-DAnconia
Copy link
Author

Francisco-DAnconia commented Nov 22, 2023

Good news. I only recently updated to 1.9.5 (from 1.7.0) and was testing it out on Testnet before migrating it to Mainnet. It ran for about a 5 days before the error plus I didn't notice for a few days. Since I restarted (1 day) - no subsequent errors.

Would you be able to try my fix on master?

I'm sorry I don't understand what your asking, but am happy to help? What does master refer to?

cculianu added a commit that referenced this issue Nov 22, 2023
There was a race condition to the SynchMempoolTask. See issue #214. It
could lead to a situation where the prefetcher thread was joined by 2
threads at once (bad!).

This has now been addressed and also failures to read inputs from
mempool and/or DB are now handled much more robustly (due to block-only
txns + new block arriving as we are synching the mempool -- rare corner
case!).

Fixes issue #214.
@cculianu
Copy link
Owner

It ran for about a 5 days before the error plus I didn't notice for a few days. Since I restarted (1 day) - no subsequent errors.

Well it's good to know it's rare but it definitely was a bug and I'm super-glad you contacted me! This was def. a race condition. Darn. I may need to do a release very soon.

I'm sorry I don't understand what your asking, but am happy to help? What does master refer to?

Oh.. I meant like can you build from the current latest github code at this repo. master refers to the latest commit on this repo.

If you like I can test this here overnight on BTC mainnet (which is very active and rapes mempool synch code typically).. and do a release tomorrow.

Or.. if you know how to build (I presume you do since you are on mac??) -- maybe you can pull the latest code from this repo and build and test it too?

@cculianu cculianu removed their assignment Nov 22, 2023
@cculianu cculianu added bug Something isn't working Fixed and removed Requires Investigation Not clear if bug here or bug outside of Fulcrum labels Nov 22, 2023
@Francisco-DAnconia
Copy link
Author

IC, thanks for the explanation.

Sure, I can definitely do this on Testnet, and if I have time I'll see if I can back up the Fulcrum data on mainnet and run the "head/master" there as well.

Thanks for the quick investigation. Bugs that should never happen are always the most interesting...in the end.

@cculianu
Copy link
Owner

Sure, I can definitely do this on Testnet

Yeah this particular bug is more likely to happen on testnet -- it requires a "block only" txn to arrive in a block, and then a mempool txn to reference it, just at the precise time. On mainnet, "block-only" txns (txns that were never in mempool) are very rare...

At least I think that's what happened.

Anyway yeah testing on both would be appreciated if you have time!

I'm already testing here on mainnet... (although I never managed to see this error since it's extremely rare and unlikely to happen..). You got lucky!

@cculianu
Copy link
Owner

I think I will do a release ASAP tbh. It bugs me that this bug can happen, and the fixed commit is at least as good as or better than what is currently released. Closing this issue for now.

@hMsats
Copy link

hMsats commented Nov 23, 2023

While compiling 1.9.7 on Ubuntu 20.04.6 LTS as follows:

make clean
qmake LIBS+=-ljemalloc
make -j4

I get:

-DUSE_QT_IN_BITCOIN -DHAVE_ENDIAN_H -DHAVE_DECL_HTOBE16 -DHAVE_DECL_HTOLE16 -DHAVE_DECL_BE16TOH -DHAVE_DECL_LE16TOH -DHAVE_DECL_HTOBE32 -DHAVE_DECL_HTOLE32 -DHAVE_DECL_BE32TOH -DHAVE_DECL_LE32TOH -DHAVE_DECL_HTOBE64 -DHAVE_DECL_HTOLE64 -DHAVE_DECL_BE64TOH -DHAVE_DECL_LE64TOH -DHAVE_DECL___BUILTIN_CLZL -DHAVE_DECL___BUILTIN_CLZLL -DGIT_COMMIT=\"f27fc28\" -DENABLE_ZMQ -DHAVE_JEMALLOC_HEADERS -DQT_NO_DEBUG -DQT_NETWORK_LIB -DQT_CORE_LIB -I. -Istaticlibs/rocksdb/include -Isrc -isystem /usr/include/pgm-5.2 -isystem /usr/include/x86_64-linux-gnu/qt5 -isystem /usr/include/x86_64-linux-gnu/qt5/QtNetwork -isystem /usr/include/x86_64-linux-gnu/qt5/QtCore -I. -I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -o SrvMgr.o src/SrvMgr.cpp
g++ -c -pipe -std=c++1z -DNDEBUG -O3 -std=gnu++1z -isystem /usr/include/mit-krb5 -w -D_REENTRANT -fPIC -DQT_DEPRECATED_WARNINGS -DUSE_QT_IN_BITCOIN -DHAVE_ENDIAN_H -DHAVE_DECL_HTOBE16 -DHAVE_DECL_HTOLE16 -DHAVE_DECL_BE16TOH -DHAVE_DECL_LE16TOH -DHAVE_DECL_HTOBE32 -DHAVE_DECL_HTOLE32 -DHAVE_DECL_BE32TOH -DHAVE_DECL_LE32TOH -DHAVE_DECL_HTOBE64 -DHAVE_DECL_HTOLE64 -DHAVE_DECL_BE64TOH -DHAVE_DECL_LE64TOH -DHAVE_DECL___BUILTIN_CLZL -DHAVE_DECL___BUILTIN_CLZLL -DGIT_COMMIT=\"f27fc28\" -DENABLE_ZMQ -DHAVE_JEMALLOC_HEADERS -DQT_NO_DEBUG -DQT_NETWORK_LIB -DQT_CORE_LIB -I. -Istaticlibs/rocksdb/include -Isrc -isystem /usr/include/pgm-5.2 -isystem /usr/include/x86_64-linux-gnu/qt5 -isystem /usr/include/x86_64-linux-gnu/qt5/QtNetwork -isystem /usr/include/x86_64-linux-gnu/qt5/QtCore -I. -I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -o Storage.o src/Storage.cpp
g++ -c -pipe -std=c++1z -DNDEBUG -O3 -std=gnu++1z -isystem /usr/include/mit-krb5 -w -D_REENTRANT -fPIC -DQT_DEPRECATED_WARNINGS -DUSE_QT_IN_BITCOIN -DHAVE_ENDIAN_H -DHAVE_DECL_HTOBE16 -DHAVE_DECL_HTOLE16 -DHAVE_DECL_BE16TOH -DHAVE_DECL_LE16TOH -DHAVE_DECL_HTOBE32 -DHAVE_DECL_HTOLE32 -DHAVE_DECL_BE32TOH -DHAVE_DECL_LE32TOH -DHAVE_DECL_HTOBE64 -DHAVE_DECL_HTOLE64 -DHAVE_DECL_BE64TOH -DHAVE_DECL_LE64TOH -DHAVE_DECL___BUILTIN_CLZL -DHAVE_DECL___BUILTIN_CLZLL -DGIT_COMMIT=\"f27fc28\" -DENABLE_ZMQ -DHAVE_JEMALLOC_HEADERS -DQT_NO_DEBUG -DQT_NETWORK_LIB -DQT_CORE_LIB -I. -Istaticlibs/rocksdb/include -Isrc -isystem /usr/include/pgm-5.2 -isystem /usr/include/x86_64-linux-gnu/qt5 -isystem /usr/include/x86_64-linux-gnu/qt5/QtNetwork -isystem /usr/include/x86_64-linux-gnu/qt5/QtCore -I. -I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -o SSLCertMonitor.o src/SSLCertMonitor.cpp
src/Storage.cpp: In member function ‘std::optional<Storage::FirstUse> Storage::getFirstUse(const HashX&) const’:
src/Storage.cpp:3903:88: error: call of overloaded ‘arg(QByteArray, QByteArray)’ is ambiguous
 3903 |                                                      .arg(hashX.toHex(), optba->toHex()));
      |                                                                                        ^
In file included from /usr/include/x86_64-linux-gnu/qt5/QtCore/QString:1,
                 from src/ByteView.h:23,
                 from src/Util.h:21,
                 from src/Mixins.h:21,
                 from src/App.h:21,
                 from src/Storage.cpp:19:
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:275:31: note: candidate: ‘QString QString::arg(qlonglong, int, int, QChar) const’
  275 |     Q_REQUIRED_RESULT QString arg(qlonglong a, int fieldwidth=0, int base=10,
      |                               ^~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:277:31: note: candidate: ‘QString QString::arg(qulonglong, int, int, QChar) const’
  277 |     Q_REQUIRED_RESULT QString arg(qulonglong a, int fieldwidth=0, int base=10,
      |                               ^~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:979:16: note: candidate: ‘QString QString::arg(long int, int, int, QChar) const’
  979 | inline QString QString::arg(long a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:981:16: note: candidate: ‘QString QString::arg(ulong, int, int, QChar) const’
  981 | inline QString QString::arg(ulong a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:975:16: note: candidate: ‘QString QString::arg(int, int, int, QChar) const’
  975 | inline QString QString::arg(int a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:977:16: note: candidate: ‘QString QString::arg(uint, int, int, QChar) const’
  977 | inline QString QString::arg(uint a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:983:16: note: candidate: ‘QString QString::arg(short int, int, int, QChar) const’
  983 | inline QString QString::arg(short a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:985:16: note: candidate: ‘QString QString::arg(ushort, int, int, QChar) const’
  985 | inline QString QString::arg(ushort a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:291:31: note: candidate: ‘QString QString::arg(double, int, char, int, QChar) const’
  291 |     Q_REQUIRED_RESULT QString arg(double a, int fieldWidth = 0, char fmt = 'g', int prec = -1,
      |                               ^~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:293:31: note: candidate: ‘QString QString::arg(char, int, QChar) const’
  293 |     Q_REQUIRED_RESULT QString arg(char a, int fieldWidth = 0,
      |                               ^~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:298:31: note: candidate: ‘QString QString::arg(const QString&, int, QChar) const’
  298 |     Q_REQUIRED_RESULT QString arg(const QString &a, int fieldWidth = 0,
      |                               ^~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:987:16: note: candidate: ‘QString QString::arg(const QString&, const QString&) const’
  987 | inline QString QString::arg(const QString &a1, const QString &a2) const
      |                ^~~~~~~
g++ -c -pipe -std=c++1z -DNDEBUG -O3 -std=gnu++1z -isystem /usr/include/mit-krb5 -w -D_REENTRANT -fPIC -DQT_DEPRECATED_WARNINGS -DUSE_QT_IN_BITCOIN -DHAVE_ENDIAN_H -DHAVE_DECL_HTOBE16 -DHAVE_DECL_HTOLE16 -DHAVE_DECL_BE16TOH -DHAVE_DECL_LE16TOH -DHAVE_DECL_HTOBE32 -DHAVE_DECL_HTOLE32 -DHAVE_DECL_BE32TOH -DHAVE_DECL_LE32TOH -DHAVE_DECL_HTOBE64 -DHAVE_DECL_HTOLE64 -DHAVE_DECL_BE64TOH -DHAVE_DECL_LE64TOH -DHAVE_DECL___BUILTIN_CLZL -DHAVE_DECL___BUILTIN_CLZLL -DGIT_COMMIT=\"f27fc28\" -DENABLE_ZMQ -DHAVE_JEMALLOC_HEADERS -DQT_NO_DEBUG -DQT_NETWORK_LIB -DQT_CORE_LIB -I. -Istaticlibs/rocksdb/include -Isrc -isystem /usr/include/pgm-5.2 -isystem /usr/include/x86_64-linux-gnu/qt5 -isystem /usr/include/x86_64-linux-gnu/qt5/QtNetwork -isystem /usr/include/x86_64-linux-gnu/qt5/QtCore -I. -I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -o SubsMgr.o src/SubsMgr.cpp

cculianu added a commit that referenced this issue Nov 23, 2023
@cculianu
Copy link
Owner

Grr why do compilers hate me so. I pushed a commit to address this: 6d4026a

@hMsats
Copy link

hMsats commented Nov 23, 2023

That did it! Thanks a lot, it's running.

@Francisco-DAnconia
Copy link
Author

Update on 1.9.6+. Testnet has been running for 2 days w/o issue - no asserts.

I have not seen the asserts metntioned in #141 now that I updated Mainnet. But I do see these variants of Tx dropped out of the mempool.

  1. Single assert (yellow):
<Controller> Processed 1 new block with 4042 txs (6896 inputs, 8813 outputs, 7873 addresses), verified ok.
<Controller> Block height 818273, up-to-date
<SynchMempool> Tx dropped out of mempool (possibly due to RBF): 46fd88359f8a91441fb46a813d97e89c1c4e0196775e23af1adaf473f8d1e3c5 (error response: No such mempool or blockchain transaction. Use gettransaction for wallet transactions.), ignoring mempool tx ...
<Controller> 103155 mempool txs involving 502600 addresses
<Controller> 103731 mempool txs involving 504035 addresses
<Controller> 104345 mempool txs involving 505852 addresses
  1. Triple assert (yellow/red):
<Controller> 106195 mempool txs involving 511583 addresses
<Controller> 106551 mempool txs involving 512882 addresses
<SynchMempool> Synch mempool expected to drop 3217, but in fact dropped 3327 -- retrying getrawmempool
<SyncMempoolPreCache> SynchMempoolTask::Precache::threadFunc: Unable to find prevout 832ac0b71d5fe00baeec6129298e0e55ee73382f327ca292187d3e8a9fa2e286:12 in DB for tx 5b73b9fac71356db34a89c26be19bd3d906c304f7625300dbb3d3b594aebdf90 (possibly a block arrived while synching mempool, will retry)
<Controller> Failed to synch blocks and/or mempool
<Controller> Block height 818274, downloading new blocks ...
<Controller> Processed 1 new block with 3218 txs (7344 inputs, 9206 outputs, 11379 addresses), verified ok.
<Controller> Block height 818274, up-to-date
  1. Quad assert (yellow/red):
<Controller> Block height 818275, up-to-date
<Controller> 105716 mempool txs involving 507956 addresses
<Controller> 106129 mempool txs involving 509332 addresses
<SynchMempool> Synch mempool expected to drop 3285, but in fact dropped 3320 -- retrying getrawmempool
<SyncMempoolPreCache> SynchMempoolTask::Precache::threadFunc: Unable to find prevout 899b2e34b038746c50b01aaaa49a00152ffb3be4d80898fc01f71b5b47ab7340:0 in DB for tx 3c4768ac8131020b1964f43621cd6d59bc5ecef788efe2dfebf3946bd249d449 (possibly a block arrived while synching mempool, will retry)
<SynchMempool> processResults: precache->thread errored out, aborting SynchMempoolTask
<Controller> Failed to synch blocks and/or mempool
<Controller> Block height 818276, downloading new blocks ...
<Controller> Processed 1 new block with 3287 txs (6655 inputs, 7832 outputs, 9150 addresses), verified ok.
<Controller> Block height 818276, up-to-date

Are these normal and expected given the state of the mempool, that is, benign. Or something else?

Thanks

@cculianu
Copy link
Owner

cculianu commented Nov 25, 2023

The first error (1) above is expected to happen occasionally. It's just because it's impossible to synch the mempool 100% correctly in an atomic fashion since it's fast changing on BTC when mempool is full. Sometimes txns drop out that were there a moment ago and sometimes live txns that you just were told were there, refer to other txns that are no longer there. So that error can happen when a txn gets expired or RBF'd and it has children you are busy examining. I see error (1) too on my setup here every once in a while and it's perfectly normal.

The other 2 errors are rarer .. i have never seen them happen .. the fact that they happened for you successively makes me curious as to WHY they happened.

In all cases the errors are recoverable and Fulcrum eventually settles. I will have to examine the other 2 errors though in more detail.. because something is fishy there.

I'm going to re-open this issue and re-assign it again. I hope to get time this weekend or next week to work on this.

@cculianu
Copy link
Owner

I'm actually going to close this issue and i opened #216 that references your comment.

@cculianu cculianu removed their assignment Nov 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Fixed
Projects
None yet
Development

No branches or pull requests

3 participants