Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulcrum crashes with sigsegv every few minutes #78

Closed
jhoenicke opened this issue Apr 11, 2021 · 3 comments
Closed

Fulcrum crashes with sigsegv every few minutes #78

jhoenicke opened this issue Apr 11, 2021 · 3 comments
Labels
bug Something isn't working Fixed

Comments

@jhoenicke
Copy link

jhoenicke commented Apr 11, 2021

Recently my Fulcrum server started to crash regularly (several times per hour). It only happens for BTC, so either something in the chain, or a deliberate DOS attack against BTC servers. The logs show nothing unsual. The most recent crash caused the database to go corrupt, so I guess now I'm back to resync for the next day.

Apr 11 07:18:38 Fulcrum[3873906]: [2021-04-11 07:18:38.845] <TcpSrv 0.0.0.0:50099> New TCP Client.56026 5.121.xxx.xxx:18059, 
17 clients total
Apr 11 07:18:38 Fulcrum[3873906]: [2021-04-11 07:18:38.849] <TcpSrv 0.0.0.0:50099> New TCP Client.56027 27.71.xxx.xxx:38319,
 18 clients total
Apr 11 07:18:38 systemd[1]: fulcrum-btc.service: Main process exited, code=killed, status=11/SEGV
Apr 11 07:18:38 systemd[1]: fulcrum-btc.service: Failed with result 'signal'.
Apr 11 07:18:39 systemd[1]: fulcrum-btc.service: Scheduled restart job, restart counter is at 89.
Apr 11 07:18:39 systemd[1]: Stopped Fulcrum Bitcoin Daemon.
Apr 11 07:18:39 systemd[1]: Started Fulcrum Bitcoin Daemon.
...
Apr 11 07:19:22 Fulcrum[3878612]: [2021-04-11 07:19:22.471] <WsSrv 0.0.0.0:50003> WebSocket handshake failed for 116.68.xxx.xxx:44368, reason: Connection lost
Apr 11 07:19:22 Fulcrum[3878612]: [2021-04-11 07:19:22.471] <WsSrv 0.0.0.0:50003> WebSocket handshake failed for 116.68.xxx.xxx:44368, reason: Bad HTTP: {"jsonrpc":"2.0","id":0,"method":"serve…arams":[]}
Apr 11 07:19:22 fulcrum-btc.service: Main process exited, code=killed, status=11/SEGV
Apr 11 07:19:22 fulcrum-btc.service: Failed with result 'signal'.
Apr 11 07:19:22 systemd[1]: fulcrum-btc.service: Scheduled restart job, restart counter is at 90.
Apr 11 07:19:22 systemd[1]: Stopped Fulcrum Bitcoin Daemon.
Apr 11 07:19:22 systemd[1]: Started Fulcrum Bitcoin Daemon.
...
Apr 11 19:56:06 systemd[1]: fulcrum-btc.service: Main process exited, code=killed, status=11/SEGV
Apr 11 19:56:06 systemd[1]: fulcrum-btc.service: Failed with result 'signal'.
Apr 11 19:56:07 systemd[1]: fulcrum-btc.service: Scheduled restart job, restart counter is at 109.
Apr 11 19:56:07 systemd[1]: Stopped Fulcrum Bitcoin Daemon.
Apr 11 19:56:07 systemd[1]: Started Fulcrum Bitcoin Daemon.
...
Apr 11 19:56:07 Fulcrum[4047419]: [2021-04-11 19:56:07.490] DB memory: 420.00 MiB
Apr 11 19:56:07 Fulcrum[4047419]: [2021-04-11 19:56:07.490] Coin: BTC
Apr 11 19:56:07 Fulcrum[4047419]: [2021-04-11 19:56:07.490] Chain: main
Apr 11 19:56:07 Fulcrum[4047419]: [2021-04-11 19:56:07.511] FATAL: Caught exception: It appears that Fulcrum was forcefully killed in the middle of committng a block to the db. We cannot figure out where exactly in the update process Fulcrum was killed, so we cannot undo the inconsistent state caused by the unexpected shutdown. Sorry!
Apr 11 19:56:07 Fulcrum[4047419]: The database has been corrupted. Please delete the datadir and resynch to bitcoind.
Apr 11 19:56:07 Fulcrum[4047419]: [2021-04-11 19:56:07.512] Stopping Controller ...
Apr 11 19:56:07 Fulcrum[4047419]: [2021-04-11 19:56:07.512] Closing storage ...
Apr 11 19:56:07 Fulcrum[4047419]: [2021-04-11 19:56:07.555] Shutdown complete
Apr 11 19:56:07 systemd[1]: fulcrum-btc.service: Main process exited, code=exited, status=1/FAILURE
@jhoenicke
Copy link
Author

jhoenicke commented Apr 11, 2021

dmesg shows address and code:

[750055.820563] PeerMgr[1290]: segfault at 20 ip 000055da03629b02 sp 00007fdbd27fbb60 error 4 in Fulcrum[55da034c2000+ae6000]
[750055.820573] Code: 48 8b 75 10 49 8d 7d 20 e8 a5 a4 65 00 4c 89 ef e8 73 e0 07 00 e9 ae fd ff ff 66 0f 1f 44 00 00 4c 8b b5 98 00 00 00 4d 8b 3e <45> 8b 6f 20 45 85 ed 75 65 bf 10 00 00 00 4c 8d ac 24 d0 00 00 00
[774142.623418] PeerMgr[2742997]: segfault at 20 ip 000055d3b24e9b02 sp 00007f461f3fbb60 error 4 in Fulcrum[55d3b2382000+ae6000]
[774142.623428] Code: 48 8b 75 10 49 8d 7d 20 e8 a5 a4 65 00 4c 89 ef e8 73 e0 07 00 e9 ae fd ff ff 66 0f 1f 44 00 00 4c 8b b5 98 00 00 00 4d 8b 3e <45> 8b 6f 20 45 85 ed 75 65 bf 10 00 00 00 4c 8d ac 24 d0 00 00 00
[774879.016268] PeerMgr[2829805]: segfault at 20 ip 00005555944acb02 sp 00007f9262bfbb60 error 4 in Fulcrum[555594345000+ae6000]
[774879.016277] Code: 48 8b 75 10 49 8d 7d 20 e8 a5 a4 65 00 4c 89 ef e8 73 e0 07 00 e9 ae fd ff ff 66 0f 1f 44 00 00 4c 8b b5 98 00 00 00 4d 8b 3e <45> 8b 6f 20 45 85 ed 75 65 bf 10 00 00 00 4c 8d ac 24 d0 00 00 00
[778555.002453] PeerMgr[2832287]: segfault at 20 ip 000055788f80bb02 sp 00007fc6cb3fbb60 error 4 in Fulcrum[55788f6a4000+ae6000]
[778555.002463] Code: 48 8b 75 10 49 8d 7d 20 e8 a5 a4 65 00 4c 89 ef e8 73 e0 07 00 e9 ae fd ff ff 66 0f 1f 44 00 00 4c 8b b5 98 00 00 00 4d 8b 3e <45> 8b 6f 20 45 85 ed 75 65 bf 10 00 00 00 4c 8d ac 24 d0 00 00 00

Binary is Fulcrum-1.5.0-x86_64-linux (unmodified). The position seems to be in the function:
_ZN3RPC14ConnectionBase11processJsonEO10QByteArray, address 167B02:

  167af8:       4c 8b b5 98 00 00 00    mov    0x98(%rbp),%r14
  167aff:       4d 8b 3e                mov    (%r14),%r15
  167b02:       45 8b 6f 20             mov    0x20(%r15),%r13d
  167b06:       45 85 ed                test   %r13d,%r13d
  167b09:       75 65                   jne    167b70 <_ZN3RPC14ConnectionBase11processJsonEO10QByteArray+0x310>
  167b0b:       bf 10 00 00 00          mov    $0x10,%edi
  167b10:       4c 8d ac 24 d0 00 00    lea    0xd0(%rsp),%r13
  167b17:       00 
  167b18:       e8 33 90 f0 ff          callq  70b50 <__cxa_allocate_exception@plt>
  167b1d:       48 8d 35 7a bd 6b 00    lea    0x6bbd7a(%rip),%rsi        # 82389e <_ZTSN10RecordFile13FileOpenErrorE+0x77e>
  167b24:       4c 89 ef                mov    %r13,%rdi
  167b27:       49 89 c6                mov    %rax,%r14
  167b2a:       e8 c1 6f f3 ff          callq  9eaf0 <_ZN7QStringC1EPKc>
  167b2f:       4c 89 ee                mov    %r13,%rsi
  167b32:       4c 89 f7                mov    %r14,%rdi
  167b35:       e8 66 7e f3 ff          callq  9f9a0 <_ZN9ExceptionC1ERK7QString>
  167b3a:       48 8d 05 a7 21 b9 00    lea    0xb921a7(%rip),%rax        # cf9ce8 <_ZTVN3RPC14ConnectionBase13UnknownMethodE>

Should be roughly: https://github.com/cculianu/Fulcrum/blob/v1.5.0/src/RPC.cpp#L430

@cculianu
Copy link
Owner

Yeah, I know. There was a dangling reference bug in the codebase. It has been fixed in 1.5.2. Please upgrade to latest.

@cculianu cculianu added bug Something isn't working Fixed labels Apr 12, 2021
@cculianu
Copy link
Owner

cculianu commented Apr 12, 2021

Also thanks for the detailed report and sorry about the crash and need to db resynch. I .. have no explanation other than this was a bug lurking in the codebase since forever and somehow some misbehaving BTC servers were able to trigger it (either intentionally or accidentally). The good thing is the bug was fixed. The bad thing is you have to resynch. But please do use 1.5.2.

If it makes you feel better my server also got hit hard by this.. and I suffered as well until I fixed it. Sorry. I promise now Fulcrum is bug-free! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Fixed
Projects
None yet
Development

No branches or pull requests

2 participants