Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running out of disk space can leave bitcoin in a desynced state #26112

Closed
jb55 opened this issue Sep 16, 2022 · 6 comments · Fixed by #26331
Closed

Running out of disk space can leave bitcoin in a desynced state #26112

jb55 opened this issue Sep 16, 2022 · 6 comments · Fixed by #26331

Comments

@jb55
Copy link
Contributor

jb55 commented Sep 16, 2022

I noticed my node was not syncing, and through some debugging on #bitcoin-core-dev it seems like it was caused by running out of disk space which left the node in a desynced state, with the valid chain marked as invalid.

After freeing up some space and doing reconsiderblock it fixed it.

Comments from @sipa:

Ugh. That is bad. Out of disk space should not result in database corruption.

Database errors propagating up and being interpreted as (permanent) block invalidity was one of the contributing factors to the BDB/LevelDB fork in the 0.7/0.8 transition.

logs:

2022-09-05T00:28:55Z UpdateTip: new best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0 height=752167 version=0x2ce8e004 log2_work=93.70770
7 tx=761139817 date='2022-09-01T15:00:06Z' progress=0.998887 cache=146.8MiB(1101860txo)
2022-09-05T00:28:55Z Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
2022-09-05T00:28:55Z You can use -debug=leveldb to get more complete diagnostic messages
2022-09-05T00:28:55Z *** System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
2022-09-05T00:28:55Z Error: A fatal internal error occurred, see debug.log for details
2022-09-05T00:28:56Z ERROR: ProcessNewBlock: ActivateBestChain failed (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chains
tate/2963623.ldb: No space left on device)
2022-09-05T00:28:56Z ERROR: ConnectBlock: Consensus::CheckTxInputs: 221fa678c5c9953d6cd17e584f05c12ab10ba0f2fc8e8131e266f3f0e9819848, bad-txns-inputs-missing
orspent, CheckTxInputs: inputs missing/spent
2022-09-05T00:28:56Z InvalidChainFound: invalid block=000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8  height=752168  log2_work=93.707719  date=2022-09-01T15:13:42Z
2022-09-05T00:28:56Z InvalidChainFound:  current best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0  height=752167  log2_work=93.707707  date=2022-09-01T15:00:06Z
2022-09-05T00:28:56Z ERROR: ConnectTip: ConnectBlock 000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8 failed, bad-txns-inputs-missingorspent, CheckTxInputs: inputs missing/spent
2022-09-05T00:28:56Z InvalidChainFound: invalid block=000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8  height=752168  log2_work=93.707719  date=2022-09-01T15:13:42Z
2022-09-05T00:28:56Z InvalidChainFound:  current best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0  height=752167  log2_work=93.707707  date=2022-09-01T15:00:06Z
@jb55 jb55 added the Bug label Sep 16, 2022
@fanquake fanquake added this to the 24.0 milestone Sep 16, 2022
@sipa
Copy link
Member

sipa commented Sep 16, 2022

So it appears that this was triggered by LevelDB failing to write to disk.

Some questions:

  • Why didn't our own disk space check detect this long before it happened? @jb55 was anything else quickly filling your disk at the same time, which could cause our own check to not being frequent enough?
  • Did Bitcoin Core shut down after this happened?

Based on your comments on IRC, it seems that normal restarting didn't fix the problem. So that suggests that while there was some LevelDB error... Bitcoin Core still managed to (incorrectly) write to disk that the block was invalid. It shouldn't conclude that in the first place, but it's somewhat strange that even after a system error it still managed to actually commit that to disk.

@jb55
Copy link
Contributor Author

jb55 commented Sep 16, 2022

Why didn't our own disk space check detect this long before it happened? @jb55 was anything else quickly filling your disk at the same time, which could cause our own check to not being frequent enough?

yes this is very possible, I run nixos and frequently use nix-shell, etc which downloads things and fills up my disk pretty quickly.

Did Bitcoin Core shut down after this happened?

yes, here's the full log:

2022-09-05T00:28:55Z UpdateTip: new best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0 height=752167 version=0x2ce8e004 log2_work=93.70770
7 tx=761139817 date='2022-09-01T15:00:06Z' progress=0.998887 cache=146.8MiB(1101860txo)
2022-09-05T00:28:55Z Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
2022-09-05T00:28:55Z You can use -debug=leveldb to get more complete diagnostic messages
2022-09-05T00:28:55Z *** System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
2022-09-05T00:28:55Z Error: A fatal internal error occurred, see debug.log for details
2022-09-05T00:28:56Z ERROR: ProcessNewBlock: ActivateBestChain failed (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chains
tate/2963623.ldb: No space left on device)
2022-09-05T00:28:56Z ERROR: ConnectBlock: Consensus::CheckTxInputs: 221fa678c5c9953d6cd17e584f05c12ab10ba0f2fc8e8131e266f3f0e9819848, bad-txns-inputs-missing
orspent, CheckTxInputs: inputs missing/spent
2022-09-05T00:28:56Z InvalidChainFound: invalid block=000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8  height=752168  log2_work=93.707719  date=2022-09-01T15:13:42Z
2022-09-05T00:28:56Z InvalidChainFound:  current best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0  height=752167  log2_work=93.707707  date=2022-09-01T15:00:06Z
2022-09-05T00:28:56Z ERROR: ConnectTip: ConnectBlock 000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8 failed, bad-txns-inputs-missingorspent, CheckTxInputs: inputs missing/spent
2022-09-05T00:28:56Z InvalidChainFound: invalid block=000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8  height=752168  log2_work=93.707719  date=2022-09-01T15:13:42Z
orspent, CheckTxInputs: inputs missing/spent
2022-09-05T00:28:57Z ERROR: ProcessNewBlock: ActivateBestChain failed (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device)
2022-09-05T00:28:57Z msghand thread exit
2022-09-05T00:28:57Z DumpAnchors: Flush 0 outbound block-relay-only peer addresses to anchors.dat started
 CheckTxInputs: inputs missing/spent
2022-09-05T00:28:56Z InvalidChainFound: invalid block=000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8  height=752168  log2_work=93.707719  date=2022-09-01T15:13:42Z
2022-09-05T00:28:56Z InvalidChainFound:  current best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0  height=752167  log2_work=93.707707  date=2022-09-01T15:00:06Z
2022-09-05T00:28:57Z tor: Thread interrupt
2022-09-05T00:28:57Z torcontrol thread exit
2022-09-05T00:28:57Z opencon thread exit
2022-09-05T00:28:57Z addcon thread exit
2022-09-05T00:28:57Z Shutdown: In progress...
2022-09-05T00:28:57Z net thread exit
2022-09-05T00:28:57Z Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
2022-09-05T00:28:57Z You can use -debug=leveldb to get more complete diagnostic messages
2022-09-05T00:28:57Z *** System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
2022-09-05T00:28:57Z Error: A fatal internal error occurred, see debug.log for details
2022-09-05T00:28:57Z ERROR: ProcessNewBlock: ActivateBestChain failed (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chains
tate/2963623.ldb: No space left on device)
2022-09-05T00:28:57Z msghand thread exit
2022-09-05T00:28:57Z DumpAnchors: Flush 0 outbound block-relay-only peer addresses to anchors.dat started
2022-09-05T00:28:57Z DumpAnchors: Flush 0 outbound block-relay-only peer addresses to anchors.dat completed (0.00s)
2022-09-05T00:28:57Z scheduler thread exit
2022-09-05T00:28:57Z Writing 0 unbroadcast transactions to disk.
2022-09-05T00:28:57Z Dumped mempool: 0.001682s to copy, 0.015307s to dump
2022-09-05T00:28:57Z Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
2022-09-05T00:28:57Z You can use -debug=leveldb to get more complete diagnostic messages
2022-09-05T00:28:57Z *** System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
2022-09-05T00:28:57Z Error: A fatal internal error occurred, see debug.log for details
2022-09-05T00:28:57Z ForceFlushStateToDisk: failed to flush state (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device)
2022-09-05T00:28:57Z Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
2022-09-05T00:28:57Z You can use -debug=leveldb to get more complete diagnostic messages
2022-09-05T00:28:57Z *** System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
2022-09-05T00:28:57Z Error: A fatal internal error occurred, see debug.log for details
2022-09-05T00:28:57Z ForceFlushStateToDisk: failed to flush state (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device)
2022-09-05T00:28:57Z [personal] Releasing wallet
2022-09-05T00:28:57Z [old-wallet] Releasing wallet
2022-09-05T00:28:58Z Shutdown: done

@sipa
Copy link
Member

sipa commented Sep 16, 2022

A guess about what might be happening:

CCoinsViewErrorCatcher, the wrapper class used around CCoinsViewDB that's supposed to detect these problems and forcefully exit the application, has an override for GetCoins. But in CheckTxInputs, HaveInputs is first invoked, which on its turn calls HaveCoin. HaveCoin is implemented in CCoinsViewDB, but not in CCoinsViewErrorCatcher, and thus the disk read exception escapes.

A solution may be to just add an override for HaveCoin in CCoinsViewErrorCatcher.

@bitcoin bitcoin deleted a comment Oct 7, 2022
@maflcko maflcko removed this from the 24.0 milestone Oct 17, 2022
@maflcko
Copy link
Member

maflcko commented Oct 17, 2022

Removed from the milestone, as this is not a regression, nor a fix is available right now.

@jb55
Copy link
Contributor Author

jb55 commented Oct 11, 2023

awesome, thanks @aureleoules !

Frank-GER pushed a commit to syscoin/syscoin that referenced this issue Oct 13, 2023
… check disk space periodically

ed52e71 Periodically check disk space to avoid corruption (Aurèle Oulès)
7fe537f Implement CCoinsViewErrorCatcher::HaveCoin (Aurèle Oulès)

Pull request description:

  Attempt to fix bitcoin#26112.

  As suggested by sipa in bitcoin#26112 (comment):
  > CCoinsViewErrorCatcher, the wrapper class used around CCoinsViewDB that's supposed to detect these problems and forcefully exit the application, has an override for GetCoins. But in CheckTxInputs, HaveInputs is first invoked, which on its turn calls HaveCoin. HaveCoin is implemented in CCoinsViewDB, but not in CCoinsViewErrorCatcher, and thus the disk read exception escapes.
  > A solution may be to just add an override for HaveCoin in CCoinsViewErrorCatcher.

  I implemented `CCoinsViewErrorCatcher::HaveCoin` and also added a periodic disk space check that shutdowns the node if there is not enough space left on disk, the minimum here is 50MB.

  For reviewers, it's possible to saturate disk space to test the PR by creating large files with `fallocate -l 50G test.bin`

ACKs for top commit:
  achow101:
    ACK ed52e71
  w0xlt:
    Code Review ACK bitcoin@ed52e71
  sipa:
    utACK ed52e71

Tree-SHA512: 456aa7b996023df42b4fbb5158ee429d9abf7374b7b1ec129b21aea1188ad19be8da4ae8e0edd90b85b7a3042b8e44e17d3742e33808a4234d5ddbe9bcef1b78
@Mikey4010
Copy link

61a6c3b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants