Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle corrupt wallets gracefully. #1895

Merged
merged 3 commits into from
Oct 9, 2012

Conversation

gavinandresen
Copy link
Contributor

Corrupt wallets used to cause a DB_RUNRECOVERY uncaught exception and a
crash. This commit does three things:

  1. Runs a BDB verify early in the startup process, and if there is a
    low-level problem with the database:
  • Moves the bad wallet.dat to wallet.timestamp.bak
  • Runs a 'salvage' operation to get key/value pairs, and
    writes them to a new wallet.dat
  • Continues with startup.
  1. Much more tolerant of serialization errors. All errors in deserialization
    are tolerated EXCEPT for errors related to reading keypairs
    or master key records-- those are reported and then shut down, so the user
    can get help (or recover from a backup).

  2. Adds a new -salvagewallet option, which:

  • Moves the wallet.dat to wallet.timestamp.bak
  • extracts ONLY keypairs and master keys into a new wallet.dat
  • soft-sets -rescan, to recreate transaction history

This was tested by randomly corrupting testnet wallets using a little
python script I wrote (https://gist.github.com/3812689)

else if (strType == "tx")
// Rescan if there is a bad transaction record:
SoftSetBoolArg("-rescan", true);
// Leave other errors alone, if we try to fix them we might make things worse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result stays DB_LOAD_OK here, so in case of corruptions in non-key/tx records, it silently continues also with the upgrading below at line 458.
Is this desired behavior?

Or should we set some flag, log a message and show a popup to the user (uiInterface.ThreadSafeMessageBox) at the end of the function that recovery has taken place and that some wallet data (such as address book entries, details can be found in debug.log...) might be corrupt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An earlier version of this code extended DBErrors to have different levels of error, but the code started getting complicated and confusing (e.g. you could have a wallet that had a key error, a non-key error, AND needed upgrading... maybe DBErrors should be a bitmask, etc).

But telling the user that there is something wrong is definitely a good idea, I'll make that so.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea to have bitmasks to handle error codes btw.

@Diapolo
Copy link

Diapolo commented Oct 3, 2012

I know you will for sure dislike the following comment, but I'll try for the last time (you won't get any further comments on strings in your pulls, if you want) as the brave knight for unified string usage ^^. Can you change your Warning messages to the following format:

"Warning: First sentence! Second sentence."

  • start with Warning:
  • First sentence (if a sentence) finished with a !
  • Further sentences finished with a .

@gavinandresen
Copy link
Contributor Author

@Diapolo : good idea on the Verifying message. And ok, I'll change the first period to an exclamation mark.

I'm finding serious bugs doing more testing; writing here so I don't lose track of them:

  1. Getting a crash on my main wallet, bdb complaining about out of memory (out of mutexes).

  2. Getting this weirdness switching from newer bitcoind to older:
    10/04/12 14:16:00 nFileVersion = 70003
    10/04/12 14:16:00 Performing wallet upgrade to 60000

  3. Crash-at-shutdown due to the printf-in-global-destructor bug

@gavinandresen
Copy link
Contributor Author

Updated to not "pre-verify" blkindex.dat which fixes the 'out of mutexes' problem (looks like bdb does not clean up after a ->verify() ?), pick up some changes from @jgarzik version of DBEnv::RemoveDB (kept RemoveDB as the name, though, since it removes a database not a dbenv), and tweaked Warning! messages.

I'll investigate the downgrade weirdness separately, I'm afraid that might be another bug introduced in 0.7.0.

printf("ERROR: db salvage failed\n");
return false;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My god... is that was is necessary to recover from BDB? Manually parse the hex dump created by a library?

I want to get rid of BDB yesterday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recovery is never a nice process, if things are broken enough you always get to a level where you have to look at hexdumps of the raw file to salvage anything. At least you still get delimited keys/values here.

Is it any prettier for leveldb, for example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no leveldb tools; I think 'recovery' is the same as 'opening', and 'crashing' is the same as 'closing' in LevelDB. There are a few flags to set the degree of checksum verification or paranoidness when opening, though.

@sipa
Copy link
Member

sipa commented Oct 7, 2012

@gavinandresen Do you consider this pull ready now?

@gavinandresen
Copy link
Contributor Author

Yes, this is pull-ready now.

I'd like some help with more thorough testing.

@gmaxwell
Copy link
Contributor

gmaxwell commented Oct 7, 2012

So, this can just cause your balance to go to zero with no notice if you're not watching the logs/console output carefully. Perhaps get getinfo error field should get something?

Here is what I tested:

Using gavin's testnet-in-a-box wallet.
zzuf -I 'wallet.dat' -s 0:1000 ./bitcoind -daemon=0

Seed 0 fails with Db::open: Invalid argument. In log I see Salvage(aggressive) found 2372 records.
Restarting without fuzzing gives me a successful start but zero balance.

Recover original wallet, then run starting with seed 1:
zzuf -I 'wallet.dat' -s 1:1000 ./bitcoind -daemon=0

Fails at seed 1 with "DbEnv::open: DB_RUNRECOVERY: Fatal error, run database recovery"
No salvage run.

Starting without fuzzing gives the correct balance.

Starting again at seed 2:
zzuf -I 'wallet.dat' -s 2:1000 ./bitcoind -daemon=0
Throws "Bitcoin: Warning: wallet.dat corrupt, data salvaged! Original wallet.dat saved as wallet.{timestamp}.bak in /home/gmaxwell/.bitcoin/testnet3; if your balance or transactions are incorrect you should restore from a backup." at the console. (first time I've seen that)

Log shows: Renamed wallet.dat to wallet.1349630310.bak
Salvage(aggressive) found 2372 records

And a bunch of nice addwallets.

But calling getinfo triggers segfault.

-- still, this pull is a massive improvement over default. Now that we've got a case where there could be backup wallet files laying around perhaps we should go all the way and keep a couple wallet rotation even when there isn't corruption?

Perhaps the fuzzing is a little too nasty to be a realistic test. Though if we ever change to our own append only format, I absolutely expect it to survive this kind of test.

@luke-jr
Copy link
Member

luke-jr commented Oct 7, 2012

How does this handle encrypted wallets?

Before, opening a -datadir that was created with a new
version of Berkeley DB would result in an un-caught DB_RUNRECOVERY
exception.

After these changes, the error is caught and the user is told
that there is a problem and is told how to try to recover from
it.
@gavinandresen
Copy link
Contributor Author

@luke-jr it handles encrypted wallets as well as might be expected. It works on the bdb level, salvaging as many key/value pairs as it can from the backed-up wallet.dat. If it encounters a database-level error reading keys (private keys, encrypted or not, or master keys) it tells the user to try to recover from a backup.

Corrupt wallets used to cause a DB_RUNRECOVERY uncaught exception and a
crash. This commit does three things:

1) Runs a BDB verify early in the startup process, and if there is a
low-level problem with the database:
  + Moves the bad wallet.dat to wallet.timestamp.bak
  + Runs a 'salvage' operation to get key/value pairs, and
    writes them to a new wallet.dat
  + Continues with startup.

2) Much more tolerant of serialization errors. All errors in deserialization
are reported by tolerated EXCEPT for errors related to reading keypairs
or master key records-- those are reported and then shut down, so the user
can get help (or recover from a backup).

3) Adds a new -salvagewallet option, which:
 + Moves the wallet.dat to wallet.timestamp.bak
 + extracts ONLY keypairs and master keys into a new wallet.dat
 + soft-sets -rescan, to recreate transaction history

This was tested by randomly corrupting testnet wallets using a little
python script I wrote (https://gist.github.com/3812689)
@gavinandresen
Copy link
Contributor Author

Rebased on top of #1917; changed error handling from bdb methods from exceptions to returned error codes.

@gavinandresen gavinandresen merged commit d0b3e77 into bitcoin:master Oct 9, 2012
KolbyML pushed a commit to KolbyML/bitcoin that referenced this pull request Dec 5, 2020
8b8e706 [BUG][GUI] Fix wallet crashing on faq-buttons press (random-zebra)

Pull request description:

  Bug reported today on discord.
  Wallet (GUI 4.3.0) crashes when the FAQ buttons from the masternode widget are pressed.

  **Cause:**
  The FAQs sections have been recently reduced to only 6 (old zerocoin sections have been removed).
  But the buttons in the masternode widget still reference the old indexes (9 and 10),  which are now out of bounds.
  Further, the safety check inside `setSection`, still has the old (10) max value hardcoded, so does not prevent this.

  **Fix:**
  This clearly shows the difficulty in maintaining code with fixed magic numbers (which depend on a list of objects that may be modified in future releases).
  So, let's introduce a proper enumeration inside `SettingsFaqWidget`, and use that to manage the sections, instead of just integers.

ACKs for top commit:
  Fuzzbawls:
    ACK 8b8e706
  furszy:
    code ACK 8b8e706

Tree-SHA512: 81b2fb1465e2fd0af4d66372d753ace4a2dd585a4a2e3fb10535e774b7fb4963949a8fb10ed1811e93a07eafe800578143f3307306243eea4339201cca6482e3
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants