Handle corrupt wallets gracefully. #1895

gavinandresen · 2012-10-01T19:25:49Z

Corrupt wallets used to cause a DB_RUNRECOVERY uncaught exception and a
crash. This commit does three things:

Runs a BDB verify early in the startup process, and if there is a
low-level problem with the database:

Moves the bad wallet.dat to wallet.timestamp.bak
Runs a 'salvage' operation to get key/value pairs, and
writes them to a new wallet.dat
Continues with startup.

Much more tolerant of serialization errors. All errors in deserialization
are tolerated EXCEPT for errors related to reading keypairs
or master key records-- those are reported and then shut down, so the user
can get help (or recover from a backup).
Adds a new -salvagewallet option, which:

Moves the wallet.dat to wallet.timestamp.bak
extracts ONLY keypairs and master keys into a new wallet.dat
soft-sets -rescan, to recreate transaction history

This was tested by randomly corrupting testnet wallets using a little
python script I wrote (https://gist.github.com/3812689)

laanwj · 2012-10-02T06:34:32Z

src/walletdb.cpp

+                else if (strType == "tx")
+                    // Rescan if there is a bad transaction record:
+                    SoftSetBoolArg("-rescan", true);
+                // Leave other errors alone, if we try to fix them we might make things worse.


The result stays DB_LOAD_OK here, so in case of corruptions in non-key/tx records, it silently continues also with the upgrading below at line 458.
Is this desired behavior?

Or should we set some flag, log a message and show a popup to the user (uiInterface.ThreadSafeMessageBox) at the end of the function that recovery has taken place and that some wallet data (such as address book entries, details can be found in debug.log...) might be corrupt?

An earlier version of this code extended DBErrors to have different levels of error, but the code started getting complicated and confusing (e.g. you could have a wallet that had a key error, a non-key error, AND needed upgrading... maybe DBErrors should be a bitmask, etc).

But telling the user that there is something wrong is definitely a good idea, I'll make that so.

I like the idea to have bitmasks to handle error codes btw.

Diapolo · 2012-10-03T17:15:26Z

I know you will for sure dislike the following comment, but I'll try for the last time (you won't get any further comments on strings in your pulls, if you want) as the brave knight for unified string usage ^^. Can you change your Warning messages to the following format:

"Warning: First sentence! Second sentence."

start with Warning:
First sentence (if a sentence) finished with a !
Further sentences finished with a .

gavinandresen · 2012-10-04T14:36:40Z

@Diapolo : good idea on the Verifying message. And ok, I'll change the first period to an exclamation mark.

I'm finding serious bugs doing more testing; writing here so I don't lose track of them:

Getting a crash on my main wallet, bdb complaining about out of memory (out of mutexes).
Getting this weirdness switching from newer bitcoind to older:
10/04/12 14:16:00 nFileVersion = 70003
10/04/12 14:16:00 Performing wallet upgrade to 60000
Crash-at-shutdown due to the printf-in-global-destructor bug

gavinandresen · 2012-10-04T19:45:19Z

Updated to not "pre-verify" blkindex.dat which fixes the 'out of mutexes' problem (looks like bdb does not clean up after a ->verify() ?), pick up some changes from @jgarzik version of DBEnv::RemoveDB (kept RemoveDB as the name, though, since it removes a database not a dbenv), and tweaked Warning! messages.

I'll investigate the downgrade weirdness separately, I'm afraid that might be another bug introduced in 0.7.0.

sipa · 2012-10-04T21:08:59Z

src/db.cpp

+        printf("ERROR: db salvage failed\n");
+        return false;
+    }
+


My god... is that was is necessary to recover from BDB? Manually parse the hex dump created by a library?

I want to get rid of BDB yesterday.

Recovery is never a nice process, if things are broken enough you always get to a level where you have to look at hexdumps of the raw file to salvage anything. At least you still get delimited keys/values here.

Is it any prettier for leveldb, for example?

There are no leveldb tools; I think 'recovery' is the same as 'opening', and 'crashing' is the same as 'closing' in LevelDB. There are a few flags to set the degree of checksum verification or paranoidness when opening, though.

sipa · 2012-10-07T13:16:27Z

@gavinandresen Do you consider this pull ready now?

gavinandresen · 2012-10-07T16:27:35Z

Yes, this is pull-ready now.

I'd like some help with more thorough testing.

gmaxwell · 2012-10-07T17:23:08Z

So, this can just cause your balance to go to zero with no notice if you're not watching the logs/console output carefully. Perhaps get getinfo error field should get something?

Here is what I tested:

Using gavin's testnet-in-a-box wallet.
zzuf -I 'wallet.dat' -s 0:1000 ./bitcoind -daemon=0

Seed 0 fails with Db::open: Invalid argument. In log I see Salvage(aggressive) found 2372 records.
Restarting without fuzzing gives me a successful start but zero balance.

Recover original wallet, then run starting with seed 1:
zzuf -I 'wallet.dat' -s 1:1000 ./bitcoind -daemon=0

Fails at seed 1 with "DbEnv::open: DB_RUNRECOVERY: Fatal error, run database recovery"
No salvage run.

Starting without fuzzing gives the correct balance.

Starting again at seed 2:
zzuf -I 'wallet.dat' -s 2:1000 ./bitcoind -daemon=0
Throws "Bitcoin: Warning: wallet.dat corrupt, data salvaged! Original wallet.dat saved as wallet.{timestamp}.bak in /home/gmaxwell/.bitcoin/testnet3; if your balance or transactions are incorrect you should restore from a backup." at the console. (first time I've seen that)

Log shows: Renamed wallet.dat to wallet.1349630310.bak
Salvage(aggressive) found 2372 records

And a bunch of nice addwallets.

But calling getinfo triggers segfault.

-- still, this pull is a massive improvement over default. Now that we've got a case where there could be backup wallet files laying around perhaps we should go all the way and keep a couple wallet rotation even when there isn't corruption?

Perhaps the fuzzing is a little too nasty to be a realistic test. Though if we ever change to our own append only format, I absolutely expect it to survive this kind of test.

luke-jr · 2012-10-07T17:26:58Z

How does this handle encrypted wallets?

Before, opening a -datadir that was created with a new version of Berkeley DB would result in an un-caught DB_RUNRECOVERY exception. After these changes, the error is caught and the user is told that there is a problem and is told how to try to recover from it.

gavinandresen · 2012-10-08T21:28:11Z

@luke-jr it handles encrypted wallets as well as might be expected. It works on the bdb level, salvaging as many key/value pairs as it can from the backed-up wallet.dat. If it encounters a database-level error reading keys (private keys, encrypted or not, or master keys) it tells the user to try to recover from a backup.

Corrupt wallets used to cause a DB_RUNRECOVERY uncaught exception and a crash. This commit does three things: 1) Runs a BDB verify early in the startup process, and if there is a low-level problem with the database: + Moves the bad wallet.dat to wallet.timestamp.bak + Runs a 'salvage' operation to get key/value pairs, and writes them to a new wallet.dat + Continues with startup. 2) Much more tolerant of serialization errors. All errors in deserialization are reported by tolerated EXCEPT for errors related to reading keypairs or master key records-- those are reported and then shut down, so the user can get help (or recover from a backup). 3) Adds a new -salvagewallet option, which: + Moves the wallet.dat to wallet.timestamp.bak + extracts ONLY keypairs and master keys into a new wallet.dat + soft-sets -rescan, to recreate transaction history This was tested by randomly corrupting testnet wallets using a little python script I wrote (https://gist.github.com/3812689)

gavinandresen · 2012-10-08T23:59:36Z

Rebased on top of #1917; changed error handling from bdb methods from exceptions to returned error codes.

8b8e706 [BUG][GUI] Fix wallet crashing on faq-buttons press (random-zebra) Pull request description: Bug reported today on discord. Wallet (GUI 4.3.0) crashes when the FAQ buttons from the masternode widget are pressed. **Cause:** The FAQs sections have been recently reduced to only 6 (old zerocoin sections have been removed). But the buttons in the masternode widget still reference the old indexes (9 and 10), which are now out of bounds. Further, the safety check inside `setSection`, still has the old (10) max value hardcoded, so does not prevent this. **Fix:** This clearly shows the difficulty in maintaining code with fixed magic numbers (which depend on a list of objects that may be modified in future releases). So, let's introduce a proper enumeration inside `SettingsFaqWidget`, and use that to manage the sections, instead of just integers. ACKs for top commit: Fuzzbawls: ACK 8b8e706 furszy: code ACK 8b8e706 Tree-SHA512: 81b2fb1465e2fd0af4d66372d753ace4a2dd585a4a2e3fb10535e774b7fb4963949a8fb10ed1811e93a07eafe800578143f3307306243eea4339201cca6482e3

laanwj reviewed Oct 2, 2012
View reviewed changes

sipa reviewed Oct 4, 2012
View reviewed changes

gavinandresen added 2 commits October 8, 2012 17:46

Don't try to verify a non-existent wallet.dat

d0b3e77

gavinandresen merged commit d0b3e77 into bitcoin:master Oct 9, 2012

jnewbery mentioned this pull request Jun 5, 2017

salvagewallet fails verification #7463

Closed

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle corrupt wallets gracefully. #1895

Handle corrupt wallets gracefully. #1895

gavinandresen commented Oct 1, 2012

laanwj Oct 2, 2012

gavinandresen Oct 2, 2012

Diapolo Oct 2, 2012

Diapolo commented Oct 3, 2012

gavinandresen commented Oct 4, 2012

gavinandresen commented Oct 4, 2012

sipa Oct 4, 2012

laanwj Oct 5, 2012

sipa Oct 5, 2012

sipa commented Oct 7, 2012

gavinandresen commented Oct 7, 2012

gmaxwell commented Oct 7, 2012

luke-jr commented Oct 7, 2012

gavinandresen commented Oct 8, 2012

gavinandresen commented Oct 8, 2012

Handle corrupt wallets gracefully. #1895

Handle corrupt wallets gracefully. #1895

Conversation

gavinandresen commented Oct 1, 2012

laanwj Oct 2, 2012

Choose a reason for hiding this comment

gavinandresen Oct 2, 2012

Choose a reason for hiding this comment

Diapolo Oct 2, 2012

Choose a reason for hiding this comment

Diapolo commented Oct 3, 2012

gavinandresen commented Oct 4, 2012

gavinandresen commented Oct 4, 2012

sipa Oct 4, 2012

Choose a reason for hiding this comment

laanwj Oct 5, 2012

Choose a reason for hiding this comment

sipa Oct 5, 2012

Choose a reason for hiding this comment

sipa commented Oct 7, 2012

gavinandresen commented Oct 7, 2012

gmaxwell commented Oct 7, 2012

luke-jr commented Oct 7, 2012

gavinandresen commented Oct 8, 2012

gavinandresen commented Oct 8, 2012