-
Notifications
You must be signed in to change notification settings - Fork 35.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle corrupt wallets gracefully. #1895
Conversation
else if (strType == "tx") | ||
// Rescan if there is a bad transaction record: | ||
SoftSetBoolArg("-rescan", true); | ||
// Leave other errors alone, if we try to fix them we might make things worse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result stays DB_LOAD_OK here, so in case of corruptions in non-key/tx records, it silently continues also with the upgrading below at line 458.
Is this desired behavior?
Or should we set some flag, log a message and show a popup to the user (uiInterface.ThreadSafeMessageBox
) at the end of the function that recovery has taken place and that some wallet data (such as address book entries, details can be found in debug.log...) might be corrupt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An earlier version of this code extended DBErrors to have different levels of error, but the code started getting complicated and confusing (e.g. you could have a wallet that had a key error, a non-key error, AND needed upgrading... maybe DBErrors should be a bitmask, etc).
But telling the user that there is something wrong is definitely a good idea, I'll make that so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea to have bitmasks to handle error codes btw.
I know you will for sure dislike the following comment, but I'll try for the last time (you won't get any further comments on strings in your pulls, if you want) as the brave knight for unified string usage ^^. Can you change your Warning messages to the following format: "Warning: First sentence! Second sentence."
|
@Diapolo : good idea on the Verifying message. And ok, I'll change the first period to an exclamation mark. I'm finding serious bugs doing more testing; writing here so I don't lose track of them:
|
Updated to not "pre-verify" blkindex.dat which fixes the 'out of mutexes' problem (looks like bdb does not clean up after a ->verify() ?), pick up some changes from @jgarzik version of DBEnv::RemoveDB (kept RemoveDB as the name, though, since it removes a database not a dbenv), and tweaked Warning! messages. I'll investigate the downgrade weirdness separately, I'm afraid that might be another bug introduced in 0.7.0. |
printf("ERROR: db salvage failed\n"); | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My god... is that was is necessary to recover from BDB? Manually parse the hex dump created by a library?
I want to get rid of BDB yesterday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recovery is never a nice process, if things are broken enough you always get to a level where you have to look at hexdumps of the raw file to salvage anything. At least you still get delimited keys/values here.
Is it any prettier for leveldb, for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no leveldb tools; I think 'recovery' is the same as 'opening', and 'crashing' is the same as 'closing' in LevelDB. There are a few flags to set the degree of checksum verification or paranoidness when opening, though.
@gavinandresen Do you consider this pull ready now? |
Yes, this is pull-ready now. I'd like some help with more thorough testing. |
So, this can just cause your balance to go to zero with no notice if you're not watching the logs/console output carefully. Perhaps get getinfo error field should get something? Here is what I tested: Using gavin's testnet-in-a-box wallet. Seed 0 fails with Db::open: Invalid argument. In log I see Salvage(aggressive) found 2372 records. Recover original wallet, then run starting with seed 1: Fails at seed 1 with "DbEnv::open: DB_RUNRECOVERY: Fatal error, run database recovery" Starting without fuzzing gives the correct balance. Starting again at seed 2: Log shows: Renamed wallet.dat to wallet.1349630310.bak And a bunch of nice addwallets. But calling getinfo triggers segfault. -- still, this pull is a massive improvement over default. Now that we've got a case where there could be backup wallet files laying around perhaps we should go all the way and keep a couple wallet rotation even when there isn't corruption? Perhaps the fuzzing is a little too nasty to be a realistic test. Though if we ever change to our own append only format, I absolutely expect it to survive this kind of test. |
How does this handle encrypted wallets? |
Before, opening a -datadir that was created with a new version of Berkeley DB would result in an un-caught DB_RUNRECOVERY exception. After these changes, the error is caught and the user is told that there is a problem and is told how to try to recover from it.
@luke-jr it handles encrypted wallets as well as might be expected. It works on the bdb level, salvaging as many key/value pairs as it can from the backed-up wallet.dat. If it encounters a database-level error reading keys (private keys, encrypted or not, or master keys) it tells the user to try to recover from a backup. |
Corrupt wallets used to cause a DB_RUNRECOVERY uncaught exception and a crash. This commit does three things: 1) Runs a BDB verify early in the startup process, and if there is a low-level problem with the database: + Moves the bad wallet.dat to wallet.timestamp.bak + Runs a 'salvage' operation to get key/value pairs, and writes them to a new wallet.dat + Continues with startup. 2) Much more tolerant of serialization errors. All errors in deserialization are reported by tolerated EXCEPT for errors related to reading keypairs or master key records-- those are reported and then shut down, so the user can get help (or recover from a backup). 3) Adds a new -salvagewallet option, which: + Moves the wallet.dat to wallet.timestamp.bak + extracts ONLY keypairs and master keys into a new wallet.dat + soft-sets -rescan, to recreate transaction history This was tested by randomly corrupting testnet wallets using a little python script I wrote (https://gist.github.com/3812689)
Rebased on top of #1917; changed error handling from bdb methods from exceptions to returned error codes. |
8b8e706 [BUG][GUI] Fix wallet crashing on faq-buttons press (random-zebra) Pull request description: Bug reported today on discord. Wallet (GUI 4.3.0) crashes when the FAQ buttons from the masternode widget are pressed. **Cause:** The FAQs sections have been recently reduced to only 6 (old zerocoin sections have been removed). But the buttons in the masternode widget still reference the old indexes (9 and 10), which are now out of bounds. Further, the safety check inside `setSection`, still has the old (10) max value hardcoded, so does not prevent this. **Fix:** This clearly shows the difficulty in maintaining code with fixed magic numbers (which depend on a list of objects that may be modified in future releases). So, let's introduce a proper enumeration inside `SettingsFaqWidget`, and use that to manage the sections, instead of just integers. ACKs for top commit: Fuzzbawls: ACK 8b8e706 furszy: code ACK 8b8e706 Tree-SHA512: 81b2fb1465e2fd0af4d66372d753ace4a2dd585a4a2e3fb10535e774b7fb4963949a8fb10ed1811e93a07eafe800578143f3307306243eea4339201cca6482e3
Corrupt wallets used to cause a DB_RUNRECOVERY uncaught exception and a
crash. This commit does three things:
low-level problem with the database:
writes them to a new wallet.dat
Much more tolerant of serialization errors. All errors in deserialization
are tolerated EXCEPT for errors related to reading keypairs
or master key records-- those are reported and then shut down, so the user
can get help (or recover from a backup).
Adds a new -salvagewallet option, which:
This was tested by randomly corrupting testnet wallets using a little
python script I wrote (https://gist.github.com/3812689)