-
Notifications
You must be signed in to change notification settings - Fork 36.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Obfuscate database files #6613
Comments
+1 |
I'd like to try tackling this one, but I'm a little confused about where to start. Is the basic approach here something like
Does that sound like the right tack, or am I off? In terms of testing, it looks like we'd get some incidental coverage out of |
Where are you handling un-obfuscated reads (eg, those written with older versions) after initialising the xor_obfuscate key? |
Concept ACK |
jamesob: I would implement it at the LevelDB wrapper layer, rather than
CCoinsViewDB, as that would be more generic and IMHO easier.
The xor value could be cached inside the wrapper object - there is no need
to read it over and over again. You'd read it when opening the database,
and if it doesn't exist, generate and write it.
As Luke mentions, you also need to deal with non-obfuscated values. I
suppose we can institute a rule that a key value cannot start with a null
byte, and then store obfuscated values using (nullbyte + key) as key.
LevelDB is efficient for large amounts of keys that start with the same
sequence of bytes, so this would not hurt storage. You do need some
mechanism for erasing the old unobfuscated key when the new one is written.
This may have some performance impact...
Also, would it be overkill to use AES-CTR instead?
|
@luke-jr @sipa good points, thanks. @sipa, I think given the discussion in #4069 AES might be more than we need right now, and definitely has different performance characteristics than XOR does. Maybe something like that will be called for down the road, but for now it seems like an XOR should do the job. It would be nice if we could somehow indicate obfuscation in the value itself, so that way we won't have to do two reads (and a potential write) for correction, but I think that may be a little too tricky, and we should just keep this as simple as performance constraints will allow. Should we generalize the the obfuscation mechanism to the |
I'd say: make a database either obfuscated or not obfuscated. Partially obfuscated databases are as incompatible with older versions as completely obfuscated ones (plus there is danger it would work partially and error out later), so there is no use for the middle road. A new database would be 'born' obfuscated. An old database would be kept unobfuscated unless the user gives a command line option to obfuscate, which would obfuscate the entire database in one go in an upgrade process. This loses backward compatibility. The two cases can be distinguished by the presence/non-presence of the obfuscation key, or e.g. add a MIN_VERSION like in the wallet. I think this makes implementation a lot easier and less error-prone.
As this is meant as a fast, lightweight obfuscation mechanism, using AES seems like overkill. The goal is to avoid trivial signature matching. I don't see what using a stronger crypto-system would add. |
@laanwj any examples of "upgrade processes" in the codebase that I can reference for this implementation? |
@sipa "LevelDB is efficient for large amounts of keys that start with the same sequence of bytes, so this would not hurt storage." Does leveldb merge prefixes or something? |
@pstratem Yes. |
@sipa it looks like there are already some special keys in the chainstate database? |
@laanwj There's certainly a disadvantage to not supporting mixed obfuscation databases. Specifically if this is switched to the default then everybody is going to need to do the upgrade procedure, which is potentially slow and expensive. |
@pstratem What special keys? |
@sipa 'B' is the best block 'l' is the last block Everything else seems to be prefix || hash -> value |
All keys are serialize(something), whether that something is a character or something more complex. |
Transition thinking...
|
@sipa right but the serialization of a single charater is ... a single character |
@jgarzik The upgrade being done in the background seems optimal. |
Doing this in the background is pretty hard, as you need to prevent writing entries that a concurrent modification may be updating. |
@sipa Then don't do it that way. You also said -reindex was hard :) e.g. create a unified view of new + old database; old db under the hood becomes read-only, all updates go to new db. There are other approaches as well. |
@jgarzik I agree, a partial state does not fix the problem. A UTXO entry that is not touched is never overwritten, so creating a UTXO with a virus signature in it, and then never spending it would leave AV software detecting the file as problematic until reindex anyway. |
@jgarzik Didn't say it was impossible, but it's a non-trivial thing to do that requires careful testing (UTXO database errors can lead to consensus failure). I think for now I'm perfectly fine with an opt-in approach. We can add updating later. |
So: opt-in, atomic obfuscation for now, more sophisticated obfuscation-by-default scheme later? |
IMO:
|
Enabling by default on -reindex should be easy to do in this PR. |
Another issue: we need a clean failure when downgrading. I don't think the chainstate has a version number, so that's problematic. |
Presumably the failure mode is database-appears-totally-empty? |
Hmm, no, it would read random garbage and try to deserialize it. |
What's the usecase for a downgrade? When would that actually happen? |
People tend to try a new version but then find a problem, so need to revert to an earlier version of the client. |
My thought was that if there is no obf key, we set one and set it to all zeros. Tada. Everyone is "obfuscated". On reindex we should reset the key. Downgrade works fine, so long as you haven't reindexed since. If you do downgrade after one you'll fail right away, on the chainstate check ... and it'll tell you to reindex! Log the OBF key so that when troubleshooting we can see if you were still AV exposed. I do not think more complexity than this is justified. The users this helps are primarily ones where the software currently immediately fails. If you suffer a latent AV incident, the result will be to need a reindex in any case... which will helpfully fix you here too. Considering how much data corruption problems we currently have with LevelDB on windows already, this sounds pretty complete. |
@gmaxwell Ha that's a nice solution. |
@gmaxwell very slick :). I'll get to implementing. |
@gmaxwell ha, 👍 good solution. |
This was implemented some time ago in bitcoin#6650 to fix bitcoin#6613 but does not appear ever to have been enabled under any circumstances outside of test/
To avoid problems on Windows with Anti-Virus software, there needs to be an option to obfuscate the keys/values written to the database files, especially the UTXO database.
See #4069 for discussion.
It should be really simple, just enough to make it useless to put AV signatures in transactions. E.g. generate a random key on first start, store the key in the database, then XOR all subsequent data read/written to the database with that.
Possibly this obfuscation could include the block files as well, although I've never heard of problems with those - the most likely explanation is that AV software doesn't consider files above a certain size.
The text was updated successfully, but these errors were encountered: