Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obfuscate database files #6613

Closed
laanwj opened this issue Sep 1, 2015 · 34 comments

Comments

Projects
None yet
8 participants
@laanwj
Copy link
Member

commented Sep 1, 2015

To avoid problems on Windows with Anti-Virus software, there needs to be an option to obfuscate the keys/values written to the database files, especially the UTXO database.
See #4069 for discussion.

It should be really simple, just enough to make it useless to put AV signatures in transactions. E.g. generate a random key on first start, store the key in the database, then XOR all subsequent data read/written to the database with that.

Possibly this obfuscation could include the block files as well, although I've never heard of problems with those - the most likely explanation is that AV software doesn't consider files above a certain size.

@jgarzik

This comment has been minimized.

Copy link
Contributor

commented Sep 1, 2015

+1

@jamesob

This comment has been minimized.

Copy link
Member

commented Sep 6, 2015

I'd like to try tackling this one, but I'm a little confused about where to start. Is the basic approach here something like

  • On construction of CCoinsViewDB, check db to ensure that some value exists under key xor_obfuscate (or similar); if not, randomly generate one and write it to db
  • On CCoinsViewDB::BatchWrite(), use db.Read('xor_obfuscate') to XOR the value of the serialized Coin objects being written
  • On CCoinsViewDB::GetCoins(), use the xor_obfuscate value to reverse the XOR in BatchWrite()
  • Add a flag to disable obfuscation

Does that sound like the right tack, or am I off?

In terms of testing, it looks like we'd get some incidental coverage out of test/coins_test.cpp, but are there any supplemental testcases that should be written?

@luke-jr

This comment has been minimized.

Copy link
Member

commented Sep 6, 2015

Where are you handling un-obfuscated reads (eg, those written with older versions) after initialising the xor_obfuscate key?

@dcousens

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2015

Concept ACK

@sipa

This comment has been minimized.

Copy link
Member

commented Sep 6, 2015

@jamesob

This comment has been minimized.

Copy link
Member

commented Sep 7, 2015

@luke-jr @sipa good points, thanks.

@sipa, I think given the discussion in #4069 AES might be more than we need right now, and definitely has different performance characteristics than XOR does. Maybe something like that will be called for down the road, but for now it seems like an XOR should do the job.

It would be nice if we could somehow indicate obfuscation in the value itself, so that way we won't have to do two reads (and a potential write) for correction, but I think that may be a little too tricky, and we should just keep this as simple as performance constraints will allow.

Should we generalize the the obfuscation mechanism to the CLevelDBWrapper, maybe adding an obfuscate bool parameter to its constructor? or should we keep this fairly contained as a one-off solution for CCoinsViewDB? If anyone has any strong opinions, I'd like to hear them. Otherwise I'm going to move forward with a draft of the more general approach.

@laanwj

This comment has been minimized.

Copy link
Member Author

commented Sep 7, 2015

Where are you handling un-obfuscated reads (eg, those written with older versions) after initialising the xor_obfuscate key?

I'd say: make a database either obfuscated or not obfuscated. Partially obfuscated databases are as incompatible with older versions as completely obfuscated ones (plus there is danger it would work partially and error out later), so there is no use for the middle road.

A new database would be 'born' obfuscated. An old database would be kept unobfuscated unless the user gives a command line option to obfuscate, which would obfuscate the entire database in one go in an upgrade process. This loses backward compatibility.

The two cases can be distinguished by the presence/non-presence of the obfuscation key, or e.g. add a MIN_VERSION like in the wallet. I think this makes implementation a lot easier and less error-prone.

Also, would it be overkill to use AES-CTR instead?

As this is meant as a fast, lightweight obfuscation mechanism, using AES seems like overkill. The goal is to avoid trivial signature matching. I don't see what using a stronger crypto-system would add.

@jamesob

This comment has been minimized.

Copy link
Member

commented Sep 7, 2015

@laanwj any examples of "upgrade processes" in the codebase that I can reference for this implementation?

@jamesob jamesob referenced this issue Sep 7, 2015

Merged

Obfuscate chainstate #6650

2 of 2 tasks complete
@pstratem

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

@sipa "LevelDB is efficient for large amounts of keys that start with the same sequence of bytes, so this would not hurt storage."

Does leveldb merge prefixes or something?

@sipa

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

@pstratem Yes.

@pstratem

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

@sipa it looks like there are already some special keys in the chainstate database?

@pstratem

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

@laanwj There's certainly a disadvantage to not supporting mixed obfuscation databases.

Specifically if this is switched to the default then everybody is going to need to do the upgrade procedure, which is potentially slow and expensive.

@sipa

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

@pstratem What special keys?

@pstratem

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

@sipa 'B' is the best block 'l' is the last block

Everything else seems to be prefix || hash -> value

@sipa

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

All keys are serialize(something), whether that something is a character or something more complex.

@jgarzik

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

Transition thinking...

  • Mixed obfuscation creates a mixed situation that lingers forever.
  • Mixed obfuscation does not eliminate the problem for a user, even after the user upgrades.
  • A one-time upgrade is painful only if done in the foreground
@pstratem

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

@sipa right but the serialization of a single charater is ... a single character

@pstratem

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

@jgarzik The upgrade being done in the background seems optimal.

@sipa

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

Doing this in the background is pretty hard, as you need to prevent writing entries that a concurrent modification may be updating.

@jgarzik

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

@sipa Then don't do it that way. You also said -reindex was hard :)

e.g. create a unified view of new + old database; old db under the hood becomes read-only, all updates go to new db.

There are other approaches as well.

@sipa

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

@jgarzik I agree, a partial state does not fix the problem. A UTXO entry that is not touched is never overwritten, so creating a UTXO with a virus signature in it, and then never spending it would leave AV software detecting the file as problematic until reindex anyway.

@sipa

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

@jgarzik Didn't say it was impossible, but it's a non-trivial thing to do that requires careful testing (UTXO database errors can lead to consensus failure). I think for now I'm perfectly fine with an opt-in approach. We can add updating later.

@jamesob

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

So: opt-in, atomic obfuscation for now, more sophisticated obfuscation-by-default scheme later?

@jgarzik

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

IMO:

  • NAK mixed obfuscation - all or none is cleaner, less complex and better reflects the future
  • if all-or-none, obfuscation could easily become encryption with a swap of layers, without touching low-level db or upper-level bitcoin.
  • turn on by default for new db or -reindex
@sipa

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

Enabling by default on -reindex should be easy to do in this PR.

@sipa

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

Another issue: we need a clean failure when downgrading. I don't think the chainstate has a version number, so that's problematic.

@jgarzik

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

Presumably the failure mode is database-appears-totally-empty?

@sipa

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

Hmm, no, it would read random garbage and try to deserialize it.

@jamesob

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

What's the usecase for a downgrade? When would that actually happen?

@sipa

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

People tend to try a new version but then find a problem, so need to revert to an earlier version of the client.

@gmaxwell

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

My thought was that if there is no obf key, we set one and set it to all zeros. Tada. Everyone is "obfuscated". On reindex we should reset the key. Downgrade works fine, so long as you haven't reindexed since. If you do downgrade after one you'll fail right away, on the chainstate check ... and it'll tell you to reindex!

Log the OBF key so that when troubleshooting we can see if you were still AV exposed.

I do not think more complexity than this is justified. The users this helps are primarily ones where the software currently immediately fails. If you suffer a latent AV incident, the result will be to need a reindex in any case... which will helpfully fix you here too. Considering how much data corruption problems we currently have with LevelDB on windows already, this sounds pretty complete.

@pstratem

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

@gmaxwell Ha that's a nice solution.

@jamesob

This comment has been minimized.

Copy link
Member

commented Sep 8, 2015

@gmaxwell very slick :). I'll get to implementing.

@dcousens

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2015

@gmaxwell ha, 👍 good solution.

@laanwj laanwj closed this in #6650 Oct 6, 2015

kazcw added a commit to kazcw/bitcoin that referenced this issue Apr 22, 2016

Enable leveldb obfuscation for new databases
This was implemented some time ago in bitcoin#6650 to fix bitcoin#6613 but does not appear
ever to have been enabled under any circumstances outside of test/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.