Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wallet] replace BDB with internal append only (logdb) backend #5686

Closed

Conversation

jonasschnelli
Copy link
Contributor

This pull request is in early stage but discussions could help driving this into the right direction.

It will basically remove bdb (currently not on compile level) and make use of a internal key/value store.
Logdb (the internal key/value store) is a append only store where every key/value-frame contains a sha256 checksum all available frames. Consistency is guaranteed.

The current implementation passes all tests but has lack of recover/rewrite possibilities.

Why this

  • IMO BerkleyDB is oversized for a "reference wallet implementation". This append-only file-store could prevent from corruptions and relieves compile stress.
  • wallet flush threads are no longer required.
  • remove another dependency and get more control over our stack

What's next / ToDos

  • add compact Rewrite() function (save whole memory map atomically to a clean file): this is required after encrypting the wallet (added 2015/01/22)
  • add thread safe / concurrency tests for logdb
  • add Recover() function: scan through a corrupt wallet file and try to recover keys even when some key/value-frames and/or checksums are corrupted
  • add more tests
  • add wallet encryption test

This is on my internal roadmap:

  • remove bdb

  • transform account to labels
  • implement bip32
  • add spv mode

@jonasschnelli
Copy link
Contributor Author

Currently there is no migration tool from bdb to logdb.
Because it would be good to avoid Bdb in our binaries, a python tool for wallet migration could do the job.

IMO the version code 0.1x should make clear that such required manual migrations are still possible.

@jgarzik
Copy link
Contributor

jgarzik commented Jan 20, 2015

@jonasschnelli Agree that a migration tool is necessary before rolling this out.

There are a few complications with python that would need double-checking:

  • Are python installs guaranteed capable of reading BDB 4.8-format data?
  • For people using --with-incompatible-bdb configure option -- like me and some of the other devs -- conversion becomes even more annoying.

I would prefer a C++ migration tool. Eventually you could split it off into a separate project or repository if the continued BDB link-time dependency remains annoying.

@jonasschnelli
Copy link
Contributor Author

@jgarzik a C++ tool would probably get better results. Agreed.
It would just help the main bitcoin repository to not have to much legacy stuff. There is also some other things that can be removed from the code like the bdb wallet migration and features-checking (example: wallet feature for compressed key, etc.).

Moving the c++ migration tool to separate repository makes sense. There we could also support the --with-incompatible-bdb option so users could make sure to use the same version as they used while creating/manipulating their wallets.

@laanwj
Copy link
Member

laanwj commented Jan 20, 2015

Are python installs guaranteed capable of reading BDB 4.8-format data?

Yes - Python is generally linked against the system's BDB, which on modern systems is always >4.8. Newer BDB versions have no problem reading old databases. The problem is the other way around. After the file is "touched" it won't be backwards compatible anymore. Making a temporary copy then reading that would be a strategy to keep the origin file as-is.

@laanwj
Copy link
Member

laanwj commented Jan 20, 2015

Apart form that: nice work! I agree with your internal roadmap.

I think it makes sense to do this work on a 'newwallet' branch so that other people can cooperate, and we can merge it into master when the work is complete, which will likely be no sooner than 0.12.

@jgarzik
Copy link
Contributor

jgarzik commented Jan 20, 2015

+1 @laanwj

Concept ACK for the wallet changes.

@jgarzik
Copy link
Contributor

jgarzik commented Jan 20, 2015

@jonasschnelli Yes. I go back and forth between "separate repo now" and "separate repo later"

For the initial conversion, it makes everything easier if the migration tool is in bitcoin/bitcoin.git and is built with the current build system. That is the path of least resistance.

Build system & install Just Works. You automatically get the migration tool, even if you didn't know you needed it until (oops, right now!)

I try to be extra-conservative with Real Money(tm), which is what we're dealing with when it comes to the "legacy wallet" we're maintaining.

@luke-jr
Copy link
Member

luke-jr commented Jan 20, 2015

It may be a good idea to have 0.11 still compatible with bdb wallets without upgrading them - note that all our wallet upgrades have been opt-in so far, and you can still use a 0.4.0 wallet with 0.10 while retaining backward compatibility.

@jonasschnelli
Copy link
Contributor Author

Supporting both (bdb and logdb) wallet formats (at runtime) would require some work.
I'm also aware of the missing backward compatibility. Supporting bdb wallets after a possible switch to logdb would require resources better spent at bip32, accounting overhaul, etc.
IMO a clean cut would be worth it. Remove all the migration, min-version feature-compatibility legacy stuff.

I agree with not letting users in the dark and support a migration tool (c++, included in bitcoind for 0.11 or 0.12, but slowly extract migration tool to different repository).
First "logdb" release could still include bdb as dependency, used for compiling the bitcoin migration tool. If you miss the library, it will build bitcoin-core without this tool (like #5680).

Migration of a wallet is a risky process and it's probably better to do this outside of bitcoind as a conscious decision from the user.

@jonasschnelli jonasschnelli force-pushed the 2015/01/wallet/logdb branch 2 times, most recently from b227792 to add1fb3 Compare January 20, 2015 20:22
@laanwj
Copy link
Member

laanwj commented Jan 21, 2015

On IRC I've explicitly argued against dual database support, and I'm going to do so here again.
Having an intermediate phase where both databases are supported means legacy concerns (such as accounts) have to ooze into the new wallet interface. It also complicates testing. Are you going to verify that the new wallet will keep working with 0.3 wallets at every step of development?

An alternative option I've thought of is to leave the old wallet untouched. Move wallet,walletdb,rpcwallet (and others as needed) to src/legacywallet.

Then build this new wallet with new database, new interface in src/wallet/. As well as a migration tool.

Using a configure option, one can then either build against the new wallet or the old wallet.
This gives the 'conservative' choice to stick with the old wallet (for however long we're willing to carry it) with all its quirks, but without intermediate complications for the new code, and having to carry any of that over.

In the beginning the legacywallet will be the default and the new wallet will be experimental-only.

This is basically the "fork the wallet and develop it separately" approach, but a bit improvised as it's all in one repository - for now. It means we can merge this soon(TM) and experiment with it, instead of delaying that to a further future 'when it's ready'.

(the only complication here I see is with regard to the UI, which has to interface with both, but a similar thing can be done there; UI that only applies to the old wallet can be moved to a legacy directory)

@luke-jr
Copy link
Member

luke-jr commented Jan 21, 2015

Sounds reasonable, but if we can't build with both supported, how will binaries go out? Probably not too much more effort to just keep the class names unique and switch between them at runtime startup, is it?

@jonasschnelli
Copy link
Contributor Author

I like laanwjs idea of supporting both formats with a compile option.

Two concerns I see:

  • the new wallet needs some changes in init.cpp. I could try to make the init.cpp changes flexible in case of the backend database. Even rpcwallet.cpp could support both. But this needs work.

The 2nd concern is more a "conceptual" one. If we support both wallet formats, why should one upgrade his wallet? Couldn't that not lead to a need of supporting both formats? And if we once drop support of BDB, how is the migration process different now from the one in future (the user has to do it once anyway)? Maybe we could force people to upgrade and save some - already rare - development resources. I mean it's not the network, it's the wallet.

@jgarzik
Copy link
Contributor

jgarzik commented Jan 21, 2015

The original idea I discussed w/ @sipa on IRC was dropping support for BDB from bitcoind entirely, plus an auto-launched migration tool. Still prefer that arrangement. From bitcoind's standpoint, BDB support is deleted. No dual mode codebase.

@laanwj
Copy link
Member

laanwj commented Jan 21, 2015

@jonasschnelli

  • Yes, initialization/deinitalization should be moved to the wallet module(s) themselves and only called from init. This is long due anyhow, and mostly low-impact move-only.
  • My idea is not about supporting two formats, but supporting two different wallets. The old wallet will stay exactly as it is now (barring critical fixes). The new wallet will see new development, and come with new features (as well as get rid of old cruft), and an interface that supports e.g. multiwallet. It just means that the transition process is more smooth.

More daring users can use the new wallet, whereas more conservative users can stay with the old wallet for say a major release or two until it is completely axed (but the new one, as well as the conversion process, much better tested).

@luke-jr Building a monster executable with both wallet implementations could be possible, with e.g. an command line option to switch. Not sure if this is a good idea in practice, but I don't want to exclude the possibility upfront. We'll see when we get there I suppose. In the beginning this won't be an issue as the new wallet will be experimental-only so requiring build-time invocations is a good thing.

@jgarzik
Copy link
Contributor

jgarzik commented Jan 21, 2015

@laanwj hmmm, that is an interesting way to transition. Let's think through the user making that old wallet / new wallet choice. That plan would indeed make for a nice, clean separation.

@jonasschnelli jonasschnelli force-pushed the 2015/01/wallet/logdb branch 9 times, most recently from 357911b to 34d5853 Compare January 22, 2015 08:53
@jtimon
Copy link
Contributor

jtimon commented Feb 6, 2015

Sounds like a good plan to me. Can you reference those PRs from here as you create them?

@jonasschnelli
Copy link
Contributor Author

@jtimon currently it's #5752, #5758, #5744
But also have a look at the "status" page: #5761.

- add wallet encryption/unlocking test
- cleanup logdb, remove of initial header with logdb-file version number
- remove fileBackend=false option (doesn't make sense anymore)
- remove wallet max/min version
@pstratem
Copy link
Contributor

A suggestion for a more robust frame format.

uint64_t Unique Magic per log file
uint64_t Counter
uint64_t Length
    0xffffffffffffffff indicates length not specified
uint64_t Unique Magic per Record
uint8_t[Length] Body
uint64_t Unique Magic per Record
SHA256 hash of Counter|Length|Unique Magic per Record|Body

@jgarzik
Copy link
Contributor

jgarzik commented May 19, 2015

  • on length-not-specified, seems like the majority of cases (100%?) would use 0xffff..
  • it can sometimes be useful to pad the body to a 64-bit etc. alignment

@jonasschnelli
Copy link
Contributor Author

@pstratem but wouldn't the 2nd and 3rd element (uint64_t Counter and uint64_t Length) break the advantage of a append only format?

@pstratem
Copy link
Contributor

@jgarzik

on length-not-specified, seems like the majority of cases (100%?) would use 0xffff..

The purpose of allowing an unspecified length is extremely long records, those simply do not exist in a bitcoin wallet.

  • Keys
  • Transactions

it can sometimes be useful to pad the body to a 64-bit etc. alignment

eh... yeah I guess sometimes, but that should be defined as part of the format for the Body, note everything else is 64 bit aligned

@pstratem
Copy link
Contributor

@jonasschnelli

but wouldn't the 2nd and 3rd element (uint64_t Counter and uint64_t Length) break the advantage of a append only format?

you're gonna have to explain that one

@jonasschnelli
Copy link
Contributor Author

but wouldn't the 2nd and 3rd element (uint64_t Counter and uint64_t Length) break the advantage of a append only format?

you're gonna have to explain that one

Ah now i see!
I first thought the counter and length is at the beginning of the file and contains the combined length/counter of all frames within the filestore. But as i read again i saw that you propose the counter and length per frame.

#include <unistd.h>
#include <sys/stat.h>

#include "logdb.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Include group ordering (own headers before system ones).
Edit: Also obsolete file/license header...

@pstratem
Copy link
Contributor

This kind of stuff is the reason for the frame to be more robust.
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg886512.html

@jgarzik
Copy link
Contributor

jgarzik commented Sep 15, 2015

Leaning towards closing as Work In Progress. General IRC consensus has been "we want this" for a long time, so concept ACK from most developers seems implied, myself included.

However, this seems more of a longer term project resulting in a long lived PR. Recommend storing this on a branch and opening a mailing list thread to follow development.

@jonasschnelli
Copy link
Contributor Author

Closing.
This is included in my core wallet fork an will be further developed there with a chance of allowing a merge to this repository if it's once is stable and tested enough.

@dexX7 dexX7 mentioned this pull request Oct 18, 2015
@jonasschnelli
Copy link
Contributor Author

I started an append only key/value file format in pure C, based on @sipa concept, that aims to be storage format for wallets (libbtc/libbtc#41), suitable for MCUs, smartphones, desktop, etc.

Just in case someone wants to contribute.

@halvors
Copy link

halvors commented Apr 2, 2017

What's the status on this? It's been over a year now, is this idea dead?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants