Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look ma, multiple DB support (Part I of the DB saga) #2924

Merged
merged 32 commits into from Sep 5, 2019

Conversation

@cdecker
Copy link
Member

commented Aug 8, 2019

Data integrity and data safety is paramount for nodes in the Lighning Network
since loss of data, or use of outdated data can directly lead to loss of
funds. c-lightning currently stores all its information in a sqlite3
database, and offers the opportunity to synchronously replicate any changes to
other storages through the use of the db_write hook.

To further bolster our data resilience, and allow for a variety of more
enterprisey deployments it is desirable to allownode operators to switch out
the sqlite3 backend for something else. In this PR implement a DB
abstraction framework that allows to configure other DBMSs as backends, and
future PRs will add concrete implementations of those drivers.

This PR consists of a couple of distinct things, but I wanted to make sure it
actually works before submitting a partial solution.

  • Compile time SQL statement extraction, rewriting for each supported DBMS,
    and runtime lookup of the rewritten queries.
  • Creation of a minimal common API that all DBMSs that I looked at can
    support, enabling the abstraction over the specific APIs. The API is very
    much inspired by sqlite3 but I made sure that it can be easily mapped to
    MySQL and Postgresql. In addition we have number of helper functionality to
    bolt down some avenues for errors:
    • Both bindings and column access use 0-based indexing, not like sqlite3s
      1-based bindings (which I found out were inspired by printf). This
      caused a couple of misunderstandings in the past.
    • Runtime statement checking such as bind checking on execution (statement
      placeholders without a binding will result in failing the execution).
    • Compile-time instrumentation (placeholders are counted during extraction
      and allow minimizing allocations at runtime).
    • Allow numbered placeholders to avoid repetitive bindings (not implemented
      yet).
  • Implementation of the sqlite3 driver, now using the uniform API, and
    keeping all sqlite3-specific code in a separate file that is easily
    configured out if the library is missing or incompatible.

Since the "S" in SQL is a marketing lie, and SQL statements are not portable
across DBMSs I had to come up with a method to write portable code. The
solution I went for consists of marking the SQL statements that are to be
extracted with two macros (SQL and NAMED_SQL), which then allows a tool
(devtools/sql-rewrite.py) to extract them and rewrite them into variants
that work for the specific DBMS that we are rewriting for. The queries are
then stored in a large array inside of wallet/gen_db_${dbms}.c and looked up
using a unique name (autogenerated in the case of the SQL macro).

When preparing a DB statement we look up the rewritten statement in the array
and initialize a struct db_stmt, to which we can attach bindings. When
executing we perform a number checks and run the query against the selected
DBMS.

The DB-specific implementation of the common API is in wallet/db_${dbms}.c
and adheres to a common set of functions defined in
wallet/db_common.h. wallet/db.c itself only dispatches calls to the
DB-specific implementations through the use of a struct db_config instance
(yes, it is a virtual dispatch table, I implemented dynamic dispatch in a
non-OOP language...), but none of this should ever leak outside of the files
mentioned here, and it'll just look like a normal DB from outside.

Still to do for this PR:

  • Migrate SQL statements in db.c.
  • Migrate SQL statements in wallet.c.
  • Remove traces of sqlite3 in the non-sqlite3-specific files.

Once I have these missing pieces I will remove the draft status, so don't feel
forced to review this until I do, but feedback is always welcome 馃槈

Still to do for future PRs:

  • Implement MySQL driver
  • Implement Postgresql driver
  • Change hard to rewrite queries into simpler ones (e.g., INSERT OR REPLACE INTO)

@cdecker cdecker added wallet db labels Aug 8, 2019

@cdecker cdecker requested review from rustyrussell, niftynei and ZmnSCPxj Aug 8, 2019

@cdecker cdecker self-assigned this Aug 8, 2019

@cdecker cdecker force-pushed the cdecker:multi-db branch 4 times, most recently from 921da08 to 9c845d1 Aug 8, 2019

@jb55

This comment has been minimized.

Copy link
Collaborator

commented Aug 8, 2019

very cool

@ZmnSCPxj

This comment has been minimized.

Copy link
Collaborator

commented Aug 8, 2019

Since the "S" in SQL is a marketing lie, and SQL statements are not portable
across DBMSs I had to come up with a method to write portable code. The
solution I went for consists of marking the SQL statements that are to be
extracted with two macros (SQL and NAMED_SQL), which then allows a tool
(devtools/sql-rewrite.py) to extract them and rewrite them into variants
that work for the specific DBMS that we are rewriting for.

So we now have our own SQL variant?

Which is not to say it is a bad solution, mostly a gripe. I encountered this kind of issue before (GLSL shaders), and sometimes this is the only solution you can come up with without going whole hog and creating your own language.

yes, it is a virtual dispatch table, I implemented dynamic dispatch in a
non-OOP language...

C is an OOP language (for some definition of "is"). It is just that you have to implement OO yourself. See the shenanigans that gtk gets up to in its C code, for example.

@jb55

This comment has been minimized.

Copy link
Collaborator

commented Aug 8, 2019

Which is not to say it is a bad solution, mostly a gripe. I encountered this kind of issue before (GLSL shaders), and sometimes this is the only solution you can come up with without going whole hog and creating your own language.

it's an interesting approach, I don't feel that strongly about it yet. the alternative being programming to some ORM which is never much fun. As long as it doesn't require that much hacking to get postgresql working, then replication will be much easier.

@rustyrussell rustyrussell added this to the 0.7.3 milestone Aug 9, 2019

@rustyrussell
Copy link
Contributor

left a comment

I think xgettext -kSQL makes this problem much simpler, and it's a tool which already exists for this purpose.

The resulting .po file format is trivial to parse, or we could literally use dgettext here too, but this would be a little weird if we ever support real localization of strings.

wallet/db.c Outdated

/* Since these queries will be treated as read-only they need to start
* with "SELECT" and have no side-effects. */
assert(strncmp(query, "SELECT", 6) == 0);

This comment has been minimized.

Copy link
@rustyrussell

rustyrussell Aug 9, 2019

Contributor

We have 'strstarts()' for this BTW.

* This macro is used to annotate SQL queries that might need rewriting for
* different SQL dialects. It is used both as a marker for the query
* extraction logic in devtools/sql-rewrite.py to identify queries, as well as
* a way to swap out the query text with it's name so that the query execution

This comment has been minimized.

Copy link
@rustyrussell

rustyrussell Aug 9, 2019

Contributor

"with its name"

@niftynei
Copy link
Collaborator

left a comment

really nice draft

"sqlite3": Sqlite3Rewriter(),
}

template = Template("""#ifndef LIGHTNINGD_WALLET_GEN_DB_${f.upper()}

This comment has been minimized.

Copy link
@niftynei

niftynei Aug 20, 2019

Collaborator

put template in external file?

wallet/db_sqlite3.c: wallet/gen_db_sqlite3.c

wallet/gen_db_sqlite3.c: devtools/sql-rewrite.py wallet/db.c wallet/wallet.c wallet/test/run-db.c
LD_LIBRARY_PATH=${LIBCLANG_PATH} devtools/sql-rewrite.py -f sqlite3 > wallet/gen_db_sqlite3.c

This comment has been minimized.

Copy link
@niftynei

niftynei Aug 20, 2019

Collaborator

I believe you can use $@ as the destination/target filename here, instead of spelling it out again.

wallet/db.c Outdated
@@ -771,6 +782,20 @@ static struct db *db_open(const tal_t *ctx, char *filename)
db = tal(ctx, struct db);
db->filename = tal_strdup(db, filename);
db->sql = sql;

for (size_t i=0; i<num_configs; i++)

This comment has been minimized.

Copy link
@niftynei

niftynei Aug 20, 2019

Collaborator

spacing here is very Go-like >.<

This comment has been minimized.

Copy link
@rustyrussell

rustyrussell Sep 4, 2019

Contributor

sometimesyoudon'twantspacesandit'sstillperfectlyreadable.

But maybe not here :)

struct db_query *queries;
size_t num_queries;
/* Is this a read-only query? If it is there's no need to tell plugins
* about it. */

This comment has been minimized.

Copy link
@niftynei

niftynei Aug 20, 2019

Collaborator

nice 馃拝

char *errmsg;
err = sqlite3_exec(db->conn, "COMMIT;", NULL, NULL, &errmsg);
if (err != SQLITE_OK)
fatal("Failed to begin a transaction: %s", errmsg);

This comment has been minimized.

Copy link
@niftynei

niftynei Aug 20, 2019

Collaborator

begin -> commit

}
}

if (err != SQLITE_OK) {

This comment has been minimized.

Copy link
@niftynei

niftynei Aug 20, 2019

Collaborator

it's a bit weird semantically to be checking if the 'err' is OK. Consider renaming to result?

}

err = sqlite3_step(stmt->inner_stmt);
if (err != SQLITE_DONE) {

This comment has been minimized.

Copy link
@niftynei

niftynei Aug 20, 2019

Collaborator

same as previous comment.

This comment has been minimized.

Copy link
@cdecker

cdecker Aug 28, 2019

Author Member

I pretty much copied this from the old code, and there I had gotten it from the sqlite3 example code :-)

@cdecker cdecker force-pushed the cdecker:multi-db branch 4 times, most recently from fe79c77 to b2a5434 Aug 20, 2019

@cdecker cdecker force-pushed the cdecker:multi-db branch 3 times, most recently from 2fb3e94 to 19e9459 Aug 27, 2019

@cdecker cdecker marked this pull request as ready for review Aug 28, 2019

@cdecker

This comment has been minimized.

Copy link
Member Author

commented Aug 28, 2019

Ok, I think this is now good to go. I went ahead and migrated all the places we used the legacy implementation with the new, indirect, implementation and everything seems to be working ok.

I simplified the interface for querying a bit, removed a lot of redundancy and implemented the binding and column accessor functions to use the abstraction layer as well.

I left the migration commits unsquashed for now so they are a bit less daunting, but once the PR has the required ACKs I'll squash them into a single commit.

@cdecker

This comment has been minimized.

Copy link
Member Author

commented Aug 28, 2019

@cdecker cdecker force-pushed the cdecker:multi-db branch from 19e9459 to fd80965 Aug 28, 2019

@jb55

This comment has been minimized.

Copy link
Collaborator

commented Aug 28, 2019

@cdecker I assume you squash the fixups now that it's ready to review?

@cdecker

This comment has been minimized.

Copy link
Member Author

commented Aug 29, 2019

@cdecker I assume you squash the fixups now that it's ready to review?

Keeping the migration fixups separate for until the review is done in order to make the review less daunting, but I'll squash before merging. @bitcoin-bot should be able to infer that the squashed version and the unsquashed version are the same (same changes) and re-apply ACKs automatically :-)

cdecker added 24 commits Jul 25, 2019
wallet: Move the db_fatal definition so we can use it in drivers
Signed-off-by: Christian Decker <decker.christian@gmail.com>
wallet: Move the struct db definition to db_common.h
All drivers will have to reach into it, so put it in a place that is reachable
from the drivers, along with all other definitions.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
wallet: Add read-only flag to extracted queries
This gets rid of the two parallel execution paths of read-only and write
queries, by explicitly stating with each query whether it is a read-only
query, we only need to remember the ones marked as write queries.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Implement skaffolding for the dispatch of DB-specific functions
These functions implement the lookup of the query, and the dispatch to the
DB-specific functions that do the actual heavy lifting.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Implement the sqlite3 driver
This is the DB-specific counterpart to the previous commit.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Switch to new DB asbtraction for DB migrations
These do not require the ability to iterate over the result, hence they can be
migrated already.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
wallet: Call db_stmt_free from the db_stmt destructor automatically
This is much more in line with the rest of our memory management.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Track whether a db_stmt has been executed
For some of the query methods in the next step we need to have an idea of
whether the stmt was executed (db_step function) so let's track that
explicitly.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Implement basic query capabilities
This is the first step towards being able to extract information from query
rows. Only the most basic types are exposed, the others will be built on top
of these primitives.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Add method to count changed rows of a db_stmt
I was hoping to get rid of these by using "ON CONFLICT" upserts, however
sqlite3 only started supporting them in version 3.24.0 which is newer than
some of our deployment targets.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Add setup and teardown function to DB
These are used to do one-time initializations and wait for pending statements
before closing.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
sqlite3: Move begin transaction and commit into the driver
This has a slight side-effect of removing the actual begin and commit
statements from the `db_write` hooks, but they are mostly redundant anyway (no
harm in grouping pre-init statements into one transaction, and we know that
each post-init call is supposed to be wrapped anyway).

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Migrate to DB abstraction layer in db.c
Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Add more type-safe bindings to the interface
Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Add type-safe column access functions
These are based on top of the basic column access functions, and act as a
small type-safe wrapper, that also does a bit of validation.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Add DB-specific db_last_insert_id
This is likely the last part we need to completely encapsulate the part of the
sqlite3 API that we were using. Like the `db_count_changes` call I decided to
pass in the `struct db_stmt` since really they refer to the statement that was
executed and not the db.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Migrate invoices.c to new abstraction layer
Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Migrate wallet.c to the new abstraction layer
Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Move tracking of pending statements into the `struct db`
We now have a much stronger consistency check from the combination of
transaction wrapping, tal memory leak detection. Tramsaction wrapping ensures
that each statement is executed before the transaction is committed. The
commit is also driven by the `io_loop`, which means that it is no longer
possible for us to have statements outside of transactions and transactions
are guaranteed to commit at the round's end.

By adding the tal-awareness we can also get a much better indication as to
whether we have un-freed statements flying around, which we can test at the
end of the round as well.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Move statement expansion into the driver
It's better to let the driver decide when and how to expand. It can then
report the expanded statement back to the dispatch through the
`db_changes_add` function.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Switch to indirect `db_last_insert_id` version
We are about to delete all the `sqlite3`-specific code from `db.c` and this is
one of the last uses of the old interface.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Switch to indirect db close
Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Remove sqlite3 from db.c and db.h
Now that all the users are migrated to the abstraction layer we can remove the
legacy implementation.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
db: Extract db config lookup into its own function
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Suggested-by: Rusty Russell <@rustyrussell>

@cdecker cdecker force-pushed the cdecker:multi-db branch from aef674a to 969f469 Sep 5, 2019

@cdecker

This comment has been minimized.

Copy link
Member Author

commented Sep 5, 2019

Ok, fixed the rebase issue I was having problems with yesterday.

Ready for review @rustyrussell 馃槈

@rustyrussell

This comment has been minimized.

Copy link
Contributor

commented Sep 5, 2019

Ack 969f469

@rustyrussell rustyrussell merged commit 58f4489 into ElementsProject:master Sep 5, 2019

2 of 3 checks passed

bitcoin-bot/acks Check is pending
bitcoin-bot/fixups PR does not contain unsquashed fixups
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@cdecker cdecker referenced this pull request Sep 13, 2019
2 of 3 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can鈥檛 perform that action at this time.