Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a `data_version` to the `db_write` hook an implement optimistic locking #3358

Merged
merged 6 commits into from Jan 2, 2020

Conversation

@cdecker
Copy link
Member

cdecker commented Dec 18, 2019

Currently the db_write hook is a simple sink to which we write, but a plugin
has no way of telling whether there were any gaps or whether the lightningd
database lost some updates. This PR introduces a numeric data_version that
is incremented on any dirty transaction. A ditry transaction is a DB
transaction that performed some changes on the underlying database.

The data_version is guaranteed to change between successive changes, however
it does not guarantee monotonicity (i.e., it can wrap around at 2**32). Its
goal is to assert to a plugin that no changes to the DB have been performed by
c-lightning between successive calls to db_write. The data_version
persists across restarts so a plugin can check whether it matches a backup
after startup.

In addition this PR implements an optimistic locking strategy. The
data_version is incremented only if it matches what we expect from
startup. If it doesn't match we abort() since there might be another
instance writing to the database. This is not really important for sqlite3
which is already protected through the use of the pid file, but in postgres
we may not have such alternative protection mechanism against concurrent
instances.

It is not meant as a graceful shutdown, but rather a last resort protection. I
expect that some other locking mechanism is implemented in a wrapper for
c-lightning that'd start the instance only after acquiring ownership. As a
standalone though it should be sufficient to protect against concurrent
instances by serializing all write operations, and abort()ing the loser of
the write race.

As a minor detail the order in which the increment and the reporting is
currently done the plugin gets the version number after the transaction would
be applied, but that doesn't really matter, as long as the values differ
between calls.

The lock provides linearizability of DB modifications: if a database is
changed under the feet of a running process that process will abort(), which
from a global point of view is as if it had crashed right after the last
successful commit. Any process that also changed the DB must've started
between the last successful commit and the unsuccessful one since otherwise
its counters would not have matched (which would also have aborted that
transaction). So this reduces all the possible timelines to an equivalent
where the first process died, and the second process recovered from the DB.

@cdecker cdecker added this to the 0.8.1 milestone Dec 18, 2019
@cdecker cdecker requested a review from rustyrussell Dec 18, 2019
@cdecker cdecker self-assigned this Dec 18, 2019
@cdecker

This comment has been minimized.

Copy link
Member Author

cdecker commented Dec 18, 2019

Still checking rebaseability, will un-draft asap :-)

@ZmnSCPxj

This comment has been minimized.

Copy link
Collaborator

ZmnSCPxj commented Dec 19, 2019

When considering as well the possibillity of chaining db_write hooks, perhaps we should consider a two-stage commit as well:

  • db_write hook, where plugins save the queries in some log, and return with "result" : "continue" if they are able to save the queries or "result" : "abort" if they fail.
  • db_commit notification, where if all the plugins hooked on db_write returned "result" : "continue", this notification is broadcast so that plugins can advance the state.

We may not have hook chaining yet, but it seems good to prepare the interface at this point as well, so that plugin authors for remote backup in particular can start adapting to the new interface, then later we can support hook chaining and thereby allow multiple backup plugins to hook the db_write hook and stop the daemon if any backup is having trouble.

@cdecker

This comment has been minimized.

Copy link
Member Author

cdecker commented Dec 19, 2019

When considering as well the possibillity of chaining db_write hooks, perhaps we should consider a two-stage commit as well:

  • db_write hook, where plugins save the queries in some log, and return with "result" : "continue" if they are able to save the queries or "result" : "abort" if they fail.
  • db_commit notification, where if all the plugins hooked on db_write returned "result" : "continue", this notification is broadcast so that plugins can advance the state.

We may not have hook chaining yet, but it seems good to prepare the interface at this point as well, so that plugin authors for remote backup in particular can start adapting to the new interface, then later we can support hook chaining and thereby allow multiple backup plugins to hook the db_write hook and stop the daemon if any backup is having trouble.

Totally with you on the chainability of hooks, standardizing on {"result": "xyz"} allowing the plugin to abort or continue is important. It is possible today by having the hook return an error, but that looks a lot scarier to the user when he gets a backtrace with an inscrutable error message. So cleaning that up would be great.

I'm not convinced that a db_commit notification is something we should do. Plugins have to manage cases with and without that notification anyway (we might exit inbetween calling the hook and committing) so adding the notification might lead to more complex logic given now two different entrypoints into the logic are possible. Worse is that plugin developers may start relying on the notification, causing issues for committed but not notified transactions. These two arguments lead me to be cautious with this kind of apparent optimization.

@cdecker cdecker force-pushed the cdecker:backup-versioning branch from 704f99f to f4a84e2 Dec 19, 2019
@darosior

This comment has been minimized.

Copy link
Collaborator

darosior commented Dec 19, 2019

Worse is that plugin developers may start relying on the notification, causing issues for committed but not notified transactions.

FWIW we could send a notification at begin_transaction and one at commit_transaction, so that the plugin knows what's committed or not ?

@cdecker

This comment has been minimized.

Copy link
Member Author

cdecker commented Dec 19, 2019

Worse is that plugin developers may start relying on the notification, causing issues for committed but not notified transactions.

FWIW we could send a notification at begin_transaction and one at commit_transaction, so that the plugin knows what's committed or not ?

But the plugin doesn't know as a matter of fact, since we might terminate inbetween calling the hook and issuing the notification, so the plugin must handle the case where it doesn't get a notification but the DB transaction is still committed. Once you already have to handle that, why add yet another possible entrypoint for the happy case, and not just always assume the bad case? No matter how many notifications we add the plugin will have to have the ability to handle the case that a notification never comes :-)

I guess I'm worried about the overhead of notifications (it's small but not free) and plugin devs relying to heavily on the helpful notification and not bothering with the bad case, which would then result in loads of subtly broken plugins.

Copy link
Collaborator

niftynei left a comment

nice, seems like a good thing to have


/* The current DB version we expect to update if changes are
* committed. */
u32 data_version;

This comment has been minimized.

Copy link
@niftynei

niftynei Dec 19, 2019

Collaborator

u64 not u32?

This comment has been minimized.

Copy link
@cdecker

cdecker Dec 20, 2019

Author Member

Yes, intval in the vars table is defined as an INTEGER which gets translated into a u32 in postgres and the size has to match in non-sqlite3 databases. So I thought I shouldn't be too picky about this value.

Alternatives are 1) creation of a new table, or 2) adding a new field bigintval to the vars table. If this is a problem I'd prefer the second. But really all we're guaranteeing is that there is enough variety to identify if we skipped writes or have lost some. In particular we can't do much if we're off by more than 2: if the plugin missed an update we must take a snapshot, if the DB moved back to an earlier point in time we must abort. We just need to have the probability of accidentally using the same version again small enough (1/2**32 seems good enough for me 😉).

This comment has been minimized.

Copy link
@niftynei

niftynei Dec 21, 2019

Collaborator

sweet, ok. that's a good thing to know about int -> u32 in postgres.

cdecker added 6 commits Dec 18, 2019
We were passing them in separately, while we could just retrieve them from the
db instance instead.
We are about to do some more operations before committing, so moving this up
allows us to reuse the same transaction.
This counter is incremented on each dirty transaction.
This increments the `data_version` upon committing dirty transactions, reads
the last data_version upon startup, and tracks the number in memory in
parallel to the DB (see next commit for rationale).

Changelog-Changed: JSON-RPC: Added a `data_version` field to the `db_write` hook which returns a numeric transaction counter.
The optimistic lock prevents multiple instances of c-lightning making
concurrent modifications to the database. That would be unsafe as it messes up
the state in the DB. The optimistic lock is implemented by checking whether a
gated update on the previous value of the `data_version` actually results in
an update. If that's not the case the DB has been changed under our feet.

The lock provides linearizability of DB modifications: if a database is
changed under the feet of a running process that process will `abort()`, which
from a global point of view is as if it had crashed right after the last
successful commit. Any process that also changed the DB must've started
between the last successful commit and the unsuccessful one since otherwise
its counters would not have matched (which would also have aborted that
transaction). So this reduces all the possible timelines to an equivalent
where the first process died, and the second process recovered from the DB.

This is not that interesting for `sqlite3` where we are also protected via the
PID file, but when running on multiple hosts against the same DB, e.g., with
`postgres`, this protection becomes important.

Changelog-Added: DB: Optimistic logging prevents instances from running concurrently against the same database, providing linear consistency to changes.
@cdecker cdecker force-pushed the cdecker:backup-versioning branch from 73c1928 to 8076248 Dec 27, 2019
@cdecker cdecker marked this pull request as ready for review Dec 27, 2019
Copy link
Collaborator

niftynei left a comment

ACK 8076248

@niftynei niftynei merged commit 3f3a48d into ElementsProject:master Jan 2, 2020
4 checks passed
4 checks passed
bitcoin-bot/acks Acks by niftynei
bitcoin-bot/changelog This PR has at least one changelog entry
bitcoin-bot/fixups PR does not contain unsquashed fixups
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.