-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement database drop operation #431
Conversation
Codecov Report
@@ Coverage Diff @@
## master #431 +/- ##
===========================================
- Coverage 73.47% 59.63% -13.84%
===========================================
Files 31 31
Lines 4679 4752 +73
Branches 1462 1401 -61
===========================================
- Hits 3438 2834 -604
- Misses 745 1077 +332
- Partials 496 841 +345
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Signed-off-by: Cole Miller <cole.miller@canonical.com>
While debugging this locally I realized it's more subtle than I thought (as usual). We really want to sqlite3_close all open connections before calling xDelete, and even then I think we need to make sure sqlite3_close hasn't just "zombified" the connection (ref). Also, I haven't completely convinced myself that there isn't an issue with followers potentially having an open client connection to some DB and then receiving a command log entry instructing them to drop it -- my understanding is that since the sqlite3_prepare-related fix, basically all access to the database(s) goes through a leader, but I might be missing something. (If that is the case, we don't need the follower connection field anymore, right?) Edit: Ah, I see we use the "follower" connection for checkpointing and a couple of other non-client-related things. |
A snapshot, while it is being taken, holds a copy of the WAL and pointers to database pages. Deleting the database pages while raft is accessing pointers to the DB pages will cause issues. Also, the snapshot mechanism assumes that the first n databases in the linked list of all open databases remains the same throughout the whole snapshot's lifetime. Databases can be added but cannot be removed in meanwhile, see e.g. here Checkpointing happens synchronously in the main loop after frames have been applied to the WAL. I think that should be fine.
I don't think it should be atomic as long as all accesses to the variable happen from the same thread. I think we should be able to acquire all locks like in
I think it's okay, the follower connection is always immediately closed after usage from what I can see. I think you can assert
Not that I can think of, but let's make sure to test it. |
Thanks -- so the plan is:
|
Unfortunately, there's another issue: a follower may have a snapshot in progress when it comes time to apply the drop command, and there's no way to back out the entry at that point. I'm not sure what to do about this. |
This begins to sound like hackery, but we could mark the database for deletion and garbage-collect it somewhere else once the snapshot finishes. Or fail the application of the log entry containing the deletion, but in such a way that raft will retry it at some point. It's not a disaster if a follower's FSM is not up-to-date I think, when it becomes leader, it will update it's FSM. The second approach is more sane imo, but it still doesn't feel nice. Maybe we could look for other ways to obtain the same desired effect for the user, "dropping a database" could mean completely emptying it, and maybe that's easier. |
Fail the application of the log entry is going to be complicated IMO, because I don't think the raft library is quite prepared for that. Failing a log entry should be used only as emergency, like an inconsistency is detected or any other hard unresolvable situations (while this one is a transient problem). Garbage collecting seems a tad better, although also not that nice.
I'd be in favor of emptying the database instead. Effectively it should be the same thing, because if you try to open it again it does make a difference for the user if it's not there at all (deleted) or if it's just there but empty. I believe we create databases automatically when you try to open them, so it seems emptying the db would be completely transparent from the user point of view. |
Thanks for the feedback. With a garbage-collection approach I'd be worried that we might apply other log entries on top of the drop command, and that those log entries might be relying on the database drop to have run to completion, when it's actually still pending because of a snapshot. I'm also uneasy about returning "success" to the client that requested the drop before it's actually completed on all nodes. The SQLite docs do suggest an alternative way of resetting or emptying the database, see here:
(Fernando Apesteguía pointed this out on Mattermost.) This does seem less likely to mess with an ongoing snapshot than just deleting the file, but I'm not clear on whether sqlite3_exec("VACUUM") is something we really want to do in the middle of applying a log entry -- isn't that going to shuffle pages around in a way the FSM doesn't expect? |
Hmm, I guess we could have a RESET dqlite request that works just like EXEC_SQL("VACUUM") except that we make those sqlite3_db_config calls at the appropriate points. |
Closing in favor of #435 |
WIP (still needs tests) but feedback appreciated.
This PR implements a new wire protocol request for dqlite that allows a client to drop or delete an entire database, so that it will appear empty when next opened. Some other SQL databases have a
DROP my_db
statement that does this, but SQLite does not, so we implement the behavior out-of-band by just deleting the backing database file and WAL.Design notes:
dropping
flag on the db object to prevent clients from opening a DB that another client has requested to drop.Questions/places where I'd like feedback:
dropping
flag sufficient to protect against other stuff interleaving with handle_drop and dropApplyCb? Should it be an atomic? Do we need to take some kind of lock on the database itself?Closes #422
Signed-off-by: Cole Miller cole.miller@canonical.com