Implement database drop operation #431

cole-miller · 2022-11-15T19:52:28Z

WIP (still needs tests) but feedback appreciated.

This PR implements a new wire protocol request for dqlite that allows a client to drop or delete an entire database, so that it will appear empty when next opened. Some other SQL databases have a DROP my_db statement that does this, but SQLite does not, so we implement the behavior out-of-band by just deleting the backing database file and WAL.

Design notes:

The new REQUEST_DROP is parametrized by a database ID, not a database name, and so clients can only drop a database that they'd previously opened (with REQUEST_OPEN). This avoids a class of error conditions where the client tries to drop a non-existent database and generally makes the mental model cleaner (for me, at least).
Leaders will not drop a database if connections other than the requesting client's are open. We set a dropping flag on the db object to prevent clients from opening a DB that another client has requested to drop.
There is a new COMMAND_DROP log entry, and applying this log entry is the trigger for deleting the database "files" (via the VFS xDelete callback which we control). Therefore, dropping is asynchronous, just as though we were executing some SQL.

Questions/places where I'd like feedback:

Do we need to take additional measures in apply_drop or elsewhere to prevent xDelete from stomping on other database operations? My mental model is that we always apply entries in log order and so there's no concurrency to worry about, but I'm not so clear on the interaction with snapshotting and checkpointing.
Similarly, is the dropping flag sufficient to protect against other stuff interleaving with handle_drop and dropApplyCb? Should it be an atomic? Do we need to take some kind of lock on the database itself?
Do we need to worry about closing/clearing the db object's follower connection?
Is this going to have weird interactions with disk mode?

Closes #422

Signed-off-by: Cole Miller cole.miller@canonical.com

codecov · 2022-11-15T19:57:18Z

Codecov Report

Merging #431 (65a5f31) into master (0c38457) will decrease coverage by 13.83%.
The diff coverage is 1.29%.

❗ Current head 65a5f31 differs from pull request most recent head fd5b078. Consider uploading reports for the commit fd5b078 to get more accurate results

@@             Coverage Diff             @@
##           master     #431       +/-   ##
===========================================
- Coverage   73.47%   59.63%   -13.84%     
===========================================
  Files          31       31               
  Lines        4679     4752       +73     
  Branches     1462     1401       -61     
===========================================
- Hits         3438     2834      -604     
- Misses        745     1077      +332     
- Partials      496      841      +345

Impacted Files	Coverage Δ
src/fsm.c	`50.25% <0.00%> (-18.90%)`	⬇️
src/gateway.c	`40.54% <0.00%> (-18.19%)`	⬇️
src/db.c	`46.42% <100.00%> (-16.08%)`	⬇️
src/message.c	`0.00% <0.00%> (-100.00%)`	⬇️
src/request.c	`0.00% <0.00%> (-100.00%)`	⬇️
src/response.c	`0.00% <0.00%> (-100.00%)`	⬇️
src/conn.c	`49.24% <0.00%> (-23.62%)`	⬇️
src/client.c	`52.84% <0.00%> (-20.21%)`	⬇️
src/transport.c	`50.98% <0.00%> (-18.96%)`	⬇️
src/query.c	`30.00% <0.00%> (-18.89%)`	⬇️
... and 19 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Signed-off-by: Cole Miller <cole.miller@canonical.com>

cole-miller · 2022-11-15T22:08:13Z

While debugging this locally I realized it's more subtle than I thought (as usual). We really want to sqlite3_close all open connections before calling xDelete, and even then I think we need to make sure sqlite3_close hasn't just "zombified" the connection (ref). Also, I haven't completely convinced myself that there isn't an issue with followers potentially having an open client connection to some DB and then receiving a command log entry instructing them to drop it -- my understanding is that since the sqlite3_prepare-related fix, basically all access to the database(s) goes through a leader, but I might be missing something. (If that is the case, we don't need the follower connection field anymore, right?)

Edit: Ah, I see we use the "follower" connection for checkpointing and a couple of other non-client-related things.

MathieuBordere · 2022-11-16T08:40:15Z

Do we need to take additional measures in apply_drop or elsewhere to prevent xDelete from stomping on other database operations? My mental model is that we always apply entries in log order and so there's no concurrency to worry about, but I'm not so clear on the interaction with snapshotting and checkpointing.

A snapshot, while it is being taken, holds a copy of the WAL and pointers to database pages. Deleting the database pages while raft is accessing pointers to the DB pages will cause issues. Also, the snapshot mechanism assumes that the first n databases in the linked list of all open databases remains the same throughout the whole snapshot's lifetime. Databases can be added but cannot be removed in meanwhile, see e.g. here

Checkpointing happens synchronously in the main loop after frames have been applied to the WAL. I think that should be fine.

Similarly, is the dropping flag sufficient to protect against other stuff interleaving with handle_drop and dropApplyCb? Should it be an atomic? Do we need to take some kind of lock on the database itself?

I don't think it should be atomic as long as all accesses to the variable happen from the same thread. I think we should be able to acquire all locks like in maybeCheckpoint just before deleting the DB (as all connections are closed, this should succeed)

Do we need to worry about closing/clearing the db object's follower connection?

I think it's okay, the follower connection is always immediately closed after usage from what I can see. I think you can assert db->follower == NULL where applicable.

Is this going to have weird interactions with disk mode?

Not that I can think of, but let's make sure to test it.

cole-miller · 2022-11-16T14:20:24Z

Thanks -- so the plan is:

In handle_drop, we make sure nobody else has the db open, set the dropping flag to prevent others from opening it in the meantime, acquire the read_lock to prevent snapshots/checkpoints from running concurrently, and finally raft_apply
In apply_drop, we clean up any prepared statements (to prevent zombification), leader__close if there is a leader connection open, db__close, and finally xDelete
After that, there's nothing to do in the raft_apply callback but report success (or failure) to the client

cole-miller · 2022-11-16T21:54:55Z

Unfortunately, there's another issue: a follower may have a snapshot in progress when it comes time to apply the drop command, and there's no way to back out the entry at that point. I'm not sure what to do about this.

MathieuBordere · 2022-11-17T08:10:22Z

Unfortunately, there's another issue: a follower may have a snapshot in progress when it comes time to apply the drop command, and there's no way to back out the entry at that point. I'm not sure what to do about this.

This begins to sound like hackery, but we could mark the database for deletion and garbage-collect it somewhere else once the snapshot finishes.

Or fail the application of the log entry containing the deletion, but in such a way that raft will retry it at some point. It's not a disaster if a follower's FSM is not up-to-date I think, when it becomes leader, it will update it's FSM.

The second approach is more sane imo, but it still doesn't feel nice.

Maybe we could look for other ways to obtain the same desired effect for the user, "dropping a database" could mean completely emptying it, and maybe that's easier.

freeekanayaka · 2022-11-17T12:12:53Z

Unfortunately, there's another issue: a follower may have a snapshot in progress when it comes time to apply the drop command, and there's no way to back out the entry at that point. I'm not sure what to do about this.

This begins to sound like hackery, but we could mark the database for deletion and garbage-collect it somewhere else once the snapshot finishes.

Or fail the application of the log entry containing the deletion, but in such a way that raft will retry it at some point. It's not a disaster if a follower's FSM is not up-to-date I think, when it becomes leader, it will update it's FSM.

The second approach is more sane imo, but it still doesn't feel nice.

Fail the application of the log entry is going to be complicated IMO, because I don't think the raft library is quite prepared for that. Failing a log entry should be used only as emergency, like an inconsistency is detected or any other hard unresolvable situations (while this one is a transient problem).

Garbage collecting seems a tad better, although also not that nice.

Maybe we could look for other ways to obtain the same desired effect for the user, "dropping a database" could mean completely emptying it, and maybe that's easier.

I'd be in favor of emptying the database instead. Effectively it should be the same thing, because if you try to open it again it does make a difference for the user if it's not there at all (deleted) or if it's just there but empty. I believe we create databases automatically when you try to open them, so it seems emptying the db would be completely transparent from the user point of view.

cole-miller · 2022-11-17T15:04:46Z

Thanks for the feedback. With a garbage-collection approach I'd be worried that we might apply other log entries on top of the drop command, and that those log entries might be relying on the database drop to have run to completion, when it's actually still pending because of a snapshot. I'm also uneasy about returning "success" to the client that requested the drop before it's actually completed on all nodes.

The SQLite docs do suggest an alternative way of resetting or emptying the database, see here:

Set the SQLITE_DBCONFIG_RESET_DATABASE flag and then run VACUUM in order to reset a database back to an empty database with no schema and no content. The following process works even for a badly corrupted database file:

If the database connection is newly opened, make sure it has read the database schema by preparing then discarding some query against the database, or calling sqlite3_table_column_metadata(), ignoring any errors. This step is only necessary if the application desires to keep the database in WAL mode after the reset if it was in WAL mode before the reset.

sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 1, 0);

sqlite3_exec(db, "VACUUM", 0, 0, 0);

sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 0, 0);

(Fernando Apesteguía pointed this out on Mattermost.) This does seem less likely to mess with an ongoing snapshot than just deleting the file, but I'm not clear on whether sqlite3_exec("VACUUM") is something we really want to do in the middle of applying a log entry -- isn't that going to shuffle pages around in a way the FSM doesn't expect?

cole-miller · 2022-11-17T15:14:22Z

Hmm, I guess we could have a RESET dqlite request that works just like EXEC_SQL("VACUUM") except that we make those sqlite3_db_config calls at the appropriate points.

cole-miller · 2022-11-18T20:39:53Z

Closing in favor of #435

Implement database drop operation

fd5b078

Signed-off-by: Cole Miller <cole.miller@canonical.com>

cole-miller force-pushed the delete branch from 65a5f31 to fd5b078 Compare November 15, 2022 20:14

cole-miller closed this Nov 18, 2022

cole-miller mentioned this pull request May 3, 2023

Support attaching additional databases to a dqlite connection #441

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement database drop operation #431

Implement database drop operation #431

cole-miller commented Nov 15, 2022

codecov bot commented Nov 15, 2022 •

edited

Loading

cole-miller commented Nov 15, 2022 •

edited

Loading

MathieuBordere commented Nov 16, 2022 •

edited

Loading

cole-miller commented Nov 16, 2022

cole-miller commented Nov 16, 2022

MathieuBordere commented Nov 17, 2022 •

edited

Loading

freeekanayaka commented Nov 17, 2022

cole-miller commented Nov 17, 2022

cole-miller commented Nov 17, 2022

cole-miller commented Nov 18, 2022

Implement database drop operation #431

Implement database drop operation #431

Conversation

cole-miller commented Nov 15, 2022

codecov bot commented Nov 15, 2022 • edited Loading

Codecov Report

cole-miller commented Nov 15, 2022 • edited Loading

MathieuBordere commented Nov 16, 2022 • edited Loading

cole-miller commented Nov 16, 2022

cole-miller commented Nov 16, 2022

MathieuBordere commented Nov 17, 2022 • edited Loading

freeekanayaka commented Nov 17, 2022

cole-miller commented Nov 17, 2022

cole-miller commented Nov 17, 2022

cole-miller commented Nov 18, 2022

codecov bot commented Nov 15, 2022 •

edited

Loading

cole-miller commented Nov 15, 2022 •

edited

Loading

MathieuBordere commented Nov 16, 2022 •

edited

Loading

MathieuBordere commented Nov 17, 2022 •

edited

Loading