Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement database drop operation #431

Closed
wants to merge 1 commit into from

Conversation

cole-miller
Copy link
Contributor

WIP (still needs tests) but feedback appreciated.

This PR implements a new wire protocol request for dqlite that allows a client to drop or delete an entire database, so that it will appear empty when next opened. Some other SQL databases have a DROP my_db statement that does this, but SQLite does not, so we implement the behavior out-of-band by just deleting the backing database file and WAL.

Design notes:

  • The new REQUEST_DROP is parametrized by a database ID, not a database name, and so clients can only drop a database that they'd previously opened (with REQUEST_OPEN). This avoids a class of error conditions where the client tries to drop a non-existent database and generally makes the mental model cleaner (for me, at least).
  • Leaders will not drop a database if connections other than the requesting client's are open. We set a dropping flag on the db object to prevent clients from opening a DB that another client has requested to drop.
  • There is a new COMMAND_DROP log entry, and applying this log entry is the trigger for deleting the database "files" (via the VFS xDelete callback which we control). Therefore, dropping is asynchronous, just as though we were executing some SQL.

Questions/places where I'd like feedback:

  • Do we need to take additional measures in apply_drop or elsewhere to prevent xDelete from stomping on other database operations? My mental model is that we always apply entries in log order and so there's no concurrency to worry about, but I'm not so clear on the interaction with snapshotting and checkpointing.
  • Similarly, is the dropping flag sufficient to protect against other stuff interleaving with handle_drop and dropApplyCb? Should it be an atomic? Do we need to take some kind of lock on the database itself?
  • Do we need to worry about closing/clearing the db object's follower connection?
  • Is this going to have weird interactions with disk mode?

Closes #422

Signed-off-by: Cole Miller cole.miller@canonical.com

@codecov
Copy link

codecov bot commented Nov 15, 2022

Codecov Report

Merging #431 (65a5f31) into master (0c38457) will decrease coverage by 13.83%.
The diff coverage is 1.29%.

❗ Current head 65a5f31 differs from pull request most recent head fd5b078. Consider uploading reports for the commit fd5b078 to get more accurate results

@@             Coverage Diff             @@
##           master     #431       +/-   ##
===========================================
- Coverage   73.47%   59.63%   -13.84%     
===========================================
  Files          31       31               
  Lines        4679     4752       +73     
  Branches     1462     1401       -61     
===========================================
- Hits         3438     2834      -604     
- Misses        745     1077      +332     
- Partials      496      841      +345     
Impacted Files Coverage Δ
src/fsm.c 50.25% <0.00%> (-18.90%) ⬇️
src/gateway.c 40.54% <0.00%> (-18.19%) ⬇️
src/db.c 46.42% <100.00%> (-16.08%) ⬇️
src/message.c 0.00% <0.00%> (-100.00%) ⬇️
src/request.c 0.00% <0.00%> (-100.00%) ⬇️
src/response.c 0.00% <0.00%> (-100.00%) ⬇️
src/conn.c 49.24% <0.00%> (-23.62%) ⬇️
src/client.c 52.84% <0.00%> (-20.21%) ⬇️
src/transport.c 50.98% <0.00%> (-18.96%) ⬇️
src/query.c 30.00% <0.00%> (-18.89%) ⬇️
... and 19 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Signed-off-by: Cole Miller <cole.miller@canonical.com>
@cole-miller
Copy link
Contributor Author

cole-miller commented Nov 15, 2022

While debugging this locally I realized it's more subtle than I thought (as usual). We really want to sqlite3_close all open connections before calling xDelete, and even then I think we need to make sure sqlite3_close hasn't just "zombified" the connection (ref). Also, I haven't completely convinced myself that there isn't an issue with followers potentially having an open client connection to some DB and then receiving a command log entry instructing them to drop it -- my understanding is that since the sqlite3_prepare-related fix, basically all access to the database(s) goes through a leader, but I might be missing something. (If that is the case, we don't need the follower connection field anymore, right?)

Edit: Ah, I see we use the "follower" connection for checkpointing and a couple of other non-client-related things.

@MathieuBordere
Copy link
Contributor

MathieuBordere commented Nov 16, 2022

Do we need to take additional measures in apply_drop or elsewhere to prevent xDelete from stomping on other database operations? My mental model is that we always apply entries in log order and so there's no concurrency to worry about, but I'm not so clear on the interaction with snapshotting and checkpointing.

A snapshot, while it is being taken, holds a copy of the WAL and pointers to database pages. Deleting the database pages while raft is accessing pointers to the DB pages will cause issues. Also, the snapshot mechanism assumes that the first n databases in the linked list of all open databases remains the same throughout the whole snapshot's lifetime. Databases can be added but cannot be removed in meanwhile, see e.g. here

Checkpointing happens synchronously in the main loop after frames have been applied to the WAL. I think that should be fine.

Similarly, is the dropping flag sufficient to protect against other stuff interleaving with handle_drop and dropApplyCb? Should it be an atomic? Do we need to take some kind of lock on the database itself?

I don't think it should be atomic as long as all accesses to the variable happen from the same thread. I think we should be able to acquire all locks like in maybeCheckpoint just before deleting the DB (as all connections are closed, this should succeed)

Do we need to worry about closing/clearing the db object's follower connection?

I think it's okay, the follower connection is always immediately closed after usage from what I can see. I think you can assert db->follower == NULL where applicable.

Is this going to have weird interactions with disk mode?

Not that I can think of, but let's make sure to test it.

@cole-miller
Copy link
Contributor Author

Thanks -- so the plan is:

  • In handle_drop, we make sure nobody else has the db open, set the dropping flag to prevent others from opening it in the meantime, acquire the read_lock to prevent snapshots/checkpoints from running concurrently, and finally raft_apply
  • In apply_drop, we clean up any prepared statements (to prevent zombification), leader__close if there is a leader connection open, db__close, and finally xDelete
  • After that, there's nothing to do in the raft_apply callback but report success (or failure) to the client

@cole-miller
Copy link
Contributor Author

Unfortunately, there's another issue: a follower may have a snapshot in progress when it comes time to apply the drop command, and there's no way to back out the entry at that point. I'm not sure what to do about this.

@MathieuBordere
Copy link
Contributor

MathieuBordere commented Nov 17, 2022

Unfortunately, there's another issue: a follower may have a snapshot in progress when it comes time to apply the drop command, and there's no way to back out the entry at that point. I'm not sure what to do about this.

This begins to sound like hackery, but we could mark the database for deletion and garbage-collect it somewhere else once the snapshot finishes.

Or fail the application of the log entry containing the deletion, but in such a way that raft will retry it at some point. It's not a disaster if a follower's FSM is not up-to-date I think, when it becomes leader, it will update it's FSM.

The second approach is more sane imo, but it still doesn't feel nice.

Maybe we could look for other ways to obtain the same desired effect for the user, "dropping a database" could mean completely emptying it, and maybe that's easier.

@freeekanayaka
Copy link
Contributor

Unfortunately, there's another issue: a follower may have a snapshot in progress when it comes time to apply the drop command, and there's no way to back out the entry at that point. I'm not sure what to do about this.

This begins to sound like hackery, but we could mark the database for deletion and garbage-collect it somewhere else once the snapshot finishes.

Or fail the application of the log entry containing the deletion, but in such a way that raft will retry it at some point. It's not a disaster if a follower's FSM is not up-to-date I think, when it becomes leader, it will update it's FSM.

The second approach is more sane imo, but it still doesn't feel nice.

Fail the application of the log entry is going to be complicated IMO, because I don't think the raft library is quite prepared for that. Failing a log entry should be used only as emergency, like an inconsistency is detected or any other hard unresolvable situations (while this one is a transient problem).

Garbage collecting seems a tad better, although also not that nice.

Maybe we could look for other ways to obtain the same desired effect for the user, "dropping a database" could mean completely emptying it, and maybe that's easier.

I'd be in favor of emptying the database instead. Effectively it should be the same thing, because if you try to open it again it does make a difference for the user if it's not there at all (deleted) or if it's just there but empty. I believe we create databases automatically when you try to open them, so it seems emptying the db would be completely transparent from the user point of view.

@cole-miller
Copy link
Contributor Author

Thanks for the feedback. With a garbage-collection approach I'd be worried that we might apply other log entries on top of the drop command, and that those log entries might be relying on the database drop to have run to completion, when it's actually still pending because of a snapshot. I'm also uneasy about returning "success" to the client that requested the drop before it's actually completed on all nodes.

The SQLite docs do suggest an alternative way of resetting or emptying the database, see here:

Set the SQLITE_DBCONFIG_RESET_DATABASE flag and then run VACUUM in order to reset a database back to an empty database with no schema and no content. The following process works even for a badly corrupted database file:

  • If the database connection is newly opened, make sure it has read the database schema by preparing then discarding some query against the database, or calling sqlite3_table_column_metadata(), ignoring any errors. This step is only necessary if the application desires to keep the database in WAL mode after the reset if it was in WAL mode before the reset.
  • sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 1, 0);
  • sqlite3_exec(db, "VACUUM", 0, 0, 0);
  • sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 0, 0);

(Fernando Apesteguía pointed this out on Mattermost.) This does seem less likely to mess with an ongoing snapshot than just deleting the file, but I'm not clear on whether sqlite3_exec("VACUUM") is something we really want to do in the middle of applying a log entry -- isn't that going to shuffle pages around in a way the FSM doesn't expect?

@cole-miller
Copy link
Contributor Author

Hmm, I guess we could have a RESET dqlite request that works just like EXEC_SQL("VACUUM") except that we make those sqlite3_db_config calls at the appropriate points.

@cole-miller
Copy link
Contributor Author

Closing in favor of #435

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support deletion of database
3 participants