Skip to content
This repository has been archived by the owner on May 25, 2021. It is now read-only.

background deletion for soft-deleted database #3

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jiangphcn
Copy link

@jiangphcn jiangphcn commented Apr 21, 2020

Overview

Allow background job to delete soft-deleted database according to specified criteria to release space. Once database is hard-deleted, the data can't be fetched back.

Testing recommendations

tbd

Related Issues or Pull Requests

apache/couchdb#2666

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • [N/A] Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • [N/A] A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

allow background job to delete soft-deleted database according to
specified criteria to release space. Once database is hard-deleted,
the data can't be fetched back.
@nickva
Copy link

nickva commented Apr 21, 2020

@jiangphcn looks like a good start!

I was think if we could perhaps just add this to fabric (fabric2*_*) modules like we did with indexing, also since soft deletion happens there, deletion logic would not be out of place there either.

For the general structure what do you think about starting with something simpler at first, and say have only a singleton job, basically with type = <<"dbdelete">> and jobid <<"dbdelete_job">>.

Then we won't even need a couch_dbdelete_server. We just call a couch_jobs:set_type_timeout(?DB_DELETE_JOB_TYPE, 6) and couch_jobs:add( undefined, ?DB_DELETE_JOB_TYPE, ?DB_DELETE_JOB, #{}) in some init function called by a supervisor. That would ensure this job would exist if it doesn't already.

Then the couch_dbdelete_worker gen_server would wait to accept that job run it. Because of the locking we would know only one job would run in the whole cluster.

In that worker we could run fabric2_db:list_deleted_dbs_info(...) but, let's use a callback there and accumulate batches of dbs, say 50 or 100 at a time. And for each db in the batch, we check the time limit and delete the old db instance. Here you can use a fabric2_util:pmap as I don't think we have a transactional interface to open and delete? Or just do a simple foreach loop at first.

After we are done, the job can reschedule itself to run in at some future point in time say after 1 hour or so and then finish. To make the scheduling work in the accept function we'd use the max_sched_time of now + say 1 minute.

@jiangphcn
Copy link
Author

@nickva thanks so much for your review and great suggestion, especially related to how to better leverage couch_jobs.

Originally, I tried to put logic to fabric module. However, I got error:badarg error when there is call related to couch_jobs. As discussed, it is related to a circular dependency between couch_jobs and fabric.

Based on all of your comments above, I tried to come up with one branch apache/couchdb@ae4e9f0 where one retry is implemented to address a circular dependency.

@nickva
Copy link

nickva commented Apr 23, 2020

@jiangphcn we chatted on slack but I'll summarize some of that just for visibility

For error:badarg we just need to find a way a better way to wait for couch_jobs to initialize there. Maybe call application:which_applications/1 or try to get he list of children from couch_jobs_sup.

For the general execution pattern it would be something like:

init() : respond back to ensure initialization proceeds, then call wait_for_couch_jobs()

wait_for_couch_jobs() :

  • wait for couch_jobs to initialize and either get the cleanup job, or try to add it, then call run_loop()

run_loop():

  • call couch_jobs:accept() and wait for job
  • once it gets a job it calls process_expirations()
  • when it returns calls couch_jobs:resubmit(..., ScheduledTime = Now + 1hour)
  • call run_loop() recursively

process_expirations():

  • go through all the dbs
  • periodically update the jobs state, (We could have stats there like accept_time, last_update_time, scheduled_time, dbs_processed, dbs_deleted, etc...)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants