Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design reliable expiration #26

Closed
djmitche opened this issue Nov 22, 2020 · 4 comments · Fixed by #341
Closed

Design reliable expiration #26

djmitche opened this issue Nov 22, 2020 · 4 comments · Fixed by #341
Assignees
Milestone

Comments

@djmitche
Copy link
Collaborator

The sync model is such that there's a point in the list of operations that is committed. That can be used to determine when to delete an expired task -- basically, when an op after its expiration time has been committed.

That might need to be recorded as an operation? Needs a bit more thought.

@djmitche
Copy link
Collaborator Author

djmitche commented Dec 3, 2020

Note that this is not about marking a task as Status::Deleted. That's just a visibility thing. This is about actually removing the task from the database.

The problem with deleting tasks is that the only way to combine an Update and a Delete operation is into a Delete (since the deleted properties are all gone). And that is a kind of data loss.

So, we want to make sure that a task is only actually deleted from the DB when it's very unlikely that it will conflict with an update. One way to do that may be to only delete tasks when their modified timestamp is sufficiently far in the past -- and more than juts a few days. We'll probably want configurable times, and probably different times for Status:Deleted vs. Status:Completed tasks. Users might want to do historical analyses of completed tasks, but not care about deleted, for example.

@djmitche
Copy link
Collaborator Author

djmitche commented Jan 9, 2021

I think this will be one of the "gc" operations, and should probably be done only after a sync. Scan the DB for expired tasks (using the definition of expired described above) and add Delete operations for those tasks.

The result for users will be that any further operations on that task will be ignored, which is not terribly confusing.

@savchenko
Copy link
Collaborator

Having data actually deleted and not merely hidden is an important property. Can we do something like this?

  1. Each task has the following structure: {UUID: [k, v], [k, v], ...}
  2. When task is deleted, it becomes {UUID: nop }
  3. When any other client is sync'ed, they delete tasks where UUID == nop.

@djmitche
Copy link
Collaborator Author

The problem is that we don't store tasks, we store operations, and those operations need to be transposable. An Update(taskId, "status", "deleted") operation is easily transposed with other Update operations, but Delete(taskId) and Update(taskId, "k", "v") don't transpose without losing the k/v data in the second operation. So a delete would "win" over a simultaneous update, for example.

This actually mirrors TW -- task 123 delete just marks the task as deleted. What TW doesn't do is eventually reclaim the space in that task. This proposal fixes the latter bit, by actually deleting the data some time after the deletion operation, when it's more certain that the user won't be making simultaneous updates to the task.

I guess the idea behind "actually deleted" is user data sovereignty? Like when I delete a facebook post it'd be nice if facebook actually deleted it instead of just marking it invisible? I get that, but going back to the data model, we store operations, and those are in an immutable chain. So even if I add a Delete(taskId) operation to that chain, further back in the chain there's an Update(taskId, "description", "Lonnie says PharmaCorp will cure cancer tomorrow, BUY BUY BUY") that can't be removed without invalidating all operations after that one, and even if Lonnie's lawyer really wants you to. So, I think the best we can do is to be clear that "deleted' data can still be recovered if necessary -- both by making an "undelete" operation and with a big "NOTE" in the section on expiration. That note can also suggest some kind of export-and-re-import as a way to "start over" without deleted/expired tasks.

Note, again, that Google Docs (the canonical example of an OT-based document) have the same characteristic: you can't delete changes in a gdoc history, but you can make a new gdoc and copy the latest state into it, then delete the entire old document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants