Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for deletes #18

Closed
Shelnutt2 opened this issue Aug 9, 2017 · 1 comment
Closed

Add support for deletes #18

Shelnutt2 opened this issue Aug 9, 2017 · 1 comment

Comments

@Shelnutt2
Copy link
Owner

Shelnutt2 commented Aug 9, 2017

Delete support is needed. Deletes can be done in multiple ways.

  1. Add a delete indicator to the record, this essentially would be a update with a hidden field.
  2. 0 out the entire message from the file, this means we have to handle gaps in messages when reading/scanning file.
  3. Maintain a separate file with the list of deleted rows.

"1)" or "2) "are pretty equivalent. The advantage to 2 is during a table scan one does not have to parse the message only to find it has been deleted. It might also be that option 1 makes roll back easier. However with 1 or 2 we still need to maintain a list of the ongoing rows touched in the transactions.

"3)" Does not seem to have a large benefit. If we keep the rows separate, then we just have to read that into memory and still do a comparison. The only upside compared to 1, is we don't have to parse capnp proto message to see if it is deleted or not, we can store the file offset and skip that way.

With option 2) We can also have a daemon process that periodically reorganizes a table that is closed, so zero'ed out space that is not a message is removed, truncating the file size.

@Shelnutt2
Copy link
Owner Author

New design:

Deletes will be maintained in a separate file the references the file, and fileoffset (possibly row number). We'll need to look up this for each and every read.

When a delete transaction ends, we will mark the file on disk as "done" and start a new file for all new rows. The idea is to limit the size of files, once we know we have deleted rows. A secondary daemon process will come back by and compact files. Thus we can safely remove old rows async, but not have to store the deletes forever.

We will create n number of files, unless the compact process can start. When we compact, all non-active files will be read and moved into a single larger file.

We need to add support for reading from multiple files, scanning across each one.

This also lays the foundation for online schema upgrades, were we can possibly map the old schema to a new schema, and convert the data ondisk async.

Shelnutt2 added a commit that referenced this issue Aug 23, 2017
Shelnutt2 added a commit that referenced this issue Aug 25, 2017
Shelnutt2 added a commit that referenced this issue Aug 25, 2017
Shelnutt2 added a commit that referenced this issue Sep 3, 2017
Shelnutt2 added a commit that referenced this issue Sep 3, 2017
Shelnutt2 added a commit that referenced this issue Sep 3, 2017
Shelnutt2 added a commit that referenced this issue Sep 3, 2017
This implements deletes in a single file. Initial ground work is layed
for multiple delete and multiple data files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant