Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split partitions into blocks/chunks #122

Open
albe opened this issue Jul 19, 2020 · 2 comments
Open

Split partitions into blocks/chunks #122

albe opened this issue Jul 19, 2020 · 2 comments
Labels
enhancement P: Storage Affects the storage layer postponed

Comments

@albe
Copy link
Owner

albe commented Jul 19, 2020

For some use-cases it might be preferable to have partitions split into multiple ordered blocks.
The most important use-case is archiving (or even deletion) of old data. Right now, this would be achieved by creating a new partition (=stream), write a "consolidated event" (or snapshot) and then keep working on that new stream.
Another use-case is scanning a partition for the latest document without an index, e.g. during auto-repair (see #107). If the partition is stored in a single file, the whole partition needs to be scanned because only forward reading is supported by the file format. If the partition is chunked though, only the last chunk needs to be scanned, which is likely orders faster.
It would also potentially help in replication logic, since only a small file would need "hot" replication.

A simple solution would be to just append a starting partition offset to the filename (e.g. partition-foo.0, partition-foo.4096, etc.) and then start a new file whenever either a configured amount of documents or chunk size is reached. Reads can then easily find the correct file to read from by searching for the chunk that has filename-offset < position < filename-offset + filesize (- header) - the list of chunks could also be pre-scanned and indexed on opening the partition.
The drawback is that the file handle would need to be switched multiple times, so there is more file close/open involved, especially when scanning the whole partition (=projection rebuild).

This could be made fully b/c if a missing filename offset is interpreted as 0. Potentially old chunks could then even be merged together to reduce the amount of files per partition again, though an evergrowing partition (stream) is a bad pattern.
Also the chunking should be easily turned off with the configuration.

{
  "chunkSize": 4096,
  "chunkDocuments": 100
}

=> start a new chunk every 4kb or 100 documents, whatever happens first.

{
   "chunkSize": 0,
   "chunkDocuments": 0
}

=> disable chunking (=current behaviour)

@albe
Copy link
Owner Author

albe commented Dec 22, 2020

The amount of file handle switches could be greatly reduced by a logic, that only keeps a fixed amount of (gradually smaller) files per partition. E.g. the first partition chunk (0) will always keep growing, the second chunk (1) is 4MB, the third (2) 2MB and the most current one (3) at most 1MB. Once the current chunk is full, chunk (1) will be appended onto (0), (2) becomes the new (1) and gets (3) appended. The new "current" chunk is now (2), until it reaches 1MB, which will merge it into (1).

The most simple version of that logic is having only exactly two chunks per partition. A "hot" and a "cold" one. Size of the "hot" chunk is configurable like above and will hence start at 0 once it is full and is merged into "cold". A 0 size "hot" chunk also means the last commit finished.

The question then becomes when and how often to actually merge chunks, because that is a disruptive operation (can not read from the chunk to be merged during the time it is appended to the main chunk - there's always a small window where either an event exists in two chunks and could be read twice or reading is blocked).

@albe
Copy link
Owner Author

albe commented Feb 9, 2021

Either one of two problems needs to be solved to make this viable:

  • deal with an ever growing amount of files
  • deal with the inconsistency of reading during merge

the former could be slightly alleviated by having a folder structure for each storage/partition/, which only puts a limit on the amount of files per partition. A clever sub-folder structure would make this even more scalable, at the cost of making watching a partition more costly.

@albe albe added the postponed label Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement P: Storage Affects the storage layer postponed
Projects
None yet
Development

No branches or pull requests

1 participant