-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternatives for .skf format #17
Comments
Key-value store (database) options: Probably faster: |
Initial trial with 28 samples and ~3M k-mers
Reading all merge ska array k-mer and vars in (fill) and changing one variant (modify) Hash fill: 284ms modify: 95ms With sled compression |
If doing this kind of approach, using a BTreeMap and reading/writing blocks of given range (and probably async) would be better. Would also want to deserialise without buffering, see: https://serde.rs/stream-array.html. Some difficulties here are that values are not likely to be equally spaced in the btree, and doing blocked reads with serde isn't easily supported. A I think the algorithm for build_and_merge would be something more like:
Two more things to try first:
|
The first, and simplest, change to make would be to combine the read and merge steps as they are. i.e. inside the append loop also do the build there. That keeps the current parallelism approach |
Using a memmap odht works and is fast:
memmap fill: 199ms modify: 234ms But:
Basically memory map seems to be the wrong choice when writing the whole file. If it could be changed to be a BTree of blocks it might work. Going to leave this for now |
https://github.com/wspeirs/btree looks like it might work here |
Look at blocked/distributed hash maps
The text was updated successfully, but these errors were encountered: