Speeding up malva using more threads #8

ivargr · 2021-05-05T11:22:40Z

First, thanks for a very nice tool!

I have a couple of questions:

Is it possible to speed up Malva using more threads? I know that KMC easily can use more threads, meaning that the first step of Malva (getting kmers from the reads) can be parallelized, but can malva-geno (computing the signatures and performing the genotyping) also be parallelized in any way?
Assume I want to genotype many samples. I guess that one could potentially save a lot oftime by not computing the variant signatures and reference kmers for each sample (since these should be the same, given the input variants). It seems now that malva-geno does this every time a new sample is genotyped. Is it possible to re-use this data so that genotyping of new samples would become faster?

Thanks!

mpre · 2021-05-05T19:14:05Z

Hi @ivargr, thanks for your comments, we are glad you're testing our tool.

Both requests are sensible but unfortunately they are not part of the current version of malva.

The first point might need some major rework and it's not to be expected in the foreseeable future.

Regarding the second point, I just pushed a new branch to this repository that tries to solve it (branch split_main). Using that branch you can now use malva-geno index to create the index of your reference genome/vcf and then malva-geno call to produce the output vcf that includes the genotypes. You still need to pass all the arguments to both steps since I didn't rework the interface yet (arguments that don't affect the index/call step will simply be ignored). If you use the MALVA script it will also check if the index is available already and, if so, skip the indexing step.

This version is not published on bioconda yet and it will probably take a while before we will be able to clean up the code and push it to bioconda, so you'll need to compile it yourself and take care of the dependecies.

Finally, I want to stress that this version is experimental, I tested it on the example we provide in the repo and it works but I didn't check `whether there's some performance hit on big datasets or if it breaks in some edge cases. The index might also not be portable since it's basically a serialization of the in-memory index.

ivargr · 2021-05-06T14:37:56Z

Thanks a lot for the quick response!

I've tested the new branch and it seems to work great on the data I have tried it with!

No worries about Malva not being able to multithread fully for now, I was mostly just curious on whether was possible or not, but it would be a cool potential improvement in a future version of Malva :)

mpre self-assigned this May 5, 2021

mpre added enhancement New feature or request question Further information is requested labels May 5, 2021

mpre linked a pull request May 5, 2021 that will close this issue

Split main #9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up malva using more threads #8

Speeding up malva using more threads #8

ivargr commented May 5, 2021

mpre commented May 5, 2021 •

edited

Loading

ivargr commented May 6, 2021

Speeding up malva using more threads #8

Speeding up malva using more threads #8

Comments

ivargr commented May 5, 2021

mpre commented May 5, 2021 • edited Loading

ivargr commented May 6, 2021

mpre commented May 5, 2021 •

edited

Loading