New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some questions about the article #1
Comments
Hi @shelkmike, Thanks for your interest and questions.
Second, it is challenging to speculate on the custom best settings for Minimap2 because there are also many other parameters that could affect accuracy, performance, and memory usage. For example, map-pb uses -w10 while map-hifi uses -w19 along with many other options that are set differently than how map-pb sets. Also overlapping seems to use the half of the window length that is used for read mapping (e.g., ava-pb -w5 and map-pb uses -w10). We also tried to find a good answer to the following question when we were designing our experiments: Is it better to use the parameter settings as suggested in map-hifi while making it suitable for finding overlapping reads (i.e., also using the options -X -e0 -m100)? We tried using half of the original map-hifi window length (-w10) and also the original window length as suggested by map-hifi (-w19). What we observe was the following: when we use the map-hifi settings along with -X -e0 -m100 for finding overlapping reads, we observe Minimap2 performs 1.2x - 4x faster than using ava-pb (still much slower than BLEND) with the cost of loss of information in the PAF file and reduced accuracy in the assembly. Third, we use window length to the level as much as -w500 (and potentially even higher) with the ability of combining many neighbor k-mers (e.g., 100 neighbor k-mers as in -x map-hifi --genome human) not to lose from the accuracy. It is not implementation-wise possible to increase the window length more than 256 in the original implementation of Minimap2. Perhaps, we could contact Heng Li and have his opinion on the suggested parameter settings for finding overlapping HiFi reads with Minimap2. We will update our experiments accordingly if we can receive a suggestion from him. I would also appreciate any pointer to a similar discussion where Heng Li provides some suggestions for overlapping HiFi reads. I would also like to clarify that we use the default settings for HiFi reads when there is available (i.e., we use map-hifi for mapping HiFi reads with Minimap2).
"We use miniasm because it does not perform error correction when generating de novo assemblies, which allows us to directly assess the quality of overlaps without using additional approaches for improving the accuracy of assemblies." We definitely agree that miniasm needs assembly polishing to generate higher quality assemblies. It may potentially be true that the the final polished assembly may have a high accuracy such that the accuracy of the initial draft assembly does not matter at all. However, we also note that this depends on the coverage of the read set, assembly polishing tool, read mapper used for generating the input for most assembly polishing tools, and probably several more other reasons. Then, the question potentially may become: what is the coverage that BLEND and Minimap2 requires to achieve 99.9% accuracy, if they both end up generating such a good accuracy after assembly polishing? An answer for such a question may again be implied from the initial draft assemblies without any error correction. For these reasons, we currently do not consider including assembly polishing in our experiments.
We have not thoroughly tested BLEND with ONT reads. We believe the current parameter settings should be good enough for ONT reads but it is still not confirmed that they will work better than Minimap2. In short, we believe BLEND is best fit for PacBio HiFi reads based on the results we show in our paper. Best, Can Firtina |
Could you please answer some questions about the article (https://arxiv.org/pdf/2112.08687.pdf):
With best wishes,
Mikhail Schelkunov
The text was updated successfully, but these errors were encountered: