Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kmer lengths #215

Closed
thr44pw00d opened this issue Nov 26, 2018 · 2 comments
Closed

kmer lengths #215

thr44pw00d opened this issue Nov 26, 2018 · 2 comments

Comments

@thr44pw00d
Copy link

Hi,
I'm not sure if it's ok to ask that question here, but maybe you'd have some good suggestions.
Is there a reason why the default kmer lengths for assembly of 2x150bp reads are 21, 33, 55, 77, i.e. the max. is only 77?
From what I've seen so far it seems that larger kmer length (up to 137 or so for 150bp reads) usually improve contig statistics (max. contig length, N50, etc.). Is there a higher chance for misassemblies when using higher kmer? Unfortunately I couldn't find anything useful on that topic in the literature. Could you eventually comment on how kmer lengths may affect assembly quality?
Thank you!

@asl
Copy link
Member

asl commented Dec 1, 2018

The increased contiguity usually comes at a price of correctness. It's very easy to obtain long albeit misassembled contigs. Since k-mer length indicates the size of overlap that is necessary to perform the junction in the de Bruijn graph, the longer k-mer is, the longer stretch of correct nucleotides are necessary to perform such junction.

Low coverage and sequencing errors therefore creates coverage gaps (since one needs at least one correct k-mer to span each position of the genome!) and make de Bruijn graph disconnected. Sometimes this could be "fixed", sometimes – not. As a result, one could easily obtain longer contigs (due to e.g. disconnected repeats) that contain misassemblies.

One should usually seek a balance between k-mer length, coverage, etc. Half of read length is a reasonable default.

@asl asl closed this as completed Dec 1, 2018
@thr44pw00d
Copy link
Author

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants