You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I'm not sure if it's ok to ask that question here, but maybe you'd have some good suggestions.
Is there a reason why the default kmer lengths for assembly of 2x150bp reads are 21, 33, 55, 77, i.e. the max. is only 77?
From what I've seen so far it seems that larger kmer length (up to 137 or so for 150bp reads) usually improve contig statistics (max. contig length, N50, etc.). Is there a higher chance for misassemblies when using higher kmer? Unfortunately I couldn't find anything useful on that topic in the literature. Could you eventually comment on how kmer lengths may affect assembly quality?
Thank you!
The text was updated successfully, but these errors were encountered:
The increased contiguity usually comes at a price of correctness. It's very easy to obtain long albeit misassembled contigs. Since k-mer length indicates the size of overlap that is necessary to perform the junction in the de Bruijn graph, the longer k-mer is, the longer stretch of correct nucleotides are necessary to perform such junction.
Low coverage and sequencing errors therefore creates coverage gaps (since one needs at least one correct k-mer to span each position of the genome!) and make de Bruijn graph disconnected. Sometimes this could be "fixed", sometimes – not. As a result, one could easily obtain longer contigs (due to e.g. disconnected repeats) that contain misassemblies.
One should usually seek a balance between k-mer length, coverage, etc. Half of read length is a reasonable default.
Hi,
I'm not sure if it's ok to ask that question here, but maybe you'd have some good suggestions.
Is there a reason why the default kmer lengths for assembly of 2x150bp reads are 21, 33, 55, 77, i.e. the max. is only 77?
From what I've seen so far it seems that larger kmer length (up to 137 or so for 150bp reads) usually improve contig statistics (max. contig length, N50, etc.). Is there a higher chance for misassemblies when using higher kmer? Unfortunately I couldn't find anything useful on that topic in the literature. Could you eventually comment on how kmer lengths may affect assembly quality?
Thank you!
The text was updated successfully, but these errors were encountered: