Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Why does count_kmers not return k-mers that are split between two records? #930
I ran count_kmers (like in the first example in the README.md), and the .sam file in input contains two consecutive records like:
However, when I ask for the 10-mers, I don't get for instance
Is it intentional?
Hi @YPares !
Our k-mer counters will only count k-mers contained entirely in a single read or contig. We do this because two reads come from different DNA fragments, and thus the k-mers gained by merging two reads together (e.g., the
Let me know if this was unclear and I can put together a more concrete example.
Thanks for your answer @fnothaft! A follow-up question: is it the case only for count_kmers? I mean is that kind of treatment (processing data spreaded over two reads) done elsewhere in ADAM or is it a general behaviour in the framework?