-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NNNNN reads #8
Comments
Hi Johannes, That is curious. Could you please post example SAM records for one read that shows this behaviour and one that does not? What happens if you filter secondary alignments? |
Hi Adam, I added the secondary alignment filtering step |
Cool, that makes sense. When teloclip writes out the overhang sequence it is drawing on the read sequence stored in the SAM alignment record, this is only present for primary alignments. The choice to discard secondary alignments was based on the assumption that sub-telomeric sequences are generally repetitive and therefore long-reads (containing telomeric repeats in the soft-clipped overhang) may also have secondary alignments to other chromosome/contig ends. It may be worthwhile checking where the primary alignments are for some of those overhanging secondary alignments - do those reads show up at the end of another contig? I've corrected the hex code for filtering non-primary alignments, it should be 0x100. Previous code was also to catch PE reads where the mate was not mapped - doesn't apply to long reads. |
Hi Adam, I just ran it with the |
Can you please post a few example SAM records so I can figure out what's going on and update the docs accordingly? |
Sure, if you can tell me how to do that :-) I just run the command above and get the bam file. |
For a contig where you know that there are overhanging reads with telomeric repeats on at least one end you can extract those alignments like this: Align reads to reference + sort: Index on position: Filter for reads only on "Chr1" from position 1 to some number slightly longer than your longest read: If you can post that final file + a fasta file with just the target contig (i.e. Chr1 in the example) I'll take a look. Can also email me if you don't want to post data. |
Hi Adam, looks like I grabbed the wrong bam file :-) I ran it again with 0x100 and 0x2308 and the results are almost identical. No 'NNNN' reads in either of them. It's only when I omit that step that the 'NNNN' reads creep in. False alarm :-) |
Hi Adam,
love the tool, but came across something strange.
I ran the tool like this:
When I look at the bam files I get a lot of these NNNNNNN reads as in the below image mapping to contig ends.
But when I go and look that read name up in the actual fasta reads file, they are proper reads without any Ns.
Any idea where that might come from?
Cheers
The text was updated successfully, but these errors were encountered: