NNNNN reads #8

JWDebler · 2021-04-15T01:58:12Z

Hi Adam,
love the tool, but came across something strange.

I ran the tool like this:

minimap2 -ax map-ont P9424_final.fasta ../P9424.correctedReads.fasta.gz | teloclip --ref P9424_final.fasta.fai  | samtools sort > P9424_teloclip.bam

When I look at the bam files I get a lot of these NNNNNNN reads as in the below image mapping to contig ends.
But when I go and look that read name up in the actual fasta reads file, they are proper reads without any Ns.
Any idea where that might come from?

Cheers

The text was updated successfully, but these errors were encountered:

Adamtaranto · 2021-04-15T02:57:57Z

Hi Johannes,

That is curious. Could you please post example SAM records for one read that shows this behaviour and one that does not?

What happens if you filter secondary alignments?

JWDebler · 2021-04-16T00:55:18Z

Hi Adam, I added the secondary alignment filtering step samtools view -h -F 0x2308 and that seems to have taken care of the problem :-) Cheers.

Adamtaranto · 2021-04-16T02:20:32Z

Cool, that makes sense. When teloclip writes out the overhang sequence it is drawing on the read sequence stored in the SAM alignment record, this is only present for primary alignments.

The choice to discard secondary alignments was based on the assumption that sub-telomeric sequences are generally repetitive and therefore long-reads (containing telomeric repeats in the soft-clipped overhang) may also have secondary alignments to other chromosome/contig ends. It may be worthwhile checking where the primary alignments are for some of those overhanging secondary alignments - do those reads show up at the end of another contig?

I've corrected the hex code for filtering non-primary alignments, it should be 0x100. Previous code was also to catch PE reads where the mate was not mapped - doesn't apply to long reads.

JWDebler · 2021-04-19T08:37:33Z

Hi Adam,

I just ran it with the 0x100 hex code, and the 'NNNNN' reads are back. 0x2308 removes them.

Adamtaranto · 2021-04-19T09:03:17Z

Can you please post a few example SAM records so I can figure out what's going on and update the docs accordingly?

JWDebler · 2021-04-19T23:27:16Z

Sure, if you can tell me how to do that :-) I just run the command above and get the bam file.

Adamtaranto · 2021-04-20T03:48:33Z

For a contig where you know that there are overhanging reads with telomeric repeats on at least one end you can extract those alignments like this:

Align reads to reference + sort:
minimap2 -ax map-ont P9424_final.fasta ../P9424.correctedReads.fasta.gz | samtools sort > P9424_sorted.bam

Index on position:
samtools index P9424_sorted.bam

Filter for reads only on "Chr1" from position 1 to some number slightly longer than your longest read:
samtools view P9424_sorted.bam "Chr1:1-500000" > Chr1_leftend.bam

If you can post that final file + a fasta file with just the target contig (i.e. Chr1 in the example) I'll take a look. Can also email me if you don't want to post data.

JWDebler · 2021-04-20T04:59:01Z

Hi Adam, looks like I grabbed the wrong bam file :-) I ran it again with 0x100 and 0x2308 and the results are almost identical. No 'NNNN' reads in either of them. It's only when I omit that step that the 'NNNN' reads creep in. False alarm :-)

Adamtaranto closed this as completed Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NNNNN reads #8

NNNNN reads #8

JWDebler commented Apr 15, 2021

Adamtaranto commented Apr 15, 2021

JWDebler commented Apr 16, 2021

Adamtaranto commented Apr 16, 2021

JWDebler commented Apr 19, 2021

Adamtaranto commented Apr 19, 2021

JWDebler commented Apr 19, 2021

Adamtaranto commented Apr 20, 2021

JWDebler commented Apr 20, 2021

NNNNN reads #8

NNNNN reads #8

Comments

JWDebler commented Apr 15, 2021

Adamtaranto commented Apr 15, 2021

JWDebler commented Apr 16, 2021

Adamtaranto commented Apr 16, 2021

JWDebler commented Apr 19, 2021

Adamtaranto commented Apr 19, 2021

JWDebler commented Apr 19, 2021

Adamtaranto commented Apr 20, 2021

JWDebler commented Apr 20, 2021