# Previously reported intragenic IESs

Five genes were previously reported as containing IESs in coding sequences (Hamilton et al., 2016, https://dx.doi.org/10.7554%2FeLife.19090). There are six IESs in total, because one gene overlaps two IESs. Compared to other Tetrahymena IESs, these intragenic IESs were reported to be shorter and precisely excised, with a conserved TTAA junction.

We wish to verify that these previously reported IESs were recovered by BleTIES

In [1]:
import pybedtools as pbt
from Bio import SeqIO

In [8]:
# Import BletIES predictions and gene annotations
# IES predictions from PacBio data
bleties_pb = pbt.BedTool('tthe_pb_clr.milraa_subreads.comb.milraa_ies.gff3')
# IES predictions from Nanopore data
bleties_ont = pbt.BedTool('tthe_ont.milraa_subreads.comb.milraa_ies.gff3')
# Gene annotations from ciliate.org
genepred = pbt.BedTool('ref/2-upd-Genome-GFF3-latest-2.no_manual.gff3')

In [27]:
# Genes containing intragenic IESs
# List from Figure 8 of Hamilton et al. 2016, https://dx.doi.org/10.7554/eLife.19090.024
# One of the locus IDs in the figure was missing a leading 0: TTHERM_00420400
shortlist = ['TTHERM_00420400','TTHERM_00464970', 'TTHERM_00142380', 'TTHERM_00569290', 'TTHERM_00348490']

In [32]:
# Overlaps with PacBio IES predictions
print(
    bleties_pb.intersect(
        genepred.filter(lambda x: x.attrs and 'Name' in x.attrs and x.attrs['Name'] in shortlist)))

chr_013	MILRAA	internal_eliminated_sequence_junction	275467	275467	0.9357	.	.	ID=BREAK_POINTS_SUBREADS_chr_013_275467_136;IES_length=136;IES_subread_coverage=131;IES_zmw_coverage=131;average_subread_coverage=140;average_zmw_coverage=140;pointer_seq=TTAA;ta_pointer_seq=TAA;ta_pointer_start=275468;ta_pointer_end=275468;
chr_015	MILRAA	internal_eliminated_sequence_junction	409851	409851	0.8015	.	.	ID=BREAK_POINTS_SUBREADS_chr_015_409851_455;IES_length=455;IES_subread_coverage=105;IES_zmw_coverage=105;average_subread_coverage=131;average_zmw_coverage=131;pointer_seq=TAA;ta_pointer_seq=TAA;ta_pointer_start=409851;ta_pointer_end=409851;
chr_015	MILRAA	internal_eliminated_sequence_junction	412429	412429	0.8908	.	.	ID=BREAK_POINTS_SUBREADS_chr_015_412429_385;IES_length=385;IES_subread_coverage=106;IES_zmw_coverage=106;average_subread_coverage=119;average_zmw_coverage=119;pointer_seq=TTAA;ta_pointer_seq=TAA;ta_pointer_start=412430;ta_pointer_end=412430;
chr_029	MILRAA	internal_eliminated_sequen

In [33]:
# Overlaps with Nanopore IES predictions
print(
    bleties_ont.intersect(
        genepred.filter(lambda x: x.attrs and 'Name' in x.attrs and x.attrs['Name'] in shortlist)))

chr_013	MILRAA	internal_eliminated_sequence_junction	275467	275467	0.8195	.	.	ID=BREAK_POINTS_SUBREADS_chr_013_275467_133;IES_length=133;IES_subread_coverage=109;IES_zmw_coverage=109;average_subread_coverage=133;average_zmw_coverage=133;pointer_seq=TTAA;ta_pointer_seq=TAA;ta_pointer_start=275468;ta_pointer_end=275468;
chr_015	MILRAA	internal_eliminated_sequence_junction	409850	409850	0.6954	.	.	ID=BREAK_POINTS_SUBREADS_chr_015_409850_450;IES_length=450;IES_subread_coverage=105;IES_zmw_coverage=105;average_subread_coverage=151;average_zmw_coverage=151;pointer_seq=TTAA;ta_pointer_seq=TAA;ta_pointer_start=409851;ta_pointer_end=409851;
chr_015	MILRAA	internal_eliminated_sequence_junction	412429	412429	0.7241	.	.	ID=BREAK_POINTS_SUBREADS_chr_015_412429_381;IES_length=381;IES_subread_coverage=105;IES_zmw_coverage=105;average_subread_coverage=145;average_zmw_coverage=145;pointer_seq=TTAA;ta_pointer_seq=TAA;ta_pointer_start=412430;ta_pointer_end=412430;
chr_029	MILRAA	internal_eliminated_seque

All six previously described intragenic IESs were recovered by BleTIES.

The annotations from Nanopore data appear to be more accurate, because all six have the conserved TTAA tandem repeats at IES-MDS junctions that were previously reported.

BleTIES probably performs better with Nanopore data because of the SPOA aligner/assembler settings.