## Data access and Filtering
Genes can be accessed by gene_name/gene_id.
Further, we can iterate over genes/ transcripts, and filter them by their properties, genomic location or coverage. 
To this end, IsoTools implements a query syntax based on tags, which are defined by expressions.


This tutorial depends on the transcriptome file PacBio_isotools_substantial_isotools.pkl, which can be obtained with this [download link](https://oc-molgen.gnz.mpg.de/owncloud/s/gjG9EPiQwpRAyg3)

In [1]:
from isotools import Transcriptome
import matplotlib.pyplot as plt

path='demonstration_dataset'
isoseq=Transcriptome.load(f'{path}/PacBio_isotools_substantial_isotools.pkl')

In [2]:
g=isoseq['RIPK2']
print(g)

Gene RIPK2 chr8:89757805-89791064(+), 4 reference transcripts, 52 expressed transcripts


In [3]:
for gene, tr_id, tr in isoseq.iter_transcripts(query='SUBSTANTIAL and NOVEL_EXON and not (RTTS or INTERNAL_PRIMING)', min_coverage=10):
    print(f'{gene.name} {tr_id}: {tr["annotation"][1]}')

ENSG00000253853 3: {'novel exon': [[2799587, 2799883]]}
ZHX1-C8orf76 1: {'readthrough fusion': [('C8orf76', array([0, 1]))], 'novel exon': [[123231299, 123231757]]}
ENSG00000253666 9: {'novel intronic TSS': [[100445517, 100445578]], 'novel exon': [[100446307, 100446368], [100451990, 100452300], [100456176, 100456279]]}
GPAT4 10: {'novel exon': [[41596747, 41596878]]}
TPD52 10: {'novel exon': [[80109827, 80109935]]}
CHMP7 34: {'readthrough fusion': [('TNFRSF10A-DT', array([0, 1, 2]))], 'novel exon': [[23246255, 23246994]]}
SLC25A32 2: {'novel exon': [[103413422, 103413531]]}
CA8 32: {'novel exon': [[60222307, 60222357]]}
SLC39A14 3: {'novel exon': [[22390729, 22390847]]}


In [4]:
#these filter tags are predefined
for context in isoseq.filter:
    print(f'* {context}: {", ".join(list(isoseq.filter[context]))}')
    

* gene: NOVEL_GENE, EXPRESSED, CHIMERIC
* transcript: INTERNAL_PRIMING, RTTS, NONCANONICAL_SPLICING, NOVEL_TRANSCRIPT, FRAGMENT, UNSPLICED, MULTIEXON, SUBSTANTIAL, ANTISENSE, INTERGENIC, GENIC_GENOMIC, NOVEL_EXONIC_PAS, NOVEL_INTRONIC_PAS, READTHROUGH_FUSION, NOVEL_EXON, NOVEL_3_SPLICE_SITE, INTRON_RETENTION, NOVEL_5_SPLICE_SITE, EXON_SKIPPING, NOVEL_COMBINATION, NOVEL_INTRONIC_TSS, NOVEL_EXONIC_TSS, MONO_EXON, NOVEL_JUNCTION, _5_FRAGMENT, _3_FRAGMENT, INTRONIC, FSM, ISM, NIC, NNC, NOVEL, COVERED, HIGH_COVERED
* reference: REF_UNSPLICED, REF_MULTIEXON, REF_INTERNAL_PRIMING
