# FICLE vignette

This vignette is a demonstration of how to run FICLE using the demo data from the input folder to generate the files in the output folder. The input files are a subset of the data generated from an AD mouse model (Leung et al. 2023).

### Load modules and FICLE

In [1]:
##------------- Load modules -----------
import sys
import os
from argparse import Namespace

##------------- Load ficle -----------
rootPath = os.path.join(os.path.dirname(os.path.realpath('__file__')) , "..")
sys.path.insert(1, rootPath)
import ficle as ficle

### Arguments

**Mandatory**    
`--genename` = gene to characterise  
`--reference` = reference gtf   
`--input_bed` = sorted bed12 file of final set of isoforms     
`--input_gtf` = gtf file of final set of isoforms (from SQANTI)  
`--input_class` = SQANTI classification file   
`--output_dir` = output directory  

**Optional**  
`--cpat` =  ORF_prob.best.tsv generated from CPAT  

### Characterise Trem2

In [2]:
inputDir = os.path.dirname(os.path.realpath('__file__')) + "/0_input/"
outputDir = os.path.dirname(os.path.realpath('__file__')) + "/0_results/"
args = Namespace(
    genename = "Trem2",
    reference = inputDir + "subsetted_gencode_vM22_annotation.gtf",
    input_bed = inputDir + "input_sorted.bed12",
    input_gtf = inputDir + "input.gtf",
    input_class = inputDir + "input_classification.txt",
    cpat = None,
    output_dir = outputDir
)

ficle.annotate_gene(args)

Subsetting Trem2 from /lustre/projects/Research_Project-MRC148213/sl693/scripts/FICLE/vignette/0_input/subsetted_gencode_vM22_annotation.gtf
Working with mouse dataset
Not using ORF for classification
**** Extracting gtf
Converting gtf file from SQANTI: PB gene id to gene name
Using /lustre/projects/Research_Project-MRC148213/sl693/scripts/FICLE/vignette/0_input/input_classification.txt
Total number of isoforms: 211
**** Extracting for transcripts associated with: Trem2
Number of detected transcripts: 97
Parsing through transcript 0
Parsing through transcript 50
Number of Transcripts with all exons and exact match: 0
Number of Transcripts with exact match but not all exons: 11
Tabulating exon presence
Number of transcripts with number of exons
Counter({5: 52, 4: 24, 3: 12, 1: 4, 2: 4, 6: 1})
Processing transcripts for exon skipping
Identifying transcripts with novel exons
Number of unique novel exons: 13
Total Number of transcripts with novel exon:  16
Identifying transcripts with intr

### Characterise Rhbdf2

*Note the only difference in the below argument from the above is the `--genename`.*

In [3]:
inputDir = os.path.dirname(os.path.realpath('__file__')) + "/0_input/"
outputDir = os.path.dirname(os.path.realpath('__file__')) + "/0_results/"
args = Namespace(
    genename = "Rhbdf2",
    reference = inputDir + "subsetted_gencode_vM22_annotation.gtf",
    input_bed = inputDir + "input_sorted.bed12",
    input_gtf = inputDir + "input.gtf",
    input_class = inputDir + "input_classification.txt",
    cpat = None,
    output_dir = outputDir
)

ficle.annotate_gene(args)

Subsetting Rhbdf2 from /lustre/projects/Research_Project-MRC148213/sl693/scripts/FICLE/vignette/0_input/subsetted_gencode_vM22_annotation.gtf
Working with mouse dataset
Not using ORF for classification
**** Extracting gtf
Converting gtf file from SQANTI: PB gene id to gene name
Using /lustre/projects/Research_Project-MRC148213/sl693/scripts/FICLE/vignette/0_input/input_classification.txt
Total number of isoforms: 211
**** Extracting for transcripts associated with: Rhbdf2
Number of detected transcripts: 18
Parsing through transcript 0
Number of Transcripts with all exons and exact match: 0
Number of Transcripts with exact match but not all exons: 9
Tabulating exon presence
Number of transcripts with number of exons
Counter({18: 3, 12: 1, 13: 1, 9: 1, 6: 1, 5: 1, 17: 1, 11: 1, 10: 1, 14: 1, 4: 1, 15: 1, 8: 1, 2: 1, 19: 1, 7: 1})
Processing transcripts for exon skipping
Identifying transcripts with novel exons
no novel exons
Identifying transcripts with intron retention
No transcripts wi