# Alternative splicing from RNA-seq data

This document shows the use of moudules for Quantification, Quality Control + Normalization for Splicing events analysis, and converting the results to molecular phenotype data in `bed` format. In particular:

1. `molecular_phenotypes/calling/splicing_calling.ipynb`
2. `molecular_phenotypes/QC/splicing_normalization.ipynb`
3. `data_preprocessing/phenotype/gene_annotation.ipynb`

Two tools, leafCutter and Psichomics are used in this splicing analyzing workflow and please check the corresponding modules for code documentation. Various reference data need to be prepared before using this workflow, please check [this module](https://cumc.github.io/xqtl-pipeline/code/data_preprocessing/reference_data.html) to download and prepare the reference data. 

A minimal working example with prepared leafcutter and psichomics input can be download [here](https://drive.google.com/drive/folders/1lpcx3eKG2UpauntLUuJ6bMBjHyIhWW_R). The minimal working example files are publicly available data from the [1000 Genomes Project](https://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/), an international research project with an extensive catalog of human genome variation. For the minimal working example, 3 of 465 unrelated human lymphoblastoid cell lines from the 1000 Genomes Project was selected to produce leafcutter and psichomics example inputs via STAR alignment. For details of the preperation method of the minimal working example please check [this document](https://docs.google.com/document/d/1Gmk8C-zhfQRLceYE9ViGl_JcoSbkJn3jPe-E9y3L-UM/edit).


## Intron usage ratio quantification via `leafCutter`

In [None]:
sos run pipeline/splicing_calling.ipynb leafcutter \
    --cwd leafcutter_output/ \
    --samples  sample_fastq_bam_list\
    --container containers/leafcutter.sif 

## Percent Spliced In (PSI) quantification for alternative splicing events via `Psichomics`

In [None]:
sos run splicing_calling.ipynb psichomics \
    --cwd psichomics_output/ \
    --samples sample_fastq_bam_list\
    --splicing_annotation hg38_suppa.rds \
    --container containers/psichomics.sif

## QC and Normalization of leafCutter outputs

In [None]:
sos run pipeline/splicing_normalization.ipynb leafcutter_norm \
    --cwd leafcutter_output/ \
    --ratios leafcutter_output/sample_list_intron_usage_perind.counts.gz \
    --container containers/leafcutter.sif 

## QC and Normalization of psichomics outputs

In [None]:
sos run pipeline/splicing_normalization.ipynb psichomics_norm\
    --cwd psichomics_output \
    --ratios psichomics_output/psi_raw_data.tsv \
    --container containers/psichomics.sif

## Process leafcutter and psichomics outputs for them to be TensorQTL ready

In [None]:
# code still in testing, to be added