# Exploring the role of ESRP1 expression in prostate cancer

In this notebook, we will explore the role of ESRP1 expression in prostate cancer, where it is commonly amplified and correlated with worsened prognosis. We will obtain splicing quantification across the TCGA-PRAD cohort using data from [TCGASpliceSeq](https://bioinformatics.mdanderson.org/TCGASpliceSeq/), and project PTMs onto the splice events that were identified by SpliceSeq. We will then explore the various ways ESRP1 expression may drive changes through changes to PTM inclusion and flanking sequences. The analysis here corresponds to Figures 4 and 5 of our [manuscript](https://www.biorxiv.org/content/10.1101/2024.01.10.575062v2)

This notebook is divided into the following sections:
1. Load ESRP1 expression data from [CBioPortal](https://www.cbioportal.org/)
2. Project PTMs onto splice events and identify events that are correlated with ESRP1 expression
3. Explore the functional consequence of ESRP1-correlated PTMs

## Load ESRP1 expression data from CBioPortal

While this is not a part of PTM-POSE, in order to explore the role of ESRP1 expression in prostate cancer, we first need to know which patients are express high or low levels of ESRP1. We can do this directly through [CBioPortal's API]() (which requires the bravado python package). Alternatively, you can choose to download the data from the [CBioPortal website](https://www.cbioportal.org/), and upload it here.

In [1]:
from bravado.client import SwaggerClient
import pandas as pd

#initialize swagger client
cbioportal = SwaggerClient.from_url('https://www.cbioportal.org/api/v2/api-docs',
                            config={"validate_requests":False,"validate_responses":False,"validate_swagger_spec": False})

for a in dir(cbioportal):
    cbioportal.__setattr__(a.replace(' ', '_').lower(), cbioportal.__getattr__(a))

# ESRP1 Entrez Gene ID = 54845
gene_id = 54845

#download rna sequencing data for ESRP1
study_id = 'prad_tcga_pan_can_atlas_2018'
expression_data = cbioportal.Molecular_Data.getAllMolecularDataInMolecularProfileUsingGET(molecularProfileId = study_id + '_rna_seq_v2_mrna',
                                                                    sampleListId = study_id + '_all', entrezGeneId = gene_id).result()
#extract expression data and normalize by z-score
sample_id = [samp.sampleId for samp in expression_data]
rsem = [samp.value for samp in expression_data]
rsem = pd.Series(rsem, index = sample_id)
rsem_zscore = (rsem - rsem.mean())/rsem.std()

#extract high and low patients (absolute z-score > 1)
high_patients = rsem_zscore[rsem_zscore > 1].index
low_patients = rsem_zscore[rsem_zscore < -1].index

## Project PTMs onto splice events and identify events that are correlated with ESRP1 expression

In [14]:
from ptm_pose import project
import pandas as pd

#load data from TCGASpliceSeq
psi_data = pd.read_csv('../../../TCGA/Data/PRAD/TCGA_SpliceSeq/PSI_download_PRAD.txt', sep = '\t')
splicegraph = pd.read_csv('../../../TCGA/Data/TCGASpliceData.txt', sep = '\t')

#identifying TCGA columns containing patient PSI data
patient_columns = [col for col in psi_data.columns if 'TCGA' in col]

psi_data, spliced_ptms = project.project_ptms_onto_SpliceSeq(psi_data, splicegraph = splicegraph, extra_cols = patient_columns)

Removing ME events from analysis
Projecting PTMs onto SpliceSeq data


Projecting PTMs onto splice events using hg19 coordinates.: 100%|██████████| 62861/62861 [16:07:33<00:00,  1.08it/s]        


PTMs projection successful (76363 identified).



## Functional consequence of ESRP1-correlated PTMs

### Gene Set Enrichment Analysis

### Exon Ontology Analysis

### Protein Interaction Network Analysis

### Flanking sequences that alter protein interactions

## Kinases impacted by splicing

### Known kinase substrates 

### Predicted differentially included kinase substrates

### Altered kinase interactions by changed flanking sequences