# Lung Cancer Post-Translational Modification and Gene Expression Regulation
Lung cancer is a complex disease that is known to be regulated at the post-translational modification level, e.g. phosphorylation driven by kinases. Our collaborators at [Cell Signaling Incorporated](https://www.cellsignal.com/) used Tandem Mass Tagg (TMT) mass spectrometry to measure differential phosphorylation, acetylation, and methylation in a panel of 42 lung cancer cell lines compared to non-cancerous lung tissue. Gene expression data from 37 of these lung cancer cell lines was also independently obtained from the publically available Cancer Cell Line Encyclopedia [(CCLE)](https://portals.broadinstitute.org/ccle/home). 

This post-translational modification (PTM) and gene expression data was pre-processed (normalized, filtered, etc) in the [CST_Data_Processing.ipynb](http://nbviewer.jupyter.org/github/MaayanLab/CST_Lung_Cancer_Viz/blob/master/notebooks/CST_Data_Processing.ipynb) notebook. This notebook will visualize PTM, Expression, and merged PTM-Expression datasets and identify clusters of PTMs/genes for further analysis. 

### Load Data and Clustergrammer-Widget
First, we will make an instance of the [Clustergrammer-PY Network](http://clustergrammer.readthedocs.io/clustergrammer_py.html#clustergrammer-py-api) class that will be used to load, analyze, and visualize our data. For more information see [Clustergrammer-PY API](http://clustergrammer.readthedocs.io/clustergrammer_py.html#clustergrammer-py-api) and [Clustergrammer-Widget](http://clustergrammer.readthedocs.io/clustergrammer_widget.html).

In [2]:
from clustergrammer_widget import *
net = Network(clustergrammer_widget)

net.load_file('../lung_cellline_3_1_16/lung_cl_all_ptm/precalc_processed/CST_CCLE_ptm.txt')

print(net.dat['mat'].shape)

(1730, 37)


# CST Post Translational Modification Lung Cancer Data
Here we will visualize pre-processed PTM data from our collaborators at CST (see [CST_Data_Processing.ipynb](http://nbviewer.jupyter.org/github/MaayanLab/CST_Lung_Cancer_Viz/blob/master/notebooks/CST_Data_Processing.ipynb) for information on data processing). In this dataset, 37 lung cancer cell lines had differential phosphorylation, methylation, and acetylation measured relative to non-cancerous lung tissue. PTM levels were quantile normalized in eahc cell and only PTMs that had less than 7 missing values were included. 

In [3]:
# manually color PTM type and cell line histology
net.set_cat_color('row', 1, 'Data-Type: phospho', 'red')
net.set_cat_color('row', 1, 'Data-Type: Rme1', 'purple')
net.set_cat_color('row', 1, 'Data-Type: AcK', 'blue')
net.set_cat_color('row', 1, 'Data-Type: Kme1', 'grey')
net.set_cat_color('col', 1, 'Histology: SCLC', 'red')
net.set_cat_color('col', 1, 'Histology: NSCLC', 'blue')

net.cluster(views=[])
net.widget()

### Cell Lines Cluster According to Histology and Mutation
We can see that cell lines cluster according to their histology - almost all NSCLC cell lines (blue column category) cluster together (except for H2106) and almost all SCLC cell lines (red column category) cluster together). We can see two high-level clusters of PTMs that have either high/low levels in SCLC cell lines and low/high levels, respectively in NSCLC cell lines. The cluster with high levels in SCLC cell lines mainly composed of phosphorylation, arginine methylation, and lysine acetylation, while the cluster with low levels in SCLC cell lines is almost entierly composed of phosphorylation (red row category).

We also included known mutations in lung cancer cell lines as additional column categories (e.g. mut-TP53). We see that cell lines appear to cluster according to common mutations in: EGFR, KRAS, and RB1. Mutations in these genes may be drivers behing common PTM regulation in these cells. 

### PTMs cluster according to Type

We see that cell lines cluster according to their histology and that PTM-types cluster to some degree based on their modificaiton-type. We also see two large clusters PTMs that have high/low levels in SCLC and NSCLC cell lines, respectively. We will export these clusters using the interactive dendrogram and ``widget_df`` method for further analysis in a different notebook.

In [3]:
# ptm_sclc = net.widget_df()
# ptm_nsclc = net.widget_df()
# ptm_sclc.to_csv('histology_clusters/ptm_sclc.txt', sep='\t')
# ptm_nsclc.to_csv('histology_clusters/ptm_nsclc.txt', sep='\t')

# Gene Expression Data


In [4]:
net.load_file('../lung_cellline_3_1_16/lung_cl_all_ptm/precalc_processed/CST_CCLE_Exp.txt')
net.set_cat_color('row', 1, 'Data-Type: Exp', 'yellow')
net.cluster(views=[])
net.widget()

We see two large clusters of genes that are up- and down-regulated in SCLC and NSCLC cell lines. This is broadly similar to what we see with the PTM data. I will export these clusters, defined at dendrogram level four, to TSVs for further analysis in another notebook.

In [5]:
# exp_sclc = net.widget_df()
# exp_nsclc = net.widget_df()
# exp_sclc.to_csv('histology_clusters/exp_sclc.txt', sep='\t')
# exp_nsclc.to_csv('histology_clusters/exp_nsclc.txt', sep='\t')

We also have expression data for these cell lines and we can see that the cell lines also cluster according to their histology based on this data. 

# Merge PTM and Gene Expression Data
We can merge the two independent data types together to idenfity cell line clusters and measurement clusters that cross data-type.

In [6]:
net.load_file('../lung_cellline_3_1_16/lung_cl_all_ptm/precalc_processed/CST_CCLE_merge.txt')
net.cluster(views=[])
net.widget()

Similarly to the above two heatmaps with PTM and expression data our merged PTM-expression data heatmap shows: cell lines cluster according to their histology, and we have two broad clusters of up- and down-regulated PTMs/genes in SCLC and NSCLC cell ilnes. 

In [7]:
# merge_sclc = net.widget_df()
# merge_nsclc = net.widget_df()
# merge_sclc.to_csv('histology_clusters/merge_sclc.txt', sep='\t')
# merge_nsclc.to_csv('histology_clusters/merge_nsclc.txt', sep='\t')