# Plasma vs PMA Phosphorylation Response
White blood cells are a key component of the immune system. Our collaborators at the Icahn School of Medicine Immune Core used 


PMA (phorbol 12-myristate 13-acetate) is a tumor promoter and activator of protein kinase C (PKC) (see [Wiki](https://en.wikipedia.org/wiki/12-O-Tetradecanoylphorbol-13-acetate)). Below we will visualize the effects of PMA treatment on phosphorylation levels (10 phosphorylation markers) in distinct cell types (determined by clustering based on surface markers). 

We will begin by loading the data, selecting equal sized subsets from each dataset, combining the Plasma and PMA datasets, and normalizing the phosphorylation marker columns (using Z-score). This will give us a matrix with 110,000 plasma treated cells and 110,000 PMA treated cells. 

In [1]:
import pandas as pd
from clustergrammer_widget import *
net = Network(clustergrammer_widget)

In [2]:
# Plasma Treated
#################
net.load_file('../cytof_data/Plasma_UCT.txt')
# subsample data so that both treatments have the same number of cells
net.random_sample(axis='row', num_samples=110000, random_state=99)
df_plasma = net.export_df()

# PMA Treated
###############
net.load_file('../cytof_data/PMA_UCT.txt')
# subsample data so that both treatments have the same number of cells
net.random_sample(axis='row', num_samples=110000, random_state=99)
df_pma = net.export_df()

Here we will use Pandas to concatenate the Plasma and PMA datasets into one 220,000 row (cell) matrix. 

In [3]:
df_merge = pd.concat([df_plasma, df_pma])
print(df_merge.shape)

(220000, 28)


### Set Cell Type Colors
Here we will manually set the cell type colors so they will be consistent across all visualizations. We have 16 unique cell types defined. 

In [4]:
# manually set treatment colors
net.set_cat_color('col', 1, 'Marker-type: phospho marker', 'red')
net.set_cat_color('col', 1, 'Marker-type: surface marker', 'blue')

# manually set row colors: downsample
net.set_cat_color('row', 2, 'Majority-Category: B cells', '#22316C')
net.set_cat_color('row', 2, 'Majority-Category: Basophils', '#000033')
net.set_cat_color('row', 2, 'Majority-Category: CD14hi monocytes', 'yellow')
net.set_cat_color('row', 2, 'Majority-Category: CD14low monocytes', '#93b8bf')
net.set_cat_color('row', 2, 'Majority-Category: CD1c DCs', '#3636e2')
net.set_cat_color('row', 2, 'Majority-Category: CD4 Tcells', 'blue')
net.set_cat_color('row', 2, 'Majority-Category: CD4 Tcells_CD127hi', '#FF6347')
net.set_cat_color('row', 2, 'Majority-Category: CD4 Tcells CD161hi', '#F87531')
net.set_cat_color('row', 2, 'Majority-Category: CD4 Tcells_Tregs', '#8B4513')
net.set_cat_color('row', 2, 'Majority-Category: CD4 Tcells+CD27hi', '#330303')
net.set_cat_color('row', 2, 'Majority-Category: CD8 Tcells', '#ffb247')
net.set_cat_color('row', 2, 'Majority-Category: Neutrophils', 'purple')
net.set_cat_color('row', 2, 'Majority-Category: NK cells_CD16hi', 'red')
net.set_cat_color('row', 2, 'Majority-Category: NK cells_CD16hi_CD57hi', 'orange')
net.set_cat_color('row', 2, 'Majority-Category: NK cells_CD56hi', '#e052e5')
net.set_cat_color('row', 2, 'Majority-Category: Undefined', 'gray')

# manually set row colors: subsample
net.set_cat_color('row', 2, 'B cells', '#22316C')
net.set_cat_color('row', 2, 'Basophils', '#000033')
net.set_cat_color('row', 2, 'CD14hi monocytes', 'yellow')
net.set_cat_color('row', 2, 'CD14low monocytes', '#93b8bf')
net.set_cat_color('row', 2, 'CD1c DCs', '#3636e2')
net.set_cat_color('row', 2, 'CD4 Tcells', 'blue')
net.set_cat_color('row', 2, 'CD4 Tcells_CD127hi', '#FF6347')
net.set_cat_color('row', 2, 'CD4 Tcells CD161hi', '#F87531')
net.set_cat_color('row', 2, 'CD4 Tcells_Tregs', '#8B4513')
net.set_cat_color('row', 2, 'CD4 Tcells+CD27hi', '#330303')
net.set_cat_color('row', 2, 'CD8 Tcells', '#ffb247')
net.set_cat_color('row', 2, 'Neutrophils', 'purple')
net.set_cat_color('row', 2, 'NK cells_CD16hi', 'red')
net.set_cat_color('row', 2, 'NK cells_CD16hi_CD57hi', 'orange')
net.set_cat_color('row', 2, 'NK cells_CD56hi', '#e052e5')
net.set_cat_color('row', 2, 'Undefined', 'gray')

# manually set treatment colors
net.set_cat_color('row', 1, 'Majority-Treatment: Plasma', 'blue')
net.set_cat_color('row', 1, 'Majority-Treatment: PMA', 'red')

net.set_cat_color('row', 1, 'Treatment: Plasma', 'blue')
net.set_cat_color('row', 1, 'Treatment: PMA', 'red')

# Plasma vs PMA Phosphorylation Subsample View
Since we cannot directly visualize a 220,000 row matrix using Clustergrammer we will try two approaches to visualize the data: subsampling and downsampling. First, we will use subsampling which will randomly select 2000 cells out of the 220,000 cells from our combined dataset. 

In [5]:
net.load_df(df_merge)
net.filter_cat('col', 1, 'Marker-type: phospho marker')
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.random_sample(axis='row', num_samples=2000, random_state=99)
net.clip(-10, 10)
net.cluster(views=[])
net.widget()

Above we see that we have a roughly equal number of Plasma and PMA treated cells, which is expected since we randomly selected cells from our combined dataset. We see that PMA treated cells tend to cluster separately from Plasma treated cells regardless of cell type. From the bottom cluster of PMA treated cells it is clear that PMA treatment increases the following phosphorylations:
* pCREB
* pMAPKAP2
* pERK1 2
* pCREB

There is some clustering based on cell type and the most obvious is the cluster of CD14hi monocytes (yellow) cells at the bottom of the heatmap with high levels of the above four phosphorylations. 

We can also find associations between cell types and specific phosphorylations. For instance, reordering the rows based on pCREB levels shows that CD14hi monocytes are among the cells with the highest pCREB4 levels.

# Plasma vs PMA Phosphorylation Downsample View
To obtain a more global view we can perform K-means downsampling of the cell lines. Below we have performed K-means clustering with 2000 clusters. K-means clusters similar cells together, which can help us preserve rare populations while preventing large and homogeneous populations from overpowering the visualization. 

In [6]:
net.load_df(df_merge)
net.filter_cat('col', 1, 'Marker-type: phospho marker')
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
ds_data = net.downsample(ds_type='kmeans', axis='row', num_samples=2000)
net.clip(-10, 10)
net.cluster(views=[])
net.widget()

  init_size=init_size)


Above we see a downsampled view of 2000 cell line clusters. Each cluster is assigned categorical information based on the categories of the majority of the cells in the cluster: e.g. Majority-Treatment and Majority-Category (cell type). Clusters vary in size from 531 cells to 1 cell. 

### Plasma vs PMA Cell Clusters
First, we see that we have obtained 1150 PMA clusters and only 850 Plasma cell line clusters. This implies that there may be more homogeneous behavior in the Plasma treated cells - homogeneous cells will be merged into fewer cell clusters. 

### Distributions of Cell Types
We see that the diversity of cell types in the downsampled data is lower in the downsampled data vs the subsampled data. This is a result of the cell-cluster cell type being defined by the majority cell type. This implies that phosphorylation data is not strongly dependent on all cell types. Below are the breakdowns of cell types from subsampled and downsampled data. 

![cell_type_comparisons](img/plasma_vs_pma_cell_type_breakdowns.png)

#### Underrepresentation CD8 Tcells and CD4 Tcells+CD27hi
The two cell types that are most underrepresented in the downsampled data are CD8 Tcells (2nd most common cell type in subsampled data) and CD4 Tcells+CD27hi (3rd most common cell type in subsampled data). We can see from the subsampled heatmap above that these cell lines are largely uniformly distributed in the heatmap, which might explain why they less frequently reach the majority in a cluster. 

#### CD14hi monocytes
The most over-represented cell type in the downsampled data is CD14hi monocytes (increased from ~8% in subsampled data to ~20% in downsampled data). We can also see from the downsampled heatmap CD14hi monocytes cell clusters are relatively small (have a small number of cells, see 'number in clust' category). 

This implies that CD14hi monocytes cells cluster together (which appears to be the case from the subsampled heatmap) and CD14hi monocytes have heterogeneous phosphorylation behavior that prevents them from being clustered into large homogeneous clusters.  

PMA treated CD14hi monocytes form a large cluster with high phosphorylation of 
* CREB
* MAPKAP2
* p38
* ERK1 2

To aid visualization of this cell type we will generate a heatmap with only CD14hi monocytes cell clusters below:

In [7]:
net.filter_cat('row', 2, 'Majority-Category: CD14hi monocytes')
net.cluster(views=[])
net.widget()

Above we see that Plasma and PMA treated cell-clusters form two large clusters. These clusters are largely defined based on the four phosphorylations discussed above. 

# Conclusions

This notebook demonstrates we can use Clustergrammer to visualize CyTOF data and identify cell type specific behaviors, e.g. unique CD14hi monocytes phosphorylation behavior after PMA treatment. 