# Overview of CyTOF Data
The original data was given as two tab-separated matrices
* ``Plasma.txt`` (original name: 160202_CGI002_Plasma_Plasma_singlets.fcs_raw_events.txt)
* ``PMA.txt`` (original name: 160202_CGI002_PMA_PMA_singlets.fcs_raw_events.txt)

These files had individual cell measurements as rows and dimensions (e.g. antibodies) as columns. I only kept the dimensions of interest surface marker and phospho marker antibody columns/dimensions and renamed these files. I then semi-automatically identified 'roughly-defined' cell types using hierarchical clustering and the surface markers associated cell types. 

``Plasma_CT.txt`` and ``PMA_CT.txt``.

# Plasma

In [1]:
import pandas as pd
import numpy as np
from clustergrammer_widget import *
net = Network()

In [2]:
net.load_file('cytof_data/Plasma_CT.txt')
net.random_sample(axis='row',num_samples=110000, random_state=99)
df_plasma = net.export_df()
df_plasma.shape

(110000, 28)

In [3]:
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
ds_data_plasma = net.downsample(ds_type='kmeans', axis='row', num_samples=1000)
net.dat['mat'].shape

  init_size=init_size)


(1000, 28)

In [4]:
# downsampling data can be used to link original to downsampled data
ds_data_plasma.shape

(110000,)

In [5]:
net.dat['mat'].shape

(1000, 28)

In [6]:
# clip z-scores since we do not are about extreme outliers
net.clip(-10,10)
# net.write_matrix_to_tsv('cytof_data/ds_plasma.txt')

net.set_cat_color('row', 1, 'Majority-Treatment: Plasma', 'blue')
net.set_cat_color('row', 1, 'Majority-Treatment: PMA', 'red')
net.set_cat_color('row', 2, 'Majority-Cell Types: T cells', '#000084')
net.set_cat_color('row', 2, 'Majority-Cell Types: NK cells', 'green')
net.set_cat_color('row', 2, 'Majority-Cell Types: CD8 T cells', 'gray')
net.set_cat_color('row', 2, 'Majority-Cell Types: Monocytes and Granulocytes', 'orange')

In [7]:
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())

## Plasma Surface Markers Only

In [8]:
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())

# PMA

In [9]:
net.load_file('cytof_data/PMA_CT.txt')
net.random_sample(axis='row',num_samples=110000, random_state=99)
df_pma = net.export_df()

In [10]:
df_pma.index.tolist()[0]

('Cell-13293', 'Treatment: PMA', 'Cell Types: T cells')

In [11]:
net.load_df(df_pma)
df_pma.shape

(110000, 28)

In [12]:
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=1000)
net.dat['mat'].shape
net.clip(-10,10)
# net.write_matrix_to_tsv('cytof_data/ds_pma.txt')

In [13]:
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())

## PMA Surface Markers Only

In [14]:
net.load_df(df_pma)
net.filter_cat('col', 1, 'Marker-type: surface marker')
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=1000)
net.clip(-10,10)
net.dat['mat'].shape

(1000, 18)

In [15]:
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())

# Merge Plasma and PMA

In [16]:
df_merge = pd.concat([df_plasma, df_pma])

In [17]:
df_merge.shape

(220000, 28)

In [18]:
net.load_df(df_merge)

In [19]:
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=2000)
net.clip(-10,10)
net.dat['mat'].shape

  init_size=init_size)


(2000, 28)

In [20]:
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())

# Plasma vs PMA based on Surface markers only

In [21]:
df_merge = pd.concat([df_plasma, df_pma])
net.load_df(df_merge)

In [22]:
net.filter_cat('col', 1, 'Marker-type: surface marker')
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=2000)
net.clip(-10,10)
net.dat['mat'].shape

(2000, 18)

In [23]:
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())

# Plasma vs PMA based on Phospho markers only

In [24]:
df_merge = pd.concat([df_plasma, df_pma])
net.load_df(df_merge)

In [25]:
net.filter_cat('col', 1, 'Marker-type: phospho marker')
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=2000)
net.clip(-10,10)
net.dat['mat'].shape

(2000, 10)

In [26]:
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())

PMA and Plasma treated cells separate more based on phospho markers than based on surface markers. This makes sense since PMA treatment is expected to influence phosphorylation levels.