# Exploratory Analysis Round 2

Now that we have looked at the data on an unfiltered way, we now take into account information we know about the data. For example, the channels, channel types, proteins we know.

Reference: http://www.nature.com/articles/sdata201446

Co-localization:
"Examining the cross-correlations at small 2d shifts between images reveals that pairs of antibodies which are expected to colocalize within either pre- or postsynaptic compartments (for example, Synapsin1 and vGluT1 or PSD95 and GluR2, respectively) have sharp peaks of correlation, while pairs of antibodies which represent associated pre- and postsynaptic compartments (for example, Synapsin1 and PSD95) have broader, more diffuse cross-correlation peaks"



In [2]:
# Import Necessary Libraries
import numpy as np
import os, csv, json

from matplotlib import *
from matplotlib import pyplot as plt

from mpl_toolkits.mplot3d import Axes3D
import scipy
from sklearn.decomposition import PCA
import skimage.measure

# pretty charting
import seaborn as sns
sns.set_palette('muted')
sns.set_style('darkgrid')

%matplotlib inline

In [21]:
# channel = ['Synap','Synap','VGlut1','VGlut1','VGlut2','Vglut3',
#            'psd','glur2','nmdar1','nr2b','gad','VGAT', 'PV','Gephyr',
#            'GABAR1','GABABR','CR1','5HT1A', 'NOS','TH','VACht',
#            'Synapo','tubuli','DAPI']

channel = ['Synap_01','Synap_02','VGlut1_01','VGlut1_02','VGlut2','Vglut3',
           'psd','glur2','nmdar1','nr2b','gad','VGAT', 'PV','Gephyr',
           'GABAR1','GABABR','CR1','5HT1A', 'NOS','TH','VACht',
           'Synapo','tubuli','DAPI']

channeltype = ['ex.pre','ex.pre','ex.pre','ex.pre','ex.pre','in.pre.small', 
               'ex.post','ex.post','ex.post','ex.post','in.pre','in.pre', 
               'in.pre','in.post','in.post','in.post','in.pre.small','other',
               'ex.post','other','other','ex.post','none','none']
print channel
print channeltype

['Synap_01', 'Synap_02', 'VGlut1_01', 'VGlut1_02', 'VGlut2', 'Vglut3', 'psd', 'glur2', 'nmdar1', 'nr2b', 'gad', 'VGAT', 'PV', 'Gephyr', 'GABAR1', 'GABABR', 'CR1', '5HT1A', 'NOS', 'TH', 'VACht', 'Synapo', 'tubuli', 'DAPI']
['ex.pre', 'ex.pre', 'ex.pre', 'ex.pre', 'ex.pre', 'in.pre.small', 'ex.post', 'ex.post', 'ex.post', 'ex.post', 'in.pre', 'in.pre', 'in.pre', 'in.post', 'in.post', 'in.post', 'in.pre.small', 'other', 'ex.post', 'other', 'other', 'ex.post', 'none', 'none']


In [3]:
# load in volume data
list_of_locations = []
with open('data/synapsinR_7thA.tif.Pivots.txt') as file:
    for line in file:
        inner_list = [float(elt.strip()) for elt in line.split(',')]
        
        # create list of features
        list_of_locations.append(inner_list)

# conver to a numpy matrix
list_of_locations = np.array(list_of_locations)

In [9]:
#### RUN AT BEGINNING AND TRY NOT TO RUN AGAIN - TAKES WAY TOO LONG ####
# write new list_of_features to new txt file
csvfile = "data_normalized/shortenedFeatures_normalized.txt"

# load in the feature data
list_of_features = []
with open(csvfile) as file:
    for line in file:
        inner_list = [float(elt.strip()) for elt in line.split(',')]
        
        # create list of features
        list_of_features.append(inner_list)

# conver to a numpy matrix
list_of_features = np.array(list_of_features)

In [5]:
# for i in range(0, len(list_of_locations)):
print min(list_of_locations[:,0]), " ", max(list_of_locations[:,0])
print min(list_of_locations[:,1]), " ", max(list_of_locations[:,1])
print min(list_of_locations[:,2]), " ", max(list_of_locations[:,2])

print abs(min(list_of_locations[:,0]) - max(list_of_locations[:,0]))
print abs(min(list_of_locations[:,1]) - max(list_of_locations[:,1]))
print abs(min(list_of_locations[:,2]) - max(list_of_locations[:,2]))

28.0   1513.0
23.0   12980.0
2.0   40.0
1485.0
12957.0
38.0


In [24]:
# Make a feature dictionary for all the different protein expressions
features = {}
for idx, chan in enumerate(channel):
    indices = [0+idx, 24+idx, 48+idx, 72+idx]
    features[chan] = list_of_features[:,indices]
    
print "The number of protein expressions are:"
print "This number should be 24: ", len(features.keys())

The number of protein expressions are:
This number should be 24:  24


In [26]:
# 
print "The number of unique channel types are: ", len(np.unique(channeltype))
print np.unique(channeltype)

The number of unique channel types are:  7
['ex.post' 'ex.pre' 'in.post' 'in.pre' 'in.pre.small' 'none' 'other']
