# Exploratory Data Analysis
Written by: Adam Li
Exploration of the dataset from https://github.com/Upward-Spiral-Science/data/tree/master/syn-diversity.

We are looking at the feature matrix and the location matrix.

In [1]:
# Import Necessary Libraries
import numpy as np
import os, csv, json

from matplotlib import *
from matplotlib import pyplot as plt

from mpl_toolkits.mplot3d import Axes3D
import scipy
from sklearn.decomposition import PCA
import skimage.measure

# pretty charting
import seaborn as sns
sns.set_palette('muted')
sns.set_style('darkgrid')

%matplotlib inline



In [2]:
#### RUN AT BEGINNING AND TRY NOT TO RUN AGAIN - TAKES WAY TOO LONG ####
# load in the feature data
list_of_features = []
with open('synapsinR_7thA.tif.Pivots.txt.2011Features.txt') as file:
    for line in file:
        inner_list = [float(elt.strip()) for elt in line.split(',')]
        
        # create list of features
        list_of_features.append(inner_list)

# conver to a numpy matrix
list_of_features = np.array(list_of_features)

In [3]:
# load in volume data
list_of_locations = []
with open('synapsinR_7thA.tif.Pivots.txt') as file:
    for line in file:
        inner_list = [float(elt.strip()) for elt in line.split(',')]
        
        # create list of features
        list_of_locations.append(inner_list)

# conver to a numpy matrix
list_of_locations = np.array(list_of_locations)

## Missing Values?
* There are no missing values based on our csv read function
* These are all numbers

In [19]:
def find_first(item, vec):
    """return the index of the first occurence of item in vec"""
    for i in xrange(len(vec)):
        if item == vec[i]:
            return i
    return -1

min_location = []
max_location = []

x_range = list_of_locations[:, 0]
y_range = list_of_locations[:, 1]
z_range = list_of_locations[:, 2]

mins = [min(x_range), min(y_range), min(z_range)]
maxs = [max(x_range), max(y_range), max(z_range)]

# find min/max values and where they occur
min_location = [
    {min(x_range): find_first(min(x_range), x_range)},  
    {min(y_range): find_first(min(y_range), y_range)},
    {min(z_range): find_first(min(z_range), z_range)}
]
max_location = [
    {max(x_range): find_first(max(x_range), x_range)},  
    {max(y_range): find_first(max(y_range), y_range)},
    {max(z_range): find_first(max(z_range), z_range)}
]
# max_location = [max(x_range), max(y_range), max(z_range)]

# print some basic stats about the list of features and locations
print "The size of the feature matrix is: ", list_of_features.shape
print "The size of the location matrix is: ", list_of_locations.shape

print "min and max are:"
print mins
print maxs

# for i in range(0, len(min_location)):
#     print json.dumps(min_location[i], indent=4, separators=(',', ': '))
#     print json.dumps(max_location[i], indent=4, separators=(',', ': '))


The size of the feature matrix is:  (1119299, 144)
The size of the location matrix is:  (1119299, 3)
min and max are:
[28.0, 23.0, 2.0]
[1513.0, 12980.0, 40.0]


## Features 
f0 = integrated brightness 
f1 = local brightness 
f2 = distance to Center of Mass 
f3 = moment of inertia around synapse

Joshua says we can throw out f4 and f5?

Synapse Data:
Excitatory presynaptic: ‘Synap’, ‘Synap’, ‘VGlut1’, ‘VGlut1’, ‘VGlut2’,
Excitatory postsynaptic: ‘psd’, ‘glur2’, ‘nmdar1’, ’nr2b’, ’NOS’, ’Synapo’ (but further away than PSD, gluR2, nmdar1 and nr2b) 
Inhibitory presynaptic: ‘gad’, ‘VGAT’, ‘PV’, 
Inhibitory postsynaptic: ‘Gephyr’, ‘GABAR1’, ‘GABABR’, ’NOS’, 
At a very small number of inhibitory: ‘Vglut3’ (presynaptic), ’CR1’(presynaptic), 
Other synapses:‘5HT1A’, ‘TH’, ’VACht’, 
Not at synapses: ‘tubuli’, ‘DAPI’.

In [18]:
# Create feature matrices for each feature f0,...,f5
f0_features = list_of_features[:, 24*0:24*(0+1)]
f1_features = list_of_features[:,24*1:24*(1+1)]
f2_features = list_of_features[:,24*2:24*(2+1)]
f3_features = list_of_features[:,24*3:24*(3+1)]
f4_features = list_of_features[:,24*4:24*(4+1)]
f5_features = list_of_features[:,24*5:24*(5+1)]

print "Each feature matrix represents"
print "Feature matrix for this certain metric is of size: ", f0_features.shape

Feature matrix for this certain metric is of size:  (1119299, 24)


In [35]:
###### Create Volume Feature Vector ######
# 01: normalize
x_locations = list_of_locations[:,0]/max(x_range)
y_locations = list_of_locations[:,1]/max(y_range)
z_locations = list_of_locations[:,2]/max(z_range)

# 02: map onto 1 + 10 + 100
feature_location = x_locations + 10*y_locations + 100*z_locations

# end is a feature vector describing point in space

(1119299,)
