# Algorithms.md

-------

-------

# Describe Approach

## Algorithm

**Description:** 

We first construct a similarity graph using the input data points.  Then we compute the Laplacian and its eigenvectors to do dimensionality reduction in order to make the clusters more obvious.  Lastly we use a classical clustering algorithm (e.g. k-means) to obtain the clusters.

**Input:** 
  1. Similarity matrix
  2. number of clusters

**Output:** Clusters

### Pseudocode

**function** connectivity (graph, num_clusters)
  1. Construct similarity graph from adjacency matrix of the graph.
  2. Compute the normalized Laplacian by doing: $L = D^{-1/2}(D - A)D^{-1/2}$, where L is the normalized laplacian, D is the Degree matrix (a diagonal matrix that measures the degree at each node), and A is the aforementioned adjacency matrix.
  3. Obtain the eigenvector associated with the second smallest eigenvalue, known as the Fiedler Vector, by taking the second column in the Laplacian constructed in step 2.
  4. The Fiedler vector should have dimensions nx1, where n is the number of nodes.  Now cluster the n components of the Fiedler vector by region label.
  5. Calculate the average for each region, and iterate through the regions to find out which region's average is closest to another's
  7. return clusters $A_1, ..., A_k$ where $A_i = \{j \ | \ y_j \in C_i\}$
  
**endfunction**
  
-------

-------


## Simulation


#### Easy
- Data will be contructed from several gaussian clusters, such that these meand of these clusters are pretty far apart.

We expect spectral clustering to do well in this scenario because spectral clustering will perform dimensionality reduction and hopefully make the data more obviously separable.

# Code

## Imports and plotting functions

In [440]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
from plotly import tools
import plotly

import os
#os.chdir('C:/Users/L/Documents/Homework/BME/Neuro Data I/Data/')

import csv,gc  # garbage memory collection :)

import numpy as np
from numpy import linalg as LA
# import matplotlib.pyplot as plt
# from mpl_toolkits.mplot3d import axes3d

# from mpl_toolkits.mplot3d import axes3d
# from collections import namedtuple

import csv
import re
import matplotlib
import time
import seaborn as sns

from collections import OrderedDict

import ast


In [441]:
import networkx as nx
import math
from sklearn.manifold import spectral_embedding as se

import scipy.sparse as sp

In [442]:
# from matplotlib.pyplot import *
# import matplotlib.pyplot as plt
# %matplotlib inline

In [443]:
plotly.offline.init_notebook_mode()

In [444]:
def plot_connectivity(dictionary):
    current_palette = sns.color_palette("husl", len(dictionary.keys()))
    Xe = []
    Ye = []
    data = []
    avg_dict = OrderedDict()
    i = 0
    for key, region in dictionary.iteritems():
        X = []
        Y = []
#         Z = []
        tmp_x = []
        tmp_y = []
        
        region_col = current_palette[i]
        region_col_lit = 'rgb' + str(region_col)
        i += 1
        for coord in region:    
            X.append(coord[0])
            Y.append(coord[1])
            tmp_x.append(coord[0])
            tmp_y.append(coord[1])
        avg_dict[key] = [np.mean(tmp_x), np.mean(tmp_y)]
            
        trace_scatter = Scatter(
                x = X, 
                y = Y,
                name=key,
                mode='markers',
                marker=dict(
                    size=10,
                    color=region_col_lit, #'purple',                # set color to an array/list of desired values
                    colorscale='Viridis',   # choose a colorscale
                    opacity=0.5
                )
        )
        avg_scatter = Scatter(
                x = [avg_dict[key][0]],
                y = [avg_dict[key][1]],
                mode='markers',
                name=key+'_avg',
                marker=dict(
                    size=10,
                    color=region_col_lit,
                    colorscale='Viridis',
                    line=dict(
                        width = 2,
                        color = 'rgb(0, 0, 0)'
                    )
                )
        )
        data.append(trace_scatter)
        data.append(avg_scatter)
        
    print('avg_dict')
    print(avg_dict)
    print(avg_dict[key][0])
        
    locations = avg_dict.keys()
#     print locations
    dist_adj = np.zeros([len(locations), len(locations)])
    
    for i1, key1 in enumerate(avg_dict):
        for i2, key2 in enumerate(avg_dict):
#             print('key1: %s' % key1)
#             print('key2: %s' % key2)
            p1 = np.asarray(avg_dict[key1])
            p2 = np.asarray(avg_dict[key2])
            dist = LA.norm(p1 - p2)
#             print('points')
#             print(p1)
#             print(p2)

#             print(dist)
            dist_adj[i1,i2] = dist
    
    print('adj mat')
    print(dist_adj)

    for i, key in enumerate(avg_dict):
#         if i + 1 == len(locations):
#             continue
#         print 'start' + str(i)
        tmp = []
        for j in range(len(locations)):
            if j == i:
                continue
            p1 = np.asarray(avg_dict[key])
            p2 = np.asarray(avg_dict[locations[j]])
            dist = LA.norm(p1 - p2)
#             dist = (math.pow(avg_dict[key][0][0]-avg_dict[locations[j]][0][0],2) + 
#                        math.pow(avg_dict[key][0][1]-avg_dict[locations[j]][0][1],2))
            tmp.append(dist)
#             print dist
        newmin = tmp.index(min(tmp))
        if newmin >= i:
            newmin += 1
#         print newmin
        print "region " + key + ": " + locations[newmin]
        tmp2 = avg_dict.keys()[newmin]
        Xe+=[avg_dict[key][0],avg_dict[tmp2][0],None]
        Ye+=[avg_dict[key][1],avg_dict[tmp2][1],None]
#         Ze+=[dictionary[key][0][2],dictionary[tmp2][0][2],None]
#     print('Xe and Ye')
#     print Xe
    print Ye
    
    trace_edge = Scatter(x=Xe,
               y=Ye,
               mode='lines',
               line=Line(color='rgb(0,0,0)', width=3),
               hoverinfo='none'
    )

    data.append(trace_edge)
    
    layout = Layout(
#         margin=dict(
#             l=0,
#             r=0,
#             b=0,
#             t=0
#         ),
        paper_bgcolor='rgb(255,255,255)',
        plot_bgcolor='rgb(255,255,255)'
    )
        
    fig = Figure(data=data, layout=layout)
    iplot(fig, validate=False)

In [445]:
def plot_connectivity3d(dictionary):
    current_palette = sns.color_palette("husl", len(dictionary.keys()))
    Xe = []
    Ye = []
    Ze = []
    data = []
    avg_dict = OrderedDict()
    i = 0
    for key, region in dictionary.iteritems():
        X = []
        Y = []
        Z = []
        tmp_x = []
        tmp_y = []
        tmp_z = []
        region_col = current_palette[i]
        region_col_lit = 'rgb' + str(region_col)
        i += 1
        for coord in region:    
            X.append(coord[0])
            Y.append(coord[1])
            Z.append(coord[2])
            tmp_x.append(coord[0])
            tmp_y.append(coord[1])
            tmp_z.append(coord[2])
        avg_dict[key] = [[np.mean(tmp_x), np.mean(tmp_y), np.mean(tmp_z)]]
            
        trace_scatter = Scatter3d(
                x = X, 
                y = Y,
                z = Z,
                name=key,
                mode='markers',
                marker=dict(
                    size=10,
                    color=region_col_lit, #'purple',                # set color to an array/list of desired values
                    colorscale='Viridis',   # choose a colorscale
                    opacity=0.5
                )
        )
        avg_scatter = Scatter3d(
                x = [avg_dict[key][0][0]],
                y = [avg_dict[key][0][1]],
                z = [avg_dict[key][0][2]],
                mode='markers',
                name=key+'_avg',
                marker=dict(
                    size=10,
                    color=region_col_lit,
                    colorscale='Viridis',
                    line=dict(
                        width = 2,
                        color = 'rgb(0, 0, 0)'
                    )
                )
        )
        data.append(trace_scatter)
        data.append(avg_scatter)
        
    locations = avg_dict.keys()
#     print locations
    for i, key in enumerate(avg_dict):
#         if i + 1 == len(locations):
#             continue
#         print 'start' + str(i)
        tmp = []
        for j in range(len(locations)):
            if j == i:
                continue
            p1 = np.asarray(avg_dict[key][0])
            p2 = np.asarray(avg_dict[locations[j]][0])
            dist = LA.norm(p1 - p2)
#             dist = (math.pow(avg_dict[key][0][0]-avg_dict[locations[j]][0][0],2) + 
#                        math.pow(avg_dict[key][0][1]-avg_dict[locations[j]][0][1],2) +
#                        math.pow(avg_dict[key][0][2]-avg_dict[locations[j]][0][2],2))
            tmp.append(dist)
#             print dist
        newmin = tmp.index(min(tmp))
        if newmin >= i:
            newmin += 1
#         print newmin
        print "region " + key + ": " + locations[newmin]
        tmp2 = avg_dict.keys()[newmin]
        Xe+=[avg_dict[key][0][0],avg_dict[tmp2][0][0],None]
        Ye+=[avg_dict[key][0][1],avg_dict[tmp2][0][1],None]
        Ze+=[avg_dict[key][0][2],avg_dict[tmp2][0][2],None]
#     print Xe
#     print Ye
    
    trace_edge = Scatter3d(x=Xe,
                y=Ye,
                z=Ze,
                mode='lines',
                line=Line(color='rgb(0,0,0)', width=3),
                hoverinfo='none'
    )

    data.append(trace_edge)
    
    layout = Layout(
#         margin=dict(
#             l=0,
#             r=0,
#             b=0,
#             t=0
#         ),
        paper_bgcolor='rgb(255,255,255)',
        plot_bgcolor='rgb(255,255,255)'
    )
        
    fig = Figure(data=data, layout=layout)
    iplot(fig, validate=False)

In [446]:
def plot_graphml(G):
    current_palette = sns.color_palette("husl", len(G.nodes())/100)
    Xe = []
    Ye = []
    data = []
    i = 0
    
    X = []
    Y = []
    regiondict = {}
    for r, node in enumerate(G.nodes()):
        tmp = G.node[node]
#         print tmp
        pos = tmp['pos']
        region = tmp['region']
        if str(region) not in regiondict:
            regiondict[str(region)] = [pos]
        else:
            tmp = regiondict[str(region)]
            tmp.append(pos)
            regiondict[str(region)] = tmp
#         print region
#         print pos
#     print regiondict
    for region, reg in enumerate(regiondict):
        for pos in regiondict[reg]:
#             print pos
            X.append(pos[0])
            Y.append(pos[1])
                
        region_col = current_palette[region]
        region_col_lit = 'rgb' + str(region_col)
        
        trace_scatter = Scatter(
                x = X, 
                y = Y,
                name=region,
                mode='markers',
                marker=dict(
                    size=10,
                    color=region_col_lit, #'purple',                # set color to an array/list of desired values
                    colorscale='Viridis',   # choose a colorscale
                    opacity=0.5
                )
        )
        data.append(trace_scatter)
        X = []
        Y = []
        
    for r, edge in enumerate(G.edges()):
        firstpt = G.node[edge[0]]
        secondpt = G.node[edge[1]]
#         print firstpt
        dist = LA.norm(firstpt['pos'] - secondpt['pos'])
#         tmp.append(dist)
#         print dist

        Xe+=[firstpt['pos'][0],secondpt['pos'][0],None]
        Ye+=[firstpt['pos'][1],secondpt['pos'][1],None]
#         Ze+=[dictionary[key][0][2],dictionary[tmp2][0][2],None]
#     print Xe
#     print Ye
    
    trace_edge = Scatter(x=Xe,
               y=Ye,
               mode='lines',
               line=Line(color='rgb(0,0,0)', width=2),
               hoverinfo='none'
    )

    data.append(trace_edge)
    
    layout = Layout(
#         margin=dict(
#             l=0,
#             r=0,
#             b=0,
#             t=0
#         ),
        paper_bgcolor='rgb(255,255,255)',
        plot_bgcolor='rgb(255,255,255)'
    )
        
    fig = Figure(data=data, layout=layout)
    iplot(fig, validate=False)

In [447]:
def plot_eigenvector(dictionary, eigenvector_num):
    current_palette = sns.color_palette("husl", len(dictionary.keys()))
    data = []
    avg_dict = OrderedDict()
    eigenvector_index = eigenvector_num - 1
    i = 0
    for key, region in dictionary.iteritems():  
        print(key)
        X = []
        Y = []
        y_vals = []
        
        region_col = current_palette[i]
        region_col_lit = 'rgb' + str(region_col)
        i += 1
        
        for j in range(len(region) - 1):
#             Xe+=[avg_dict[key][0],avg_dict[tmp2][0],None]
            Y += [region[j][eigenvector_index], region[j+1][eigenvector_index], None]
            X += [j, j+1, None]
            y_vals.append(region[j][eigenvector_index])
        y_vals.append(region[len(region) - 1][eigenvector_index])
#         print(Y)
        avg = np.mean(y_vals)
        avg_dict[key] = avg

        trace_edge = Scatter(
                    x=X,
                    y=Y,
                    name=key,
                    mode='lines',
                    line=Line(color=region_col_lit, width=3),
                    hoverinfo='none'
        )
        data.append(trace_edge)
        
        Ya, Xa = [], []
        for j in range(len(region) - 1):
#             Xe+=[avg_dict[key][0],avg_dict[tmp2][0],None]
            Ya += [avg, avg, None]
            Xa += [j, j+1, None]
        
        trace_edge = Scatter(
                    x=Xa,
                    y=Ya,
                    name=key + 'avg',
                    mode='lines',
                    line=Line(color=region_col_lit, width=3),
                    hoverinfo='none'
        )
        data.append(trace_edge)
        
#         color=region_col_lit, #'purple',                # set color to an array/list of desired values
#                     colorscale='Viridis'

#     for key, value in avg_dict.iteritems():
#         trace_edge = Scatter(
#                     y=value,
#                     name=key,
#                     mode='lines',
#                     line=Line(color=region_col_lit, width=3),
#                     hoverinfo='none'
#         )
#         data.append(trace_edge)
    
    layout = Layout(
#         margin=dict(
#             l=0,
#             r=0,
#             b=0,
#             t=0
#         ),
        paper_bgcolor='rgb(255,255,255)',
        plot_bgcolor='rgb(255,255,255)'
    )
        
    fig = Figure(data=data, layout=layout)
    iplot(fig, validate=False)

## Function for generating easy data

In [448]:
# create 2D plot showing algorithm
# 4 'regions' indicated by 4 x 100 points: a,b,c,d
# obtain the average coordinates for each region to obtain the representative point for each region
# and connect the 4 points such that if any of the points were chosen as starting point, it would only connect to nearest 'region'
# loop through all points and set them as starting point, connectivity map will self-terminate when the last two points point to each other as mutual 'nearest regions'

# test with spherical gaussian mixture
np.random.seed(123456789)
# np.random.seed(1234567)

a_norm = 0.5 * np.random.randn(100, 1) + np.array([-0.1])
b_norm = 0.5 * np.random.randn(100, 1) + np.array([2.1])
c_norm = 0.5 * np.random.randn(100, 1) + np.array([-1.5])

z = np.zeros((len(a_norm), 1))

a_norm = np.hstack((a_norm, z))
b_norm = np.hstack((b_norm, z))
c_norm = np.hstack((c_norm, z))


norm_dict = OrderedDict([('a',a_norm),('b',b_norm),('c',c_norm)])

# connect the 400 points using epsilon ball 
radius = 0.4
allpt = np.concatenate([a_norm,b_norm,c_norm])

# print(allpt)

G=nx.Graph()

# generate networkx graph object to obtain adjacency matrix, edges have weight of 1, 
# with epsilon ball radius according to above
for i in range(len(allpt)):
    G.add_node(str(i),pos=allpt[i],region=i/100)
    for j in range(i+1,len(allpt)):
        dist = LA.norm(allpt[i] - allpt[j])
        if dist < radius:
            G.add_edge(str(i),str(j),distance=dist)

In [449]:
plot_connectivity(norm_dict)

avg_dict
OrderedDict([('a', [-0.041694897750491421, 0.0]), ('b', [2.0088111177663377, 0.0]), ('c', [-1.5634126617706636, 0.0])])
-1.56341266177
adj mat
[[ 0.          2.05050602  1.52171776]
 [ 2.05050602  0.          3.57222378]
 [ 1.52171776  3.57222378  0.        ]]
region a: c
region b: a
region c: a
[0.0, 0.0, None, 0.0, 0.0, None, 0.0, 0.0, None]


In [450]:
print(nx.adjacency_matrix(G))

  (0, 1)	1
  (0, 3)	1
  (0, 6)	1
  (0, 20)	1
  (0, 40)	1
  (0, 42)	1
  (0, 45)	1
  (0, 47)	1
  (0, 50)	1
  (0, 61)	1
  (0, 62)	1
  (0, 63)	1
  (0, 75)	1
  (0, 79)	1
  (0, 83)	1
  (0, 84)	1
  (0, 85)	1
  (0, 86)	1
  (0, 87)	1
  (0, 88)	1
  (0, 117)	1
  (0, 122)	1
  (0, 125)	1
  (0, 130)	1
  (0, 135)	1
  :	:
  (299, 135)	1
  (299, 136)	1
  (299, 137)	1
  (299, 143)	1
  (299, 148)	1
  (299, 162)	1
  (299, 164)	1
  (299, 176)	1
  (299, 182)	1
  (299, 208)	1
  (299, 229)	1
  (299, 232)	1
  (299, 243)	1
  (299, 246)	1
  (299, 247)	1
  (299, 250)	1
  (299, 251)	1
  (299, 252)	1
  (299, 253)	1
  (299, 254)	1
  (299, 294)	1
  (299, 295)	1
  (299, 296)	1
  (299, 297)	1
  (299, 298)	1


In [451]:
se_result = spec_clust(G, 3)

In [452]:
print(se_result)

[[ 0.41620447 -0.35212685  0.68056477]
 [ 0.43285265 -0.36154286  0.62571167]
 [ 0.39123221 -0.28691984 -0.04397421]
 [ 0.3163154  -0.26926612  0.54729585]
 [ 0.34961176 -0.27145098  0.16611432]
 [ 0.36625994 -0.28717812  0.21794269]
 [ 0.44950083 -0.37410819  0.62678421]
 [ 0.38290812 -0.28033695 -0.04968709]
 [ 0.48279719  0.68594212  0.06844991]
 [ 0.25804677  0.37430131  0.0444967 ]
 [ 0.36625994 -0.28215735  0.14271961]
 [ 0.39123221 -0.28801457 -0.0295113 ]
 [ 0.52441764  0.74920978  0.07849594]
 [ 0.27469495  0.35715282  0.00790247]
 [ 0.44117674  0.60861805  0.04504637]
 [ 0.23307451  0.29408395 -0.00141936]
 [ 0.45782492  0.6598222   0.07440806]
 [ 0.38290812  0.55382877  0.06432944]
 [ 0.30799131  0.44621788  0.0525405 ]
 [ 0.34961176 -0.27145098  0.16611432]
 [ 0.35793585 -0.30426621  0.6114019 ]
 [ 0.50776946  0.7268102   0.07739138]
 [ 0.50776946  0.72840533  0.07901293]
 [ 0.37458403 -0.28795555  0.13764387]
 [ 0.29966722  0.43424722  0.05121623]
 [ 0.42452856 -0.23882317

In [453]:
se_regions = get_dict_easy(G, se_result)

[38, 266, 192, 123, 39, 267, 196, 126, 41, 268, 219, 218, 221, 220, 223, 222, 225, 224, 227, 226, 29, 30, 31, 32, 25, 26, 27, 28, 34, 35, 152, 151, 150, 149, 156, 155, 154, 153, 147, 146, 264, 265, 262, 263, 260, 261, 258, 259, 256, 257, 72, 71, 74, 73, 68, 67, 70, 69, 65, 64, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 288, 287, 286, 285, 284, 283, 282, 281, 291, 290, 110, 111, 108, 109, 114, 115, 112, 48, 102, 103, 211, 210, 213, 212, 215, 214, 217, 216, 205, 204]


In [454]:
print(se_regions)

OrderedDict([('a', array([[ 0.14150952,  0.11907414, -0.05124387],
       [ 0.14983361,  0.07939455, -0.09113255],
       [ 0.14150952,  0.01683677, -0.13093889],
       [ 0.54106582, -0.33222036, -0.68014508],
       [ 0.34961176, -0.1918844 , -0.4669954 ],
       [ 0.44117674, -0.29308373, -0.41422774],
       [ 0.27469495, -0.12109916, -0.3557901 ],
       [ 0.20810224, -0.06578694, -0.25544118],
       [ 0.3163154 , -0.17145116, -0.42380849],
       [ 0.34961176, -0.1918844 , -0.4669954 ],
       [ 0.40788038, -0.23863481, -0.53891735],
       [ 0.48279719, -0.31379921, -0.5164583 ],
       [ 0.35793585, -0.19704122, -0.47768777],
       [ 0.52441764, -0.31949133, -0.6672316 ],
       [ 0.44117674, -0.29308373, -0.41422774],
       [ 0.34961176, -0.26345655,  0.05339188],
       [ 0.49944537, -0.30926191, -0.62303583],
       [ 0.44117674, -0.25983376, -0.57903581],
       [ 0.44950083, -0.29828768, -0.42472943],
       [ 0.44117674, -0.29308373, -0.41422774],
       [ 0.41620447, 

In [455]:
plot_eigenvector1(se_regions)

a
b
c


In [456]:
plot_connectivity(se_regions)

avg_dict
OrderedDict([('a', [0.41745308730413894, -0.24206126859434632]), ('b', [0.39581045466232945, 0.5559201846032501]), ('c', [0.38806905144802811, -0.31385891600902399])])
0.388069051448
adj mat
[[ 0.          0.79827489  0.07757786]
 [ 0.79827489  0.          0.86981355]
 [ 0.07757786  0.86981355  0.        ]]
region a: c
region b: a
region c: a
[-0.24206126859434632, -0.31385891600902399, None, 0.5559201846032501, -0.24206126859434632, None, -0.31385891600902399, -0.24206126859434632, None]


In [457]:
def gen_easy():
    # create 2D plot showing algorithm
    # 4 'regions' indicated by 4 x 100 points: a,b,c,d
    # obtain the average coordinates for each region to obtain the representative point for each region
    # and connect the 4 points such that if any of the points were chosen as starting point, it would only connect to nearest 'region'
    # loop through all points and set them as starting point, connectivity map will self-terminate when the last two points point to each other as mutual 'nearest regions'

    # test with spherical gaussian mixture
    np.random.seed(123456789)
    # np.random.seed(1234567)

    a_norm = 0.5 * np.random.randn(100, 1) + np.array([-0.1])
    b_norm = 0.5 * np.random.randn(100, 1) + np.array([2.1])
    c_norm = 0.5 * np.random.randn(100, 1) + np.array([-1.5])

    z = np.zeros((len(a_norm), 1))

    a_norm = np.hstack((a_norm, z))
    b_norm = np.hstack((b_norm, z))
    c_norm = np.hstack((c_norm, z))


    norm_dict = OrderedDict([('a',a_norm),('b',b_norm),('c',c_norm)])

    # connect the 400 points using epsilon ball 
    radius = 0.4
    allpt = np.concatenate([a_norm,b_norm,c_norm])

    # print(allpt)

    G=nx.Graph()

    # generate networkx graph object to obtain adjacency matrix, edges have weight of 1, 
    # with epsilon ball radius according to above
    for i in range(len(allpt)):
        G.add_node(str(i),pos=allpt[i],region=i/100)
        for j in range(i+1,len(allpt)):
            dist = LA.norm(allpt[i] - allpt[j])
            if dist < radius:
                G.add_edge(str(i),str(j),distance=dist)
                
    return G, norm_dict

In [458]:
def gen_easy2():
    # Same as gen_easy except the centers are random
    # create 2D plot showing algorithm
    # 4 'regions' indicated by 4 x 100 points: a,b,c,d
    # obtain the average coordinates for each region to obtain the representative point for each region
    # and connect the 4 points such that if any of the points were chosen as starting point, it would only connect to nearest 'region'
    # loop through all points and set them as starting point, connectivity map will self-terminate when the last two points point to each other as mutual 'nearest regions'

    # test with spherical gaussian mixture
#     np.random.seed(123456789)
    np.random.seed(1234567)
    
    a_center = np.random.rand() * 10
    b_center = np.random.rand() * 10
    c_center = np.random.rand() * 10
    
    print('centers: %f, %f, %f' % (a_center, b_center, c_center))

    a_norm = 0.5 * np.random.randn(100, 1) + np.array([a_center])
    b_norm = 0.5 * np.random.randn(100, 1) + np.array([b_center])
    c_norm = 0.5 * np.random.randn(100, 1) + np.array([c_center])

    z = np.zeros((len(a_norm), 1))

    a_norm = np.hstack((a_norm, z))
    b_norm = np.hstack((b_norm, z))
    c_norm = np.hstack((c_norm, z))


    norm_dict = OrderedDict([('a',a_norm),('b',b_norm),('c',c_norm)])

    # connect the 400 points using epsilon ball 
    radius = 0.4
    allpt = np.concatenate([a_norm,b_norm,c_norm])

    # print(allpt)

    G=nx.Graph()

    # generate networkx graph object to obtain adjacency matrix, edges have weight of 1, 
    # with epsilon ball radius according to above
    for i in range(len(allpt)):
        G.add_node(str(i),pos=allpt[i],region=i/100)
        for j in range(i+1,len(allpt)):
            dist = LA.norm(allpt[i] - allpt[j])
            if dist < radius:
                G.add_edge(str(i),str(j),distance=dist)
                
    return G, norm_dict

In [459]:
a_norm = 0.5 * np.random.randn(100, 1) + np.array([-0.1])
print(a_norm)

[[-0.34958243]
 [ 0.24122297]
 [ 0.13074795]
 [ 0.23550063]
 [-0.84807258]
 [-0.19242836]
 [-0.13374992]
 [ 0.42598973]
 [ 0.15139973]
 [ 0.52802992]
 [-0.26036861]
 [ 0.11372274]
 [ 0.15977123]
 [-0.17497218]
 [-0.02396473]
 [-0.58922548]
 [ 0.11628304]
 [ 0.56592432]
 [-1.15902272]
 [-0.0612423 ]
 [-0.15682324]
 [-0.11400999]
 [-0.03033174]
 [ 0.24089282]
 [ 0.31118568]
 [-0.20051767]
 [ 0.13431938]
 [ 0.6805202 ]
 [-0.51675037]
 [-0.06846987]
 [-0.51807763]
 [ 0.58406794]
 [ 0.65731699]
 [-0.54711979]
 [-0.13922163]
 [-0.52139375]
 [ 0.2255946 ]
 [-0.36942627]
 [-0.13236217]
 [-0.95055759]
 [ 0.00914935]
 [ 0.22579308]
 [-0.06190635]
 [-0.26526319]
 [ 0.75672401]
 [ 0.14858745]
 [ 0.00401314]
 [-0.32123246]
 [-0.1592458 ]
 [-0.42769894]
 [ 0.03787497]
 [ 0.18902649]
 [ 0.45308464]
 [-0.53054678]
 [ 0.34195678]
 [ 0.55578683]
 [-0.71714131]
 [-0.31178707]
 [-0.514352  ]
 [-0.02371956]
 [-0.52682935]
 [ 0.81477083]
 [ 0.1636268 ]
 [-0.6970885 ]
 [-0.02959405]
 [ 0.33166333]
 [-0.48614

In [460]:
def gen_easy3():
    # create 2D plot showing algorithm
    # 4 'regions' indicated by 4 x 100 points: a,b,c,d
    # obtain the average coordinates for each region to obtain the representative point for each region
    # and connect the 4 points such that if any of the points were chosen as starting point, it would only connect to nearest 'region'
    # loop through all points and set them as starting point, connectivity map will self-terminate when the last two points point to each other as mutual 'nearest regions'

    # test with spherical gaussian mixture
    np.random.seed(123456789)
    # np.random.seed(1234567)

    a_norm = 0.5 * np.random.randn(10, 1) + np.array([-0.1])
    b_norm = 0.5 * np.random.randn(10, 1) + np.array([1])
    c_norm = 0.5 * np.random.randn(10, 1) + np.array([-1.5])
    d_norm = 0.5 * np.random.randn(10, 1) + np.array([3])

    z = np.zeros((len(a_norm), 1))

    a_norm = np.hstack((a_norm, z))
    b_norm = np.hstack((b_norm, z))
    c_norm = np.hstack((c_norm, z))
    d_norm = np.hstack((d_norm, z))


    norm_dict = OrderedDict([('a',a_norm),('b',b_norm),('c',c_norm),('d',d_norm)])

    # connect the 400 points using epsilon ball 
    radius = 0.4
    allpt = np.concatenate([a_norm,b_norm,c_norm,d_norm])

    # print(allpt)

    G=nx.Graph()

    # generate networkx graph object to obtain adjacency matrix, edges have weight of 1, 
    # with epsilon ball radius according to above
    for i in range(len(allpt)):
        G.add_node(str(i),pos=allpt[i],region=i/100)
        for j in range(i+1,len(allpt)):
            dist = LA.norm(allpt[i] - allpt[j])
            if dist < radius:
                G.add_edge(str(i),str(j),distance=dist)
                
    return G, norm_dict

## Function for generating hard data

In [461]:
def gen_hard():
    # create 2D plot showing algorithm
    # 4 'regions' indicated by 4 x 100 points: a,b,c,d
    # obtain the average coordinates for each region to obtain the representative point for each region
    # and connect the 4 points such that if any of the points were chosen as starting point, it would only connect to nearest 'region'
    # loop through all points and set them as starting point, connectivity map will self-terminate when the last two points point to each other as mutual 'nearest regions'

    # test with spherical gaussian mixture
    np.random.seed(123456789)
    a_norm = 0.5 * np.random.randn(100, 2) + np.array([-0.1,0])
    b_norm = 0.5 * np.random.randn(100, 2) + np.array([0,2.1])
    c_norm = 0.5 * np.random.randn(100, 2) + np.array([2,2.1])
    d_norm = 0.5 * np.random.randn(100, 2) + np.array([2.1,0])

    norm_dict = OrderedDict([('a',a_norm),('b',b_norm),('c',c_norm),('d',d_norm)])

    # connect the 400 points using epsilon ball 
    radius = 0.4
    allpt = np.concatenate([a_norm,b_norm,c_norm,d_norm])

    G=nx.Graph()

    # generate networkx graph object to obtain adjacency matrix, edges have weight of 1, 
    # with epsilon ball radius according to above
    for i in range(len(allpt)):
        G.add_node(str(i),pos=allpt[i],region=i/100)
        for j in range(i+1,len(allpt)):
            dist = LA.norm(allpt[i] - allpt[j])
            if dist < radius:
                G.add_edge(str(i),str(j),distance=dist)
                
    return G, norm_dict

In [462]:
def get_connectivity(eig_dict):
    eigenvector_index = 1 # the second smallest eigenvector
    avg_dict = {}
    con_dict = {}
    
    # Getting the average second eigenvector component for each of the regions
    for key, region in eig_dict.iteritems():
        print(key)
        y_vals = []
        
        for j in range(len(region)):
            y_vals.append(region[j][eigenvector_index])
        avg = np.mean(y_vals)
        avg_dict[key] = avg
    
    for key, avg in avg_dict.iteritems():
        min_key = ''
        min_diff = float('inf')
        for key2, avg2 in avg_dict.iteritems():
            if key2 == key:
                continue
            diff = np.sqrt(np.square(avg - avg2))
            if diff < min_diff:
                min_diff = diff
                min_key = key2
        
        con_dict[key] = [min_key, min_diff]
    
    con_dict = sorted(con_dict.items())
    
    return con_dict

In [463]:
def get_connectivity_tot(eig_dict):
    eigenvector_index = 1 # the second smallest eigenvector
    avg_dict = {}
    con_dict = {}
    
    # Getting the average second eigenvector component for each of the regions
    for key, region in eig_dict.iteritems():
        print(key)
        y_vals = []
        
        for j in range(len(region)):
            y_vals.append(region[j])
        avg = np.mean(y_vals)
        avg_dict[key] = avg
    
    for key, avg in avg_dict.iteritems():
        min_key = ''
        min_diff = float('inf')
        for key2, avg2 in avg_dict.iteritems():
            if key2 == key:
                continue
            diff = np.sqrt(np.square(avg - avg2))
            if diff < min_diff:
                min_diff = diff
                min_key = key2
        
        con_dict[key] = [min_key, min_diff]
    
    con_dict = sorted(con_dict.items())
    
    return con_dict

## Function for doing the clustering

In [464]:
def spec_clust(graphx, num_components):
    adj_mat = nx.adjacency_matrix(graphx)
#     result = se(adj_mat, n_components=num_components, drop_first=True)
    result = se(adj_mat, n_components=num_components, drop_first=False)

    return result

In [465]:
def spec_clust_drop(graphx, num_components):
    adj_mat = nx.adjacency_matrix(graphx)
    result = se(adj_mat, n_components=num_components, drop_first=True)
#     result = se(adj_mat, n_components=num_components, drop_first=False)

    return result

In [466]:
def get_dict_easy(graphx, a2out):
    nodelist = graphx.nodes()
    a_arr = np.zeros(100, dtype=np.int)
    b_arr = np.zeros(100, dtype=np.int)
    c_arr = np.zeros(100, dtype=np.int)
    for ind, i in enumerate(nodelist):
#         print('i: %s' % i)
        region = int(float(i))/100
        place = int(float(i)) % 100
        if region == 0:
            a_arr[place] = int(ind)
    #         a_list.append(int(float(i)))
        elif region == 1:
            b_arr[place] = int(ind)
    #         b_list.append(int(float(i)))
        elif region == 2:
            c_arr[place] = int(ind)
    #         c_list.append(int(float(i)))

    a_region = a2out[a_arr.tolist()]  # V[0:9,:].mean(axis=0)
    print a_arr.tolist()
    b_region = a2out[b_arr.tolist()]  # V[10:19,:].mean(axis=0)
    c_region = a2out[c_arr.tolist()]  # V[20:29,:].mean(axis=0)
    se_regions = OrderedDict([('a', a_region),('b', b_region),('c', c_region)])
    
#     print(a_region)
    
    return se_regions

In [467]:
def get_dict_easy3(graphx, a2out):
    nodelist = graphx.nodes()
    a_arr = np.zeros(10, dtype=np.int)
    b_arr = np.zeros(10, dtype=np.int)
    c_arr = np.zeros(10, dtype=np.int)
    d_arr = np.zeros(10, dtype=np.int)
    for ind, i in enumerate(nodelist):
#         print('i: %s' % i)
        region = int(float(i))/10
        place = int(float(i)) % 10
        if region == 0:
            a_arr[place] = int(ind)
    #         a_list.append(int(float(i)))
        elif region == 1:
            b_arr[place] = int(ind)
    #         b_list.append(int(float(i)))
        elif region == 2:
            c_arr[place] = int(ind)
    #         c_list.append(int(float(i)))
        elif region == 3:
            d_arr[place] = int(ind)
    #         d_list.append(int(float(i)))

    a_region = a2out[a_arr.tolist()]  # V[0:9,:].mean(axis=0)
    print a_arr.tolist()
    b_region = a2out[b_arr.tolist()]  # V[10:19,:].mean(axis=0)
    c_region = a2out[c_arr.tolist()]  # V[20:29,:].mean(axis=0)
    d_region = a2out[d_arr.tolist()]  # V[20:29,:].mean(axis=0)
    se_regions = OrderedDict([('a', a_region),('b', b_region),('c', c_region),('d', d_region)])
    
#     print(a_region)
    
    return se_regions

In [468]:
def get_dict_hard(graphx, a2out):
    nodelist = graphx.nodes()
    a_arr = np.zeros(100, dtype=np.int)
    b_arr = np.zeros(100, dtype=np.int)
    c_arr = np.zeros(100, dtype=np.int)
    d_arr = np.zeros(100, dtype=np.int)
    for ind, i in enumerate(nodelist):
#         print('i: %s' % i)
        region = int(float(i))/100
        place = int(float(i)) % 100
        if region == 0:
            a_arr[place] = int(ind)
    #         a_list.append(int(float(i)))
        elif region == 1:
            b_arr[place] = int(ind)
    #         b_list.append(int(float(i)))
        elif region == 2:
            c_arr[place] = int(ind)
    #         c_list.append(int(float(i)))
        elif region == 3:
            d_arr[place] = int(ind)
    #         d_list.append(int(float(i)))

    a_region = a2out[a_arr.tolist()]  # V[0:9,:].mean(axis=0)
    print a_arr.tolist()
    b_region = a2out[b_arr.tolist()]  # V[10:19,:].mean(axis=0)
    c_region = a2out[c_arr.tolist()]  # V[20:29,:].mean(axis=0)
    d_region = a2out[d_arr.tolist()]  # V[20:29,:].mean(axis=0)
    se_regions = OrderedDict([('a', a_region),('b', b_region),('c', c_region),('d', d_region)])
    
    return se_regions

# Performing simulations

## Easy Simulations

### Easy Simulation 1

In [469]:
G, norm_dict = gen_easy()

In [470]:
# connected points are regions a to b to c
plot_connectivity(norm_dict)

avg_dict
OrderedDict([('a', [-0.041694897750491421, 0.0]), ('b', [2.0088111177663377, 0.0]), ('c', [-1.5634126617706636, 0.0])])
-1.56341266177
adj mat
[[ 0.          2.05050602  1.52171776]
 [ 2.05050602  0.          3.57222378]
 [ 1.52171776  3.57222378  0.        ]]
region a: c
region b: a
region c: a
[0.0, 0.0, None, 0.0, 0.0, None, 0.0, 0.0, None]


In [471]:
# plot the points and edges of the networkx graph object with epsilon ball of radius = 0.4
plot_graphml(G)

In [472]:
se_result = spec_clust(G, 2)

In [473]:
print(se_result)

[[ 0.41620447 -0.35212685]
 [ 0.43285265 -0.36154286]
 [ 0.39123221 -0.28691984]
 [ 0.3163154  -0.26926612]
 [ 0.34961176 -0.27145098]
 [ 0.36625994 -0.28717812]
 [ 0.44950083 -0.37410819]
 [ 0.38290812 -0.28033695]
 [ 0.48279719  0.68594212]
 [ 0.25804677  0.37430131]
 [ 0.36625994 -0.28215735]
 [ 0.39123221 -0.28801457]
 [ 0.52441764  0.74920978]
 [ 0.27469495  0.35715282]
 [ 0.44117674  0.60861805]
 [ 0.23307451  0.29408395]
 [ 0.45782492  0.6598222 ]
 [ 0.38290812  0.55382877]
 [ 0.30799131  0.44621788]
 [ 0.34961176 -0.27145098]
 [ 0.35793585 -0.30426621]
 [ 0.50776946  0.7268102 ]
 [ 0.50776946  0.72840533]
 [ 0.37458403 -0.28795555]
 [ 0.29966722  0.43424722]
 [ 0.42452856 -0.23882317]
 [ 0.52441764 -0.33323079]
 [ 0.557714   -0.35280446]
 [ 0.45782492 -0.2738383 ]
 [ 0.41620447 -0.28264687]
 [ 0.53274173 -0.32491314]
 [ 0.54106582 -0.33788635]
 [ 0.29134313 -0.14438304]
 [ 0.50776946  0.72371617]
 [ 0.20810224 -0.06578694]
 [ 0.557714   -0.35089398]
 [ 0.51609355  0.74132903]
 

In [474]:
se_regions = get_dict_easy(G, se_result)

[38, 266, 192, 123, 39, 267, 196, 126, 41, 268, 219, 218, 221, 220, 223, 222, 225, 224, 227, 226, 29, 30, 31, 32, 25, 26, 27, 28, 34, 35, 152, 151, 150, 149, 156, 155, 154, 153, 147, 146, 264, 265, 262, 263, 260, 261, 258, 259, 256, 257, 72, 71, 74, 73, 68, 67, 70, 69, 65, 64, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 288, 287, 286, 285, 284, 283, 282, 281, 291, 290, 110, 111, 108, 109, 114, 115, 112, 48, 102, 103, 211, 210, 213, 212, 215, 214, 217, 216, 205, 204]


In [475]:
plot_eigenvector(se_regions, 2)

a
b
c


In [476]:
c = get_connectivity(se_regions)
print(c)

a
b
c
[('a', ['c', 0.071797647414666932]), ('b', ['a', 0.79798145319760361]), ('c', ['a', 0.071797647414666932])]


In [477]:
plot_connectivity(se_regions)

avg_dict
OrderedDict([('a', [0.41745308730408648, -0.24206126859444488]), ('b', [0.39581045466245768, 0.55592018460315873]), ('c', [0.3880690514479524, -0.31385891600911181])])
0.388069051448
adj mat
[[ 0.          0.79827489  0.07757786]
 [ 0.79827489  0.          0.86981355]
 [ 0.07757786  0.86981355  0.        ]]
region a: c
region b: a
region c: a
[-0.24206126859444488, -0.31385891600911181, None, 0.55592018460315873, -0.24206126859444488, None, -0.31385891600911181, -0.24206126859444488, None]


In [478]:
se_result = spec_clust_drop(G, 3)
se_regions = get_dict_easy(G, se_result)

[38, 266, 192, 123, 39, 267, 196, 126, 41, 268, 219, 218, 221, 220, 223, 222, 225, 224, 227, 226, 29, 30, 31, 32, 25, 26, 27, 28, 34, 35, 152, 151, 150, 149, 156, 155, 154, 153, 147, 146, 264, 265, 262, 263, 260, 261, 258, 259, 256, 257, 72, 71, 74, 73, 68, 67, 70, 69, 65, 64, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 288, 287, 286, 285, 284, 283, 282, 281, 291, 290, 110, 111, 108, 109, 114, 115, 112, 48, 102, 103, 211, 210, 213, 212, 215, 214, 217, 216, 205, 204]


In [479]:
plot_connectivity3d(se_regions)

region a: c
region b: c
region c: a


### Easy Simulation 2

In [480]:
G, norm_dict = gen_easy2()

centers: 2.370292, 0.076484, 0.198303


In [481]:
# connected points are regions a to b to c
plot_connectivity(norm_dict)

avg_dict
OrderedDict([('a', [2.3180504243863123, 0.0]), ('b', [0.12126558740378701, 0.0]), ('c', [0.22178415175057509, 0.0])])
0.221784151751
adj mat
[[ 0.          2.19678484  2.09626627]
 [ 2.19678484  0.          0.10051856]
 [ 2.09626627  0.10051856  0.        ]]
region a: c
region b: c
region c: b
[0.0, 0.0, None, 0.0, 0.0, None, 0.0, 0.0, None]


In [482]:
# plot the points and edges of the networkx graph object with epsilon ball of radius = 0.4
plot_graphml(G)

In [483]:
se_result = spec_clust(G, 3)

In [484]:
print(se_result)

[[ 0.77781529 -0.42385432 -0.20817286]
 [ 0.51391367 -0.29082731  0.68093158]
 [ 0.2361225  -0.06604618 -0.49862543]
 [ 0.64586448 -0.34231302 -0.49092353]
 [ 0.73614661 -0.39916578 -0.28606629]
 [ 0.77781529 -0.42385432 -0.20817286]
 [ 0.20834338 -0.03507481 -0.45375074]
 [ 0.74309139 -0.4107023   0.07837446]
 [ 0.75698095 -0.41639911 -0.03574254]
 [ 0.44446588 -0.25234941  0.70694212]
 [ 0.69447794 -0.38608334  0.2095763 ]
 [ 0.6389197  -0.33761209 -0.50742559]
 [ 0.59030625 -0.29961406 -0.66389501]
 [ 0.59030625 -0.29961406 -0.66389501]
 [ 0.69447794 -0.38608334  0.2095763 ]
 [ 0.69447794 -0.38716037  0.30105596]
 [ 0.21528816 -0.12433077  0.72369877]
 [ 0.72920183 -0.4038617   0.12637356]
 [ 0.65975404 -0.34151132 -0.62851737]
 [ 0.14584037  0.0681654  -0.28224178]
 [ 0.2708464  -0.15582628  0.78652201]
 [ 0.7083675  -0.392758    0.14499167]
 [ 0.25695684  0.50020159  0.09249119]
 [ 0.43057632  0.82735547  0.07152043]
 [ 0.18750904  0.01109481 -0.40145723]
 [ 0.45141066  0.86127743

In [485]:
se_regions = get_dict_easy(G, se_result)

[36, 263, 184, 120, 37, 264, 188, 123, 39, 265, 213, 212, 215, 214, 217, 216, 219, 218, 221, 220, 26, 27, 28, 29, 22, 23, 24, 25, 32, 33, 149, 148, 147, 146, 153, 152, 151, 150, 145, 144, 261, 262, 259, 260, 257, 258, 255, 256, 253, 254, 71, 70, 73, 72, 67, 66, 69, 68, 64, 63, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 287, 286, 285, 284, 283, 282, 281, 280, 289, 288, 108, 109, 106, 107, 112, 113, 110, 111, 100, 101, 205, 204, 207, 206, 209, 208, 211, 210, 199, 198]


In [486]:
plot_eigenvector(se_regions, 1)

a
b
c


### Easy Simulation 3

In [487]:
G, norm_dict = gen_easy3()

In [488]:
# connected points are regions a to b to c
plot_connectivity(norm_dict)

avg_dict
OrderedDict([('a', [0.43112128244395703, 0.0]), ('b', [0.86385596172202261, 0.0]), ('c', [-1.4194860811038104, 0.0]), ('d', [3.1897050306629793, 0.0])])
3.18970503066
adj mat
[[ 0.          0.43273468  1.85060736  2.75858375]
 [ 0.43273468  0.          2.28334204  2.32584907]
 [ 1.85060736  2.28334204  0.          4.60919111]
 [ 2.75858375  2.32584907  4.60919111  0.        ]]
region a: b
region b: a
region c: a
region d: b
[0.0, 0.0, None, 0.0, 0.0, None, 0.0, 0.0, None, 0.0, 0.0, None]


In [489]:
# plot the points and edges of the networkx graph object with epsilon ball of radius = 0.4
plot_graphml(G)

IndexError: list index out of range

In [490]:
se_result = spec_clust(G, 2)

In [491]:
print(se_result)

[[  7.97683408e-01  -2.20475038e-01]
 [  7.97683408e-01  -2.20475038e-01]
 [  7.97683408e-01  -2.20475038e-01]
 [  9.30630643e-01  -2.57220878e-01]
 [  2.65894469e-01  -7.34916793e-02]
 [  7.97683408e-01  -2.20475038e-01]
 [  7.97683408e-01  -2.20475038e-01]
 [  3.98841704e-01  -1.10237519e-01]
 [  3.98841704e-01  -1.10237519e-01]
 [  7.97683408e-01  -2.20475038e-01]
 [ -9.51654695e-02  -7.06406094e-03]
 [ -6.05598443e-02  -4.49531151e-03]
 [ -2.59542190e-02  -1.92656207e-03]
 [ -9.51654695e-02  -7.06406094e-03]
 [  1.32947235e-01  -3.67458397e-02]
 [ -8.65140632e-02  -6.42187358e-03]
 [ -8.65140632e-02  -6.42187358e-03]
 [ -8.65140632e-02  -6.42187358e-03]
 [ -8.65140632e-02  -6.42187358e-03]
 [ -7.78626569e-02  -5.77968622e-03]
 [ -1.06396135e-17  -4.12496521e-17]
 [  2.12946454e-01   7.82085886e-01]
 [ -1.03816876e-01  -7.70624830e-03]
 [ -5.19084379e-02  -3.85312415e-03]
 [ -6.05598443e-02  -4.49531151e-03]
 [ -3.46056253e-02  -2.56874943e-03]
 [ -4.32570316e-02  -3.21093679e-03]
 

In [492]:
se_regions = get_dict_easy3(G, se_result)

[11, 10, 13, 12, 15, 14, 17, 16, 19, 18]


In [493]:
plot_eigenvector(se_regions, 2)

a
b
c
d


In [494]:
d3 = get_connectivity(se_regions)
print(d3)

a
b
c
d
[('a', ['b', 0.0038030214370806427]), ('b', ['a', 0.0038030214370806427]), ('c', ['a', 0.17852748067362789]), ('d', ['b', 0.59945855314351482])]


In [495]:
se_result = spec_clust(G, 2)

In [496]:
print(se_result)

[[  3.53655329e-01  -4.34188092e-01]
 [  3.53655329e-01  -4.34188092e-01]
 [  3.53655329e-01  -4.34188092e-01]
 [  4.12597884e-01  -5.06552774e-01]
 [  1.17885110e-01  -1.44729364e-01]
 [  3.53655329e-01  -4.34188092e-01]
 [  3.53655329e-01  -4.34188092e-01]
 [  1.76827665e-01  -2.17094046e-01]
 [  1.76827665e-01  -2.17094046e-01]
 [  3.53655329e-01  -4.34188092e-01]
 [  7.13121726e-01  -1.07817506e-01]
 [  4.53804734e-01  -6.86111405e-02]
 [  1.94487743e-01  -2.94047745e-02]
 [  7.13121726e-01  -1.07817506e-01]
 [  5.89425548e-02  -7.23646820e-02]
 [  6.48292478e-01  -9.80159150e-02]
 [  6.48292478e-01  -9.80159150e-02]
 [  6.48292478e-01  -9.80159150e-02]
 [  6.48292478e-01  -9.80159150e-02]
 [  5.83463230e-01  -8.82143235e-02]
 [  6.43763597e-18   7.14104762e-18]
 [  3.10863920e-01   6.84568128e-01]
 [  7.77950973e-01  -1.17619098e-01]
 [  3.88975487e-01  -5.88095490e-02]
 [  4.53804734e-01  -6.86111405e-02]
 [  2.59316991e-01  -3.92063660e-02]
 [  3.24146239e-01  -4.90079575e-02]
 

In [497]:
se_regions = get_dict_easy4(G, se_result)

[11, 10, 13, 12, 15, 14, 17, 16, 19, 18]


In [498]:
plot_connectivity(se_regions)

avg_dict
OrderedDict([('a', [0.53101116248957481, -0.086629359337388398]), ('b', [0.51215105744988598, -0.077432572836984015]), ('c', [0.30060702968375996, -0.3690598782374353]), ('d', [0.23625657954544108, 0.52027177760979981])])
0.236256579545
adj mat
[[ 0.          0.02098296  0.36449014  0.67469197]
 [ 0.02098296  0.          0.36027401  0.65830711]
 [ 0.36449014  0.36027401  0.          0.89165676]
 [ 0.67469197  0.65830711  0.89165676  0.        ]]
region a: b
region b: a
region c: b
region d: b
[-0.086629359337388398, -0.077432572836984015, None, -0.077432572836984015, -0.086629359337388398, None, -0.3690598782374353, -0.077432572836984015, None, 0.52027177760979981, -0.077432572836984015, None]


In [499]:
se_result = spec_clust(G, 4)
se_regions = get_dict_easy4(G, se_result)

[11, 10, 13, 12, 15, 14, 17, 16, 19, 18]


In [500]:
get_connectivity_tot(se_regions)

a
b
c
d


[('a', ['d', 0.040839419634557422]),
 ('b', ['d', 0.054178840618800267]),
 ('c', ['a', 0.099895746881983549]),
 ('d', ['a', 0.040839419634557422])]

## Hard Simulations

### Hard Simulation 1

In [501]:
G, norm_dict = gen_hard()

In [502]:
# connected points are regions a to b to c
plot_connectivity(norm_dict)

avg_dict
OrderedDict([('a', [-0.13262056680304143, -0.00026321318111222053]), ('b', [-0.052014327254939687, 2.0671585607300034]), ('c', [1.9722442624361329, 2.1749830143957016]), ('d', [2.0552163210033725, 0.054473387829954811])])
2.055216321
adj mat
[[ 0.          2.06899255  3.02690471  2.1885215 ]
 [ 2.06899255  0.          2.02712825  2.91398741]
 [ 3.02690471  2.02712825  0.          2.12213229]
 [ 2.1885215   2.91398741  2.12213229  0.        ]]
region a: b
region b: c
region c: b
region d: c
[-0.00026321318111222053, 2.0671585607300034, None, 2.0671585607300034, 2.1749830143957016, None, 2.1749830143957016, 2.0671585607300034, None, 0.054473387829954811, 2.1749830143957016, None]


In [503]:
# plot the points and edges of the networkx graph object with epsilon ball of radius = 0.4
plot_graphml(G)

In [504]:
se_result = spec_clust(G, 2)

In [505]:
print(se_result)

[[  1.36733028e-01   1.39271128e-01]
 [  4.97211010e-02  -2.42972784e-02]
 [  9.94422020e-02   1.27108651e-02]
 [  2.48605505e-02   2.89191038e-02]
 [  1.24302753e-02   1.44004912e-02]
 [  4.97211010e-02   3.06855889e-02]
 [  7.45816515e-02   6.65065518e-02]
 [  2.61035780e-01   2.78182219e-01]
 [  1.98884404e-01   2.07816696e-01]
 [  1.74023854e-01   1.96092227e-01]
 [  2.11314679e-01  -2.44628208e-01]
 [  2.36175230e-01  -2.61547473e-01]
 [  1.49163303e-01  -1.39661243e-01]
 [  3.23187157e-01  -3.76554093e-01]
 [  1.74023854e-01  -1.98683572e-01]
 [  2.48605505e-02  -2.91665422e-02]
 [  1.36733028e-01  -1.59884466e-01]
 [  9.94422020e-02  -1.06515107e-01]
 [  3.35617432e-01  -3.89323114e-01]
 [  1.24302753e-01  -1.35282500e-01]
 [  7.45816515e-02  -9.53123766e-03]
 [  4.97211010e-02  -5.17548149e-02]
 [  1.11872477e-01  -1.13831063e-01]
 [  1.74023854e-01  -1.68752430e-01]
 [  2.48605505e-01  -2.50071324e-01]
 [  4.97211010e-02  -4.30246847e-03]
 [  1.74023854e-01  -1.77311560e-01]
 

In [506]:
# z = np.zeros([se_result.shape[0], 1])
# se_result = np.hstack((se_result,z))
# print(se_result)

In [507]:
se_regions = get_dict_hard(G, se_result)

[1, 92, 221, 244, 314, 399, 71, 149, 222, 291, 181, 180, 183, 182, 185, 184, 187, 186, 189, 188, 117, 118, 119, 120, 113, 114, 115, 116, 121, 122, 55, 54, 53, 52, 59, 58, 57, 56, 51, 50, 397, 398, 395, 396, 393, 394, 391, 392, 389, 390, 340, 339, 342, 341, 336, 335, 338, 337, 334, 333, 265, 266, 190, 343, 269, 270, 271, 272, 223, 351, 218, 217, 216, 215, 214, 213, 212, 211, 220, 219, 143, 144, 141, 142, 147, 148, 145, 146, 135, 136, 75, 74, 77, 76, 79, 78, 81, 80, 73, 72]


In [508]:
print(se_regions)

OrderedDict([('a', array([[ 0.0497211 , -0.02429728],
       [ 0.07458165,  0.06395582],
       [ 0.17402385,  0.13777823],
       [ 0.08701193,  0.05016243],
       [ 0.13673303,  0.0969203 ],
       [ 0.23617523,  0.18626509],
       [ 0.22374495,  0.17146254],
       [ 0.06215138,  0.04985733],
       [ 0.37290826,  0.28511085],
       [ 0.21131468,  0.1672369 ],
       [ 0.21131468,  0.16450839],
       [ 0.22374495,  0.16278938],
       [ 0.26103578,  0.20484438],
       [ 0.36047798,  0.27694194],
       [ 0.12430275,  0.0996499 ],
       [ 0.39776881,  0.30749188],
       [ 0.32318716,  0.25105911],
       [ 0.08701193,  0.06780348],
       [ 0.17402385,  0.1872814 ],
       [ 0.07458165,  0.02039207],
       [ 0.27346606,  0.21563041],
       [ 0.34804771,  0.26730126],
       [ 0.18645413,  0.13230993],
       [ 0.03729083,  0.02969706],
       [ 0.28589633,  0.21061373],
       [ 0.37290826,  0.29088387],
       [ 0.06215138,  0.04985733],
       [ 0.22374495,  0.17645761],
 

In [509]:
plot_eigenvector(se_regions, 2)

a
b
c
d


In [510]:
se_result = spec_clust(G, 2)

In [511]:
print(se_result)

[[  1.36733028e-01   1.39271128e-01]
 [  4.97211010e-02  -2.42972784e-02]
 [  9.94422020e-02   1.27108651e-02]
 [  2.48605505e-02   2.89191038e-02]
 [  1.24302753e-02   1.44004912e-02]
 [  4.97211010e-02   3.06855889e-02]
 [  7.45816515e-02   6.65065518e-02]
 [  2.61035780e-01   2.78182219e-01]
 [  1.98884404e-01   2.07816696e-01]
 [  1.74023854e-01   1.96092227e-01]
 [  2.11314679e-01  -2.44628208e-01]
 [  2.36175230e-01  -2.61547473e-01]
 [  1.49163303e-01  -1.39661243e-01]
 [  3.23187157e-01  -3.76554093e-01]
 [  1.74023854e-01  -1.98683572e-01]
 [  2.48605505e-02  -2.91665422e-02]
 [  1.36733028e-01  -1.59884466e-01]
 [  9.94422020e-02  -1.06515107e-01]
 [  3.35617432e-01  -3.89323114e-01]
 [  1.24302753e-01  -1.35282500e-01]
 [  7.45816515e-02  -9.53123766e-03]
 [  4.97211010e-02  -5.17548149e-02]
 [  1.11872477e-01  -1.13831063e-01]
 [  1.74023854e-01  -1.68752430e-01]
 [  2.48605505e-01  -2.50071324e-01]
 [  4.97211010e-02  -4.30246847e-03]
 [  1.74023854e-01  -1.77311560e-01]
 

In [512]:
# z = np.zeros([se_result.shape[0], 1])
# se_result = np.hstack((se_result,z))
# print(se_result)

In [513]:
se_regions = get_dict_hard(G, se_result)

[1, 92, 221, 244, 314, 399, 71, 149, 222, 291, 181, 180, 183, 182, 185, 184, 187, 186, 189, 188, 117, 118, 119, 120, 113, 114, 115, 116, 121, 122, 55, 54, 53, 52, 59, 58, 57, 56, 51, 50, 397, 398, 395, 396, 393, 394, 391, 392, 389, 390, 340, 339, 342, 341, 336, 335, 338, 337, 334, 333, 265, 266, 190, 343, 269, 270, 271, 272, 223, 351, 218, 217, 216, 215, 214, 213, 212, 211, 220, 219, 143, 144, 141, 142, 147, 148, 145, 146, 135, 136, 75, 74, 77, 76, 79, 78, 81, 80, 73, 72]


In [514]:
cdict = get_connectivity(se_regions)
print(cdict)

a
b
c
d
[('a', ['d', 0.072965763247196558]), ('b', ['c', 0.048179427048437806]), ('c', ['b', 0.048179427048437806]), ('d', ['a', 0.072965763247196558])]


## Testing some shit

In [182]:
se_result = spec_clust(G, 4)

In [183]:
print(se_result)

[[-0.42385432 -0.20817286  0.09264204 -0.448313  ]
 [-0.29082731  0.68093158 -0.07959814 -0.03084334]
 [-0.06604618 -0.49862543 -0.06485032  0.73072486]
 ..., 
 [-0.4191753  -0.24531589  0.09542103 -0.42170542]
 [-0.38418343  0.38596145  0.00238746 -0.43733618]
 [-0.30204519  0.63380395 -0.06579654 -0.10983295]]


In [184]:
eigen_vector = se_result[:,0]

eigen_vector = np.array(eigen_vector)[np.newaxis]
eigen_vector = eigen_vector.T
print(eigen_vector)

am = nx.adjacency_matrix(G)

[[-0.42385432]
 [-0.29082731]
 [-0.06604618]
 [-0.34231302]
 [-0.39916578]
 [-0.42385432]
 [-0.03507481]
 [-0.4107023 ]
 [-0.41639911]
 [-0.25234941]
 [-0.38608334]
 [-0.33761209]
 [-0.29961406]
 [-0.29961406]
 [-0.38608334]
 [-0.38716037]
 [-0.12433077]
 [-0.4038617 ]
 [-0.34151132]
 [ 0.0681654 ]
 [-0.15582628]
 [-0.392758  ]
 [ 0.50020159]
 [ 0.82735547]
 [ 0.01109481]
 [ 0.86127743]
 [ 0.36973699]
 [ 0.75894298]
 [ 0.00802111]
 [ 0.75678464]
 [-0.31627709]
 [-0.17943822]
 [ 0.43330366]
 [ 0.81489548]
 [-0.34618147]
 [-0.16764641]
 [ 0.72012313]
 [ 0.59228136]
 [-0.32779913]
 [ 0.40903669]
 [-0.13096909]
 [-0.13460836]
 [-0.40355669]
 [-0.42121036]
 [-0.05149444]
 [-0.31627709]
 [-0.32393575]
 [-0.23003877]
 [-0.26772898]
 [-0.39196672]
 [-0.33292304]
 [-0.39423376]
 [-0.05484561]
 [-0.12423996]
 [-0.31627709]
 [-0.43282641]
 [-0.3969859 ]
 [-0.11243585]
 [-0.19412328]
 [-0.19805345]
 [-0.3018733 ]
 [-0.34618147]
 [-0.22302741]
 [ 0.15212669]
 [ 0.67995201]
 [-0.17943822]
 [ 0.84607

In [185]:
print(am.shape)
print(eigen_vector.shape)

(300, 300)
(300, 1)


In [195]:
r = np.dot(am.todense(), eigen_vector)
print(r.shape)
print(r)

(300, 1)
[[ -4.27040016e+01]
 [ -2.34450247e+01]
 [ -4.09711036e+00]
 [ -3.32390378e+01]
 [ -4.02106977e+01]
 [ -4.27040016e+01]
 [ -2.55100839e+00]
 [ -4.09963797e+01]
 [ -4.18754312e+01]
 [ -1.90689562e+01]
 [ -3.83134087e+01]
 [ -3.26072862e+01]
 [ -2.64598472e+01]
 [ -2.64598472e+01]
 [ -3.83134087e+01]
 [ -3.76152833e+01]
 [ -5.20952546e+00]
 [ -4.02253807e+01]
 [ -3.15921673e+01]
 [  1.68827670e+00]
 [ -7.72677948e+00]
 [ -3.92095453e+01]
 [  2.33513912e+01]
 [  4.55593387e+01]
 [ -6.39979927e-01]
 [  4.85780635e+01]
 [  1.74199503e+01]
 [  4.38810693e+01]
 [ -8.31029511e-01]
 [  4.35433683e+01]
 [ -2.85056723e+01]
 [ -9.60127785e+00]
 [  1.91470012e+01]
 [  4.67300834e+01]
 [ -3.36279273e+01]
 [ -8.64362895e+00]
 [  4.03879128e+01]
 [  3.04233274e+01]
 [ -2.97197390e+01]
 [  1.96574795e+01]
 [ -8.94811471e+00]
 [ -9.27548071e+00]
 [ -4.07380431e+01]
 [ -4.20845481e+01]
 [ -3.64596786e+00]
 [ -2.85056723e+01]
 [ -2.93444281e+01]
 [ -1.47356449e+01]
 [ -2.08559260e+01]
 [ -3.88034

In [139]:
# r2 = np.einsum('ij,jk->ik', am, eigen_vector)
# print(r2.shape)
# print(r2)

ValueError: einstein sum subscripts string contains too many subscripts for operand 0

In [197]:
eig_val = r[0,0] / eigen_vector[0,0]
print(eig_val)

100.75160101


In [198]:
eig_val = r[1,0] / eigen_vector[1,0]
print(eig_val)

80.6149338568


## OLD

In [53]:
# A = nx.adjacency_matrix(G)
A = nx.to_numpy_matrix(G) 
A2 = nx.adjacency_matrix(G)
nodelist = G.nodes()

In [22]:
# apply eigen decomposition to obtain the eigenvalues (diagonal matrix D) and the eigenvectors (n x n matrix V)
D, V = LA.eig(A)
D = np.diagflat(D)

# following lines in this cell aren't necessary, just used to verify that V * D * V^-1 = A, where A is the adjacency matrix
b = np.matrix(V)*np.matrix(D)*np.matrix(LA.inv(V))
# out = np.dot(V, D, LA.inv(V))
dotm = lambda *args: reduce(np.dot, args)
out = dotm(V, D, LA.inv(V))

In [23]:
# verification of whether A = V * D * V^-1 is true
np.allclose(A,b)

True

In [24]:
# print V
np.savetxt('eigenvectorA.txt',V,delimiter="\t")

In [25]:
import os
cwd = os.getcwd()

In [26]:
print V.shape

(400, 400)


In [27]:
a_mean = np.zeros((1,2))  
b_mean = np.zeros((1,2))  
c_mean = np.zeros((1,2))  
d_mean = np.zeros((1,2))  

a_num = 0
b_num = 0
c_num = 0
d_num = 0

for ind, i in enumerate(nodelist):
    region = int(float(i))/100
    if region == 0:
        a_mean = np.add(V[ind,0:2],a_mean)
        a_num += 1
    elif region == 1:
        b_mean = np.add(V[ind,0:2],b_mean)
        b_num += 1
    elif region == 2:
        c_mean = np.add(V[ind,0:2],c_mean)
        c_num += 1
    elif region == 3:
        d_mean = np.add(V[ind,0:2],d_mean)
        d_num += 1
a_mean = a_mean/100.
b_mean = b_mean/100.
c_mean = c_mean/100.
d_mean = d_mean/100.

print a_num # check to make sure each cluster has 100 points
# print b_num
# print c_num
# print d_num

100


In [28]:
print a_mean.shape

(1, 2)


In [29]:
test = np.column_stack((np.transpose(a_mean),np.transpose(b_mean),np.transpose(c_mean),np.transpose(d_mean)))
# test = np.column_stack((a_mean, b_mean, c_mean, d_mean))
print test.shape

(2, 4)


In [30]:
# print test
import scipy

In [31]:
names = ['a','b','c','d']
for i in range(len(np.transpose(test))):
    dists = []
    dists2 = []
    print 'region ' + names[i]
    for j in range(len(np.transpose(test))):
        if i == j:
            continue
        dist = LA.norm(test[:,i] - test[:,j])
#         dist2 = scipy.spatial.distance.euclidean(test[:,i],test[:,j]) # alternative to finding euclidean distance
        dists.append(dist)
        dists2.append(dist2)
    print dists
#     print dists2
    near = dists.index(min(dists))
#     print np.exp(dists[near])
    if near >= i:
        near += 1
    print 'Nearest connection is between region ' + names[i] + ' and ' + names[near] + "\n"

region a


NameError: name 'dist2' is not defined

## Spectral Embedding

In [33]:
a2out = se(A2,n_components=2,drop_first=True)
print a2out.shape

(400, 2)



Graph is not fully connected, spectral embedding may not work as expected.



In [34]:
print nodelist.index('0')

1


In [35]:
print(A2)

  (0, 6)	1
  (0, 7)	1
  (0, 63)	1
  (0, 123)	1
  (0, 189)	1
  (0, 243)	1
  (0, 251)	1
  (0, 252)	1
  (0, 354)	1
  (0, 363)	1
  (0, 374)	1
  (1, 105)	1
  (1, 140)	1
  (1, 174)	1
  (1, 192)	1
  (2, 91)	1
  (2, 126)	1
  (2, 157)	1
  (2, 196)	1
  (2, 235)	1
  (2, 237)	1
  (2, 264)	1
  (2, 320)	1
  (3, 197)	1
  (3, 301)	1
  :	:
  (398, 185)	1
  (398, 221)	1
  (398, 269)	1
  (398, 337)	1
  (398, 351)	1
  (398, 399)	1
  (399, 50)	1
  (399, 54)	1
  (399, 55)	1
  (399, 58)	1
  (399, 75)	1
  (399, 76)	1
  (399, 117)	1
  (399, 136)	1
  (399, 144)	1
  (399, 146)	1
  (399, 148)	1
  (399, 183)	1
  (399, 184)	1
  (399, 221)	1
  (399, 266)	1
  (399, 269)	1
  (399, 337)	1
  (399, 351)	1
  (399, 398)	1


In [36]:
unique, counts = np.unique(np.asarray(A)[:,1], return_counts=True)
dict(zip(unique, counts))

{0.0: 396, 1.0: 4}

In [37]:
print a2out

[[  1.39271128e-01  -1.47073685e-01]
 [ -2.42972784e-02   8.52426383e-05]
 [  1.27108651e-02  -9.44604855e-02]
 [  2.89191038e-02  -3.13306059e-02]
 [  1.44004912e-02  -1.52538651e-02]
 [  3.06855889e-02  -2.45646579e-02]
 [  6.65065518e-02  -6.83683774e-02]
 [  2.78182219e-01  -3.08212668e-01]
 [  2.07816696e-01  -2.34361366e-01]
 [  1.96092227e-01  -1.98066416e-01]
 [ -2.44628208e-01  -1.15346274e-01]
 [ -2.61547473e-01  -1.50072037e-01]
 [ -1.39661243e-01  -1.01445355e-01]
 [ -3.76554093e-01  -1.93917218e-01]
 [ -1.98683572e-01  -8.73039451e-02]
 [ -2.91665422e-02  -1.38953227e-02]
 [ -1.59884466e-01  -8.34738060e-02]
 [ -1.06515107e-01  -2.98546880e-02]
 [ -3.89323114e-01  -2.03040224e-01]
 [ -1.35282500e-01  -8.01016587e-02]
 [ -9.53123766e-03   6.63512861e-02]
 [ -5.17548149e-02   2.38336964e-02]
 [ -1.13831063e-01   5.38326679e-02]
 [ -1.68752430e-01   5.84017586e-02]
 [ -2.50071324e-01   1.18834173e-01]
 [ -4.30246847e-03   3.95968662e-02]
 [ -1.77311560e-01   5.04399562e-02]
 

In [38]:
print len(nodelist)
print(nodelist)

400
['344', '0', '346', '347', '340', '341', '342', '343', '348', '349', '298', '299', '296', '297', '294', '295', '292', '293', '290', '291', '199', '198', '195', '194', '197', '196', '191', '190', '193', '192', '270', '271', '272', '273', '274', '275', '276', '277', '278', '279', '108', '109', '102', '103', '100', '101', '106', '107', '104', '105', '39', '38', '33', '32', '31', '30', '37', '36', '35', '34', '339', '338', '335', '334', '337', '336', '331', '330', '333', '332', '345', '6', '99', '98', '91', '90', '93', '92', '95', '94', '97', '96', '238', '239', '234', '235', '236', '237', '230', '231', '232', '233', '1', '146', '147', '144', '145', '142', '143', '140', '141', '148', '149', '133', '132', '131', '130', '137', '136', '135', '134', '139', '138', '24', '25', '26', '27', '20', '21', '22', '23', '28', '29', '379', '378', '371', '370', '373', '372', '375', '374', '377', '376', '393', '392', '88', '89', '397', '396', '395', '394', '82', '83', '80', '81', '86', '87', '84', '85'

In [39]:
a_list = []
b_list = []
c_list = []
d_list = []
for ind, i in enumerate(nodelist):
    region = int(float(i))/100
    if region == 0:
        a_list.append(ind)
#         a_list.append(int(float(i)))
    elif region == 1:
        b_list.append(ind)
#         b_list.append(int(float(i)))
    elif region == 2:
        c_list.append(ind)
#         c_list.append(int(float(i)))
    elif region == 3:
        d_list.append(ind)
#         d_list.append(int(float(i)))
        
a_region = a2out[a_list]  # V[0:9,:].mean(axis=0)
# print a_region
b_region = a2out[b_list]  # V[10:19,:].mean(axis=0)
c_region = a2out[c_list]  # V[20:29,:].mean(axis=0)
d_region = a2out[d_list]  # V[30:39,:].mean(axis=0)
se_regions = OrderedDict([('a', a_region),('b', b_region),('c', c_region),('d', d_region)])

In [40]:
print a_list

[1, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 92, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 135, 136, 141, 142, 143, 144, 145, 146, 147, 148, 149, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 244, 265, 266, 269, 270, 271, 272, 291, 314, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 351, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399]


In [41]:
a_r_avg = a_region.mean(axis=0)
print a_r_avg
b_r_avg = b_region.mean(axis=0)
c_r_avg = c_region.mean(axis=0)
d_r_avg = d_region.mean(axis=0)

[ 0.16076066  0.28616511]


In [42]:
plot_connectivity(se_regions)

region a: b
region b: c
region c: b
region d: c


### testing on Fear199.graphml

In [429]:
def add_to_dict(d, region, index):
    if region in d:
        d[region].append(index)
    else:
        d[region] = [index]
        
    return d

In [436]:
def get_dict_real(se_result, g, regions_path):
    aut_nodes = g.nodes()
    points = np.genfromtxt(regions_path, delimiter=',')
    
    d = {}

    for index, node in enumerate(aut_nodes):
        point1 = points[index]
        region = point1[4]
        s = aut.node[node]['attr']
        point2 = ast.literal_eval(s)
        add_to_dict(d, region, index)
    
    m = {}
    for key, value in d.iteritems():
        index_list = value
        m[str(key)] = aut_se[index_list]
    
    return m

In [516]:
fear = nx.read_graphml("Fear199.graphml")

fear_se = spec_clust(fear, 3)
print aut_se.shape

(9904, 3)


In [517]:
se_regions = get_dict_real(fear_se, fear, 'Fear199_regions.csv')

In [518]:
plot_connectivity3d(se_regions)

region 155.0: 952.0
region 407.0: 974.0
region 919.0: 311.0
region 667.0: 162.0
region 601.0: 708.0
region 175.0: 412.0
region 163.0: 10686.0
region 1062.0: 298.0
region 973.0: 878.0
region 880.0: 857.0
region 401.0: 1045.0
region 296.0: 228.0
region 314.0: 188.0
region 661.0: 537.0
region 260.0: 206.0
region 169.0: 1066.0
region 431.0: 882.0
region 262.0: 240.0
region 1096.0: 448.0
region 45.0: 10723.0
region 642.0: 206.0
region 1104.0: 42.0
region 1098.0: 430.0
region 92.0: 749.0
region 171.0: 59.0
region 959.0: 501.0
region 1094.0: 581.0
region 646.0: 880.0
region 675.0: 1044.0
region 298.0: 1062.0
region 421.0: 910.0
region 966.0: 854.0
region 945.0: 156.0
region 1066.0: 169.0
region 1090.0: 771.0
region 427.0: 473.0
region 55.0: 196.0
region 276.0: 1034.0
region 1141.0: 573.0
region 587.0: 529.0
region 429.0: 250.0
region 327.0: 678.0
region 581.0: 1094.0
region 652.0: 562.0
region 310.0: 10720.0
region 195.0: 440.0
region 969.0: 28.0
region 1088.0: 1031.0
region 96.0: 636.0
regio

In [520]:
r = get_connectivity(se_regions)
print(r)

155.0
407.0
919.0
667.0
601.0
175.0
163.0
1062.0
973.0
880.0
401.0
296.0
314.0
661.0
260.0
169.0
431.0
262.0
1096.0
45.0
642.0
1104.0
1098.0
92.0
171.0
959.0
1094.0
646.0
675.0
298.0
421.0
966.0
945.0
1066.0
1090.0
427.0
55.0
276.0
1141.0
587.0
429.0
327.0
581.0
652.0
310.0
195.0
969.0
1088.0
96.0
312.0
774.0
1121.0
246.0
853.0
330.0
188.0
191.0
244.0
591.0
583.0
1139.0
228.0
268.0
698.0
1102.0
688.0
1086.0
778.0
59.0
10677.0
684.0
694.0
565.0
304.0
585.0
308.0
338.0
654.0
199.0
1114.0
321.0
656.0
873.0
573.0
943.0
860.0
837.0
325.0
334.0
1125.0
197.0
575.0
266.0
593.0
811.0
0.0
682.0
577.0
41.0
532.0
1079.0
955.0
857.0
258.0
10704.0
61.0
772.0
68.0
1118.0
750.0
10698.0
1035.0
363.0
53.0
200.0
845.0
206.0
748.0
1021.0
819.0
847.0
851.0
1110.0
248.0
1023.0
240.0
551.0
204.0
489.0
224.0
278.0
1005.0
821.0
534.0
829.0
226.0
137.0
749.0
1128.0
512.0
764.0
130.0
274.0
510.0
712.0
827.0
132.0
202.0
540.0
559.0
542.0
12.0
520.0
250.0
220.0
1047.0
869.0
10.0
544.0
222.0
218.0
10735.0
377.0
216

## Old read data

In [515]:
aut = nx.read_graphml("Fear199.graphml")

aut_am = nx.adjacency_matrix(aut)
aut_nodes = aut.nodes()

In [423]:
# aut_se = se(aut_am, n_components=3,drop_first=True)
# print aut_se.shape

aut_se = spec_clust(aut, 3)
print aut_se.shape


(9904, 3)


In [424]:
points = np.genfromtxt('Fear199_regions.csv', delimiter=',')

In [425]:
print(aut_am.shape)

(9904, 9904)


In [426]:
print(len(aut_nodes))
print(aut_nodes)

9904
['s2040', 's2041', 's2042', 's2043', 's2044', 's2045', 's2046', 's2047', 's2048', 's2049', 's2734', 's2738', 's2739', 's2735', 's2732', 's2733', 's2730', 's2731', 's6535', 's3076', 's3077', 's3074', 's3075', 's7007', 's7006', 's3036', 's7557', 's1429', 's7556', 's7003', 's7002', 's8806', 's8551', 's8552', 's8553', 's8802', 's8555', 's8556', 's8557', 's8558', 's8559', 's7000', 's4506', 's8808', 's8809', 's6299', 's6298', 's6749', 's6748', 's3034', 's1400', 's6291', 's6290', 's6741', 's6292', 's6295', 's6294', 's6297', 's6296', 's3719', 's3718', 's3713', 's3712', 's3711', 's3710', 's3717', 's3716', 's3715', 's3714', 's3035', 's912', 's1406', 's6127', 's6126', 's6125', 's6124', 's799', 's798', 's6121', 's6120', 's795', 's794', 's797', 's796', 's791', 's790', 's793', 's792', 's3032', 's3033', 's1428', 's9603', 's9602', 's249', 's248', 's9607', 's9606', 's9605', 's9604', 's243', 's242', 's241', 's240', 's247', 's246', 's245', 's244', 's4297', 's4296', 's4295', 's4294', 's4293', 's4292'

In [427]:
print(aut_se.shape)
print(aut_se)

(9904, 3)
[[ 0.00643152  0.01367666 -0.009874  ]
 [ 0.01745698  0.03712236 -0.02680085]
 [ 0.01286304  0.02735332 -0.01974799]
 ..., 
 [ 0.01194425  0.02539951 -0.01833742]
 [ 0.01653819  0.03516855 -0.02539028]
 [ 0.00735031  0.01563047 -0.01128457]]


In [428]:
print(points)

[[  36.  162.  267.   48.  755.]
 [  37.  137.  315.   47.  755.]
 [  38.  115.  303.   43.  251.]
 ..., 
 [ 424.  124.  333.   43.  251.]
 [ 424.  162.  339.   40.  836.]
 [ 429.  126.  327.   41.  959.]]


In [87]:
d = {}

for index, node in enumerate(aut_nodes):
    point1 = points[index]
    region = point1[4]
    s = aut.node[node]['attr']
    point2 = ast.literal_eval(s)
    add_to_dict(d, region, index)

In [108]:
print(points[0])
print(points[1])
print(points[200])
print(points[3000])

[  50.  525.  349.  242.  242.]
[  60.  519.  252.  242.  242.]
[ 106.  658.  958.  237.  237.]
[ 333.  637.  445.  248.  248.]


In [97]:
m = {}
for key, value in d.iteritems():
    index_list = value
    m[str(key)] = aut_se[index_list]
    

# a_region = a2out[a_list]  # V[0:9,:].mean(axis=0)
# # print a_region
# b_region = a2out[b_list]  # V[10:19,:].mean(axis=0)
# c_region = a2out[c_list]  # V[20:29,:].mean(axis=0)
# d_region = a2out[d_list]  # V[30:39,:].mean(axis=0)
# se_regions = OrderedDict([('a', a_region),('b', b_region),('c', c_region),('d', d_region)])

In [98]:
print(m)

{'237.0': array([[  4.04575566e-05,   4.11609843e-03,  -1.57395833e-03],
       [  6.29564205e-03,  -3.71912434e-02,   1.58604882e-02],
       [  2.79806313e-03,  -1.65294415e-02,   7.04910588e-03],
       ..., 
       [  5.39434088e-05,   5.48813125e-03,  -2.09861111e-03],
       [  4.04575566e-05,   4.11609843e-03,  -1.57395833e-03],
       [  5.70001297e-03,   3.24740919e-02,  -4.71030043e-03]]), '235.0': array([[ -5.85960614e-03,   2.13226865e-03,  -2.71306312e-03],
       [  1.56018741e-02,  -7.54720774e-03,   8.60182460e-03],
       [  5.49325026e-03,  -9.69167963e-03,   3.90280260e-03],
       [ -3.75725134e-02,   4.27642569e-02,  -8.26710398e-02],
       [ -3.00580107e-02,   3.42114055e-02,  -6.61368319e-02],
       [ -4.33936134e-03,  -5.08713690e-03,  -4.29715936e-03],
       [  3.37146305e-05,   3.43008203e-03,  -1.31163195e-03],
       [  3.16667387e-03,   1.80411622e-02,  -2.61683357e-03],
       [ -3.13677920e-02,   1.63545585e-02,  -7.56210753e-03],
       [ -1.91249963e

In [101]:
plot_connectivity3d(m)

region 237.0: 253.0
region 235.0: 232.0
region 250.0: 232.0
region 236.0: 234.0
region 247.0: 230.0
region 255.0: 241.0
region 248.0: 247.0
region 251.0: 231.0
region 249.0: 234.0
region 246.0: 244.0
region 245.0: 249.0
region 240.0: 231.0
region 241.0: 255.0
region 254.0: 244.0
region 244.0: 254.0
region 242.0: 254.0
region 243.0: 230.0
region 234.0: 249.0
region 233.0: 253.0
region 232.0: 250.0
region 231.0: 255.0
region 230.0: 247.0
region 239.0: 253.0
region 238.0: 243.0
region 253.0: 233.0


### testing on Aut1367.graphml

In [48]:
aut = nx.read_graphml("Aut1367.graphml")

aut_am = nx.adjacency_matrix(aut)
aut_nodes = aut.nodes()

In [44]:
aut_se = se(aut_am, n_components=3,drop_first=True)
print aut_se.shape

(5090, 3)


In [106]:
print(fear_se)

[[ -1.28038741e-03   6.99428950e-04  -2.04421902e-03]
 [  6.04580620e-27   9.08969517e-23  -2.99327109e-22]
 [  1.40076039e-03   5.97098700e-03  -6.80572655e-03]
 ..., 
 [ -4.48135595e-03   2.44800132e-03  -7.15476656e-03]
 [ -3.84116224e-03   2.09828685e-03  -6.13265705e-03]
 [ -6.14585959e-03   3.35725896e-03  -9.81225129e-03]]


In [46]:
points = np.genfromtxt('Aut1367.region.csv', delimiter=',')

In [53]:
print(aut_am.shape)

(5090, 5090)


In [56]:
print(len(aut_nodes))
print(aut_nodes)

5090
['s2040', 's2041', 's2042', 's2043', 's2044', 's2045', 's2046', 's2047', 's2048', 's2049', 's2734', 's2738', 's2739', 's2286', 's2287', 's2284', 's2285', 's2282', 's2283', 's2280', 's2281', 's2732', 's3296', 's2288', 's2289', 's1218', 's1838', 's1839', 's1219', 's1834', 's1835', 's1836', 's1837', 's1830', 's1831', 's1832', 's1833', 's57', 's56', 's55', 's54', 's53', 's52', 's51', 's50', 's1813', 's59', 's58', 's1748', 's691', 's1749', 's1210', 's2138', 's2139', 's1211', 's2134', 's2135', 's2136', 's2137', 's2130', 's2131', 's2132', 's2133', 's4319', 's4063', 's1214', 's181', 's1215', 's5090', 's1216', 's1217', 's1409', 's1408', 's1792', 's3845', 's1562', 's1563', 's3191', 's327', 's1403', 's3719', 's3718', 's1934', 's3713', 's3712', 's3711', 's3710', 's3717', 's3716', 's3715', 's3714', 's3190', 's2786', 's1568', 's1111', 's1569', 's1930', 's4151', 's320', 's799', 's798', 's3399', 's3398', 's795', 's794', 's797', 's796', 's791', 's790', 's793', 's792', 's1007', 's3159', 's3158', 's

In [54]:
print(aut_se.shape)
print(aut_se)

(5090, 3)
[[ -2.17845287e-02   2.99166270e-03  -8.76354739e-03]
 [  6.33334775e-04   3.60823244e-03  -5.23366715e-04]
 [  4.04575566e-05   4.11609843e-03  -1.57395833e-03]
 ..., 
 [  6.06863349e-05   6.17414765e-03  -2.36093750e-03]
 [ -2.45075948e-02   3.36562053e-03  -9.85899082e-03]
 [ -5.44613218e-03   7.47915674e-04  -2.19088685e-03]]


In [47]:
print(points)

[[  50.  525.  349.  242.  242.]
 [  60.  519.  252.  242.  242.]
 [  61.  504.  237.  237.  237.]
 ..., 
 [ 544.  515.  338.  242.  242.]
 [ 547.  478.  268.  242.  242.]
 [ 553.  458.  255.  242.  242.]]


In [86]:
def add_to_dict(d, region, index):
    if region in d:
        d[region].append(index)
    else:
        d[region] = [index]
        
    return d

In [87]:
d = {}

for index, node in enumerate(aut_nodes):
    point1 = points[index]
    region = point1[4]
    s = aut.node[node]['attr']
    point2 = ast.literal_eval(s)
    add_to_dict(d, region, index)

In [108]:
print(points[0])
print(points[1])
print(points[200])
print(points[3000])

[  50.  525.  349.  242.  242.]
[  60.  519.  252.  242.  242.]
[ 106.  658.  958.  237.  237.]
[ 333.  637.  445.  248.  248.]


In [97]:
m = {}
for key, value in d.iteritems():
    index_list = value
    m[str(key)] = aut_se[index_list]
    

# a_region = a2out[a_list]  # V[0:9,:].mean(axis=0)
# # print a_region
# b_region = a2out[b_list]  # V[10:19,:].mean(axis=0)
# c_region = a2out[c_list]  # V[20:29,:].mean(axis=0)
# d_region = a2out[d_list]  # V[30:39,:].mean(axis=0)
# se_regions = OrderedDict([('a', a_region),('b', b_region),('c', c_region),('d', d_region)])

In [98]:
print(m)

{'237.0': array([[  4.04575566e-05,   4.11609843e-03,  -1.57395833e-03],
       [  6.29564205e-03,  -3.71912434e-02,   1.58604882e-02],
       [  2.79806313e-03,  -1.65294415e-02,   7.04910588e-03],
       ..., 
       [  5.39434088e-05,   5.48813125e-03,  -2.09861111e-03],
       [  4.04575566e-05,   4.11609843e-03,  -1.57395833e-03],
       [  5.70001297e-03,   3.24740919e-02,  -4.71030043e-03]]), '235.0': array([[ -5.85960614e-03,   2.13226865e-03,  -2.71306312e-03],
       [  1.56018741e-02,  -7.54720774e-03,   8.60182460e-03],
       [  5.49325026e-03,  -9.69167963e-03,   3.90280260e-03],
       [ -3.75725134e-02,   4.27642569e-02,  -8.26710398e-02],
       [ -3.00580107e-02,   3.42114055e-02,  -6.61368319e-02],
       [ -4.33936134e-03,  -5.08713690e-03,  -4.29715936e-03],
       [  3.37146305e-05,   3.43008203e-03,  -1.31163195e-03],
       [  3.16667387e-03,   1.80411622e-02,  -2.61683357e-03],
       [ -3.13677920e-02,   1.63545585e-02,  -7.56210753e-03],
       [ -1.91249963e

In [101]:
plot_connectivity3d(m)

region 237.0: 253.0
region 235.0: 232.0
region 250.0: 232.0
region 236.0: 234.0
region 247.0: 230.0
region 255.0: 241.0
region 248.0: 247.0
region 251.0: 231.0
region 249.0: 234.0
region 246.0: 244.0
region 245.0: 249.0
region 240.0: 231.0
region 241.0: 255.0
region 254.0: 244.0
region 244.0: 254.0
region 242.0: 254.0
region 243.0: 230.0
region 234.0: 249.0
region 233.0: 253.0
region 232.0: 250.0
region 231.0: 255.0
region 230.0: 247.0
region 239.0: 253.0
region 238.0: 243.0
region 253.0: 233.0


In [103]:
print(m.keys())

['237.0', '235.0', '250.0', '236.0', '247.0', '255.0', '248.0', '251.0', '249.0', '246.0', '245.0', '240.0', '241.0', '254.0', '244.0', '242.0', '243.0', '234.0', '233.0', '232.0', '231.0', '230.0', '239.0', '238.0', '253.0']


In [66]:
node = aut_nodes[0]
print(node)
print(aut.node[node]['attr'])
print(aut.node[node]['attr'][0])
print(aut.node[node]['attr'][2])

s2040
[268, 426, 430]
[
6


In [69]:
import ast

node = aut_nodes[0]
print(node)
s = aut.node[node]['attr']
point = ast.literal_eval(s)
print(point)
print(point[0])


s2040
[268, 426, 430]
268


In [104]:
plot_connectivity()

AttributeError: 'numpy.ndarray' object has no attribute 'keys'

In [73]:
def plot_graphml(G):
    current_palette = sns.color_palette("husl", len(G.nodes())/10)
    Xe = []
    Ye = []
    Ze = []
    data = []
    i = 0
    
    X = []
    Y = []
    Z = []
    regiondict = {}
    for r, node in enumerate(G.nodes()):
        tmp = G.node[node]
#         print tmp
        pos = tmp['pos']
        region = tmp['region']
        if str(region) not in regiondict:
            regiondict[str(region)] = [pos]
        else:
            tmp = regiondict[str(region)]
            tmp.append(pos)
            regiondict[str(region)] = tmp
#         print region
#         print pos
#     print regiondict
    for region, reg in enumerate(regiondict):
        for pos in regiondict[reg]:
#             print pos
            X.append(pos[0])
            Y.append(pos[1])
                
        region_col = current_palette[region]
        region_col_lit = 'rgb' + str(region_col)
        
        trace_scatter = Scatter(
                x = X, 
                y = Y,
                name=region,
                mode='markers',
                marker=dict(
                    size=10,
                    color=region_col_lit, #'purple',                # set color to an array/list of desired values
                    colorscale='Viridis',   # choose a colorscale
                    opacity=0.5
                )
        )
        data.append(trace_scatter)
        X = []
        Y = []
        
    for r, edge in enumerate(G.edges()):
        firstpt = G.node[edge[0]]
        secondpt = G.node[edge[1]]
#         print firstpt
        dist = LA.norm(firstpt['pos'] - secondpt['pos'])
#         tmp.append(dist)
#         print dist

        Xe+=[firstpt['pos'][0],secondpt['pos'][0],None]
        Ye+=[firstpt['pos'][1],secondpt['pos'][1],None]
#         Ze+=[dictionary[key][0][2],dictionary[tmp2][0][2],None]
#     print Xe
#     print Ye
    
    trace_edge = Scatter(x=Xe,
               y=Ye,
               mode='lines',
               line=Line(color='rgb(0,0,0)', width=2),
               hoverinfo='none'
    )

    data.append(trace_edge)
    
    layout = Layout(
#         margin=dict(
#             l=0,
#             r=0,
#             b=0,
#             t=0
#         ),
        paper_bgcolor='rgb(255,255,255)',
        plot_bgcolor='rgb(255,255,255)'
    )
        
    fig = Figure(data=data, layout=layout)
    iplot(fig, validate=False)

In [78]:
plot_connectivity3d(fear199)

AttributeError: 'Graph' object has no attribute 'keys'

In [79]:
print np.allclose(A,out)
print np.allclose(A,b)
print np.diagonal(V)
v = np.diagonal(V)
print V.shape
# print V

True
True
[ -2.04490597e-02   1.44010363e-07  -6.58109774e-04   5.21823762e-14
  -4.92115063e-05  -4.58819189e-06  -2.09930528e-06  -4.53644487e-05
   2.14247757e-01   1.64327199e-04   4.58404264e-04  -9.10483931e-03
   1.18521779e-02   5.35296689e-04  -7.12953582e-08  -2.43450068e-03
  -4.43236314e-04   1.20375777e-01   1.38012283e-04  -7.40450009e-04
  -3.24305092e-04  -1.84548109e-05  -3.04271101e-01   1.24388639e-02
   1.08987693e-02  -6.44306783e-04   4.59377808e-02  -3.59019326e-02
  -1.84816982e-08  -8.73362448e-03   2.69106586e-07   1.80407184e-02
  -1.68002414e-03  -9.80753286e-03   3.22621782e-02  -7.13315053e-03
   1.30020706e-04   7.97779967e-07   7.86879596e-03   4.81091041e-02
   2.20921707e-02   5.91576707e-02   1.95103194e-02   1.76573912e-03
   1.78304814e-03   2.37142317e-03   1.47525910e-02  -5.82443242e-02
   2.16433197e-02  -5.89303547e-06  -3.46887478e-03   5.16813602e-04
  -4.32729680e-03   2.88631666e-02  -5.63782725e-04  -2.66583737e-01
  -9.79418935e-03  -4.21

In [80]:
tmp = [[1,0]]
tmp.append([2,0])
tmp.append([3,0])
print tmp

[[1, 0], [2, 0], [3, 0]]


In [81]:
# from matplotlib.pyplot import *
# import matplotlib.pyplot as plt
# %matplotlib inline

# colors = cm.rainbow(np.linspace(0,1,10))
Adict = OrderedDict()
avg = []
real = 0
imag = 0
for r, i in enumerate(v):
    region = r%10
    reg = r/10
#     print reg
    if 'region' + str(reg) not in Adict:
        Adict['region'+str(reg)] = [[i.real,i.imag]]
    else:
        tmp = Adict['region'+str(reg)]
        tmp.append([i.real,i.imag])
        Adict['region'+str(reg)] = tmp
#     if region == 0:
#         real = 0
#         imag = 0
#     else:
#         real += i.real
#         imag += i.imag
#     c = colors[r/10]
#     if r == 9:
#         plt.scatter(real/10,imag/10, color=c,s=40)
#     plt.scatter(i.real,i.imag,color=c)

In [82]:
G=nx.from_numpy_matrix(A)

In [83]:
plot_connectivity(Adict)

region region0: region34
region region1: region22
region region2: region20
region region3: region4
region region4: region3
region region5: region2
region region6: region22
region region7: region10
region region8: region11
region region9: region39
region region10: region7
region region11: region8
region region12: region7
region region13: region23
region region14: region18
region region15: region0
region region16: region30
region region17: region32
region region18: region14
region region19: region14
region region20: region2
region region21: region26
region region22: region1
region region23: region13
region region24: region33
region region25: region31
region region26: region21
region region27: region36
region region28: region24
region region29: region38
region region30: region36
region region31: region35
region region32: region17
region region33: region24
region region34: region0
region region35: region31
region region36: region30
region region37: region5
region region38: region29
region 

In [99]:
print Adict

OrderedDict([('region0', [[-0.020449059665810447, 0.0], [1.4401036314877591e-07, 0.0], [-0.00065810977381890407, 0.0], [5.2182376187436285e-14, 0.0], [-4.9211506262587882e-05, 0.0], [-4.5881918895267038e-06, 0.0], [-2.0993052803228115e-06, 0.0], [-4.5364448675650738e-05, 0.0], [0.21424775692176978, 0.0], [0.00016432719924219401, 0.0]]), ('region1', [[0.00045840426398652278, 0.0], [-0.009104839314994187, 0.0], [0.011852177863579082, 0.0], [0.00053529668906559125, 0.0], [-7.1295358235266252e-08, 0.0], [-0.0024345006819777386, 0.0], [-0.00044323631435356993, 0.0], [0.12037577739386628, 0.0], [0.00013801228298159139, 0.0], [-0.00074045000901895228, 0.0]]), ('region2', [[-0.00032430509165717224, 0.0], [-1.8454810855319429e-05, 0.0], [-0.3042711014540273, 0.0], [0.012438863885174197, 0.0], [0.010898769267694078, 0.0], [-0.00064430678333571062, 0.0], [0.045937780806892335, 0.0], [-0.03590193256826809, 0.0], [-1.8481698170938521e-08, 0.0], [-0.0087336244811743648, 0.0]]), ('region3', [[2.69106

In [97]:
plot_connectivity3d(Adict)

IndexError: list index out of range

In [84]:
az_norm = np.random.normal(1, 0.1, 10)
bz_norm = np.random.normal(1, 0.1, 10)
cz_norm = np.random.normal(3, 0.1, 10)
dz_norm = np.random.normal(3.5, 0.1, 10)

a_norm3d = np.column_stack((ax_norm,ay_norm,az_norm))
b_norm3d = np.column_stack((bx_norm,by_norm,bz_norm))
c_norm3d = np.column_stack((cx_norm,cy_norm,cz_norm))
d_norm3d = np.column_stack((dx_norm,dy_norm,dz_norm))

norm_dict3d = OrderedDict([('a',a_norm3d),('b',b_norm3d),('c',c_norm3d),('d',d_norm3d)])

NameError: name 'ax_norm' is not defined

In [98]:
plot_connectivity3d(norm_dict3d)

NameError: name 'norm_dict3d' is not defined

In [None]:
token = 'Aut1367'
data_txt = 'Aut1367reorient_atlas.region.csv'
data = np.genfromtxt(data_txt, delimiter=',', dtype='int', usecols = (0,1,2,4), names=['x','y','z','region'])

In [None]:
ccf_txt = 'natureCCFOhedited.csv'
ccf = {}

with open(ccf_txt, 'rU') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        # row[0] is ccf atlas index, row[4] is string of full name
        ccf[row[0]] = row[4]