
# Algorithms.md

-------

-------

# Describe Approach

## Algorithm

**Description:** 

We first construct a similarity graph using the input graph.  Then we compute the Laplacian and its eigenvectors to do cluster based on the component associated with each node for the second eigenvector.

**Input:** 
  1. graphml object
  2. dictionary mapping indices to region labels

**Output:** dictionary mapping from region to closest region

### Pseudocode

**function** connectivity (graph, region_dict)
  1. Construct similarity graph from adjacency matrix of the graph.
  2. Compute the normalized Laplacian by doing: $L = D^{-1/2}(D - A)D^{-1/2}$, where L is the normalized laplacian, D is the Degree matrix (a diagonal matrix that measures the degree at each node), and A is the aforementioned adjacency matrix.
  3. Obtain the eigenvector associated with the second smallest eigenvalue, known as the Fiedler Vector, by taking the second column in the Laplacian constructed in step 2.
  4. The Fiedler vector should have dimensions nx1, where n is the number of nodes.  Now cluster the n components of the Fiedler vector by region label.
  5. Calculate the average for each region, and iterate through the regions to find out which region's average is closest to another's
  7. Return a dictionary, mapping from a region to its closest region.
  
**endfunction**
  
-------

-------


## Simulation


#### Easy
- Data will be contructed from 3 gaussian clusters, such that these mean of these clusters are far apart enough s.t. the clusters look distinct but close enough s.t. the graph is mostly connected.
- data points all lie on a line

We expect the connectivity algorithm to do well in this scenario because the data is essentially 1 dimensional.

#### Hard
- Data will be contructed from 3 gaussian clusters, such that these mean of these clusters are far apart enough s.t. the clusters look distinct but close enough s.t. the graph is mostly connected.

We expect the connectivity algorithm to do more poorly because there are more dimensions to consider.  Also there are only 10 points so outliers will have more impact.

# Real Data

## Fear199

In [290]:
fear = nx.read_graphml("Fear199.graphml")

# Doing the spectral embedding
fear_se = spec_clust(fear, 3)
print fear_se.shape

(9904, 3)


### Plot of the graph

In [143]:
plot_graphml3d2(fear)

In [313]:
se_regions, norm_dict = get_dict_real(fear, fear_se, 'Fear199_regions.csv')

In [316]:
c, o_c = get_connectivity_hard(se_regions, norm_dict)
# c is a dictionary mapping from the region number to the closest region number
# print('Predicted Connectivity')
# print(c)
# print('Actual Connectivity:')
# print(o_c)

num_correct = 0
num_total = len(c)
for key in c:
    if c[key][0] == o_c[key][0]:
        num_correct += 1
print('num_correct: %d' % num_correct)
print('total: %d' % num_total)
accuracy = float(num_correct) / num_total
print('ACCURACY: %f' % accuracy)

num_correct: 10
total: 578
ACCURACY: 0.017301


### Plot of the Eigenvector embedding

In [111]:
plot_connectivity3d(se_regions, detail=False)

## Analysis

* We output connectivity results as a dictionary, mapping from region to the closest region
* For a given region A, we defined a closest region prediction P as being 'correct' if for region A in the actual brain, the closest region by distance is P
* we defined distance between 2 regions as finding the centroids of the 2 regions and then taking the Euclidian distance between the centroids.
* Using Euclidian distance as a metric for ground truth, we found that:
    * num correct predictions (matchings): 10
    * total number of regions: 578
    * ACCURACY: 0.017301
    

In [133]:
analyze_real(fear, fear_se, 'Fear199_regions.csv')

Predicted Connectivity
OrderedDict([(0.0, [982.0, 0.00013804333042255419]), (2.0, [267.0, 0.00013598484137252814]), (7.0, [133.0, 0.00013759163582362059]), (8.0, [821.0, 0.0010805638329164555]), (9.0, [310.0, 0.00013345669507677782]), (10.0, [187.0, 7.4334092768035749e-05]), (12.0, [1105.0, 0.0014504688807444848]), (15.0, [1102.0, 0.00085551384168909366]), (17.0, [797.0, 7.0202000813253154e-05]), (19.0, [10692.0, 8.082529859385415e-05]), (20.0, [718.0, 0.00026746147399495232]), (23.0, [422.0, 0.00025726644711446814]), (26.0, [391.0, 0.00015644433568203335]), (27.0, [661.0, 0.00015218296953466935]), (28.0, [733.0, 6.8033702797304352e-05]), (33.0, [243.0, 0.00018144997325240521]), (36.0, [427.0, 0.00017274225505062144]), (41.0, [92.0, 0.00012121138077119226]), (42.0, [562.0, 0.0039874332528289586]), (45.0, [115.0, 0.00011627475578657696]), (50.0, [780.0, 0.00015672378825645761]), (52.0, [251.0, 4.7923810461066336e-05]), (53.0, [377.0, 0.0033684238026254758]), (55.0, [1096.0, 0.0005607965

# Code

## Imports and plotting functions

In [1]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
from plotly import tools
import plotly

import os

import csv,gc  # garbage memory collection :)

import numpy as np
from numpy import linalg as LA

import csv
import re
import matplotlib
import time
import seaborn as sns

from collections import OrderedDict

import ast

import networkx as nx
import math
from sklearn.manifold import spectral_embedding as se

import scipy.sparse as sp

plotly.offline.init_notebook_mode()

In [2]:
def plot_connectivity(dictionary, orig_dict=None):
    current_palette = sns.color_palette("husl", len(dictionary.keys()))
    Xe = []
    Ye = []
    data = []
    avg_dict = OrderedDict()
    orig_avg_dict = OrderedDict()
    con_dict = OrderedDict()
    orig_con_dict = OrderedDict()
    i = 0
    
    if orig_dict != None:
        # Getting the original averages.
        for key, region in orig_dict.iteritems():
            tmp_x = []
            tmp_y = []

            for coord in region:    
                tmp_x.append(coord[0])
                tmp_y.append(coord[1])
            orig_avg_dict[key] = [np.mean(tmp_x), np.mean(tmp_y)]
    
        # Getting connectivity for original points.
        for key, avg in orig_avg_dict.iteritems():
            min_key = ''
            min_diff = float('inf')
            for key2, avg2 in orig_avg_dict.iteritems():
                if key2 == key:
                    continue
                avg_np = np.array(avg)
                avg2_np = np.array(avg2)
                diff = np.linalg.norm(avg_np - avg2_np)
                if diff < min_diff:
                    min_diff = diff
                    min_key = key2

            orig_con_dict[key] = [min_key, min_diff]
        
    
            
    # Getting and plotting the eigenvector embeddings and averages
    for key, region in dictionary.iteritems():
        X = []
        Y = []
#         Z = []
        tmp_x = []
        tmp_y = []
        
        region_col = current_palette[i]
        region_col_lit = 'rgb' + str(region_col)
        i += 1
        for coord in region:    
            X.append(coord[0])
            Y.append(coord[1])
            tmp_x.append(coord[0])
            tmp_y.append(coord[1])
        avg_dict[key] = [np.mean(tmp_x), np.mean(tmp_y)]
            
        trace_scatter = Scatter(
                x = X, 
                y = Y,
                name=key,
                mode='markers',
                marker=dict(
                    size=10,
                    color=region_col_lit, #'purple',                # set color to an array/list of desired values
                    colorscale='Viridis',   # choose a colorscale
                    opacity=0.5
                )
        )
        avg_scatter = Scatter(
                x = [avg_dict[key][0]],
                y = [avg_dict[key][1]],
                mode='markers',
                name=key+'_avg',
                marker=dict(
                    size=10,
                    color=region_col_lit,
                    colorscale='Viridis',
                    line=dict(
                        width = 2,
                        color = 'rgb(0, 0, 0)'
                    )
                )
        )
        data.append(trace_scatter)
        data.append(avg_scatter)
        
    for key, avg in avg_dict.iteritems():
        min_key = ''
        min_diff = float('inf')
        for key2, avg2 in avg_dict.iteritems():
            if key2 == key:
                continue
            avg_np = np.array(avg)
            avg2_np = np.array(avg2)
            diff = np.linalg.norm(avg_np - avg2_np)
            if diff < min_diff:
                min_diff = diff
                min_key = key2
        
        con_dict[key] = [min_key, min_diff]

    locations = avg_dict.keys()
    dist_adj = np.zeros([len(locations), len(locations)])
    
    for i1, key1 in enumerate(avg_dict):
        for i2, key2 in enumerate(avg_dict):
            p1 = np.asarray(avg_dict[key1])
            p2 = np.asarray(avg_dict[key2])
            dist = LA.norm(p1 - p2)
            dist_adj[i1,i2] = dist


    for i, key in enumerate(avg_dict):
        tmp = []
        for j in range(len(locations)):
            if j == i:
                continue
            p1 = np.asarray(avg_dict[key])
            p2 = np.asarray(avg_dict[locations[j]])
            dist = LA.norm(p1 - p2)
#             dist = (math.pow(avg_dict[key][0][0]-avg_dict[locations[j]][0][0],2) + 
#                        math.pow(avg_dict[key][0][1]-avg_dict[locations[j]][0][1],2))
            tmp.append(dist)
#             print dist
        newmin = tmp.index(min(tmp))
        if newmin >= i:
            newmin += 1
#         print newmin
        print "region " + key + ": " + locations[newmin]
        tmp2 = avg_dict.keys()[newmin]
        Xe+=[avg_dict[key][0],avg_dict[tmp2][0],None]
        Ye+=[avg_dict[key][1],avg_dict[tmp2][1],None]
#         Ze+=[dictionary[key][0][2],dictionary[tmp2][0][2],None]
    
    trace_edge = Scatter(x=Xe,
               y=Ye,
               mode='lines',
               line=Line(color='rgb(0,0,0)', width=3),
               hoverinfo='none'
    )
    
    # Printing results
    print('connectivity dict:')
    print(con_dict)
    
    if orig_dict != None:
        print('orig con dict:')
        print(orig_con_dict)

    # Uncomment the following line for the averages
#     data.append(trace_edge)
    
    layout = Layout(
#         margin=dict(
#             l=0,
#             r=0,
#             b=0,
#             t=0
#         ),
        paper_bgcolor='rgb(255,255,255)',
        plot_bgcolor='rgb(255,255,255)'
    )
        
    fig = Figure(data=data, layout=layout)
    iplot(fig, validate=False)

In [3]:
def plot_connectivity3d(dictionary, orig_dict=None, detail=True):
    current_palette = sns.color_palette("husl", len(dictionary.keys()))
    Xe = []
    Ye = []
    Ze = []
    data = []
    avg_dict = OrderedDict()
    orig_avg_dict = OrderedDict()
    con_dict = OrderedDict()
    orig_con_dict = OrderedDict()
    i = 0
    
    if orig_dict != None:
        # Getting the original averages.
        for key, region in orig_avg_dict.iteritems():
            tmp_x = []
            tmp_y = []

            for coord in region:    
                tmp_x.append(coord[0])
                tmp_y.append(coord[1])
            orig_avg_dict[key] = [np.mean(tmp_x), np.mean(tmp_y)]
    
        # Getting connectivity for original points.
        for key, avg in orig_avg_dict.iteritems():
            min_key = ''
            min_diff = float('inf')
            for key2, avg2 in orig_avg_dict.iteritems():
                if key2 == key:
                    continue
                avg_np = np.array(avg)
                avg2_np = np.array(avg2)
                diff = np.linalg.norm(avg_np - avg2_np)
                if diff < min_diff:
                    min_diff = diff
                    min_key = key2

            orig_con_dict[key] = [min_key, min_diff]
            
            
    for key, region in dictionary.iteritems():
        X = []
        Y = []
        Z = []
        tmp_x = []
        tmp_y = []
        tmp_z = []
        region_col = current_palette[i]
        region_col_lit = 'rgb' + str(region_col)
        i += 1
        for coord in region:    
            X.append(coord[0])
            Y.append(coord[1])
            Z.append(coord[2])
            tmp_x.append(coord[0])
            tmp_y.append(coord[1])
            tmp_z.append(coord[2])
        avg_dict[key] = [[np.mean(tmp_x), np.mean(tmp_y), np.mean(tmp_z)]]
            
        trace_scatter = Scatter3d(
                x = X, 
                y = Y,
                z = Z,
                name=key,
                mode='markers',
                marker=dict(
                    size=10,
                    color=region_col_lit, #'purple',                # set color to an array/list of desired values
                    colorscale='Viridis',   # choose a colorscale
                    opacity=0.5
                )
        )
        avg_scatter = Scatter3d(
                x = [avg_dict[key][0][0]],
                y = [avg_dict[key][0][1]],
                z = [avg_dict[key][0][2]],
                mode='markers',
                name=key+'_avg',
                marker=dict(
                    size=10,
                    color=region_col_lit,
                    colorscale='Viridis',
                    line=dict(
                        width = 2,
                        color = 'rgb(0, 0, 0)'
                    )
                )
        )
        data.append(trace_scatter)
        data.append(avg_scatter)
        
#     print('plot avgdict')
#     print(avg_dict)
        
    locations = avg_dict.keys()
#     print locations
    for i, key in enumerate(avg_dict):
#         if i + 1 == len(locations):
#             continue
#         print 'start' + str(i)
        tmp = []
        for j in range(len(locations)):
            if j == i:
                continue
            p1 = np.asarray(avg_dict[key][0])
            p2 = np.asarray(avg_dict[locations[j]][0])
            dist = LA.norm(p1 - p2)
#             dist = (math.pow(avg_dict[key][0][0]-avg_dict[locations[j]][0][0],2) + 
#                        math.pow(avg_dict[key][0][1]-avg_dict[locations[j]][0][1],2) +
#                        math.pow(avg_dict[key][0][2]-avg_dict[locations[j]][0][2],2))
            tmp.append(dist)
#             print dist
        newmin = tmp.index(min(tmp))
        if newmin >= i:
            newmin += 1
#         print newmin
        if detail:
            print "region " + key + ": " + locations[newmin]
        tmp2 = avg_dict.keys()[newmin]
        Xe+=[avg_dict[key][0][0],avg_dict[tmp2][0][0],None]
        Ye+=[avg_dict[key][0][1],avg_dict[tmp2][0][1],None]
        Ze+=[avg_dict[key][0][2],avg_dict[tmp2][0][2],None]
#     print Xe
#     print Ye
    
    trace_edge = Scatter3d(x=Xe,
                y=Ye,
                z=Ze,
                mode='lines',
                line=Line(color='rgb(0,0,0)', width=3),
                hoverinfo='none'
    )

    data.append(trace_edge)
    
    for key, avg in avg_dict.iteritems():
        min_key = ''
        min_diff = float('inf')
        for key2, avg2 in avg_dict.iteritems():
            if key2 == key:
                continue
            avg_np = np.array(avg)
            avg2_np = np.array(avg2)
            diff = np.linalg.norm(avg_np - avg2_np)
            if diff < min_diff:
                min_diff = diff
                min_key = key2
        
        con_dict[key] = [min_key, min_diff]
    
    # Printing results
    if detail:
        print('connectivity dict:')
        print(con_dict)

        if orig_dict != None:
            print('orig con dict:')
            print(orig_con_dict)
    
    layout = Layout(
#         margin=dict(
#             l=0,
#             r=0,
#             b=0,
#             t=0
#         ),
        paper_bgcolor='rgb(255,255,255)',
        plot_bgcolor='rgb(255,255,255)'
    )
        
    fig = Figure(data=data, layout=layout)
    iplot(fig, validate=False)

In [4]:
def plot_graphml(G):
    current_palette = sns.color_palette("husl", len(G.nodes())/10)
    print(len(G.nodes()))
    print(current_palette)
    Xe = []
    Ye = []
    data = []
    i = 0
    
    X = []
    Y = []
    regiondict = {}
    for r, node in enumerate(G.nodes()):
        tmp = G.node[node]
#         print tmp
        pos = tmp['pos']
        region = tmp['region']
        if str(region) not in regiondict:
            regiondict[str(region)] = [pos]
        else:
            tmp = regiondict[str(region)]
            tmp.append(pos)
            regiondict[str(region)] = tmp
#         print region
#         print pos
#     print regiondict
    for region, reg in enumerate(regiondict):
        for pos in regiondict[reg]:
#             print pos
            X.append(pos[0])
            Y.append(pos[1])
        
        print('region: %d' % region)
                
        region_col = current_palette[region]
        region_col_lit = 'rgb' + str(region_col)
        
        trace_scatter = Scatter(
                x = X, 
                y = Y,
                name=region,
                mode='markers',
                marker=dict(
                    size=10,
                    color=region_col_lit, #'purple',                # set color to an array/list of desired values
                    colorscale='Viridis',   # choose a colorscale
                    opacity=0.5
                )
        )
        data.append(trace_scatter)
        X = []
        Y = []
        
    for r, edge in enumerate(G.edges()):
        firstpt = G.node[edge[0]]
        secondpt = G.node[edge[1]]
#         print firstpt
        dist = LA.norm(firstpt['pos'] - secondpt['pos'])
#         tmp.append(dist)
#         print dist

        Xe+=[firstpt['pos'][0],secondpt['pos'][0],None]
        Ye+=[firstpt['pos'][1],secondpt['pos'][1],None]
#         Ze+=[dictionary[key][0][2],dictionary[tmp2][0][2],None]
#     print Xe
#     print Ye
    
    trace_edge = Scatter(x=Xe,
               y=Ye,
               mode='lines',
               line=Line(color='rgb(0,0,0)', width=2),
               hoverinfo='none'
    )

    data.append(trace_edge)
    
    layout = Layout(
#         margin=dict(
#             l=0,
#             r=0,
#             b=0,
#             t=0
#         ),
        paper_bgcolor='rgb(255,255,255)',
        plot_bgcolor='rgb(255,255,255)'
    )
        
    fig = Figure(data=data, layout=layout)
    iplot(fig, validate=False)

In [5]:
def plot_graphml3d(G):
    current_palette = sns.color_palette("husl", len(G.nodes())/10)
#     print(len(G.nodes()))
#     print(current_palette)
    Xe = []
    Ye = []
    Ze = []
    data = []
    i = 0
    
    X = []
    Y = []
    Z = []
    regiondict = {}
    for r, node in enumerate(G.nodes()):
        tmp = G.node[node]
#         print tmp
        pos = tmp['pos']
        region = tmp['region']
        if str(region) not in regiondict:
            regiondict[str(region)] = [pos]
        else:
            tmp = regiondict[str(region)]
            tmp.append(pos)
            regiondict[str(region)] = tmp
#         print region
#         print pos
#     print regiondict
    for region, reg in enumerate(regiondict):
        for pos in regiondict[reg]:
#             print pos
            X.append(pos[0])
            Y.append(pos[1])
            Z.append(pos[2])
        
#         print('region: %d' % region)
                
        region_col = current_palette[region]
        region_col_lit = 'rgb' + str(region_col)
        
        trace_scatter = Scatter(
                x = X, 
                y = Y,
                z = Z,
                name=region,
                mode='markers',
                marker=dict(
                    size=10,
                    color=region_col_lit, #'purple',                # set color to an array/list of desired values
                    colorscale='Viridis',   # choose a colorscale
                    opacity=0.5
                )
        )
        data.append(trace_scatter)
        X = []
        Y = []
        Z = []
        
    for r, edge in enumerate(G.edges()):
        firstpt = G.node[edge[0]]
        secondpt = G.node[edge[1]]
#         print firstpt
        dist = LA.norm(firstpt['pos'] - secondpt['pos'])
#         tmp.append(dist)
#         print dist

        Xe+=[firstpt['pos'][0],secondpt['pos'][0],None]
        Ye+=[firstpt['pos'][1],secondpt['pos'][1],None]
        Ze+=[firstpt['pos'][2],secondpt['pos'][2],None]
#     print Xe
#     print Ye
    
    trace_edge = Scatter(x=Xe,
                y=Ye,
                z=Ze,
                mode='lines',
                line=Line(color='rgb(0,0,0)', width=2),
                hoverinfo='none'
    )

    data.append(trace_edge)
    
    layout = Layout(
#         margin=dict(
#             l=0,
#             r=0,
#             b=0,
#             t=0
#         ),
        paper_bgcolor='rgb(255,255,255)',
        plot_bgcolor='rgb(255,255,255)'
    )
        
    fig = Figure(data=data, layout=layout)
    iplot(fig, validate=False)

In [6]:
import pandas as pd

def plot_graphml3d2(G):
    g = G

    # grab the node positions from the graphML file
    V = nx.number_of_nodes(g)
    attributes = nx.get_node_attributes(g, 'attr')
    node_positions_3d = pd.DataFrame(columns=['x', 'y', 'z'], index=range(V))
    for n in g.nodes_iter():
        node_positions_3d.loc[n] = [int((re.findall('\d+', str(attributes[n])))[0]),
                                    int((re.findall('\d+', str(attributes[n])))[1]),
                                    int((re.findall('\d+', str(attributes[n])))[2])]

    # grab edge endpoints
    edge_x = []
    edge_y = []
    edge_z = []

    for e in g.edges_iter():
        # strippedSource = int(e[0].replace('s', ''))
        # strippedTarget = int(e[1].replace('s', ''))
        source_pos = node_positions_3d.loc[e[0]]
        target_pos = node_positions_3d.loc[e[1]]

        edge_x += [source_pos['x'], target_pos['x'], None]
        edge_y += [source_pos['y'], target_pos['y'], None]
        edge_z += [source_pos['z'], target_pos['z'], None]


    # node style
    node_trace = Scatter3d(x=[x for x in node_positions_3d['x']],
                           y=[x for x in node_positions_3d['y']],
                           z=[x for x in node_positions_3d['z']],
                           mode='markers',
                           # name='regions',
                           marker=Marker(symbol='dot',
                                         size=6,
                                         opacity=0.5,
                                         color='purple'),
                           # text=[str(r) for r in range(V)],
                           # text=atlas_data['nodes'],
                           hoverinfo='text')

    # edge style
    edge_trace = Scatter3d(x=edge_x,
                           y=edge_y,
                           z=edge_z,
                           mode='lines',
                           line=Line(color='cyan', width=1),
                           hoverinfo='none')

    # axis style
    axis = dict(showbackground=False,
                showline=False,
                zeroline=False,
                showgrid=False,
                showticklabels=False)

    plot_title = 'graphml plot'
    # overall layout
    layout = Layout(title=plot_title,
                    width=800,
                    height=900,
                    showlegend=False,
                    scene=Scene(xaxis=XAxis(axis),
                                yaxis=YAxis(axis),
                                zaxis=ZAxis(axis)),
                    margin=Margin(t=50),
                    hovermode='closest',
                    paper_bgcolor='rgba(0,0,0,0)',
                    plot_bgcolor='rgb(255,255,255)')

    data = Data([node_trace, edge_trace])
    fig = Figure(data=data, layout=layout)
    iplot(fig, validate=False)

In [7]:
def plot_eigenvector(dictionary, eigenvector_num):
    current_palette = sns.color_palette("husl", len(dictionary.keys()))
    data = []
    avg_dict = OrderedDict()
    eigenvector_index = eigenvector_num - 1
    i = 0
    for key, region in dictionary.iteritems():  
#         print(key)
        X = []
        Y = []
        y_vals = []
        
        region_col = current_palette[i]
        region_col_lit = 'rgb' + str(region_col)
        i += 1
        
        for j in range(len(region) - 1):
#             Xe+=[avg_dict[key][0],avg_dict[tmp2][0],None]
            Y += [region[j][eigenvector_index], region[j+1][eigenvector_index], None]
            X += [j, j+1, None]
            y_vals.append(region[j][eigenvector_index])
        y_vals.append(region[len(region) - 1][eigenvector_index])
#         print(Y)
        avg = np.mean(y_vals)
        avg_dict[key] = avg

        trace_edge = Scatter(
                    x=X,
                    y=Y,
                    name=key,
                    mode='lines',
                    line=Line(color=region_col_lit, width=3),
                    hoverinfo='none'
        )
        data.append(trace_edge)
        
        Ya, Xa = [], []
        for j in range(len(region) - 1):
#             Xe+=[avg_dict[key][0],avg_dict[tmp2][0],None]
            Ya += [avg, avg, None]
            Xa += [j, j+1, None]
        
        trace_edge = Scatter(
                    x=Xa,
                    y=Ya,
                    name=key + 'avg',
                    mode='lines',
                    line=Line(color=region_col_lit, width=3),
                    hoverinfo='none'
        )
        data.append(trace_edge)
        
#         color=region_col_lit, #'purple',                # set color to an array/list of desired values
#                     colorscale='Viridis'

#     for key, value in avg_dict.iteritems():
#         trace_edge = Scatter(
#                     y=value,
#                     name=key,
#                     mode='lines',
#                     line=Line(color=region_col_lit, width=3),
#                     hoverinfo='none'
#         )
#         data.append(trace_edge)
    
    layout = Layout(
#         margin=dict(
#             l=0,
#             r=0,
#             b=0,
#             t=0
#         ),
        paper_bgcolor='rgb(255,255,255)',
        plot_bgcolor='rgb(255,255,255)'
    )
        
    fig = Figure(data=data, layout=layout)
    iplot(fig, validate=False)

In [8]:
def get_connectivity(eig_dict, orig_dict=None):
    eigenvector_index = 1 # the second smallest eigenvector
    avg_dict = {}
    orig_avg_dict = OrderedDict()
    
    # dict that maps from region to most connected region
    con_dict = OrderedDict()
    
    orig_con_dict = OrderedDict()
    
    if orig_dict != None:
        # Getting the original averages.
        for key, region in orig_dict.iteritems():
            tmp_x = []
            tmp_y = []

            for coord in region:
                tmp_x.append(coord[0])
                tmp_y.append(coord[1])
            orig_avg_dict[key] = [np.mean(tmp_x), np.mean(tmp_y)]
            
#         print 'orignal averages'
#         print orig_avg_dict
    
        # Getting connectivity for original points.
        for key, avg in orig_avg_dict.iteritems():
            min_key = ''
            min_diff = float('inf')
            for key2, avg2 in orig_avg_dict.iteritems():
                if key2 == key:
                    continue
                avg_np = np.array(avg)
                avg2_np = np.array(avg2)
                diff = np.linalg.norm(avg_np - avg2_np)
                if diff < min_diff:
                    min_diff = diff
                    min_key = key2

            orig_con_dict[key] = [min_key, min_diff]
            
    
    # Getting the average second eigenvector component for each of the regions
    for key, region in eig_dict.iteritems():
#         print(key)
        y_vals = []
        
        for j in range(len(region)):
            y_vals.append(region[j][eigenvector_index])
        avg = np.mean(y_vals)
        avg_dict[key] = avg
    
    for key, avg in avg_dict.iteritems():
        min_key = ''
        min_diff = float('inf')
        for key2, avg2 in avg_dict.iteritems():
            if key2 == key:
                continue
            diff = np.sqrt(np.square(avg - avg2))
            if diff < min_diff:
                min_diff = diff
                min_key = key2
        
        con_dict[key] = [min_key, min_diff]
    
    con_dict = OrderedDict(sorted(con_dict.items()))
#     con_dict = sorted(con_dict.items())
    
    if orig_dict == None:
        return con_dict
    else:
        return con_dict, orig_con_dict

In [9]:
def get_connectivity_hard(eig_dict, orig_dict=None):
    eigenvector_index = 1 # the second smallest eigenvector
    avg_dict = {}
    orig_avg_dict = OrderedDict()
    
    # dict that maps from region to most connected region
    con_dict = OrderedDict()
    
    orig_con_dict = OrderedDict()
    
    if orig_dict != None:
        # Getting the original averages.
        for key, region in orig_dict.iteritems():
            tmp_x = []
            tmp_y = []
            y_vals = []
            
            for j in range(len(region)):
                y_vals.append(region[j])
            y_vals = np.array(y_vals)
#             print('y_vals')
#             print(y_vals)
            x_avg = np.mean(y_vals[:,0])
            y_avg = np.mean(y_vals[:,1])
            z_avg = np.mean(y_vals[:,2])
            orig_avg_dict[key] = [x_avg, y_avg, z_avg]
#             avg = np.mean(y_vals)
#             orig_avg_dict[key] = avg

#             for coord in region:
#                 tmp_x.append(coord[0])
#                 tmp_y.append(coord[1])
#             orig_avg_dict[key] = [np.mean(tmp_x), np.mean(tmp_y)]
            
#         print 'orignal averages'
#         print orig_avg_dict
    
        # Getting connectivity for original points.
        for key, avg in orig_avg_dict.iteritems():
            min_key = ''
            min_diff = float('inf')
            for key2, avg2 in orig_avg_dict.iteritems():
                if key2 == key:
                    continue
                avg_np = np.array(avg)
                avg2_np = np.array(avg2)
                diff = np.linalg.norm(avg_np - avg2_np)
                if diff < min_diff:
                    min_diff = diff
                    min_key = key2

            orig_con_dict[float(key)] = [float(min_key), min_diff]
            
    
    # Getting the average first 2 eigenvector components for each of the regions
    for key, region in eig_dict.iteritems():
#         print(key)
        y_vals = []
        
        for j in range(len(region)):
            y_vals.append(region[j])
        y_vals = np.array(y_vals)
        x_avg = np.mean(y_vals[:,0])
        y_avg = np.mean(y_vals[:,1])
        z_avg = np.mean(y_vals[:,2])
        avg_dict[key] = [x_avg, y_avg, z_avg]
        
#     print('getcon avg_dict')
#     print(avg_dict)
    
    # Computing connectivity between regions using the distance between averages
    for key, avg in avg_dict.iteritems():
        min_key = ''
        min_diff = float('inf')
        for key2, avg2 in avg_dict.iteritems():
            if key2 == key:
                continue
            avg_np = np.array(avg)
            avg2_np = np.array(avg2)
            diff = np.linalg.norm(avg_np - avg2_np)
            if diff < min_diff:
                min_diff = diff
                min_key = key2
        
        con_dict[float(key)] = [float(min_key), min_diff]
    
    con_dict = OrderedDict(sorted(con_dict.items()))
    orig_con_dict = OrderedDict(sorted(orig_con_dict.items()))
    
    if orig_dict == None:
        return con_dict
    else:
        return con_dict, orig_con_dict

## Function for doing the clustering

In [10]:
def spec_clust(graphx, num_components):
    adj_mat = nx.adjacency_matrix(graphx)
#     result = se(adj_mat, n_components=num_components, drop_first=True)
    result = se(adj_mat, n_components=num_components, drop_first=False)

    return result

## Functions for doing the analysis

In [11]:
def get_dict_easy(graphx, a2out):
    nodelist = graphx.nodes()
    a_arr = np.zeros(10, dtype=np.int)
    b_arr = np.zeros(10, dtype=np.int)
    c_arr = np.zeros(10, dtype=np.int)
    for ind, i in enumerate(nodelist):
#         print('i: %s' % i)
        region = int(float(i))/10
        place = int(float(i)) % 10
        if region == 0:
            a_arr[place] = int(ind)
    #         a_list.append(int(float(i)))
        elif region == 1:
            b_arr[place] = int(ind)
    #         b_list.append(int(float(i)))
        elif region == 2:
            c_arr[place] = int(ind)
    #         c_list.append(int(float(i)))

    a_region = a2out[a_arr.tolist()]  # V[0:9,:].mean(axis=0)
#     print a_arr.tolist()
    b_region = a2out[b_arr.tolist()]  # V[10:19,:].mean(axis=0)
    c_region = a2out[c_arr.tolist()]  # V[20:29,:].mean(axis=0)
    se_regions = OrderedDict([('a', a_region),('b', b_region),('c', c_region)])
    
#     print(a_region)
    
    return se_regions

In [12]:
def get_dict_hard(graphx, a2out):
    nodelist = graphx.nodes()
    a_arr = np.zeros(10, dtype=np.int)
    b_arr = np.zeros(10, dtype=np.int)
    c_arr = np.zeros(10, dtype=np.int)
#     d_arr = np.zeros(10, dtype=np.int)
    for ind, i in enumerate(nodelist):
#         print('i: %s' % i)
        region = int(float(i))/10
        place = int(float(i)) % 10
        if region == 0:
            a_arr[place] = int(ind)
    #         a_list.append(int(float(i)))
        elif region == 1:
            b_arr[place] = int(ind)
    #         b_list.append(int(float(i)))
        elif region == 2:
            c_arr[place] = int(ind)
    #         c_list.append(int(float(i)))
#         elif region == 3:
#             d_arr[place] = int(ind)
#     #         d_list.append(int(float(i)))

    a_region = a2out[a_arr.tolist()]  # V[0:9,:].mean(axis=0)
    print a_arr.tolist()
    b_region = a2out[b_arr.tolist()]  # V[10:19,:].mean(axis=0)
    c_region = a2out[c_arr.tolist()]  # V[20:29,:].mean(axis=0)
#     d_region = a2out[d_arr.tolist()]  # V[20:29,:].mean(axis=0)
#     se_regions = OrderedDict([('a', a_region),('b', b_region),('c', c_region),('d', d_region)])
    se_regions = OrderedDict([('a', a_region),('b', b_region),('c', c_region)])

    
    return se_regions

In [14]:
def analyze_hard(G, norm_dict, se_result):
    se_regions = get_dict_hard(G, se_result)
#     plot_connectivity(se_regions, norm_dict)
#     print('se regions')
#     print(se_regions)
    c, o_c = get_connectivity_hard(se_regions, norm_dict)
    print('Predicted Connectivity:')
    print(c)
    print('Actual Connectivity:')
    print(o_c)
    
    num_correct = 0
    num_total = len(c)
    for key in c:
        if c[key][0] == o_c[key][0]:
            num_correct += 1
            
    accuracy = float(num_correct) / num_total
    print('ACCURACY: %f' % accuracy)
        
    plot_connectivity3d(se_regions)

In [15]:
def analyze_real(graph, se_result, input_path):
    se_regions, norm_dict = get_dict_real(graph, se_result, input_path)
    c, o_c = get_connectivity_hard(se_regions, norm_dict)
    # c is a dictionary mapping from the region number to the closest region number
    print('Predicted Connectivity')
    print(c)
    print('Actual Connectivity:')
    print(o_c)

    num_correct = 0
    num_total = len(c)
    for key in c:
        if c[key][0] == o_c[key][0]:
            num_correct += 1
            
    accuracy = float(num_correct) / num_total
    print('ACCURACY: %f' % accuracy)
    
#     rplot_connectivity3d(se_regions)

# def analyze_real(G, norm_dict, se_result):
#     se_regions = get_dict_hard(G, se_result)
# #     plot_connectivity(se_regions, norm_dict)
# #     print('se regions')
# #     print(se_regions)
#     c, o_c = get_connectivity_hard(se_regions, norm_dict)
# #     print('Predicted Connectivity:')
# #     print(c)
# #     print('Actual Connectivity:')
# #     print(o_c)
    
#     num_correct = 0
#     num_total = len(c)
#     for key in c:
#         if c[key][0] == o_c[key][0]:
#             num_correct += 1
            
#     accuracy = float(num_correct) / num_total
#     print('ACCURACY: %f' % accuracy)
        
# #     plot_connectivity3d(se_regions)

In [16]:
class SparseMatrix:
    def __init__(self, x, y, z):
#         self._max_index = 0
        x_dim = x
        y_dim = y
        z_dim = z
        self._vector = {}

    def add(self, index, value):
        # vector starts at index one, because it reads from the file and the file
        # always has the index of the features start at 1
        self._vector[index] = value
#         if index > self._max_index:
#             self._max_index = index

    def get(self, index):
        # if the index doesn't exist in the dict, return 0 because it's sparse anyways
        if index in self._vector:
            return self._vector[index]
        return -1

    def get_sparse_matrix(self):
        return self._vector
        # return self._vector.keys()

#     def get_full_vector(self, size=None):
#         """ Returns a full vector of features as a numpy array. """
#         size = (self._max_index + 1) if size == None else size
#         full_vector = np.zeros(size)  # 0 indexed
#         for key, value in self._vector.iteritems():
#             full_vector[key] = value

#         return full_vector

    def __str__(self):
        return str(self._vector)

In [17]:
def add_to_dict(d, region, index):
    if region in d:
        d[region].append(index)
    else:
        d[region] = [index]
        
    return d

def get_adj_mat(regions_path):
    points = np.genfromtxt(regions_path, delimiter=',')
    x_dim = np.max(points[:,0])
    y_dim = np.max(points[:,1])
    z_dim = np.max(points[:,2])
    am = SparseMatrix(x_dim, y_dim, z_dim)
    for point in points:
        am.add(tuple(point[0:3]), point[4])
        
    return am

def get_dict_real(g, se_result, regions_path):
    nodes = g.nodes()
    points = np.genfromtxt(regions_path, delimiter=',')
    orig_dict = OrderedDict()
    d = {}
    
    sparse_mat = get_adj_mat(regions_path)

    for index, node in enumerate(nodes):        
        s = g.node[node]['attr']
        point = ast.literal_eval(s)
        region = sparse_mat.get(tuple(point))
        if region == -1:
            print 'FUCK'
        add_to_dict(d, region, index)
    
    for point in points:
        region = point[4]
#         if region in orig_dict:
#             orig_dict[region] = np.vstack((orig_dict[region], point[0:3]))
#         else:
#             orig_dict[region] = np.array([point[0:3]])
        add_to_dict(orig_dict, region, point[0:3])
    
    se_regions = {}
    for key, value in d.iteritems():
        index_list = value
        se_regions[key] = se_result[index_list]
    
    return se_regions, orig_dict

# def analyze_real(graph, se_result, input_path):
#     se_regions = get_dict_real(graph, se_result, input_path)
# #     c = get_connectivity_tot(se_regions)
# #     print('Predicted Connectivity')
# #     print(c)
#     plot_connectivity3d(se_regions)