 # Exploring Topological Data Analysis for Brain Network Connectivity
___

**Contact source**: `drewwilimitis@gmail.com`
<br>
___

 ## References and Sources:
 
 *sources on TDA*

Data Source for ADHD-200 Measurements: 
- https://fcon_1000.projects.nitrc.org/indi/adhd200/

GitHub Repositories I found helpful: 

 ## Overview of Experiment 

- We explore fMRI brain connectivity patterns of healthy controls and ADHD patients that were provided by multiple healthcare sites

- We use TDA to ...

**Overview of Current Datasets and Directories**:<br>

- We have four total folders labels as sites: KKL, NYU, Neuro, Peking 
<br>
- Each site's folder has two files (.mat files): one for ADHD group, one for healthy controls (20 subjects in each group)
<br>
- Each individual has a (190, 190) connectivity matrix based on correlation in activity patterns between 190 brain regions

**We attempt the following analyses:** <br>
...

 ## Mathematical Background/Overview
 ___
 
 ### header

## Import libraries and load data

In [None]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')
import seaborn as sns
import time
import sys
import os
import scipy.io

# import modules within repositoryhyperbolic_kmeans
sys.path.append('/Users/drew/Desktop/hyperbolic-learning/hyperbolic_kmeans')  # path to hkmeans folder
sys.path.append('/Users/drew/Desktop/hyperbolic-learning/utils') # path to utils folder
from utils import *
from hkmeans import HyperbolicKMeans, plot_clusters

# ignore warnings
import warnings
warnings.filterwarnings('ignore');

# display multiple outputs within a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all";

In [None]:
# initialize lists for mat files, site names, and dictionary with all connectivity data
mat_files = []
conn_dict = {}
sites = []

# iterate through site folders and load data files
for site in os.listdir('../../data/ADHD-200/sites/'):
    if '.DS' in site:
        continue
    print('Loading data for site: {}'.format(str(site)))
    sites.append(site)
    subdir = '../../data/ADHD-200/sites/' + str(site) + '/'
    for file in os.listdir(subdir):
        fpath = subdir + file
        mat = scipy.io.loadmat(fpath)
        group = list(mat.keys())[-1]
        data = mat[group]
        mat_files.append(mat)
        # reformat/standardize naming of sites and healthy/control groups 
        if 'Peking' in group:
            group = group.split('_')[0] + '_' + group.split('_')[-1]
        if 'KK_' in group:
            group = group.replace('KK', 'KKL')
        conn_dict[group] = np.array(data)

In [None]:
# explore sites and dictionary with data files
groups = conn_dict.keys()
print('Dimensions of group connectivity data matrices:')
for group in groups:
    print(group + ': ', np.shape(conn_dict[group]))

# new site names
sites = np.unique([x.split('_')[0] for x in groups])
print('\nNew site namings:')
print(sites)