This notebook is for running a first try of BOWave on the Frolich et. al data to see if we match their paper's results.
This requires 16 < x < 32 gb of RAM. Recommend running on Caviness with --mem=32gb flag set.

#Load ICs

In [None]:
import BOWaves.utilities.dataloaders as dataloaders
import os
import numpy as np

In [None]:
#frolich_ics = {'ICs': np.array([]), 'labels': np.array([])}
frolich_ics = {'ICs': [], 'labels': []}

#for file in directory frolich data
frolich_data = os.listdir('../data/frolich')

#filter out subdirectories such as /img
frolich_data = [file for file in frolich_data if not os.path.isdir(file)]

for file in frolich_data:
    ICs, labels = dataloaders.load_and_visualize_mat_file_frolich('../data/frolich/' + file, visualize=False)
    frolich_ics['ICs'].extend(ICs)
    frolich_ics['labels'].extend(labels)


Now create codebooks since we have the ICs and their labels.

First split off 20% for testing.

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(frolich_ics['ICs'], frolich_ics['labels'], test_size=0.2, random_state=42)

Now out of the training set, split into the different classes. Frolich's data has 4 classes.

In [None]:
if len(X_train) != len(y_train):
    raise ValueError('X_train and y_train are not the same length.')

# Forgot what the classes were. check on Caviness
neural = {'ICs': [], 'centroids': [], 'labels': [], 'shifts': [], 'distances': [], 'inertia': []}
blink = {'ICs': [], 'centroids': [], 'labels': [], 'shifts': [], 'distances': [], 'inertia': []}
muscle = {'ICs': [], 'centroids': [], 'labels': [], 'shifts': [], 'distances': [], 'inertia': []}
mixed = {'ICs': [], 'centroids': [], 'labels': [], 'shifts': [], 'distances': [], 'inertia': []}


for i in range(len(X_train)):
    if y_train[i] == 'neural':
        neural['ICs'].append(X_train[i])
    elif y_train[i] == 'blink':
        blink['ICs'].append(X_train[i])
    elif y_train[i] == 'muscle':
        muscle['ICs'].append(X_train[i])
    elif y_train[i] == 'mixed':
        mixed['ICs'].append(X_train[i])
    else:
        raise ValueError('Unknown class label: ' + y_train[i])

In [None]:
from BOWaves.sikmeans.sikmeans_core import shift_invariant_k_means
metric, init = 'cosine', 'random'
num_clusters = 16
centroid_len = 256
n_runs = 3
n_jobs = 1
rng = np.random.RandomState(42)

#need to do this per class.
neural['centroids'], neural['labels'], neural['shifts'], neural['distances'], neural['inertia'], _ = shift_invariant_k_means(neural['ICs'], num_clusters, centroid_len, metric=metric, init=init, n_init=n_runs, rng=rng,  verbose=True, n_jobs=n_jobs)

blink['centroids'], blink['labels'], blink['shifts'], blink['distances'], blink['inertia'], _ = shift_invariant_k_means(blink['ICs'], num_clusters, centroid_len, metric=metric, init=init, n_init=n_runs, rng=rng,  verbose=True, n_jobs=n_jobs)

muscle['centroids'], muscle['labels'], muscle['shifts'], muscle['distances'], muscle['inertia'], _ = shift_invariant_k_means(muscle['ICs'], num_clusters, centroid_len, metric=metric, init=init, n_init=n_runs, rng=rng,  verbose=True, n_jobs=n_jobs)

mixed['centroids'], mixed['labels'], mixed['shifts'], mixed['distances'], mixed['inertia'], _ = shift_invariant_k_means(mixed['ICs'], num_clusters, centroid_len, metric=metric, init=init, n_init=n_runs, rng=rng,  verbose=True, n_jobs=n_jobs)

Now that we have the codebooks, let's run the bowav clf code.
We want to do leave one subject out cross validation to try and classify the labels on the held out test set.

First, define BOWav to create the bag-of-words representations of the features learned in the codebooks.

In [None]:
from BOWaves.sikmeans.sikmeans_core import _assignment_step

def bag_of_waves(codebooks):
    """
    Creates a bag-of-words representation of the input data using the codebooks.

    The codebooks are a list of dictionaries, where each dictionary contains the centroids, labels, shifts, distances, and raw ICs of a codebook. Therefore they're the only thing we need to pass in.

    Parameters
    ----------
    codebooks

    Returns
    -------
    X: matrix of shape (n_ics, n_features)
        The bag-of-words representation of the input data.
    """



In [None]:
from sklearn.model_selection import LeaveOneOut
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

