### Helper Classes

First we get all of our helper modules. The prepare_EMG module will prepare the EMG data for phoneme recognition. The prepare_outputs module will prepare our target labels and align them with our EMG data. The module 'prepare_data' will help us read data from CSV into a dataframe. Finally, 'vis' will help visualize EMG data in both time and frequency domains. 

In [32]:
%load_ext autoreload
%autoreload 2

import prepare_EMG, prepare_outputs, prepare_data, vis
# autodetector = Output_Prep.detector
EMG_Prep = prepare_EMG.EMG_preparer(window_size=60.0)
# Output_Prep = prepare_outputs.output_preparer(subvocal_detector = autodetector, window_size=30.0)
Output_Prep = prepare_outputs.output_preparer(window_size=60.0,do_grid_search=False)

Data_Prep = prepare_data.data_preparer()



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


  y = column_or_1d(y, warn=True)


Training Score: 0.896547109208


### Labeling the Data

First, we need to visualize a few EMG voltage graphs to find some sections that most likely contain no subvocalization. Then, we'll need to find some regions that almost certainly do. These two classes of EMG readouts will serve to train an identifier to help us automatically label EMG windows with phonemes. The model used here will most likely be an SVC, inside "prepare_outputs". It will process each EMG window in order, and when it finds one that most likely contains subvocalization, it applies the next phoneme as that window's label. 

In [33]:
data_1 = Data_Prep.load('Sat Mar  4 00:44:23 2017')
data_2 = Data_Prep.load('Sat Mar  4 00:45:02 2017')
data_3 = Data_Prep.load('Sat Mar  4 00:45:47 2017')
data_4 = Data_Prep.load('Sat Mar  4 00:47:01 2017')
data_5 = Data_Prep.load('Sat Mar  4 00:47:36 2017')
data_6 = Data_Prep.load('Sat Mar  4 00:48:09 2017')
data_7 = Data_Prep.load('Sat Mar  4 00:49:05 2017')
data_8 = Data_Prep.load('Sat Mar  4 00:49:41 2017')
data_9 = Data_Prep.load('Sat Mar  4 00:50:22 2017')
data_10 = Data_Prep.load('Sat Mar  4 00:51:17 2017')
data_11 = Data_Prep.load('Sat Mar  4 00:52:02 2017')
data_12 = Data_Prep.load('Sat Mar  4 00:52:38 2017')
data_13 = Data_Prep.load('Sat Mar  4 00:53:24 2017')
data_14 = Data_Prep.load('Sat Mar  4 00:53:51 2017')
data_15 = Data_Prep.load('Sat Mar  4 00:54:25 2017')
data_16 = Data_Prep.load('Sat Mar  4 00:54:57 2017')
data_17 = Data_Prep.load('Sat Mar  4 00:56:01 2017')
data_18 = Data_Prep.load('Sat Mar  4 00:56:35 2017')
data_19 = Data_Prep.load('Sat Mar  4 00:57:21 2017')
data_20 = Data_Prep.load('Sat Mar  4 00:57:49 2017')
data_21 = Data_Prep.load('Sat Mar  4 00:58:59 2017')
data_22 = Data_Prep.load('Sat Mar  4 00:59:53 2017')

data_list = [data_1, data_2, data_3, data_4, data_5, data_6, data_7, data_8, data_9, data_10, data_11, data_12, data_13, data_14, data_15, data_16, data_17, data_18, data_19, data_20, data_21, data_22]

In [34]:
import pandas
%autoreload 2

num_files = len(data_list)
labels_frame = pandas.read_csv('austen_subvocal.csv')
trans_labels = Output_Prep.transform(labels_frame.iloc[0][0])
data_1_proc = EMG_Prep.process(data_1)
aligned_data, trans_labels= Output_Prep.zip(data_1_proc, trans_labels, repeat=3)

for file in range(1, num_files):
    trans_labels_iter = Output_Prep.transform(labels_frame.iloc[file][0])
    data_proc_iter = EMG_Prep.process(data_list[file])
    aligned_data_iter, trans_labels_iter = Output_Prep.zip(data_proc_iter, trans_labels_iter, repeat=3)

    aligned_data = aligned_data.append(aligned_data_iter)
    trans_labels = trans_labels.append(trans_labels_iter)
    
print('Aligned Data shape:',aligned_data.shape,'Trans labels shape:',trans_labels.shape)

Aligned Data shape: (5300, 60) Trans labels shape: (5300, 4)


### AF Extractor Models

These models will be optimized for extracting AF's from the data, before passing those AF's onto an MLPC for identifying the most likely phoneme. 

In [35]:
# Prepare lists of parameters for our GridSearch
# First, our layer sizes
layer_sizes = []
for i in range(2,3):
    for j in range(0,180,30):
        if j:
            tup = []
            for k in range(i):
                tup.append(j)
            layer_sizes.append(tuple(tup))
print('number layer sizes:',len(layer_sizes),'here be layer sizes',layer_sizes)

# Next, our alpha values
# alphas = [0.0000001,1,1000]

number layer sizes: 5 here be layer sizes [(30, 30), (60, 60), (90, 90), (120, 120), (150, 150)]


In [90]:
from sklearn.neural_network import MLPClassifier as MLPC
# Import other models to try for feature extraction
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

import copy

X_train, X_test, y_train, y_test = train_test_split(aligned_data, trans_labels, test_size=0.15, random_state=12)

combined_features = FeatureUnion([
    ('pca',PCA(random_state=18)),
    ('kbest',SelectKBest(k=1))
])

pipeline = Pipeline([
    ('features', combined_features),
    ('model', MLPC(random_state=12))
])


param_grid = {
    'features__pca__n_components':[0.5,0.66,0.9],
    'model__solver':['adam'],
    'model__hidden_layer_sizes':layer_sizes,
    'model__activation':['relu'],
#     'model__alpha': alphas,
    'model__max_iter':[200]
}


grid_search = GridSearchCV(pipeline, param_grid, n_jobs=-1)

manner_classifier = MLPC(solver='adam',hidden_layer_sizes=(21,21),random_state=3)
manner_classifier.fit(X_train, y_train['manner'])
m_score = manner_classifier.score(X_test, y_test['manner'])

place_classifier = MLPC(solver='adam',hidden_layer_sizes=(30,30),random_state=6)
place_classifier.fit(X_train, y_train['place'])
p_score = place_classifier.score(X_test, y_test['place'])

height_classifier = MLPC(solver='adam',hidden_layer_sizes=(30,30),random_state=9)
height_classifier.fit(X_train, y_train['height'])
h_score = height_classifier.score(X_test, y_test['height'])

vowel_classifier = MLPC(solver='adam',hidden_layer_sizes=(30,30),random_state=12)
vowel_classifier.fit(X_train, y_train['vowel'])
v_score = vowel_classifier.score(X_test, y_test['vowel'])

print('manner score:',m_score,'place score:',p_score,'height score:',h_score,'vowel score:',v_score)
# print(data_1_proc.head(50), trans_labels['manner'].head(50))

manner score: 0.405031446541 place score: 0.277987421384 height score: 0.494339622642 vowel score: 0.593710691824


In [41]:
manner_classifier2 = copy.deepcopy(grid_search)
manner_classifier2.fit(aligned_data, trans_labels['manner'])
m_score2 = manner_classifier2.score(aligned_data, trans_labels['manner'])

print('manner score:',m_score2)



manner score: 0.389433962264


In [53]:
print(manner_classifier2.best_estimator_.steps)

[('features', FeatureUnion(n_jobs=1,
       transformer_list=[('pca', PCA(copy=True, iterated_power='auto', n_components=0.9, random_state=18,
  svd_solver='auto', tol=0.0, whiten=False)), ('kbest', SelectKBest(k=1, score_func=<function f_classif at 0x7f550ab948c8>))],
       transformer_weights=None)), ('model', MLPClassifier(activation='relu', alpha=1, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(120, 120), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=12, shuffle=True,
       solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False))]


In [59]:
place_classifier2 = copy.deepcopy(grid_search)
place_classifier2.fit(aligned_data, trans_labels['place'])
p_score2 = place_classifier2.score(aligned_data, trans_labels['place'])

print('place score:',p_score2)



place score: 0.280943396226


In [62]:
print(place_classifier2.best_estimator_.steps)

[('features', FeatureUnion(n_jobs=1,
       transformer_list=[('pca', PCA(copy=True, iterated_power='auto', n_components=0.5, random_state=18,
  svd_solver='auto', tol=0.0, whiten=False)), ('kbest', SelectKBest(k=1, score_func=<function f_classif at 0x7f550ab948c8>))],
       transformer_weights=None)), ('model', MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(120, 120), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=12, shuffle=True,
       solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False))]


In [60]:
height_classifier2 = copy.deepcopy(grid_search)
height_classifier2.fit(aligned_data, trans_labels['height'])
h_score2 = height_classifier2.score(aligned_data, trans_labels['height'])

print('height score:',h_score2)



height score: 0.504905660377


In [61]:
vowel_classifier2 = copy.deepcopy(grid_search)
vowel_classifier2.fit(aligned_data, trans_labels['vowel'])
v_score2 = vowel_classifier2.score(aligned_data, trans_labels['vowel'])

print('vowel score:',v_score2)



vowel score: 0.610566037736


In [126]:
# Experiment with PCA here

from sklearn.decomposition import PCA


manner_union = FeatureUnion([('pca',PCA(n_components=0.03)),('kbest',SelectKBest(k=1))])
manner_reduced_data = manner_union.fit_transform(aligned_data, trans_labels['manner'])
X_train, X_test, y_train, y_test = train_test_split(manner_reduced_data, trans_labels, test_size=0.15, random_state=12)


manner_classifier3 = MLPC(solver='adam',alpha=1000,hidden_layer_sizes=(1),random_state=3,max_iter=300)
manner_classifier3.fit(X_train, y_train['manner'])
m_score3 = manner_classifier3.score(X_test, y_test['manner'])
print('manner score:',m_score3)

manner score: 0.405031446541


In [86]:
place_union = FeatureUnion([('pca',PCA(n_components=0.9)),('kbest',SelectKBest(k=1))])
place_reduced_data = place_union.fit_transform(aligned_data, trans_labels['place'])
place_classifier3 = MLPC(solver='adam',alpha=0.00001,hidden_layer_sizes=(120,120),random_state=6,max_iter=300)
place_classifier3.fit(place_reduced_data, trans_labels['place'])
p_score3 = place_classifier3.score(place_reduced_data, trans_labels['place'])
print('place classifier score:',p_score3)

place classifier score: 0.624716981132


In [87]:
height_union = FeatureUnion([('pca',PCA(n_components=0.9)),('kbest',SelectKBest(k=1))])
height_reduced_data = height_union.fit_transform(aligned_data, trans_labels['height'])
height_classifier3 = MLPC(solver='adam',alpha=0.00001,hidden_layer_sizes=(180,180),random_state=12,max_iter=300)
height_classifier3.fit(height_reduced_data, trans_labels['height'])
h_score3 = height_classifier3.score(height_reduced_data, trans_labels['height'])
print('height score:',h_score3)

height score: 0.907169811321


In [88]:
vowel_union = FeatureUnion([('pca',PCA(n_components=0.9)),('kbest',SelectKBest(k=1))])
vowel_reduced_data = vowel_union.fit_transform(aligned_data, trans_labels['vowel'])
vowel_classifier3 = MLPC(solver='adam',alpha=0.00001,hidden_layer_sizes=(180,180),random_state=12,max_iter=300)
vowel_classifier3.fit(vowel_reduced_data, trans_labels['vowel'])
v_score3 = vowel_classifier3.score(vowel_reduced_data, trans_labels['vowel'])
print('vowel score:',v_score3)

vowel score: 0.948113207547


In [None]:
print(aligned_data.head(),trans_labels['manner'].head())

phoneme_inputs = pandas.concat([aligned_data,trans_labels['manner'],trans_labels['place'],trans_labels['height'],trans_labels['vowel']],axis=1,join='outer')
phoneme_labels = trans_labels.axes[0]
phoneme_classifier = MLPC(solver='adam',hidden_layer_sizes=(90,90),random_state=6, max_iter=300)
phoneme_classifier.fit(phoneme_inputs, phoneme_labels)