# **Transform module approach**

# General Idea
The general idea of a transform module is to pre-pend a number of layers to the feature extraction module of the network. These layers can be trained jointly or independently of the rest of the network, by freezing the weights of layers outside the tranform module. The transform module is trained with each subject's data independently in order to find a transformation of that subject's data that minimizes the overal network loss.

Data from a total of 6 subjects was used as a held-out test set. The data from the remaining 26 subjects was used to train the whole network.

# Transform module architecture and hyperparameter selection
  A 4-fold cross-validation scheme across subjects was used to select the transform module architecture and activation function that provided the best cross-validation test score. During these training scheme, subjects' data in the held-out fold were not included in the training of the feature extraction layers. Model performance for these held-out subjects was evaluated with a 2-fold cross-validations scheme within each subject. That is, one half of each subject's data was used to train the transfer module portion of the network and model performance was evaluated on the other half.
*The results of the above approach for various transofrm module architecture and hyperparameter settings can be found in dev_notebooks/[dev]visualize_xsubject_robustness_transform_module. *  Cross-validation scores from the above scheme showed that a transform module with a single layer and a linear activation function had the best generalization performance on held-out subjects not used to train the feature extration module of the network. 


# Training and testing model
A transform module consisting of a single layer with linear activation function is pre-pended to the feature extraction module of the neural network. This transfer module is independently trained for each training subject while the feature extraction layers are trained jointly across all training subjects. After this training stage, the weights of layers outside the transform module are frozen.

Model test performance was obtained by evaluating model with a set of 6 held-out subjects not used to trained the feature extraction module. For these test subjects, transform module layers were trained with half of the data and model performance was evaluated on the held-out data. Cross-validated test performance was obtained with a 2-fold cross-validation scheme.


In [None]:
#Run cell to mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [2]:
# install package to have access to custom functions
%pip install /content/drive/Othercomputers/'My MacBook Pro'/EMG_gestures/ --use-feature=in-tree-build

Processing ./drive/Othercomputers/My MacBook Pro/EMG_gestures
Building wheels for collected packages: EMG-gestures
  Building wheel for EMG-gestures (setup.py) ... [?25l[?25hdone
  Created wheel for EMG-gestures: filename=EMG_gestures-0.1.0-py3-none-any.whl size=31272 sha256=9fc7704d0108b73ce5837e43368cb42bcc09e9a318321891916fbeb7b27fb77b
  Stored in directory: /tmp/pip-ephem-wheel-cache-td8y6jv8/wheels/74/96/87/ceb916fceabb875209ae993e697bf574966ab592f4167a4958
Successfully built EMG-gestures
Installing collected packages: EMG-gestures
Successfully installed EMG-gestures-0.1.0


In [12]:
#import necessary packages

#our workhorses
import numpy as np
import pandas as pd
import scipy

#to visualize
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
#style params for figures
sns.set(font_scale = 2)
plt.style.use('seaborn-white')
plt.rc("axes", labelweight="bold")
from IPython.display import display, HTML

#to load files
import os
import sys
import h5py
import pickle
import keras

#import cusotm functions
from EMG_gestures.utils import *
from EMG_gestures.analysis import nn_xsubject_transform_module_train_frac_subjects,\
 nn_xsubject_transform_module_train_all_subjects,\
 nn_xsubject_transform_module_test_subject_eval


In [None]:
#define where the data files are located
data_folder = '/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/'

nsubjects = 36

#randomly-selected subjects to use as hold-out test data 
test_subjects = [10, 12, 20, 14, 23, 34,  0]

# User-defined parameters
lo_freq = 20 #lower bound of bandpass filter
hi_freq = 450 #upper bound of bandpass filter

win_size = 100 #define window size over which to compute time-domain features
step = win_size #keeping this parameter in case we want to re-run later with some overlap



In [None]:
#intialize empty lists
feature_matrix_all = np.empty((0,0))
target_labels_all = np.empty((0,))
window_tstamps_all = np.empty((0,))
block_labels_all  = np.empty((0,))
series_labels_all  = np.empty((0,))
subject_id_all = np.empty((0,))
block_count = 0

for subject_id in range(1,nsubjects+1):
    if subject_id not in test_subjects:
        subject_folder = os.path.join(data_folder,'%02d'%(subject_id))
        print('=======================')
        print(subject_folder)

        # Process data and get features 
        #get features across segments and corresponding info
        feature_matrix, target_labels, window_tstamps, \
        block_labels, series_labels = get_subject_data_for_classification(subject_folder, lo_freq, hi_freq, \
                                                                        win_size, step)

        #prevent repeat of block labels by increasing block count
        block_labels = block_labels+block_count
        block_count = np.max([block_count, np.max(block_labels)])


        # concatenate lists
        feature_matrix_all = np.vstack((feature_matrix_all,feature_matrix)) if feature_matrix_all.size else feature_matrix
        target_labels_all = np.hstack((target_labels_all,target_labels))
        window_tstamps_all = np.hstack((window_tstamps_all,window_tstamps))
        block_labels_all = np.hstack((block_labels_all,block_labels))
        series_labels_all = np.hstack((series_labels_all,series_labels))
        subject_id_all = np.hstack((subject_id_all,np.ones((block_labels.size))*subject_id))
        

/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/01
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/02
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/03
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/04
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/05
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/06
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/07
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/08
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/09
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/11
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/13
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/15
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/16
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/17
/content/drive/Other

In [14]:

#define hyper params for each model
model_dict = {0:{'tm_layers':0,'tm_activation':'','fe_layers':1, 'fe_activation':'tanh'},\
              1:{'tm_layers':1,'tm_activation':'linear','fe_layers':1, 'fe_activation':'tanh'},\
              2:{'tm_layers':1,'tm_activation':'tanh','fe_layers':1, 'fe_activation':'tanh'},\
              3:{'tm_layers':1,'tm_activation':'relu','fe_layers':1, 'fe_activation':'tanh'},\
              4:{'tm_layers':2,'tm_activation':'linear','fe_layers':1, 'fe_activation':'tanh'},\
              5:{'tm_layers':2,'tm_activation':'tanh','fe_layers':1, 'fe_activation':'tanh'},\
              6:{'tm_layers':2,'tm_activation':'relu','fe_layers':1, 'fe_activation':'tanh'},\
              }



In [None]:
results_folder = '/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/results_data/xsubject_transform_module/simple_NN/'

#network training args 
verbose = 0
epochs = 200
batch_size = 2
es_patience = 5

#validation scheme args
n_train_splits = 4
n_val_splits = 2
nreps = 10

#excluded labels
exclude = [0,7]
#performance metrics
score_list = ['f1','accuracy']



for model_id in range(4,6+1):

    results_model_df = []

    for rep in range(nreps):
        np.random.seed(rep)#to replicate results

        print('Model %i || Rep %02d'%(model_id, rep+1))
        print('----True Data----')
        rep_results_df = nn_xsubject_transform_module_train_frac_subjects(feature_matrix_all, target_labels_all, subject_id_all, block_labels_all,\
                                                        series_labels_all, model_dict[model_id], exclude, score_list,\
                                                        n_train_splits = n_train_splits,n_val_splits = n_val_splits,\
                                                        verbose = verbose, epochs = epochs, batch_size = batch_size,\
                                                        es_patience = es_patience, permute = False)
        #add details and concatenate dataframe
        rep_results_df['Shuffled'] = False
        rep_results_df['Rep'] =  rep+1
        rep_results_df['Model'] = model_id
        results_model_df.append(rep_results_df)

        print('Model %i || Rep %02d'%(model_id, rep+1))
        print('----Permuted Data----')
        rep_results_df = nn_xsubject_transform_module_train_frac_subjects(feature_matrix_all, target_labels_all, subject_id_all, block_labels_all,\
                                                        series_labels_all, model_dict[model_id], exclude, score_list,\
                                                        n_train_splits = n_train_splits,n_val_splits = n_val_splits,\
                                                        verbose = verbose, epochs = epochs, batch_size = batch_size,\
                                                        es_patience = es_patience, permute = True)
        # add details and concatenate dataframe
        rep_results_df['Shuffled'] = True
        rep_results_df['Rep'] =  rep+1
        rep_results_df['Model'] = model_id
        results_model_df.append(rep_results_df)

    results_model_df = pd.concat(results_model_df,axis = 0)
    #save results to file
    results_fn = 'model_%02d_results.h5'%(model_id)
    results_model_df.to_hdf(os.path.join(results_folder,results_fn), key='results_df', mode='w')
print('***Finished!**')

In [None]:

results_model_df.groupby(['Shuffled','Type']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Subject,Fold,Epochs,Batch_Size,Train_Loss,Val_Loss,Epochs_Trained,f1_score,accuracy_score,Rep,Model
Shuffled,Type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
False,Train,6.0,2.555556,200.0,2.0,0.176355,0.210548,34.296296,0.959739,0.959872,1.0,0.0
False,Val_Test,6.0,2.333333,200.0,2.0,1.996984,1.965846,6.0,0.566185,0.594131,1.0,0.0
False,Val_Train,6.0,2.333333,200.0,2.0,1.996984,1.965846,6.0,0.566185,0.594131,1.0,0.0
True,Train,6.0,2.555556,200.0,2.0,1.390585,2.240562,7.518519,0.350335,0.39641,1.0,0.0
True,Val_Test,6.0,2.333333,200.0,2.0,2.203371,2.539136,6.0,0.171627,0.199007,1.0,0.0
True,Val_Train,6.0,2.333333,200.0,2.0,2.203371,2.539136,6.0,0.171627,0.199007,1.0,0.0


In [None]:
results_folder = '/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/results_data/xsubject_transform_module/simple_NN/'
model_dir = '/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/model_data/xsubject_transform_module/simple_NN/'
figure_dir = '/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/figures/training_history/xsubject_transform_module/simple_NN/'

#network training args 
verbose = 0
epochs = 200
batch_size = 2
es_patience = 5

nreps = 10

#use first series to train (let it be input)
# train_idxs = np.where(series_labels_all==0)[0]
# test_idxs = np.where(series_labels_all==1)[0]

train_idxs = np.where(series_labels_all>=0)[0]
test_idxs = np.array([])
exclude = [0,7]

score_list = ['f1','accuracy']#performance metrics

model_id = 1
results_model_df = []
np.random.seed(1)
for rep in range(nreps):

    print('Model %i || Rep %02d'%(model_id, rep+1))
    print('----True Data----')
    figure_folder = os.path.join(figure_dir,'rep_%i'%(rep+1))
    if not os.path.isdir(figure_folder):
        os.makedirs(figure_folder)
    model_folder = os.path.join(model_dir,'rep_%i'%(rep+1))
    if not os.path.isdir(model_folder):
        os.makedirs(model_folder)

    results_df, scaler = nn_xsubject_transform_module_train_all_subjects(feature_matrix_all, target_labels_all, subject_id_all, block_labels_all,\
                                                            train_idxs, test_idxs,  model_dict[model_id], exclude, score_list,\
                                                            figure_folder = figure_folder, model_folder = model_folder,\
                                                            verbose = verbose, epochs = epochs, batch_size = batch_size,\
                                                            es_patience = es_patience, permute = False)
    #fill in details and append to list
    results_df['Shuffled'] = False
    results_df['Rep'] = rep+1
    results_model_df.append(results_df)

    print('Model %i || Rep %02d'%(model_id, rep+1))
    print('----Permuted Data----')
    results_df, scaler = nn_xsubject_transform_module_train_all_subjects(feature_matrix_all, target_labels_all, subject_id_all, block_labels_all,\
                                                            train_idxs, test_idxs, model_dict[model_id], exclude, score_list,\
                                                            figure_folder = None, model_folder = None,\
                                                            verbose = verbose, epochs = epochs, batch_size = batch_size,\
                                                            es_patience = es_patience,permute = True)
    #fill in details and append to list
    results_df['Shuffled'] = True
    results_df['Rep'] = rep+1
    results_model_df.append(results_df)
results_model_df = pd.concat(results_model_df,axis = 0)

#save results to file
results_fn = 'train_model_transform_module_all_training_data_results.h5'
results_model_df.to_hdf(os.path.join(results_folder,results_fn), key='results_df', mode='w')

#save scaler
scaler_fn = 'trained_scaler_all_training_data.pkl'
with open(os.path.join(model_dir,scaler_fn), "wb") as output_file:
    pickle.dump(scaler, output_file)

In [4]:
# LOAD TEST SUBJECT DATA
#define where the data files are located
data_folder = '/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/'

#randomly-selected subjects to use as hold-out test data 
test_subjects = [10, 12, 20, 14, 23, 34]

# User-defined parameters
lo_freq = 20 #lower bound of bandpass filter
hi_freq = 450 #upper bound of bandpass filter

win_size = 100 #define window size over which to compute time-domain features
step = win_size #keeping this parameter in case we want to re-run later with some overlap

#intialize empty lists
feature_matrix_all = np.empty((0,0))
target_labels_all = np.empty((0,))
window_tstamps_all = np.empty((0,))
block_labels_all  = np.empty((0,))
series_labels_all  = np.empty((0,))
subject_id_all = np.empty((0,))
block_count = 0

for subject_id in test_subjects:
    subject_folder = os.path.join(data_folder,'%02d'%(subject_id))
    print('=======================')
    print(subject_folder)

    # Process data and get features 
    #get features across segments and corresponding info
    feature_matrix, target_labels, window_tstamps, \
    block_labels, series_labels = get_subject_data_for_classification(subject_folder, lo_freq, hi_freq, \
                                                                    win_size, step)

    #prevent repeat of block labels by increasing block count
    block_labels = block_labels+block_count
    block_count = np.max([block_count, np.max(block_labels)])

    # concatenate lists
    feature_matrix_all = np.vstack((feature_matrix_all,feature_matrix)) if feature_matrix_all.size else feature_matrix
    target_labels_all = np.hstack((target_labels_all,target_labels))
    window_tstamps_all = np.hstack((window_tstamps_all,window_tstamps))
    block_labels_all = np.hstack((block_labels_all,block_labels))
    series_labels_all = np.hstack((series_labels_all,series_labels))
    subject_id_all = np.hstack((subject_id_all,np.ones((block_labels.size))*subject_id))

/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/10
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/12
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/20
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/14
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/23
/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/EMG_data/34


In [31]:
# EVALUATE TEST SUBJECTS

model_dir = '/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/model_data/xsubject_transform_module/simple_NN/'
results_folder = '/content/drive/Othercomputers/My MacBook Pro/EMG_gestures/results_data/xsubject_transform_module/simple_NN/'
#network training and evaluation args 
verbose = 0
epochs = 200
batch_size = 2
es_patience = 5
n_splits = 2
# architecture hyper-parameters of model. useful to build untrained model with randomized weights
model_dict = {'tm_layers':1,'tm_activation':'linear',\
              'fe_layers':1, 'fe_activation':'tanh'}

#excluded labels
exclude = [0,7]
#performance metrics
score_list = ['f1','accuracy']
nreps = 10

# load data scaler 
scaler_fn = 'trained_scaler_all_training_data.pkl'#
with open(os.path.join(model_dir,scaler_fn), "rb") as output_file:
    scaler = pickle.load( output_file)
results_df = []
np.random.seed(1) #set seed for replicability
for rep in range(nreps):
    
    print('Rep %02d'%(rep+1))
    
    model_folder = os.path.join(model_dir,'rep_%i'%(rep+1))
    #load trained model
    model_fn = os.path.join(model_folder, 'trained_model_all_train_data_permuted_%s.h5'%(str(False)))
    trained_model = keras.models.load_model(model_fn)
    
    # evaluate with true data
    print('----True Data----')
    
    rep_results_df = nn_xsubject_transform_module_test_subject_eval(feature_matrix_all, target_labels_all, subject_id_all, block_labels_all,\
                                                        series_labels_all, trained_model, scaler, model_dict, exclude, score_list,\
                                                        n_splits = n_splits, verbose = verbose, epochs = epochs, batch_size = batch_size,\
                                                        es_patience = es_patience)
    #fill in details and append to list
    rep_results_df['Shuffled'] = False
    rep_results_df['Rep'] = rep+1
    results_df.append(rep_results_df)
    # evaluate with permuted data
    print('----Permuted Data----')
    #load model trained with permuted data
    rep_results_df = nn_xsubject_transform_module_test_subject_eval(feature_matrix_all, target_labels_all, subject_id_all, block_labels_all,\
                                                        series_labels_all, trained_model, scaler, model_dict, exclude, score_list,\
                                                        n_splits = n_splits, verbose = verbose, epochs = epochs, batch_size = batch_size,\
                                                        es_patience = es_patience, permute = True)
    #fill in details and append to list
    rep_results_df['Shuffled'] = True
    rep_results_df['Rep'] = rep+1
    results_df.append(rep_results_df)

results_df = pd.concat(results_df,axis = 0).reset_index().drop(columns = ['index'])
#save results to file
results_fn = 'train_model_transform_module_all_testing_data_results.h5'
results_df.to_hdf(os.path.join(results_folder,results_fn), key='results_df', mode='w')  

Rep 01
----True Data----
Test: Subject 01 out of 06
Test: Subject 02 out of 06
Test: Subject 03 out of 06
Test: Subject 04 out of 06
Test: Subject 05 out of 06
Test: Subject 06 out of 06
----Permuted Data----
Test: Subject 01 out of 06
Test: Subject 02 out of 06
Test: Subject 03 out of 06
Test: Subject 04 out of 06
Test: Subject 05 out of 06
Test: Subject 06 out of 06
Rep 02
----True Data----
Test: Subject 01 out of 06
Test: Subject 02 out of 06
Test: Subject 03 out of 06
Test: Subject 04 out of 06
Test: Subject 05 out of 06
Test: Subject 06 out of 06
----Permuted Data----
Test: Subject 01 out of 06
Test: Subject 02 out of 06
Test: Subject 03 out of 06
Test: Subject 04 out of 06
Test: Subject 05 out of 06
Test: Subject 06 out of 06
Rep 03
----True Data----
Test: Subject 01 out of 06
Test: Subject 02 out of 06
Test: Subject 03 out of 06
Test: Subject 04 out of 06
Test: Subject 05 out of 06
Test: Subject 06 out of 06
----Permuted Data----
Test: Subject 01 out of 06
Test: Subject 02 out o

In [32]:
results_df.groupby(['Shuffled','Type']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Subject,Fold,Epochs,Batch_Size,Train_Loss,Val_Loss,Epochs_Trained,f1_score,accuracy_score,Rep
Shuffled,Type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
False,Test,18.833333,1.5,200.0,2.0,0.374142,0.370099,37.791667,0.895771,0.902171,5.5
False,Train,18.833333,1.5,200.0,2.0,0.374142,0.370099,37.791667,0.948425,0.949725,5.5
True,Test,18.833333,1.5,200.0,2.0,2.580333,4.617162,10.141667,0.112291,0.124739,5.5
True,Train,18.833333,1.5,200.0,2.0,2.580333,4.617162,10.141667,0.400852,0.434168,5.5
