# Summary

&emsp; This notebook contains code to evaluate the performance of a Recursive Neural Network (RNN) model in classifying upper limb position from electromyography data. Performance is assesed within each subject and across subjects. In the 'across subject' case, the model is trained on data from one subject and data from all other subjects is used as the test data. Overall, gesture classification performance on held-out data is quite high when the training and test data come form the same subject, but significantly drops when the test data comes from other subjects.

The following notebooks in this repo contain useful data and analysis pipeline details:
- data_exploration_and_quality_check_demo.ipynb
- single_subject_classification_demo.ipynb

RNN model performance is compared with performance form a simpler logistic regression in:
- compare_model_performance_within_and_across_subjects.ipynb

&emsp; The folder containing this notebook is expected to contain a utils.py script (containing custom functions for data wrangling and analysis) and the EMG_data folder (downloaded from: http://archive.ics.uci.edu/ml/datasets/EMG+data+for+gestures#)

***NOTE***: Training a RNN mode can take a while when using a CPU. I recommend running this portion on Google Colab or a GPU-equipped workstation.

In [1]:
#Run cell to mount Google Drive
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [3]:
#import necessary packages

#our workhorses
import numpy as np
import pandas as pd
import scipy

#to visualize
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
#style params for figures
sns.set(font_scale = 2)
plt.style.use('seaborn-white')
plt.rc("axes", labelweight="bold")
from IPython.display import display, HTML

#to load files
import os
import sys
import h5py

#append repo folder to search path
sys.path.append('/content/drive/MyDrive/limb-position-EMG-Repo/')
from utils import *


### Within-subject performance

&emsp; The code blocks below train and evaluates a RNN model to classify limb position from the pattern of signals across electrodes. In order to put the performance in context, it's useful to also measure classifier performance after randomly shuffling the class labels (i.e., erasing the relationship between signal and class).

Some details on data preparation and the model
+ Model schematic:
Input Layer -> GRU layer -> Droput -> Dense Layer -> softmax
+ Values are standardized across samples within each feature dimension
+ Model performance is assesed on the held-out set with stratified k-fold cross-validation which keeps the class balance across train/test splits.

&emsp; Model performance is assesed for each subject individually using different train/test splits of the data and the results are written to a file.

In [6]:
#define where the data files are located
data_folder = '/content/drive/MyDrive/limb-position-EMG-Repo/EMG_data/'

nsubjects = 36



# User-defined parameters
lo_freq = 20 #lower bound of bandpass filter
hi_freq = 450 #upper bound of bandpass filter

win_size = 100 #define window size over which to compute time-domain features
step = win_size #keeping this parameter in case we want to re-run later with some overlap

#excluded labels
exclude = [0]

for subject_id in range(nsubjects,nsubjects+1):
  subject_folder = os.path.join(data_folder,'%02d'%(subject_id))
  print('=======================')
  print(subject_folder)

  # Process data and get features 
  #get features across segments and corresponding info
  feature_matrix, target_labels, window_tstamps, \
  block_labels, series_labels = get_subject_data_for_classification(subject_folder, lo_freq, hi_freq, \
                                                                    win_size, step)

  #include only on labeled samples
  in_samples = np.where(np.isin(target_labels,exclude, invert = True))[0]
  feature_matrix_in = feature_matrix[in_samples,:]
  target_labels_in = target_labels[in_samples]
  window_tstamps_in = window_tstamps[in_samples]
  block_labels_in = block_labels[in_samples]
  #initialize empty list
  rnn_results_df = []

  # Set seed for replicability
  np.random.seed(1)

  # Repeat analysis over multiple repetitions to take into account stochasticity of experiment
  nreps = 10
  for rep in range(nreps):
      print('**Repetition %i'%(rep+1))

      #trained and evalute RNN model on labeled data
      train_f1, test_f1 = RNN_on_labeled_data(feature_matrix_in.T, target_labels_in, window_tstamps_in,\
                                                            block_labels_in, epochs = 40)
      
      # Put results in dataframe
      rnn_results_df.append(pd.DataFrame({'F1_score':train_f1,\
                                      'Rep':[rep+1 for x in range(train_f1.size)],\
                                      'Fold': np.arange(train_f1.size)+1,\
                                'Shuffled':[False for x in range(train_f1.size)],\
                                'Type':['Train' for x in range(train_f1.size)]}))
      
      rnn_results_df.append(pd.DataFrame({'F1_score':test_f1,\
                                      'Rep':[rep+1 for x in range(test_f1.size)],\
                                      'Fold': np.arange(test_f1.size)+1,\
                                'Shuffled':[False for x in range(test_f1.size)],\
                                'Type':['Test' for x in range(test_f1.size)]}))

      
      print('---Permuted Data---')
      train_f1, test_f1 = RNN_on_labeled_data(feature_matrix_in.T, target_labels_in, window_tstamps_in,\
                                                            block_labels_in, epochs = 40, permute = True)
      
      # Put results in dataframe
      rnn_results_df.append(pd.DataFrame({'F1_score':train_f1,\
                                      'Rep':[rep+1 for x in range(train_f1.size)],\
                                      'Fold': np.arange(train_f1.size)+1,\
                                'Shuffled':[True for x in range(train_f1.size)],\
                                'Type':['Train' for x in range(train_f1.size)]}))
      
      rnn_results_df.append(pd.DataFrame({'F1_score':test_f1,\
                                      'Rep':[rep+1 for x in range(test_f1.size)],\
                                      'Fold': np.arange(test_f1.size)+1,\
                                'Shuffled':[True for x in range(test_f1.size)],\
                                'Type':['Test' for x in range(test_f1.size)]}))
      
  #concatenate all dataframes
  rnn_results_df = pd.concat(rnn_results_df, axis =0)

  #save results
  results_folder =  os.path.join(data_folder,'..','results_data','RNN')
  results_fn = 'subject_%02d_within_subject_results.h5'%(subject_id)
  rnn_results_df.to_hdf(os.path.join(results_folder,results_fn), key='results_df', mode='w')

/content/drive/MyDrive/limb-position-EMG-Repo/EMG_data/36
**Repetition 1
(366,)
(366,)
(24,) (24,)
Split Count: 1
(18, 19, 16) (18, 19, 6)
(6, 17, 16) (6, 17, 6)
Training Model
Evaluating Model
Split Count: 2
(18, 17, 16) (18, 17, 6)
(6, 19, 16) (6, 19, 6)
Training Model
Evaluating Model
Split Count: 3
(18, 19, 16) (18, 19, 6)
(6, 16, 16) (6, 16, 6)
Training Model
Evaluating Model
Split Count: 4
(18, 19, 16) (18, 19, 6)
(6, 17, 16) (6, 17, 6)
Training Model
Evaluating Model
---Permuted Data---
(366,)
(366,)
(24,) (24,)
Split Count: 1
(18, 19, 16) (18, 19, 6)
(6, 16, 16) (6, 16, 6)
Training Model
Evaluating Model
Split Count: 2
(18, 19, 16) (18, 19, 6)
(6, 17, 16) (6, 17, 6)
Training Model
Evaluating Model
Split Count: 3
(18, 17, 16) (18, 17, 6)
(6, 19, 16) (6, 19, 6)
Training Model
Evaluating Model
Split Count: 4
(18, 19, 16) (18, 19, 6)
(6, 17, 16) (6, 17, 6)
Training Model
Evaluating Model
**Repetition 2
(366,)
(366,)
(24,) (24,)
Split Count: 1
(18, 17, 16) (18, 17, 6)
(6, 19, 16) (6

### Across-subject performance

&emsp; The code block below asseses model performance across subjects. The classifier is trained on data from one subject and tested on data from all other subjects. I exclude unlabeled class timepoints as well as timepoints with labels not collected for al subjects (class 7). This prevents further complications in comparing model performance across subjects.

Results are written to an hdf5 file. Trained models are also saved to file for later use.


In [7]:


#RNN training args - all other arguments are the same
verbose = 0
epochs = 40
batch_size = 2
nreps = 10
#excluded labels
exclude = [0,7]

for src_subject_id in range(nsubjects,nsubjects+1):


  model_folder = '/content/drive/MyDrive/limb-position-EMG-Repo/RNN_models/good_data'

  rnn_results_df = RNN_xsubject(data_folder, src_subject_id, nsubjects, nreps, lo_freq, hi_freq, win_size, step, exclude,\
                                model_folder, verbose, epochs, batch_size, permute = False)

  rnn_results_df['Shuffled'] = False

  #repeat with permuted data
  print('--Permuted Data---')
  model_folder = '/content/drive/MyDrive/limb-position-EMG-Repo/RNN_models/permuted_data'

  rnn_perm_results_df = RNN_xsubject(data_folder, src_subject_id, nsubjects, nreps, lo_freq, hi_freq, win_size, step, exclude,\
                                model_folder, verbose, epochs, batch_size, permute = True)

  rnn_perm_results_df['Shuffled'] = True

  #concatenate

  results_df = pd.concat([rnn_results_df, rnn_perm_results_df]).reset_index(drop = True)
  results_df['Train_Subject'] = src_subject_id

  #save to file
  results_folder =  os.path.join(data_folder,'..','results_data','RNN')
  results_fn = 'subject_%02d_across_subject_results.h5'%(src_subject_id)
  results_df.to_hdf(os.path.join(results_folder,results_fn), key='results_df', mode='w')

Source Subject :/content/drive/MyDrive/limb-position-EMG-Repo/EMG_data/36
Repetition: 1
(24, 19, 16) (24, 19, 6)
Training Model
Target Subject :/content/drive/MyDrive/limb-position-EMG-Repo/EMG_data/01
(24, 21, 16) (24, 21, 6)
Evaluating Model
Target Subject :/content/drive/MyDrive/limb-position-EMG-Repo/EMG_data/02
(24, 20, 16) (24, 20, 6)
Evaluating Model
Target Subject :/content/drive/MyDrive/limb-position-EMG-Repo/EMG_data/03
(24, 19, 16) (24, 19, 6)
Evaluating Model
Target Subject :/content/drive/MyDrive/limb-position-EMG-Repo/EMG_data/04
(24, 23, 16) (24, 23, 6)
Evaluating Model
Target Subject :/content/drive/MyDrive/limb-position-EMG-Repo/EMG_data/05
(24, 22, 16) (24, 22, 6)
Evaluating Model
Target Subject :/content/drive/MyDrive/limb-position-EMG-Repo/EMG_data/06
(24, 22, 16) (24, 22, 6)
Evaluating Model
Target Subject :/content/drive/MyDrive/limb-position-EMG-Repo/EMG_data/07
(24, 27, 16) (24, 27, 6)
Evaluating Model
Target Subject :/content/drive/MyDrive/limb-position-EMG-Rep