**EMG Pattern Classification**

For recording patterns, the author used a MYO Thalmic bracelet worn on a user’s forearm, and a PC with a Bluetooth receiver. The bracelet is equipped with **eight sensors** equally spaced around the forearm that simultaneously acquire myographic signals. The signals are sent through a Bluetooth interface to a PC. 


    Author: Debanjan Saha
    College: Northeastern University
    Group: Project Group 7
    Batch: Wednesday
    Course: IE 7300
    Professor: Ramin M.

This dataset contains raw EMG data for **36** subjects while they performed series of static hand gestures.The subject performs two series, each of which consists of **six (seven) basic gestures**. Each gesture was **performed for 3 seconds** with a **pause** of **3 seconds between gestures**.

Relevant Paper:
Lobov S., Krilova N., Kastalskiy I., Kazantsev V., Makarov V.A. Latent Factors Limiting the Performance of sEMG-Interfaces. Sensors. 2018;18(4):1122. doi: 10.3390/s18041122

In [None]:
#!pip install nolds
!pip install pycatch22
#!pip install torchsummary
# !pip install pyeeg

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pycatch22
  Downloading pycatch22-0.4.2.tar.gz (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.0/49.0 KB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: pycatch22
  Building wheel for pycatch22 (pyproject.toml) ... [?25l[?25hdone
  Created wheel for pycatch22: filename=pycatch22-0.4.2-cp39-cp39-linux_x86_64.whl size=118675 sha256=24d6d474a88d40bfaaf5ac1deb278bd9127a383933e45490ef45c327c07c7904
  Stored in directory: /root/.cache/pip/wheels/c0/84/da/f210e9de22c6265163dac19287b0674e040605dfc519d83ca5
Successfully built pycatch22
Installing collected packages: pycatch22
Successfully install

In [None]:
# import warnings
import os
from tqdm import tqdm
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# import pywt
# import librosa
# import nolds
# import pyeeg
# warnings.filterwarnings('ignore')
sns.set_style('darkgrid')
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_colwidth', 500)

In [None]:
# from scipy import stats
import pycatch22
# import tensorflow as tf
# import torch
# import torch.nn as nn
# import torch.nn.functional as F
# from torch.utils.data import TensorDataset, DataLoader
# from torchsummary import summary
from sklearn.model_selection import train_test_split

In [None]:
from google.colab import drive, files
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
%cd '/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master'
!ls -lrt

/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master
total 146
-rw------- 1 root root 1453 Dec  7  2018 README.txt
drwx------ 2 root root 4096 Apr  2 17:32 25
drwx------ 2 root root 4096 Apr  2 17:32 24
drwx------ 2 root root 4096 Apr  2 17:32 23
drwx------ 2 root root 4096 Apr  2 17:32 22
drwx------ 2 root root 4096 Apr  2 17:32 15
drwx------ 2 root root 4096 Apr  2 17:32 14
drwx------ 2 root root 4096 Apr  2 17:32 13
drwx------ 2 root root 4096 Apr  2 17:32 12
drwx------ 2 root root 4096 Apr  2 17:32 06
drwx------ 2 root root 4096 Apr  2 17:32 01
drwx------ 2 root root 4096 Apr  2 17:32 36
drwx------ 2 root root 4096 Apr  2 17:32 31
drwx------ 2 root root 4096 Apr  2 17:32 30
drwx------ 2 root root 4096 Apr  2 17:32 26
drwx------ 2 root root 4096 Apr  2 17:32 21
drwx------ 2 root root 4096 Apr  2 17:32 19
drwx------ 2 root root 4096 Apr  2 17:32 10
drwx------ 2 root root 4096 Apr  2 17:32 09
drwx------ 2 root root 4096 Apr  2 17:32 08
drwx------ 2 

In [None]:
# List out all the available files in the project's environment
files_path = set()
for dirname, _, filenames in os.walk('/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/'):
    for filename in filenames:
        f_path = os.path.join(dirname, filename)
        files_path.add(f_path)

In [None]:
files_path.discard('/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/README.txt')
files_path.discard('/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/all_ext_features_data.csv')
files_path.discard('/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/.DS_Store')
files_path

{'/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/01/1_raw_data_13-12_22.03.16.txt',
 '/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/01/2_raw_data_13-13_22.03.16.txt',
 '/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/02/1_raw_data_14-19_22.03.16.txt',
 '/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/02/2_raw_data_14-21_22.03.16.txt',
 '/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/03/1_raw_data_09-32_11.04.16.txt',
 '/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/03/2_raw_data_09-34_11.04.16.txt',
 '/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/04/1_raw_data_18-02_24.04.16.txt',
 '/content/drive/MyDrive/Northeastern/Projects/IE7300 Project/EMG_data_for_gestures-master/04/2_raw_data_18-03_24.04.1

In [None]:
df_time_series = pd.DataFrame()
for id, file in enumerate(tqdm(sorted(files_path))):
    print(file.split('/')[-2])
    sub = int(file.split('/')[-2])
    data = pd.read_csv(file, sep='\t')
    subject_id = np.ones([len(data),1], dtype=np.int16)*sub
    print('\tSubject Id: \t', sub, '\tShape: ', subject_id.shape)
    # Add the subject into the data
    data['subject_id'] = subject_id
    df_time_series = pd.concat([df_time_series, data], axis=0, ignore_index=True, copy=True)
    # if id%2 == 1: 
    #     all_train = pd.concat([all_train, data], axis=0, ignore_index=True, copy=True)
    # else:
    #     all_test = pd.concat([all_test, data], axis=0, ignore_index=True, copy=True)

print('Total Records: ', df_time_series.shape)

  0%|          | 0/72 [00:00<?, ?it/s]

01


  1%|▏         | 1/72 [00:00<00:31,  2.28it/s]

	Subject Id: 	 1 	Shape:  (63196, 1)
01


  3%|▎         | 2/72 [00:00<00:28,  2.46it/s]

	Subject Id: 	 1 	Shape:  (57974, 1)
02


  4%|▍         | 3/72 [00:01<00:31,  2.16it/s]

	Subject Id: 	 2 	Shape:  (72322, 1)
02


  6%|▌         | 4/72 [00:01<00:32,  2.08it/s]

	Subject Id: 	 2 	Shape:  (64104, 1)
03


  7%|▋         | 5/72 [00:02<00:28,  2.31it/s]

	Subject Id: 	 3 	Shape:  (56568, 1)
03


  8%|▊         | 6/72 [00:02<00:29,  2.26it/s]

	Subject Id: 	 3 	Shape:  (49217, 1)
04


 10%|▉         | 7/72 [00:03<00:27,  2.37it/s]

	Subject Id: 	 4 	Shape:  (59107, 1)
04


 11%|█         | 8/72 [00:03<00:25,  2.49it/s]

	Subject Id: 	 4 	Shape:  (55091, 1)
05


 12%|█▎        | 9/72 [00:03<00:25,  2.50it/s]

	Subject Id: 	 5 	Shape:  (57118, 1)
05


 14%|█▍        | 10/72 [00:04<00:24,  2.56it/s]

	Subject Id: 	 5 	Shape:  (50130, 1)
06


 15%|█▌        | 11/72 [00:04<00:22,  2.68it/s]

	Subject Id: 	 6 	Shape:  (51078, 1)
06


 17%|█▋        | 12/72 [00:04<00:22,  2.70it/s]

	Subject Id: 	 6 	Shape:  (48541, 1)
07


 18%|█▊        | 13/72 [00:05<00:26,  2.22it/s]

	Subject Id: 	 7 	Shape:  (68697, 1)
07


 19%|█▉        | 14/72 [00:05<00:25,  2.25it/s]

	Subject Id: 	 7 	Shape:  (63943, 1)
08


 21%|██        | 15/72 [00:06<00:24,  2.37it/s]

	Subject Id: 	 8 	Shape:  (60280, 1)
08


 22%|██▏       | 16/72 [00:06<00:23,  2.39it/s]

	Subject Id: 	 8 	Shape:  (57668, 1)
09


 24%|██▎       | 17/72 [00:07<00:22,  2.43it/s]

	Subject Id: 	 9 	Shape:  (62770, 1)
09


 25%|██▌       | 18/72 [00:07<00:22,  2.42it/s]

	Subject Id: 	 9 	Shape:  (64877, 1)
10


 26%|██▋       | 19/72 [00:07<00:22,  2.33it/s]

	Subject Id: 	 10 	Shape:  (61641, 1)
10


 28%|██▊       | 20/72 [00:08<00:21,  2.38it/s]

	Subject Id: 	 10 	Shape:  (61448, 1)
11


 29%|██▉       | 21/72 [00:08<00:21,  2.33it/s]

	Subject Id: 	 11 	Shape:  (74681, 1)
11


 31%|███       | 22/72 [00:09<00:22,  2.27it/s]

	Subject Id: 	 11 	Shape:  (72645, 1)
12


 32%|███▏      | 23/72 [00:09<00:22,  2.19it/s]

	Subject Id: 	 12 	Shape:  (65920, 1)
12


 33%|███▎      | 24/72 [00:10<00:21,  2.18it/s]

	Subject Id: 	 12 	Shape:  (62631, 1)
13


 35%|███▍      | 25/72 [00:10<00:23,  2.01it/s]

	Subject Id: 	 13 	Shape:  (75676, 1)
13


 36%|███▌      | 26/72 [00:11<00:23,  1.96it/s]

	Subject Id: 	 13 	Shape:  (77564, 1)
14


 38%|███▊      | 27/72 [00:11<00:22,  2.04it/s]

	Subject Id: 	 14 	Shape:  (52821, 1)
14


 39%|███▉      | 28/72 [00:12<00:19,  2.21it/s]

	Subject Id: 	 14 	Shape:  (48182, 1)
15


 40%|████      | 29/72 [00:12<00:18,  2.33it/s]

	Subject Id: 	 15 	Shape:  (53553, 1)
15


 42%|████▏     | 30/72 [00:12<00:16,  2.48it/s]

	Subject Id: 	 15 	Shape:  (51843, 1)
16


 43%|████▎     | 31/72 [00:13<00:17,  2.39it/s]

	Subject Id: 	 16 	Shape:  (55489, 1)
16


 44%|████▍     | 32/72 [00:13<00:16,  2.45it/s]

	Subject Id: 	 16 	Shape:  (50012, 1)
17


 46%|████▌     | 33/72 [00:14<00:15,  2.45it/s]

	Subject Id: 	 17 	Shape:  (65227, 1)
17


 47%|████▋     | 34/72 [00:14<00:15,  2.46it/s]

	Subject Id: 	 17 	Shape:  (66858, 1)
18


 49%|████▊     | 35/72 [00:14<00:15,  2.47it/s]

	Subject Id: 	 18 	Shape:  (62354, 1)
18


 50%|█████     | 36/72 [00:15<00:14,  2.42it/s]

	Subject Id: 	 18 	Shape:  (66958, 1)
19


 51%|█████▏    | 37/72 [00:15<00:14,  2.43it/s]

	Subject Id: 	 19 	Shape:  (58818, 1)
19


 53%|█████▎    | 38/72 [00:16<00:14,  2.41it/s]

	Subject Id: 	 19 	Shape:  (51088, 1)
20


 54%|█████▍    | 39/72 [00:16<00:13,  2.41it/s]

	Subject Id: 	 20 	Shape:  (65349, 1)
20


 56%|█████▌    | 40/72 [00:17<00:13,  2.45it/s]

	Subject Id: 	 20 	Shape:  (62504, 1)
21


 57%|█████▋    | 41/72 [00:17<00:12,  2.46it/s]

	Subject Id: 	 21 	Shape:  (62365, 1)
21


 58%|█████▊    | 42/72 [00:17<00:12,  2.41it/s]

	Subject Id: 	 21 	Shape:  (56882, 1)
22


 60%|█████▉    | 43/72 [00:18<00:11,  2.48it/s]

	Subject Id: 	 22 	Shape:  (61788, 1)
22


 61%|██████    | 44/72 [00:18<00:11,  2.51it/s]

	Subject Id: 	 22 	Shape:  (57841, 1)
23


 62%|██████▎   | 45/72 [00:19<00:10,  2.49it/s]

	Subject Id: 	 23 	Shape:  (60135, 1)
23


 64%|██████▍   | 46/72 [00:19<00:10,  2.42it/s]

	Subject Id: 	 23 	Shape:  (55484, 1)
24


 65%|██████▌   | 47/72 [00:19<00:10,  2.35it/s]

	Subject Id: 	 24 	Shape:  (61968, 1)
24


 67%|██████▋   | 48/72 [00:20<00:10,  2.33it/s]

	Subject Id: 	 24 	Shape:  (54972, 1)
25


 68%|██████▊   | 49/72 [00:20<00:10,  2.10it/s]

	Subject Id: 	 25 	Shape:  (62681, 1)
25


 69%|██████▉   | 50/72 [00:21<00:10,  2.11it/s]

	Subject Id: 	 25 	Shape:  (54066, 1)
26


 71%|███████   | 51/72 [00:21<00:09,  2.15it/s]

	Subject Id: 	 26 	Shape:  (56151, 1)
26


 72%|███████▏  | 52/72 [00:22<00:09,  2.11it/s]

	Subject Id: 	 26 	Shape:  (50003, 1)
27


 74%|███████▎  | 53/72 [00:22<00:08,  2.12it/s]

	Subject Id: 	 27 	Shape:  (56688, 1)
27


 75%|███████▌  | 54/72 [00:23<00:08,  2.10it/s]

	Subject Id: 	 27 	Shape:  (50111, 1)
28


 76%|███████▋  | 55/72 [00:23<00:08,  1.96it/s]

	Subject Id: 	 28 	Shape:  (50513, 1)
28


 78%|███████▊  | 56/72 [00:24<00:07,  2.06it/s]

	Subject Id: 	 28 	Shape:  (47253, 1)
29


 79%|███████▉  | 57/72 [00:24<00:07,  2.03it/s]

	Subject Id: 	 29 	Shape:  (53904, 1)
29


 81%|████████  | 58/72 [00:25<00:06,  2.05it/s]

	Subject Id: 	 29 	Shape:  (52083, 1)
30


 82%|████████▏ | 59/72 [00:25<00:06,  2.07it/s]

	Subject Id: 	 30 	Shape:  (77878, 1)
30


 83%|████████▎ | 60/72 [00:26<00:05,  2.06it/s]

	Subject Id: 	 30 	Shape:  (70683, 1)
31


 85%|████████▍ | 61/72 [00:26<00:05,  2.13it/s]

	Subject Id: 	 31 	Shape:  (44927, 1)
31


 86%|████████▌ | 62/72 [00:27<00:04,  2.12it/s]

	Subject Id: 	 31 	Shape:  (46096, 1)
32


 88%|████████▊ | 63/72 [00:27<00:04,  2.16it/s]

	Subject Id: 	 32 	Shape:  (62719, 1)
32


 89%|████████▉ | 64/72 [00:28<00:03,  2.16it/s]

	Subject Id: 	 32 	Shape:  (60211, 1)
33


 90%|█████████ | 65/72 [00:28<00:03,  2.17it/s]

	Subject Id: 	 33 	Shape:  (59017, 1)
33


 92%|█████████▏| 66/72 [00:29<00:02,  2.17it/s]

	Subject Id: 	 33 	Shape:  (53088, 1)
34


 93%|█████████▎| 67/72 [00:29<00:02,  2.08it/s]

	Subject Id: 	 34 	Shape:  (79765, 1)
34


 94%|█████████▍| 68/72 [00:30<00:01,  2.07it/s]

	Subject Id: 	 34 	Shape:  (51438, 1)
35


 96%|█████████▌| 69/72 [00:30<00:01,  2.16it/s]

	Subject Id: 	 35 	Shape:  (50417, 1)
35


 97%|█████████▋| 70/72 [00:30<00:00,  2.18it/s]

	Subject Id: 	 35 	Shape:  (49084, 1)
36


 99%|█████████▊| 71/72 [00:31<00:00,  2.13it/s]

	Subject Id: 	 36 	Shape:  (52390, 1)
36


100%|██████████| 72/72 [00:31<00:00,  2.26it/s]

	Subject Id: 	 36 	Shape:  (49364, 1)
Total Records:  (4237908, 11)





In [None]:
df_time_series.drop(index=list(df_time_series[df_time_series['class'] == 0].index),inplace=True)

In [None]:
## Categorizing unique signals. Takes 5 mins. Could not figure out a way to optimize. Suggestions welcome
def enumerateSignals(df):
  signal_number_array = np.zeros(df.shape[0])
  signal_number = 0
  for index in range(df.shape[0]-1):
    signal_number_array[index] = signal_number
    if (df.iloc[index,-2] == df.iloc[index+1,-2]) and (df.iloc[index,-1] == df.iloc[index+1,-1]):
      continue
    else:
      signal_number += 1
  signal_number_array[-1] = signal_number_array[-2]
  return signal_number_array

In [None]:
df_time_series.reset_index(drop=True,inplace=True)

In [None]:
df_time_series['signal_number'] = enumerateSignals(df_time_series.iloc[:,-2:])

In [13]:
signalNumber_vs_class = {}
signalNumber_vs_subjectId = {}
def updateSignalNumberDicts(signalNumber,subjectId,targetClass):
  signalNumber_vs_class[signalNumber] = targetClass
  signalNumber_vs_subjectId[signalNumber] = subjectId

In [14]:
df_time_series.groupby(by='signal_number').apply(lambda group : updateSignalNumberDicts(group.iloc[0,-1],group.iloc[0,-2],group.iloc[0,-3]))

## Creating Train, Validation and Test set

In [51]:
X = df_time_series.drop('class', axis=1)
y = df_time_series['class']
X_train,X_validation,y_train,y_validation = train_test_split(X,y,test_size=0.3,random_state=1024,shuffle=False)

In [52]:
X_validation,X_test,y_validation,y_test = train_test_split(X_validation,y_validation,test_size=0.3,random_state=1024,shuffle=False)

In [53]:
X_train = pd.concat((X_train,X_validation[X_validation.signal_number == 602]),axis=0)
X_train.shape

(1059173, 11)

In [54]:
y_train = pd.concat((y_train,y_validation[X_validation.signal_number == 602]))
y_train.shape

(1059173,)

In [55]:
y_validation.drop(index=list(X_validation[X_validation.signal_number == 602].index),inplace=True)
X_validation.drop(index=list(X_validation[X_validation.signal_number == 602].index),inplace=True)

In [56]:
X_validation = pd.concat((X_validation,X_test[X_test.signal_number == 790]),axis=0)
X_validation.shape

(318061, 11)

In [57]:
y_validation = pd.concat((y_validation,y_test[X_test.signal_number == 790]),axis=0)
y_validation.shape

(318061,)

In [58]:
y_test.drop(index=list(y_test[X_test.signal_number == 790].index),inplace=True)
X_test.drop(index=list(X_test[X_test.signal_number == 790].index),inplace=True)

## Acording to paper, data should be normalized before extracting catch22

In [60]:
def normalizeTrainData(df):
  mean = np.mean(df,axis=0)
  std = np.std(df,axis=0)
  return (df - mean)/std,mean,std

In [59]:
X_train.iloc[:,1:9],trainMean,trainStd = normalizeTrainData(X_train.iloc[:,1:9])

In [61]:
def normalizeTestData(df,trainingMean,trainingStd):
  return (df - trainingMean)/trainingStd

In [66]:
X_validation.iloc[:,1:9] = normalizeTestData(X_validation.iloc[:,1:9],trainMean,trainStd)
X_test.iloc[:,1:9] = normalizeTestData(X_test.iloc[:,1:9],trainMean,trainStd)

In [67]:
def createCatch22Features(df,interval):
  lastIndex = df.shape[0]  - (df.shape[0]%interval)
  
  
  # catch_22_features = ['DN_HistogramMode_5', 'DN_HistogramMode_10', 'CO_f1ecac', 'CO_FirstMin_ac', 'CO_HistogramAMI_even_2_5', 'CO_trev_1_num',
  #                  'MD_hrv_classic_pnn40', 'SB_BinaryStats_mean_longstretch1', 'SB_TransitionMatrix_3ac_sumdiagcov', 'PD_PeriodicityWang_th0_01',
  #                  'CO_Embed2_Dist_tau_d_expfit_meandiff', 'IN_AutoMutualInfoStats_40_gaussian_fmmi', 'FC_LocalSimple_mean1_tauresrat',
  #                  'DN_OutlierInclude_p_001_mdrmd', 'DN_OutlierInclude_n_001_mdrmd', 'SP_Summaries_welch_rect_area_5_1',
  #                  'SB_BinaryStats_diff_longstretch0', 'SB_MotifThree_quantile_hh', 'SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1',
  #                  'SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1', 'SP_Summaries_welch_rect_centroid', 'FC_LocalSimple_mean3_stderr']
  
  df_final = pd.DataFrame()
  
  for channel in range(1,9):
    channelCatch22List = []
    for startIndex in np.arange(0,df.shape[0],interval):
      if startIndex == lastIndex:
        channelFeatures = pycatch22.catch22_all(df.iloc[startIndex:,channel])
      else:
        channelFeatures = pycatch22.catch22_all(df.iloc[startIndex:(startIndex+interval),channel])
      channelFeaturesDict = dict(zip(channelFeatures['names'], channelFeatures['values']))
      channelCatch22Dict = {f'{feature}_channel_{channel}':channelFeaturesDict[feature] for feature in channelFeaturesDict.keys()}
      channelCatch22List.append(channelCatch22Dict)
    df_channel = pd.DataFrame(channelCatch22List)
    df_final = pd.concat((df_final,df_channel),axis=1)
      
  return df_final

In [68]:
def groupBySignals(signalNumber,group,intervalSize):
  df_subjectId_interval = createCatch22Features(group,intervalSize)
  ones = np.ones(df_subjectId_interval.shape[0])
  df_subjectId_interval['intervalSize'] = ones * intervalSize
  df_subjectId_interval['subjectId'] = ones * signalNumber_vs_subjectId[signalNumber]
  y_temp = pd.Series(ones * signalNumber_vs_class[signalNumber])
  return df_subjectId_interval,y_temp

In [69]:
## Some issue with signal 180. If not excluded the runtime will crash
groups = X_train[(X_train.signal_number != 180)].groupby(by='signal_number')
X_train_features = pd.DataFrame()
y_train_features = pd.Series(dtype=np.float16)
intervalSize = 512
for signalNumber,group in groups:
  dfSubjectIdInterval,yTemp = groupBySignals(signalNumber,group,intervalSize)
  X_train_features = pd.concat((X_train_features,dfSubjectIdInterval),axis=0,ignore_index=True)
  y_train_features = pd.concat((y_train_features,yTemp),ignore_index=True)

In [70]:
groups = X_validation.groupby(by='signal_number')
X_validation_features = pd.DataFrame()
y_validation_features = pd.Series(dtype=np.float16)
intervalSize = 512
for signalNumber,group in groups:
  dfSubjectIdInterval,yTemp = groupBySignals(signalNumber,group,intervalSize)
  X_validation_features = pd.concat((X_validation_features,dfSubjectIdInterval),axis=0,ignore_index=True)
  y_validation_features = pd.concat((y_validation_features,yTemp),ignore_index=True)

In [71]:
groups = X_test.groupby(by='signal_number')
X_test_features = pd.DataFrame()
y_test_features = pd.Series(dtype=np.float16)
intervalSize = 512
for signalNumber,group in groups:
  dfSubjectIdInterval,yTemp = groupBySignals(signalNumber,group,intervalSize)
  X_test_features = pd.concat((X_test_features,dfSubjectIdInterval),axis=0,ignore_index=True)
  y_test_features = pd.concat((y_test_features,yTemp),ignore_index=True)

In [72]:
y_train_features.drop(index=list(X_train_features[X_train_features.DN_HistogramMode_5_channel_1.isna()].index),inplace=True)
X_train_features.drop(index=list(X_train_features[X_train_features.DN_HistogramMode_5_channel_1.isna()].index),inplace=True)

In [73]:
y_train_features.drop(index=list(X_train_features[X_train_features.SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1_channel_8.isna()].index),inplace=True)
X_train_features.drop(index=list(X_train_features[X_train_features.SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1_channel_8.isna()].index),inplace=True)

In [74]:
y_train_features.drop(index=list(X_train_features[X_train_features.SB_TransitionMatrix_3ac_sumdiagcov_channel_1.isna()].index),inplace=True)
X_train_features.drop(index=list(X_train_features[X_train_features.SB_TransitionMatrix_3ac_sumdiagcov_channel_1.isna()].index),inplace=True)

In [75]:
y_validation_features.drop(index=list(X_validation_features[X_validation_features.DN_HistogramMode_5_channel_1.isna()].index),inplace=True)
X_validation_features.drop(index=list(X_validation_features[X_validation_features.DN_HistogramMode_5_channel_1.isna()].index),inplace=True)
y_validation_features.drop(index=list(X_validation_features[X_validation_features.SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1_channel_8.isna()].index),inplace=True)
X_validation_features.drop(index=list(X_validation_features[X_validation_features.SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1_channel_8.isna()].index),inplace=True)

## MODEL BUILDING

In [76]:
from sklearn.svm import SVC
clf = SVC(kernel='linear')
svm_model = clf.fit(X_train_features.iloc[:,:-2].values,y_train_features.values)

In [77]:
def getAccuracy(pred,actual):
  correctClassifications = 0
  for i in range(len(pred)):
    if pred[i] == actual[i]:
      correctClassifications += 1
  return (correctClassifications/len(pred)) * 100

In [78]:
prediction_train = svm_model.predict(X_train_features.iloc[:,:-2].values)
prediction_validation = svm_model.predict(X_validation_features.iloc[:,:-2].values)

In [79]:
getAccuracy(prediction_train,y_train_features.values)

55.74395930479017

In [80]:
getAccuracy(prediction_validation,y_validation_features.values)

32.81907433380084