Once the data was `wrangled`, the next step is to try and train an artifical neural network model on it. There are many ways to do this, but the quickest and easiest by far is to use `PyTorch`, a wrapper for the Torch algorithm for creating neural networks

The first step in this process is to load in the wrangled data, and separate it into `training`, and `validating` datasets using K-Fold Cross Validation, in order to ensure the network gets a wide array of samples to use.

While we could make our own K-Fold algorithm across the DataFrame, the `skLearn` package makes this process simple, and fast.

PyTorch does not allow for string `tensors`, so we will need to convert our results into a numeric value. Given we are not trying to do any kind of natural language processing, converting to ASCII characters is redundant and overly complex. Instead, we can create a `map` of values, where the key is the string value, and its value is its index in an array. After the model runs, we can index this array at the value to retrieve the key press the model is guessing. 

In [43]:
import pandas as pd
from sklearn.model_selection import KFold

eeg_data_raw = pd.read_csv('eeg_val_to_key_press.csv')

eeg_data_raw['KeyPressed'] = eeg_data_raw['KeyPressed'].apply(lambda x: x.lower())

unique_res = eeg_data_raw['KeyPressed'].unique()

# unique_res

res_map = {}

for i in range(len(unique_res)):
    res_map[unique_res[i]] = i

def to_map_val(s):
    return res_map[s]

eeg_data_raw['KeyPressed'] = eeg_data_raw['KeyPressed'].apply(to_map_val)

kf = KFold(n_splits = 5, shuffle = True, random_state = 2)
folded_data = []
for fold in enumerate(kf.split(eeg_data_raw)):
    # result = next(kf.split(eeg_data_raw), None)
    train = eeg_data_raw.iloc[fold[0]]
    test = eeg_data_raw.iloc[fold[1]]
    folded_data.append((train,test))


AttributeError: 'tuple' object has no attribute 'shape'

Once the train and test dataframes are created, we have to start creating `tensors` from it. To do this, we will extract each of the `EXG` channel values, and use them independently. Then, we will make a `List` of `tensors` and use that as the training dataset.

In [33]:
import torch

exg_0 = torch.tensor(train['EXG Channel 0'].values, dtype=torch.float)
exg_1 = torch.tensor(train['EXG Channel 1'].values, dtype=torch.float)
exg_2 = torch.tensor(train['EXG Channel 2'].values, dtype=torch.float)
exg_3 = torch.tensor(train['EXG Channel 3'].values, dtype=torch.float)
exg_4 = torch.tensor(train['EXG Channel 4'].values, dtype=torch.float)
exg_5 = torch.tensor(train['EXG Channel 5'].values, dtype=torch.float)
exg_6 = torch.tensor(train['EXG Channel 6'].values, dtype=torch.float)
exg_7 = torch.tensor(train['EXG Channel 7'].values, dtype=torch.float)
print(exg_7.shape)
train_dataset = torch.stack([exg_0, exg_1, exg_2, exg_3, exg_4, exg_5, exg_6, exg_7])
print(train_dataset.shape)
y_train = torch.tensor(train['KeyPressed'].values, dtype=torch.int8)

train_dataset, y_train

torch.Size([263060])
torch.Size([8, 263060])


(tensor([[  6179.9443,   6184.5264,   6185.8677,  ...,  -1269.0649,
           -1265.3993,  -1267.2098],
         [  2068.1621,   2071.1350,   2072.3643,  ...,  -3474.9587,
           -3476.8586,  -3483.2734],
         [ -9348.5498,  -9344.7725,  -9343.8564,  ...,  -9731.5469,
           -9737.0908,  -9736.0176],
         ...,
         [ -1583.4199,  -1581.5647,  -1584.0234,  ...,  -6106.0273,
           -6075.4946,  -6087.8774],
         [-12547.6660, -12546.4365, -12544.9385,  ..., -13128.3867,
          -13130.2861, -13129.5713],
         [-13145.9102, -13146.4688, -13149.6875,  ..., -13845.0283,
          -13830.8350, -13839.5518]]),
 tensor([0, 1, 1,  ..., 1, 1, 0], dtype=torch.int8))

In [32]:
exg_0_test = torch.tensor(test['EXG Channel 0'].values, dtype=torch.float)
exg_1_test = torch.tensor(test['EXG Channel 1'].values, dtype=torch.float)
exg_2_test = torch.tensor(test['EXG Channel 2'].values, dtype=torch.float)
exg_3_test = torch.tensor(test['EXG Channel 3'].values, dtype=torch.float)
exg_4_test = torch.tensor(test['EXG Channel 4'].values, dtype=torch.float)
exg_5_test = torch.tensor(test['EXG Channel 5'].values, dtype=torch.float)
exg_6_test = torch.tensor(test['EXG Channel 6'].values, dtype=torch.float)
exg_7_test = torch.tensor(test['EXG Channel 7'].values, dtype=torch.float)

test_dataset = torch.stack([exg_0_test, exg_1_test, exg_2_test, exg_3_test, exg_4_test, exg_5_test, exg_6_test, exg_7_test])

y_test = torch.tensor(test['KeyPressed'].values, dtype=torch.int8)

test_dataset, y_test

(tensor([[  2282.9624,   2273.8652,   2280.7498,  ...,  -1274.2283,
           -1271.5237,  -1268.1262],
         [ -1207.5753,  -1197.7406,  -1199.0593,  ...,  -3475.6069,
           -3472.6565,  -3476.0093],
         [-10062.4199, -10060.3418, -10063.2695,  ...,  -9732.3740,
           -9733.6709,  -9738.1182],
         ...,
         [ -3491.2754,  -3518.2764,  -3484.6145,  ...,  -6096.1924,
           -6094.2700,  -6073.6621],
         [-12830.2139, -12830.6836, -12831.0410,  ..., -13128.1855,
          -13130.1299, -13129.9062],
         [-13947.5557, -13954.1045, -13937.4521,  ..., -13841.0049,
          -13838.3447, -13832.8691]]),
 tensor([1, 1, 1,  ..., 1, 1, 1], dtype=torch.int8))

Now that we have our sets of `train` and `validate` tensors, as well as their output values, we just need to pass them into the `PyTorch` train method.

In [None]:
import torch.utils.data as data_utils

#1. make torch.dataset from tensors (or )
#2. make dataloader should have 50k
#3. 

In [45]:
from torch.utils.data import Dataset
class EEGDataset(Dataset):
    def __init__(self):
        eeg_df = pd.read_csv('eeg_val_to_key_press.csv')
        eeg_df = eeg_df.drop(['Sample Index', 'Timestamp'], axis=1)
        eeg_df['KeyPressed'] = eeg_df['KeyPressed'].apply(lambda x: x.lower())

        unique_res = eeg_df['KeyPressed'].unique()

        res_map = {}

        for i in range(len(unique_res)):
            res_map[unique_res[i]] = i

        def to_map_val(s):
            return res_map[s]

        eeg_df['KeyPressed'] = eeg_df['KeyPressed'].apply(to_map_val)

        x=eeg_df.iloc[:,0:8].values
        y=eeg_df.iloc[:,8].values

        self.x_train=torch.tensor(x,dtype=torch.float32)
        self.y_train=torch.tensor(y,dtype=torch.float32)
 
 
    def __len__(self):
        return len(self.y_train)
    
    def __getitem__(self,idx):
        return self.x_train[idx],self.y_train[idx]


In [46]:
from torch.utils.data import DataLoader

ds = EEGDataset()
train_loader=DataLoader(ds,batch_size=10,shuffle=True)

for i, (data, labels) in enumerate(train_loader):
  print(data.shape, labels.shape)
  print(data,labels)
  break
 

KeyError: "['Sample Index', 'Timestamp'] not found in axis"