# Part 2. DEAP Dataset + Spectral Analysis + SVM

In this part 2, we will focus on performing spectral analysis.  Spectral analysis here refers to the analysis of theta (4 - 8 Hz), alpha (8 - 12 Hz), beta (12 - 30 Hz), and gamma (30 - 64 Hz).   

Spectral analysis is a very basic and must-do analysis for emotions/cognitions/resting state since it is a common knowledge with abundant evidence that our emotion/cognition change how our brain signals oscillate.  For example, when we are calm, alpha is relatively high, likewise, when we are attentive, beta is relatively high and alpha becomes relatively lower.

In this part, we shall extract these powers.  Then we shall visualize it.  Lastly, let's try some simple SVM and Logistic Regression and see if these features are useful for predicting the four valence-arousal classes that we have obtained from Part 1.

In [1]:
import torch

import os
import pickle
import numpy as np

Set cuda accordingly.

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("Configured device: ", device)

Configured device:  cuda


## 1. Loading dataset

Let's first reuse the dataset loader we have created in Part 1.

In [3]:
class Dataset(torch.utils.data.Dataset):
    
    def __init__(self, path):
        _, _, filenames = next(os.walk(path))
        filenames = sorted(filenames)
        all_data = []
        all_label = []
        for dat in filenames:
            
            temp = pickle.load(open(os.path.join(path,dat), 'rb'), encoding='latin1')
            all_data.append(temp['data'])
            
            #####divide labels into four classes: LALV, HALV, LAHV, HAHV
            labels = temp['labels']
            labels_holder = np.zeros((40, 1))
            
            val_med = np.median(labels[:, 0])
            aro_med = np.median(labels[:, 1])

            cond_lalv = (labels[:, 0] <= val_med) & (labels[:, 1] <= aro_med)
            cond_halv = (labels[:, 0] <= val_med) & (labels[:, 1] >= aro_med)
            cond_lahv = (labels[:, 0] >= val_med) & (labels[:, 1] <= aro_med)
            cond_hahv = (labels[:, 0] >= val_med) & (labels[:, 1] >= aro_med)
            
            labels_holder[cond_lalv] = 1  #LALV
            labels_holder[cond_halv] = 2  #HALV
            labels_holder[cond_lahv] = 3  #LAHV
            labels_holder[cond_hahv] = 4  #HAHV
                                    
            #labels_holder shape: (40, 1)
            all_label.append(labels_holder)
                
        self.data = np.vstack(all_data)[:, :32, ]   #shape: (1280, 32, 8064) --> take only the first 32 channels
        self.label = np.vstack(all_label) #(1280, 1)  ==> 1280 samples, 
        
        del temp, all_data, all_label

    def __len__(self):
        return self.data.shape[0]

    def __getitem__(self, idx):
        single_data  = self.data[idx]
        single_label = self.label[idx]
        
        batch = {
            'data': torch.Tensor(single_data),
            'label': torch.Tensor(single_label)
        }
        
        return batch

Let's try load the dataset.

In [4]:
path = "data"  #create a folder "data", and inside put s01.dat,....,s32.dat inside from the preprocessed folder from the DEAP dataset

In [5]:
dataset = Dataset(path)

data = dataset[:]['data']
label = dataset[:]['label']

print("Data shape: ", data.shape)  #1280 = 32 * 40 trials, 32 EEG channels, 8064 samples
print("Label shape: ", label.shape)  #four classes of LALV, HALV, LAHV, HAHV

Data shape:  torch.Size([1280, 32, 8064])
Label shape:  torch.Size([1280, 1])


Let's look the label distribution of the dataset.

In [6]:
cond_0 = label == 0  #just to make sure we really don't have any 0
lalv = label == 1
halv = label == 2
lahv = label == 3
hahv = label == 4

assert len(label[cond_0]) == 0  #simple unit test
assert len(label[lalv]) + len(label[halv]) + len(label[lahv]) + len(label[hahv]) == label.shape[0]  #simple unit test
print("count of LALV: ", len(label[lalv]))
print("count of HALV: ", len(label[halv]))
print("count of LAHV: ", len(label[lahv]))
print("count of HAHV: ", len(label[hahv]))

count of LALV:  344
count of HALV:  285
count of LAHV:  281
count of HAHV:  370


Let's see the median of EEG of each group (you can do std on your own exercise)

In [7]:
lalv_unsqueeze = lalv.squeeze()
halv_unsqueeze = halv.squeeze()
lahv_unsqueeze = lahv.squeeze()
hahv_unsqueeze = hahv.squeeze()

print("Median of LALV", np.median(data[lalv_unsqueeze, :, :]))
print("Median of HALV", np.median(data[halv_unsqueeze, :, :]))
print("Median of LAHV", np.median(data[lahv_unsqueeze, :, :]))
print("Median of HAHV", np.median(data[hahv_unsqueeze, :, :]))

Median of LALV 0.008621817
Median of HALV 0.008139102
Median of LAHV 0.0065687215
Median of HAHV 0.0016776982


## 2. Spectral Analysis

## 3. Machine Learning