In [84]:
import pandas as pd
import numpy as np
import pickle
import math
import statistics as stat
import scipy.stats as scStat

### The DEAP Dataset

The dataset contains 40 experiments for each of the 32 participants. The labels array contain the valence, arousal, dominance and liking ratings for each participant for each of the 40 experiments. The data array contains 8064 physiological/EEG signal data from 40 different channels for each of the 40 experiments for each of the 32 participants.

### Feature Extraction

We divide the 8064 readings per channel, into 10 batches of approximately 807 readings each. For each batch we extract the mean, median, maximum, minimum, standard deviation, variance, range, skewness and kurtosis values for the 807 readings. Hence for each of the 10 batches of a single channel we extract 9 values mentioned above, we get 90 values as our processed dataset. We further add the net mean, median, maximum, minimum, standard deviation, variance, range, skewness and kurtosis values for the entire 8064 readings along with the experiment and participant number to our dataset, bringing it up to 101 values per channel.

In [124]:
def extract_features(data, trial, participantNumber):
    extData = []
    for x in np.array_split(data, 10):
        extData.extend(calc_features(x))
    extData.extend(calc_features(data))
    extData.append(participantNumber)
    extData.append(participantNumber)
    return extData

In [94]:
def calc_features(array):
    return [stat.mean(array),
                stat.median(array),
                stat.variance(array),
                stat.stdev(array),
                max(array),
                min(array),
                scStat.mode(array)[0][0],
                scStat.kurtosis(array),
                scStat.skew(array, axis=0, bias=True)]

Features from each channel are extracted and appended to a df so that it can be stored into a csv file and accessed later.

In [258]:
def process_data_file(fileName, participantNumber):
    with open(fileName, 'rb') as f: content = pickle.load(f, encoding='latin1')
    data = content['data']
    labels = content['labels']
    extracted_features = []
    for index, trialData in enumerate(data):
        for i, channelData in enumerate(trialData):
            extracted_features.append(extract_features(channelData, index, participantNumber))
    df = pd.DataFrame(extracted_features)
    df['Valance Label'] = list(labels[:,0])*int(len(df)/len(labels))
    df['Arousal Label'] = list(labels[:,1])*int(len(df)/len(labels))
    return df

### Reading Data from DEAP Dataset

The data from DEAP Dataset .dat files are read one by one and the extracted features are appended into a csv file.

In [261]:
files = ['s01.dat', 's02.dat', 's03.dat']
participants = [1, 2, 3]
for f, participantNumber in zip(files, participants):
    process_data_file(f, participantNumber).to_csv('ExtractedFeatures.csv', mode='a', index=False, header=False)

### Learning from extracted features

If you're reading from the provided csv file, learning starts here

In [262]:
columns = [str(i) for i in range(0,99)]
columns.extend(['experiment No', 'participant No', 'Valance Label', 'Arousal Label'])
df = pd.read_csv('ExtractedFeatures.csv', header=None, names=columns)