# Report for the sleep data project

## Introduction


The purpose of this project is to use ensemble methods to discriminate between different sleep stages from EEG sleep data. We experiment with different ensemble methods and parameters in order to compare the results and find the most appropriate approach given our data.

We first prepare and filter the data for further use. Then perform Wavelet decomposition to identify and extract different frequencies from the data. Subsequently, we do feature extraction, where features correspond to the power of the frequency bands. Afterwards, we train and test two different ensemble methods; Random Forest and Adaboost. In order to optimize these results we then perform hyperparameter search and compare the results. To finish we propose and implement further methods to optimize classification.


## Sleep Data Description

Data was collected with the Traumschreiber, high-tech sleep mask developed for research purposes.

The data used to train and test the classifier consists of five data sets corresponding to different nights of sleep. Each data set containing information from seven Electroencephalogram (EEG) channels and one Electrocardiogram (ECG) channel, recorded for about seven hours of sleep.

Data is labeled by epochs of one second, where each second contains about 200 microvolt points. These labels correspond to the sleep stages introduced by the American Academy of Sleep Medicine (AASM) that differentiates between five main sleeping stages: 

(1) Wakefulness: Active wakefulness with beta waves (+13 Hz) and relaxed wakefulness with mostly alpha wave (8-13 Hz).
(2) Non-Rapid Eye Movement (NREM) 1: Dominated by Theta activity (4-7 Hz).
(3) NREM-2: Characterized by Theta waves, sleep spindles and K-complexes.
(4) NREM-3: Dominated by Delta wave (0.5-2 Hz) along with some sleep spindles.
(5) Rapid Eye Movement (REM): Characterized by low-amplitude mixed-frequency brain waves. Theta, alpha and even beta activity can be observed.


In [None]:
import pywt
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy.ndimage    
import scipy.signal    
import os
from feature_extractor import *

In [None]:
# read the data for subject a
data = pd.read_csv('../data/by_subject/a_data.csv')
labels = pd.read_csv('../data/by_subject/a_labels.csv')

### Median filter justification

The data presented huge peaks that where probably product of interference in the Bluetooth signal. To eliminate peaks while altering the data as little as possible, we decided to implement a median filter by using the Scikit median filter function, with a Kernel size of three. This filter runs through the signal entry by entry, replacing each entry with the median of neighboring entries. The pattern of neighbors is referred as the Kernel size.

In [None]:
# Example Plot from data set 1, EEG channel 0
plt.figure(figsize = (15,5))
plt.plot(data['Ch0'])
plt.show()

In [None]:
plt.figure(figsize = (15,5))
plt.plot(data['Ch0'][1034300:1034500])
plt.show()

In [None]:
# Median filter implementation loop
pre_processed = scipy.signal.medfilt(data['Ch0'], kernel_size=3)

In [None]:
plt.figure(figsize = (15,5))
plt.plot(pre_processed[1034300:1034500])
plt.show()

In [None]:
# group datapoints into bins, corresponding to a second of recording time maybe mit preprocessing
data['TimestampToSec'] = data['Timestamp'].astype(int)
grouped = data.groupby('TimestampToSec')

In [None]:
# plot a second of data of all channels

single_sec_data = grouped.get_group(1489016350)
single_sec_ch = single_sec_data['Ch0']

#plt.plot(single_sec_ch)
plt.plot(single_sec_data['Ch0'])
plt.plot(single_sec_data['Ch1'])
plt.plot(single_sec_data['Ch2'])
plt.plot(single_sec_data['Ch3'])
plt.plot(single_sec_data['Ch4'])
plt.plot(single_sec_data['Ch5'])
plt.plot(single_sec_data['Ch6'])
plt.plot(single_sec_data['Ch7'])



plt.show()


## Discrete Wavelet Transform

### Discrete Wavelet Transform Overview
The wavelet are waves of irregular form in shape and compactly supported. These properties along with the main two operations of scaling and shifting, which produce a time-scale representation of the signal, make wavelets an ideal tool for analysing signals of non-stationary nature. Their irregular shape makes them suitable for analysing signals with discontinuities, and their compactly supported nature enables temporal localisation. Motivated by the adaptive time-frequency resolution properties of the Wavelet Transform and the corresponding fact that some stages in sleep recordings have a well defined time-frequency domain we opted to use Discrete Wavelet Decomposition to obtain five sub-bands of the original signal and consequently performed feature extraction on them for the classification.

The Discrete Wavelet Decomposition algorithm we implemented relays firstly on a dyadic scaling of the wavelength and secondly on a discrete shifting across the original signal. The first operation serves as half band filter which halves the highest frequency component of the original signal,providing  lower computational time and less memory usage. This in accordance to Nyquist’s sampling rate allowing the usage of half of the previous sample points at each level of the decomposition for a proper reconstruction of the original signal. 




<img src="wavelet transform EEG ERD ERS event-related potentials time frequencya.jpg">

### Feature Extraction for sleep classification 

In [None]:
mode = pywt.Modes.smooth

def signal_decomp(data):
    """Decompose and plot a signal S.
    S = An + Dn + Dn-1 + ... + D1
    """
    w = pywt.Wavelet('db4')
    a = data
    ca = []
    cd = []
    for i in range(5):
        (a, d) = pywt.dwt(a, w, mode)
        ca.append(a)
        cd.append(d)  
    return ca, cd

In [None]:
def Energy(coeffs, k):
    return np.sqrt(np.sum(np.array(coeffs[-k]) ** 2)) / len(coeffs[-k])

### Signal Decomposition
The algorithm adopted can be better visualized as a tree of low and high pass filter, which perform the decomposition of the signal into different frequency bands applying successive filtering of the time domain signal. 
The original signal is successively decomposed into components of lower resolution, while the high frequency components are not analysed any further.This decomposition halves the time resolution since only half the number of samples now characterizes the entire signal. 
However it doubles the frequency resolution, since the frequency band of the signal now spans only half the previous.The maximum depth of decomposition is dependent on the input size of the data to be analysed, with 2N data samples enabling the breakdown of the signal into N discrete levels using the discrete wavelet transform. This procedure thus offers a good time resolution at high frequencies, and good frequency resolution at low frequencies. 

This matches well the resolution of each sub-band with certain sleep stages patterns, for example capturing at an higher time resolution the signal of the beta stage, which shows abrupt discontinuities .
We discarded the first two levels of the decomposition simply because those frequency bands are completely absent it the original signal. In conclusion the frequencies that are most prominent in the original signal will appear as high amplitudes in the corresponding region of the Discrete Wavelet Transform signal that includes those particular frequencies.




<img src="Untitled Diagram (3).png">

In [None]:
def plot_signal_decomp(data, w, title):
    ca, cd = signal_decomp(data)
        
    rec_a = []
    rec_d = []

    for i, coeff in enumerate(ca):
        coeff_list = [coeff, None] + [None] * i
        rec_a.append(pywt.waverec(coeff_list, w))

    for i, coeff in enumerate(cd):
        coeff_list = [None, coeff] + [None] * i
        rec_d.append(pywt.waverec(coeff_list, w))

    fig = plt.figure(figsize=(12,10))
    ax_main = fig.add_subplot(len(rec_a) + 1, 1, 1)
    ax_main.set_title(title, fontsize=20)
    ax_main.plot(data)
    ax_main.set_xlim(data.index[0], data.index[len(data) - 1])

    for i, y in enumerate(rec_a):
        ax = fig.add_subplot(len(rec_a) + 1, 2, 3 + i * 2)
        ax.plot(y, 'r')
        ax.set_xlim(0, len(y) - 1)
        ax.set_ylabel("A%d" % (i + 1))

    for i, y in enumerate(rec_d):
        ax = fig.add_subplot(len(rec_d) + 1, 2, 4 + i * 2)
        ax.plot(y, 'g')
        ax.set_xlim(0, len(y) - 1)
        ax.set_ylabel("D%d" % (i + 1))


In [None]:
plot_signal_decomp(single_sec_ch, 'db4', "Single Sec single Channel EEG data")
plt.show()

### ignore from here

In [None]:
len(labels)

In [None]:
# CONSTRUCT FEATURES

# for every label, look up the corresponding data
features = []
for l in range(len(labels)):
    try:
        time = labels['Timestamp'][l]
        slice = grouped.get_group(time)
    except KeyError:
        print(time)
        pass
    # for every channel
    power_all_channels = []
    # 1-7 EEG, 8th channel is ECG data
    for ch in range(8):
        single_sec_ch = slice['Ch{}'.format(ch)]
        
        # median filter the data
        pre_processed = scipy.signal.medfilt(single_sec_ch, kernel_size=3)  
        
        _, cd = signal_decomp(pre_processed)
        # for every decomp. level
        power = []
        for l in range(5):
            power.append(Energy(cd, l))
            
        # collect power for all channels into one vector 
        power_all_channels.append(power) 
    # currently mean power of the frequency bands over all channels are the only features
    power_vec = np.asarray(power_all_channels).flatten()
    features.append(power_vec)
features =np.asarray(features)
 

In [None]:
print(features.shape)

## Classification

### Load all the features
If no features are available run the feature_extractor.py to get the feature files.

In [None]:
from sklearn import ensemble
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import train_test_split
from scipy.stats import randint as sp_randint
from sklearn.cross_validation import LabelKFold



### Feature statistics

In order to gain an insight into the variance we have over subjects we take a look at the mean and standard deviation of each subject. Below the mean and standard deviation for all of the 40 features are plotted, for the whole data and for each sleep phase. Each line corresponds to one subject.  We can see that the first and the last five features, corresponding to the first and last channel show the biggest differences in means and standard deviation, whereas the other channels are relatively similar.

In [None]:
# get the feature and target table
features = pd.read_csv("../data/precomputed_features/features.csv")
targets = pd.read_csv("../data/precomputed_features/targets.csv")

# merge the tables in order to derive the sleep phase later
merge = pd.merge(features,targets)

# group the data by each subject
grouped = merge.groupby('40')

# get the sleep phase identifier
events = merge['Event'].unique()

means = []
means_phase = np.zeros(shape=(6,5, 40))
stds_phase =  np.zeros(shape=(6,5,40))
stds = []

for i in range(6):
    g = grouped.get_group(i)
    
    del g['Timestamp']
    del g['40']
    del g['subject_id']
    for j in range(events.shape[0]):
        p = g[g['Event']==events[j]]
        del p['Event']
        means_phase[i,j, :] = p.describe().ix['mean']
        stds_phase[i,j, :] = p.describe().ix['std']
        
    del g['Event']
    means.append(g.describe().ix['mean'])
    stds.append(g.describe().ix['std'])
        


plt.figure(figsize=(20,10))
ax1 = plt.subplot(121)
ax1.set_title('Means of the features for each subject', fontsize = 25)
ax1.plot(np.transpose(np.asarray(means)))

ax2 = plt.subplot(122)
ax2. set_title('Stds of the features for each subject', fontsize = 25)
ax2.plot(np.transpose(np.asarray(stds)))



fig=plt.figure(figsize=(20,20),facecolor='w', edgecolor='k')
fig.suptitle('Means for each sleep phase', fontsize = 25)
for i in range(5):
    temp = 320+i+1
    ax=plt.subplot(temp)
    ax.set_title(events[i], fontsize = 25)
    ax.plot(np.transpose(means_phase[:,i,:]))
    


plt.show()


fig=plt.figure(figsize=(20,20),facecolor='w', edgecolor='k')
fig.suptitle('Stds for each sleep phase', fontsize = 25)

for i in range(5):
    temp = 320+i+1
    ax=plt.subplot(temp)
    ax.set_title(events[i], fontsize = 25)
    ax.plot(np.transpose(stds_phase[:,i,:]))
    


plt.show()


### Throw out first and last channel


In [None]:
features = pd.read_csv("../data/precomputed_features/features.csv")
targets = pd.read_csv("../data/precomputed_features/targets.csv")


features = features.drop(features.columns[[1,2,3,4,5,36,37,38,39,40]], axis=1)

### Create a separate test set to test our classifiers on

In [None]:

#Test set subject a
X_test = features[features['40']==0]
y_test = targets[targets['subject_id']==0]

# Training set
X_train = features[features['40'] > 0]
y_train =  targets[targets['subject_id'] > 0]

#X_train,X_test,y_train,y_test = train_test_split(features, targets['stages'], test_size=0.99, random_state=0)


### General Set up
The general set up consists of a 4 fold crossvalidation (splitting always one subject)

In [None]:
# Utility function to report best scores for Random Search
def report(results, n_top=5):
    for i in range(1, n_top + 1):
        candidates = np.flatnonzero(results['rank_test_score'] == i)
        for candidate in candidates:
            print("Model with rank: {0}".format(i))
            print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
                  results['mean_test_score'][candidate],
                  results['std_test_score'][candidate]))
            print("Parameters: {0}".format(results['params'][candidate]))
            print("")

In [None]:

# only 5 subjects in the dataset
cv_labels = y_train['subject_id']
lkf = LabelKFold(cv_labels, n_folds=4)


### Random Forest

In [None]:
# run randomized search
n_iter_search = 20

## Random Forst
clf = ensemble.RandomForestClassifier(n_estimators = 10, criterion='entropy', class_weight='balanced', n_jobs = -1)


# specify parameters and distributions to sample from
param_dist = {"n_estimators":sp_randint(1, 100),
              "max_depth": [3, None],
              "max_features": sp_randint(1, 40),
              "bootstrap": [True, False],
              "criterion": ["gini", "entropy"]}


random_search = RandomizedSearchCV(clf, param_distributions=param_dist,
                                   n_iter=n_iter_search, cv = lkf)

random_search.fit(X_train, y_train['Event'])

report(random_search.cv_results_)


In [None]:
#bootstrap': True, 'criterion': 'entropy', 'max_depth': None, 'max_features': 10, 'n_estimators': 61

## Random Forst
rf_clf = ensemble.RandomForestClassifier(n_estimators = 73, criterion='entropy', class_weight='balanced', max_features=1, n_jobs = -1)


rf_predicted = cross_val_predict(rf_clf, X_train, y_train['Event'], cv=lkf)

rf_acc = metrics.accuracy_score(y_train['Event'], rf_predicted)
print("This is the Score: {}".format(rf_acc))


### AdaBoost

In [None]:
# run randomized search
n_iter_search = 20

## AdaBoost
clf = ensemble.AdaBoostClassifier()


# specify parameters and distributions to sample from
param_dist = {"n_estimators":sp_randint(50, 250),
              "algorithm": ["SAMME", "SAMME.R"],
              "base_estimator": [DecisionTreeClassifier(max_depth=1), DecisionTreeClassifier(max_depth=2), DecisionTreeClassifier(max_depth=3)]
             }


random_search = RandomizedSearchCV(clf, param_distributions=param_dist,
                                   n_iter=n_iter_search,cv = lkf)

random_search.fit(X_train, y_train['Event'])

report(random_search.cv_results_)

In [None]:
#bootstrap': True, 'criterion': 'entropy', 'max_depth': None, 'max_features': 10, 'n_estimators': 61

## Random Forst
ada_clf = ensemble.AdaBoostClassifier(base_estimator= DecisionTreeClassifier(max_depth=3),n_estimators=188, algorithm ='SAMME.R')


ada_predicted = cross_val_predict(ada_clf, X_train, y_train['Event'], cv=lkf)

ada_acc = metrics.accuracy_score(y_train['Event'], ada_predicted)
print("This is the Score: {}".format(ada_acc))


### Smoothing Bayesian

In [None]:
# visualize false predictions
def vis_clfs(targets, predicted):
    # color coding grayscale
    #color = {'stage_q_N34' : 0, 'stage_q_N23': 50, 'stage_q_N12': 100, 'stage_q_REM1': 150, 'stage_q_Wake0': 200}
    
    color = [0,50,100,150,200]
    label_text = ['N34', 'N23', 'N12', 'REM', 'Wake']
    false_pred = np.where(predicted != targets)
    print(false_pred[0].shape)
    timepoints = range(0,len(predicted))

    rows = np.ceil((len(predicted) / 500)).astype(int) * 10
    cols = 500
    image = np.ones((rows,cols), dtype=np.int16) * 255

    for timepoint in timepoints:
        x = timepoint % 500 
        y = int(timepoint / 500) * 10
        if(np.any(false_pred[0]==timepoint)):
            image[y:y+10,x] = 255
        else:
            image[y:y+10,x] = color[predicted[timepoint]]

    
    import matplotlib.patches as mpatches
    plt.figure(frameon=False, figsize=(16,16))  
    plt.title('Classification Results', fontsize=18)
    plt.axis('off')   
    im = plt.imshow(image,cmap=plt.cm.bone, vmin = 0, vmax = 255)
    # get the colors of the values, according to the 
    # colormap used by imshow
    values = [0,50,100,150,200]
    colors = [im.cmap(im.norm(value)) for value in values]
    # create a patch (proxy artist) for every color 
    patches = [ mpatches.Patch(color=colors[i], label="{l}".format(l=label_text[i]) ) for i in range(len(values)) ]
    # put those patched as legend-handles into the legend
    plt.legend(handles=patches, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., fontsize=18 )
    plt.show()

In [None]:
def smooth(probabilities) :
    res = np.zeros(probabilities.shape)
    # sum over the last n timepoints
    res[0,:] = np.squeeze(0.2 * np.ones((5,1)))
    for timep in range(probabilities.shape[0]):
        # from the second timepoint onwards
        if(timep > 0):
            prod = res[timep-1,:] * probabilities[timep,:]
            norm = np.sum(prod)
            res[timep,:] = ((prod / norm + 0.00002)*0.2 + probabilities[timep,:]*0.8)
            
    # assign class according to the most ofen occuring class within last 5 predictions
    classes = np.argmax(res,axis=1)
    return classes


In [None]:
rf_clf.fit(X_train,y_train['Event'])

probabilities = rf_clf.predict_proba(X_test)
new = smooth(probabilities)

In [None]:
number_targets = y_test['Event'].apply(lambda v: str(v).replace('stage_q_N12','0').replace('stage_q_N23','1').replace('stage_q_N34','2').replace('stage_q_REM1','3').replace('stage_q_Wake0','4'))
#new[0:5
number_targets= number_targets.apply(int)



In [None]:

acc = metrics.accuracy_score(number_targets, new)
print(acc)

acc2 = metrics.accuracy_score(y_test['Event'], rf_clf.predict(X_test))
print(acc2)

In [None]:
vis_clfs(number_targets,new)

## Results

In [None]:
from sklearn.metrics import confusion_matrix
import itertools

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title, fontsize=18)
    #plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45, fontsize=18)
    plt.yticks(tick_marks, classes, fontsize=18)

    if normalize:
        float_formatter = lambda x: "%.2f" % x
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        if normalize:
            plt.text(j, i, float_formatter(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black", fontsize=18)
        else:
            plt.text(j, i, cm[i, j],
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black", fontsize=18)

    plt.tight_layout()
    plt.ylabel('True label', fontsize=18)
    plt.xlabel('Predicted label', fontsize=18)

In [None]:
# get class names for labels of plot
class_names, counts = np.unique(y_train['Event'], return_counts=True)
    
# Compute confusion matrix
np.set_printoptions(precision=2)

rf_cnf_matrix = confusion_matrix(y_train['Event'], rf_predicted)
ada_cnf_matrix = confusion_matrix(y_train['Event'], ada_predicted)

# Plot non-normalized confusion matrix
plt.figure(figsize=(20,10))

plt.subplot(121)
plot_confusion_matrix(rf_cnf_matrix, classes=class_names,
                      title='Confusion matrix for the Random Forest ')

# Plot normalized confusion matrix
plt.subplot(122)
plot_confusion_matrix(ada_cnf_matrix, classes=class_names,
                      title='Confusion matrix for the Ada Boost Classifiers')

plt.show()

### Test Set

In [None]:
rf_clf.fit(X_train,y_train['Event'])
ada_clf.fit(X_train,y_train['Event'])


rf_pred_test = rf_clf.predict(X_test)
ada_pred_test = ada_clf.predict(X_test)


rf_acc_test = metrics.accuracy_score(y_test['Event'], rf_pred_test)
print("This is the Score for Random Forest on the test set: {}".format(rf_acc_test))

ada_acc_test = metrics.accuracy_score(y_test['Event'], ada_pred_test)
print("This is the Score for Ada-Boost on the test set: {} \n \n".format(ada_acc_test))

rf_cnf_matrix_test = confusion_matrix(y_test['Event'], rf_pred_test)
ada_cnf_matrix_test = confusion_matrix(y_test['Event'], ada_pred_test)

# Plot non-normalized confusion matrix
plt.figure(figsize=(20,10))
plt.subplot(121)
plot_confusion_matrix(rf_cnf_matrix_test, classes=class_names,
                      title='Confusion matrix for the Random Forest for the test set')

# Plot normalized confusion matrix
plt.subplot(122)
plot_confusion_matrix(ada_cnf_matrix_test, classes=class_names, normalize=False,
                      title='Confusion matrix for the Random Forest for test set')

plt.show()

## Conclusion

• Wavelet transform allows to adapt and extract relevant information from both time and frequency domain. Because some sleep stages are mainly characterized by the time domain while others by the frequency domain, wavelet decomposition is most appropriate for extracting frequency bands from EEG sleep data.   • Random Forest and Adaboost preform both over xxx • Random Forest outperforms Adaboost algorithm by xxx • Hyperparameter search optimizes the results from both methods significantly. • Smoothing classification results within sleep stages can help to improve overall results