## Project 4
In this task we will perform sequence classification. We will categorize temporally coherent and uniformly distributed short sections of a long time-series. In particular, for each 4 seconds of a lengthy EEG/EMG measurement of brain activity recorded during sleep, we will assign one of the 3 classes corresponding to the sleep stage present within the evaluated epoch.

Each row in train_{eeg1,eeg2,emg}.csv is a single epoch of the corresponding channel indexed by an id, so the first column contains the id. In addition to the id column, each sample has 512 values corresponding to 4x128, where 4 is the number of seconds per epoch, and 128 is the measurement frequency. Note also that the data contains stacked recordings of three subjects. Each subject has 21600 epochs (24 hours) rendering in total 3x21600=64800 epochs i.e. training data points. Therefore, apart from the "breaks" between subjects, neighboring epochs are temporally coherent. The file structure is therefore:

In [44]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.signal import correlate
from scipy.fft import fft
from statsmodels.tsa.stattools import adfuller

In [2]:
train_eeg1 = pd.read_csv('raw/train_eeg1.csv', index_col='Id')
train_eeg2 = pd.read_csv('raw/train_eeg2.csv', index_col='Id')
train_emg = pd.read_csv('raw/train_emg.csv', index_col='Id')
train_labels = pd.read_csv('raw/train_labels.csv', index_col='Id')

print('Shape of data')
print('Shape of train_eeg1 ', train_eeg1.shape)
print('Shape of train_eeg2 ', train_eeg2.shape)
print('Shape of train_emg ', train_emg.shape)
print()
print('Head of eeg1')
train_eeg1.head()


Shape of data
Shape of train_eeg1  (64800, 512)
Shape of train_eeg2  (64800, 512)
Shape of train_emg  (64800, 512)

Head of eeg1


Unnamed: 0_level_0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,...,x503,x504,x505,x506,x507,x508,x509,x510,x511,x512
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0.0004,0.00047,6.7e-05,-0.00016,-3e-06,0.00031,0.00036,0.00019,-7.2e-05,-7e-05,...,-8.6e-05,3.3e-05,-4.6e-05,-0.00027,-0.00039,-0.00034,-0.00032,-0.00021,4.2e-05,5.3e-05
1,6.7e-05,9.5e-05,0.00027,0.00028,0.00025,0.00012,9.4e-05,-0.00034,-0.00096,-0.0012,...,4.6e-05,0.0003,0.00063,0.00071,0.00052,0.00041,0.00066,0.00088,0.00077,0.00041
2,0.00016,-0.00021,-0.00084,-0.0012,-0.0012,-0.0014,-0.0014,-0.00091,-0.0006,-0.00027,...,-0.00068,-0.00088,-0.001,-0.00077,-0.00068,-0.00073,-0.00073,-0.00062,-0.00055,-0.00054
3,-0.00014,0.00026,0.00039,0.00043,0.00028,0.00023,0.00039,0.00022,0.00015,0.00022,...,0.00072,0.00076,0.00038,5.2e-05,-0.00026,-0.00058,-0.00075,-0.0011,-0.0012,-0.0012
4,-0.0011,-0.00079,-8.1e-05,0.00014,0.0002,-0.00014,-0.00043,-0.00053,-0.00058,-0.00041,...,0.00029,0.0006,0.00067,0.00019,-5.5e-05,-0.00016,-0.00023,-0.00023,-0.00033,-0.00081


In [3]:
train_emg.head()

Unnamed: 0_level_0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,...,x503,x504,x505,x506,x507,x508,x509,x510,x511,x512
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,-2.4e-05,-1.8e-05,-1.4e-05,-1.7e-05,-1.2e-05,-7e-06,-4e-06,-9e-06,-6e-05,-3.1e-05,...,-3e-06,-9e-06,-1e-05,-2e-06,-9e-07,-1e-05,-6e-06,-1.3e-05,-3e-05,-4e-06
1,-5e-06,-7e-06,-7e-06,-8e-06,-6e-06,-8e-06,-1.2e-05,-1.7e-05,-1.5e-05,-1e-05,...,-1.4e-05,-1.6e-05,-1e-05,-5e-06,-9.2e-06,-1.1e-05,-1e-05,-1.5e-05,6e-06,-6e-06
2,-1.3e-05,-1.3e-05,-8e-06,-1.4e-05,-5e-06,-8e-06,-1.1e-05,-7e-06,-8e-06,-8e-06,...,-5e-06,-8e-06,-1.1e-05,-1.4e-05,-1.8e-05,-1.2e-05,-1.5e-05,-1.5e-05,-1.6e-05,-1.1e-05
3,-1.1e-05,-1e-05,-2.9e-05,-4.1e-05,-1e-05,-2.1e-05,-1e-05,-1.2e-05,-5e-06,-7e-06,...,1.9e-05,-1.2e-05,-9e-06,-1.1e-05,-6.4e-06,-1.2e-05,-8e-06,-7e-06,-3e-06,-1.4e-05
4,-1.1e-05,-1.4e-05,-1e-05,-1.1e-05,-4e-06,-5.1e-05,-2.9e-05,-2e-06,-1.1e-05,-1.3e-05,...,-3.3e-05,-5e-06,-8e-06,-1.1e-05,-1.2e-05,-8e-06,-8e-06,-7e-06,-1.1e-05,-7e-06


In [4]:
#missing data
print(train_eeg1.isna().values.any())
print(train_eeg2.isna().values.any())
print(train_emg.isna().values.any())

#total = train_eeg1.isnull().sum().sort_values(ascending=False)
#percent = (train_eeg1.isnull().sum()/train_eeg1.isnull().count()).sort_values(ascending=False)
#missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
#missing_data.head(20)

False
False
False


In [34]:
###Install this and restart from terminal
#conda install -y nodejs
#pip install --upgrade jupyterlab
#jupyter labextension install @jupyter-widgets/jupyterlab-manager
#jupyter labextension install jupyter-matplotlib
#jupyter nbextension enable --py widgetsnbextension

%matplotlib widget

train_eeg1.iloc[1].plot()
train_eeg2.iloc[1].plot()
train_emg.iloc[1].plot()
plt.legend(['eeg1', 'eeg2', 'emg'])
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [45]:
#Function to check the stationarity of the time serie using Dickey fuller test
def stationarity(ts):
    print('Results of Dickey-Fuller Test:')
    test = adfuller(ts, autolag='AIC')
    results = pd.Series(test[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for i,val in test[4].items():
        results['Critical Value (%s)'%i] = val
    print (results)

stationarity(train_eeg1.iloc[1])
#https://machinelearningmastery.com/time-series-data-stationary-python/

Results of Dickey-Fuller Test:
Test Statistic                -8.793585e+00
p-value                        2.206874e-14
#Lags Used                     5.000000e+00
Number of Observations Used    5.060000e+02
Critical Value (1%)           -3.443340e+00
Critical Value (5%)           -2.867269e+00
Critical Value (10%)          -2.569821e+00
dtype: float64


In [31]:
def autocorr(x):
    return np.correlate(x, x, mode='same')
%matplotlib widget

#Data looks kind of periodic, try autcorrelation
plt.plot(autocorr(train_eeg1.iloc[1]))
plt.title('Autocorrelation EEG Canal1')
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

How about cross correlation

In [29]:
%matplotlib widget
plt.plot(correlate(train_eeg2.iloc[1], train_eeg1.iloc[1]))
plt.title('Cross correlation EEG Canals')
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [43]:
%matplotlib widget
eeg1f = fft(train_eeg1.iloc[1].values)
N = train_eeg1.shape[1]
T = 4/N
print('Sampling Frequency', 1/T, 'Hz')
xf = np.linspace(0.0, 1.0/(2.0*T), N//2)
plt.plot(xf, 2.0/N * np.abs(eeg1f[0:N//2]))
plt.grid()
plt.show()

Sampling Frequency 128.0 Hz


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

We can lowpass this bitch with cutoff at about 50Hz to get smoother signal