<a href="https://colab.research.google.com/github/ansariwaleed/EEG/blob/main/EEGtutorial_Loading_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction
In this tutorial, we use the data from EEG Motor Movement/Imagery Dataset (EEG-MMIDB). cleaned dataset which contains 109 subjects.
- here we take the first subject as a subject to understand how things works.

#Dataset description

Orignal dataset can be found here: https://archive.physionet.org/pn4/eegmmidb/



> After our clearning and sorting, each .npy file comprised of the data of one subject. The data shape of each npy file is [N, 65], the first 64 columns represent the readouts of 64 EEG channel, the last column denotes the class/intent label. The row denotes time-points in signal collection and one row represents one readout at a specific time-point. In this tutorial, we call each row a instance. The sampling rate in EEG-MMIDB is 160 Hz, which means that the equipment can generate 160 instances/rows/time-points in each second.
The N varis for different subjects, but N should be 259,520 or 255,680. This is the inherent difference in the original dataset. Recall that the sampling rate is 160 Hz, thus, some trials last for 4.1 seconds while others last for 4.2 seconds: 4.1 seconds (656=4.1
 160 instances) or 4.2 seconds (672 = 4.2
 160 instancs). It is suggested to segment the signals in each second.

Based on the experimental setting, we split all EEG signals into 11 different cognitive intentions as follows. In which, the intentions with image represents the subject only image the action but not move in reality: these four intentions (labelled by 4, 5, 8, and 9) are strictly movement imagery EEG. The residual intentions are rather the mental states that the user was conducting a specific action.

>Labels:
0: open eyes,
1: close eyes.
2: left hand,
3: right hand.
4: image left hand,
5: image right hand.
6: open fists,
7:open feet.
8: image fist,
9: image feet.
10: rest.


In [8]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [9]:
#load dataset
import numpy as np
dataset_1 = np.load('/content/drive/MyDrive/1/1.npy')
print('The shape of Dataset_1:', dataset_1.shape)
dataset_1

The shape of Dataset_1: (259520, 65)


array([[-16, -29,   2, ..., -11,  15,   0],
       [-56, -54, -27, ...,   1,  21,   0],
       [-55, -55, -29, ...,  18,  35,   0],
       ...,
       [  0,   0,   0, ...,   0,   0,   9],
       [  0,   0,   0, ...,   0,   0,   9],
       [  0,   0,   0, ...,   0,   0,   9]])

##Data details
>For subject 1, the data consists of:
- 259,520 instances (snapshots) collected over time.
- 64 channels (sensors) capturing brain activity.
- The last column shows the class label (what the person was doing or thinking).

##Key Terms
>Instance (Time Step/Time Point):
- An instance is a snapshot taken at a single moment.
- With a 160 Hz sampling rate, 160 instances are collected every second.
- We use "instance" and "time point" to mean the same thing.

>Segment (Sample):
- A segment is a group of continuous instances representing a specific event or state of brain activity.
- The length of a segment is called the time window.
- We use "segment" and "sample" to mean the same thing.

In [2]:
import io, os,sys,types
from IPython import get_ipython
from nbformat import read
from IPython.core.interactiveshell import InteractiveShell

class NotebookFinder(object):
    """Module finder that locates Jupyter Notebooks"""
    def __init__(self):
        self.loaders = {}

    def find_module(self, fullname, path=None):
        nb_path = find_notebook(fullname, path)
        if not nb_path:
            return

        key = path
        if path:
            # lists aren't hashable
            key = os.path.sep.join(path)

        if key not in self.loaders:
            self.loaders[key] = NotebookLoader(path)
        return self.loaders[key]

def find_notebook(fullname, path=None):
    """find a notebook, given its fully qualified name and an optional path

    This turns "foo.bar" into "foo/bar.ipynb"
    and tries turning "Foo_Bar" into "Foo Bar" if Foo_Bar
    does not exist.
    """
    name = fullname.rsplit('.', 1)[-1]
    if not path:
        path = ['']
    for d in path:
        nb_path = os.path.join(d, name + ".ipynb")
        if os.path.isfile(nb_path):
            return nb_path
        # let import Notebook_Name find "Notebook Name.ipynb"
        nb_path = nb_path.replace("_", " ")
        if os.path.isfile(nb_path):
            return nb_path

class NotebookLoader(object):
    """Module Loader for Jupyter Notebooks"""
    def __init__(self, path=None):
        self.shell = InteractiveShell.instance()
        self.path = path

    def load_module(self, fullname):
        """import a notebook as a module"""
        path = find_notebook(fullname, self.path)

        print ("importing Jupyter notebook from %s" % path)

        # load the notebook object
        with io.open(path, 'r', encoding='utf-8') as f:
            nb = read(f, 4)


        # create the module and add it to sys.modules
        # if name in sys.modules:
        #    return sys.modules[name]
        mod = types.ModuleType(fullname)
        mod.__file__ = path
        mod.__loader__ = self
        mod.__dict__['get_ipython'] = get_ipython
        sys.modules[fullname] = mod

        # extra work to ensure that magics that would affect the user_ns
        # actually affect the notebook module's ns
        save_user_ns = self.shell.user_ns
        self.shell.user_ns = mod.__dict__

        try:
          for cell in nb.cells:
            if cell.cell_type == 'code':
                # transform the input to executable Python
                code = self.shell.input_transformer_manager.transform_cell(cell.source)
                # run the code in themodule
                exec(code, mod.__dict__)
        finally:
            self.shell.user_ns = save_user_ns
        return mod
sys.meta_path.append(NotebookFinder())

In [5]:
from scipy.signal import butter, lfilter
import scipy
import numpy as np

def butter_bandpass_filter(data, lowcut, highcut, fs, order=5):
    b, a = butter_bandpass(lowcut, highcut, fs, order=order)
    y = scipy.signal.lfilter(b, a, data)
    return y

def butter_bandpass(lowcut, highcut, fs, order=5):
    nyq = 0.5 * fs
    low = lowcut / nyq
    high = highcut / nyq
    b, a = butter(order, [low, high], btype='band')
    return b, a

def one_hot(y_):
    y_ = y_.reshape(len(y_))
    y_ = [int(xx) for xx in y_]
    n_values = np.max(y_) + 1
    return np.eye(n_values)[np.array(y_, dtype=np.int32)]
import numpy as np
def extract(input, n_classes, n_fea, time_window, moving):
    xx = input[:, :n_fea]
    yy = input[:, n_fea:n_fea + 1]
    new_x = []
    new_y = []
    number = int((xx.shape[0] / moving) - 1)
    for i in range(number):
        ave_y = np.average(yy[int(i * moving):int(i * moving + time_window)])
        if ave_y in range(n_classes + 1):
            new_x.append(xx[int(i * moving):int(i * moving + time_window), :])
            new_y.append(ave_y)
        else:
            new_x.append(xx[int(i * moving):int(i * moving + time_window), :])
            new_y.append(0)

    new_x = np.array(new_x)
    new_x = new_x.reshape([-1, n_fea * time_window])
    new_y = np.array(new_y)
    new_y.shape = [new_y.shape[0], 1]
    data = np.hstack((new_x, new_y))
    data = np.vstack((data, data[-1]))  # add the last sample again, to make the sample number round
    return data

In [10]:

n_fea = 64  # 64 channels
label = dataset_1[:, n_fea: n_fea+1]  # seperate label from feature
feature = dataset_1[:, 0:n_fea]
feature_f=[]  # feature after filtering

# EEG Delta pattern decomposition
for i in range(feature.shape[1]):
    x = feature[:, i]
    fs = 160.0
    lowcut = 0.5
    highcut = 4.0
    y = butter_bandpass_filter(x, lowcut, highcut, fs, order=3)
    feature_f.append(y)

feature_f=np.array(feature_f).T
print('The shape of filtered feature:',feature_f.shape)

data_f=np.hstack((feature_f,label))  # stack label to filtered feature
print("The shape of dataset_1 after filtering:",data_f.shape)

The shape of filtered feature: (259520, 64)
The shape of dataset_1 after filtering: (259520, 65)
