# Dataset Introduction

This notebook is trying to replicate the [Github Rebo](https://github.com/N-Nieto/Inner_Speech_Dataset) of the `Inner Speech Classification` Dataset. Maily the tutorial notebook.

[Notebook Link](https://github.com/N-Nieto/Inner_Speech_Dataset/blob/main/Database_load_Tutorial.ipynb)

In [1]:
#  !git clone https://github.com/N-Nieto/Inner_Speech_Dataset -quit

In [2]:
import os
import mne
import pickle
import random
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from Inner_Speech_Dataset.Python_Processing.Data_extractions import  Extract_data_from_subject
from Inner_Speech_Dataset.Python_Processing.Data_processing import  Select_time_window, Transform_for_classificator, Split_trial_in_time

np.random.seed(23)

mne.set_log_level(verbose='warning') #to avoid info at terminal
warnings.filterwarnings(action = "ignore", category = DeprecationWarning )
warnings.filterwarnings(action = "ignore", category = FutureWarning )

## Load Data

In [3]:
### Hyperparameters
datatype = "EEG"     # Data Type
fs = 256             # Sampling rate
t_start = 1.5        # Select the useful par of each trial. Time in seconds
t_end = 3.5
N_S = 6              # Subject number [1 to 10]

In [4]:
#@title Data extraction and processing
# The root dir have to point to the folder that cointains the database
root_dir = "dataset/inner-speech-recognition"

# Load all trials for a sigle subject
X, Y = Extract_data_from_subject(root_dir, N_S, datatype)

# Cut usefull time. i.e action interval
X = Select_time_window(X = X, t_start = t_start, t_end = t_end, fs = fs)


print("Data shape: [trials x channels x samples]: ",X.shape)
print("Labels shape: ",Y.shape)

Data shape: [trials x channels x samples]:  (540, 128, 512)
Labels shape:  (540, 4)


## Create the different groups for a classifier. A group is created with one condition and one class.

In [5]:
Conditions = [["Inner"],["Inner"],["Inner"],["Inner"]]    # Conditions to compared
Classes    = [  ["Up"] ,["Down"],["Right"],["Left"]]    # The class for the above condition

# Transform data and keep only the trials of interes
X_trans , Y_trans =  Transform_for_classificator(X, Y, Classes, Conditions)

print("Final data shape: ",X_trans.shape)
print("Final labels shape: ",Y_trans.shape)

Final data shape:  (216, 128, 512)
Final labels shape:  (216,)


# Writing a function to merge datasets

In [6]:
%%time
# The Storage variables with their respected names
X_train =[]
Y_train =[]
X_val = []
Y_val =[]
X_test = []
Y_test = []

### Hyperparameters
datatype = "EEG"     # Data Type
fs = 256             # Sampling rate
t_start = 1.5        # Select the useful par of each trial. Time in seconds
t_end = 3.5

# Setting parameters
root_dir = "dataset/inner-speech-recognition"             # path to the main directory of the subjects
subject_numbers = [1,2,3,4,5,6,7,8,9,10]                  # Number of subjects
Conditions = [["Inner"],["Inner"],["Inner"],["Inner"]]    # Paradigm of data that want to classify
Classes = [  ["Up"] ,["Down"],["Right"],["Left"]]         # The four clases of data


# Main for loop to extract the data and append it
for s_n in subject_numbers:
    X_sub, Y_sub = Extract_data_from_subject(root_dir, s_n, datatype)
    X_sub = Select_time_window(X = X_sub, t_start = t_start, t_end = t_end, fs = fs)
    X_transformed , Y_transformed =  Transform_for_classificator(X_sub, Y_sub, Classes, Conditions)
    
    ## Separating for testing set
    indices = np.arange(10, len(X_transformed), 10)
    X_tt = X_transformed[indices].copy()
    Y_tt = Y_transformed[indices].copy()
    # Removed after separating for testing set 
    X_t =  np.delete(X_transformed, indices, axis =0)
    Y_t =  np.delete(Y_transformed, indices)
    
    ## Separating for Validation set
    indices = np.arange(9, len(X_t), 9)
    X_tt_val = X_t[indices].copy()
    Y_tt_val = Y_t[indices].copy()
    # Removed after separating for testing set 
    X_t =  np.delete(X_t, indices, axis =0)
    Y_t =  np.delete(Y_t, indices)
       
    # Appending data to the main sets
    X_train.append(X_t)
    Y_train.append(Y_t)
    X_val.append(X_tt_val)
    Y_val.append(Y_tt_val)
    X_test.append(X_tt)
    Y_test.append(Y_tt)

# Vertically Concatenate all the data of one subject after another
X_train = np.concatenate(X_train, axis=0)
Y_train = np.concatenate(Y_train, axis=0)
X_val = np.concatenate(X_val, axis=0)
Y_val = np.concatenate(Y_val, axis=0)
X_test = np.concatenate(X_test, axis=0)
Y_test = np.concatenate(Y_test, axis=0)


# Printing the shape
print("The shape of the X_train x Y_train ", X_train.shape , " x ",Y_train.shape)
print("The shape of the X_val x Y_val ", X_val.shape , " x ",Y_val.shape)
print("The shape of the X_test x Y_test: ", X_test.shape, " x ",Y_test.shape)

The shape of the X_train x Y_train  (1799, 128, 512)  x  (1799,)
The shape of the X_val x Y_val  (223, 128, 512)  x  (223,)
The shape of the X_test x Y_test:  (214, 128, 512)  x  (214,)
CPU times: total: 18.2 s
Wall time: 22.3 s


In [9]:
X_train.shape

(1799, 128, 512)

In [10]:
Y_train.shape

(1799,)

In [7]:
unique, counts = np.unique(Y_test, return_counts=True)
df=pd.DataFrame([unique, counts])
df

Unnamed: 0,0,1,2,3
0,0.0,1.0,2.0,3.0
1,47.0,55.0,57.0,55.0


## Processing
The processing was developed in Python, using mainly the MNE library.

Using the `Inner_speech_processing.py` script, you can easily make your own processing, changing the variables at the top of the script.

The `TFR_representation.py` generates the Time Frequency Representations used addressing the same processing followed in the paper.

By means of the `Plot_TFR_Topomap.py` the same images presented in the paper can be addressed.