# Dataset Introduction

This notebook is trying to replicate the [Github Rebo](https://github.com/N-Nieto/Inner_Speech_Dataset) of the `Inner Speech Classification` Dataset. Maily the tutorial notebook.

[Notebook Link](https://github.com/N-Nieto/Inner_Speech_Dataset/blob/main/Database_load_Tutorial.ipynb)

In [1]:
#  !git clone https://github.com/N-Nieto/Inner_Speech_Dataset -quit

In [2]:
import os
import mne
import pickle
import random
import warnings
import numpy as np
import matplotlib.pyplot as plt

from Inner_Speech_Dataset.Python_Processing.Data_extractions import  Extract_data_from_subject
from Inner_Speech_Dataset.Python_Processing.Data_processing import  Select_time_window, Transform_for_classificator, Split_trial_in_time

np.random.seed(23)

mne.set_log_level(verbose='warning') #to avoid info at terminal
warnings.filterwarnings(action = "ignore", category = DeprecationWarning )
warnings.filterwarnings(action = "ignore", category = FutureWarning )

## Load Data

In [41]:
### Hyperparameters

datatype = "EEG"     # Data Type
fs = 256             # Sampling rate
t_start = 1.5        # Select the useful par of each trial. Time in seconds
t_end = 3.5
N_S = 6              # Subject number [1 to 10]

In [42]:
#@title Data extraction and processing
# The root dir have to point to the folder that cointains the database
root_dir = "dataset/inner-speech-recognition"

# Load all trials for a sigle subject
X, Y = Extract_data_from_subject(root_dir, N_S, datatype)

# Cut usefull time. i.e action interval
X = Select_time_window(X = X, t_start = t_start, t_end = t_end, fs = fs)


print("Data shape: [trials x channels x samples]")
print(X.shape) # Trials, channels, samples

print("Labels shape")
print(Y.shape) # Time stamp, class , condition, session

Data shape: [trials x channels x samples]
(540, 128, 512)
Labels shape
(540, 4)


## Create the different groups for a classifier. A group is created with one condition and one class.

In [43]:
Conditions = [["Inner"],["Inner"],["Inner"],["Inner"]]    # Conditions to compared
Classes    = [  ["Up"] ,["Down"],["Left"],["Right"]]    # The class for the above condition

# Transform data and keep only the trials of interes
X_trans , Y_trans =  Transform_for_classificator(X, Y, Classes, Conditions)

print("Final data shape: ",X_trans.shape)
print("Final labels shape: ",Y_trans.shape)

Final data shape:  (216, 128, 512)
Final labels shape:  (216,)


## Processing
The processing was developed in Python, using mainly the MNE library.

Using the `Inner_speech_processing.py` script, you can easily make your own processing, changing the variables at the top of the script.

The `TFR_representation.py` generates the Time Frequency Representations used addressing the same processing followed in the paper.

By means of the `Plot_TFR_Topomap.py` the same images presented in the paper can be addressed.

In [47]:
unique, counts = np.unique(Y_trans, return_counts=True)
print(unique, counts)

[0. 1. 2. 3.] [54 54 54 54]


In [40]:
X_trans[198][1]

array([-3.42861838e-06,  1.78002000e-07, -2.35591625e-06, -7.83196999e-07,
       -2.02179794e-06, -5.87102094e-07,  9.12387244e-06,  9.92898187e-06,
        1.26856873e-05,  1.20888303e-05,  5.73838184e-06,  1.00431359e-05,
        8.11830419e-06, -2.83755286e-06, -2.01818111e-06, -5.55349667e-06,
       -7.70402059e-06, -4.11437346e-06, -3.47507543e-06, -7.13949465e-06,
       -7.88644073e-06,  1.14410594e-07,  6.51471566e-06,  8.84385480e-06,
        6.86900590e-06,  6.77894888e-06,  1.87936640e-06,  1.55513518e-06,
        1.45367894e-06, -4.15376400e-06, -5.59547092e-06, -8.36670525e-06,
       -6.03766417e-06, -4.11486972e-06, -3.51469205e-06, -5.68845949e-07,
        1.93650136e-06,  8.01005297e-06,  6.66600319e-06, -9.50341665e-07,
        3.35318946e-06,  8.35871485e-06,  1.09332213e-05,  1.50182848e-05,
        6.85925801e-06,  3.38350170e-06,  2.50950780e-06, -3.36353950e-06,
       -2.30157527e-06, -3.28318160e-06,  9.65495126e-07,  6.50758747e-06,
        7.47873150e-06,  