<a   href="https://colab.research.google.com/github//N-Nieto/Inner_Speech_Dataset/blob/master/Database_load_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial for load the Inner speech database.

## Set up - Download and import required libraries

In [10]:
#@title Install dependencies
!git clone https://github.com/N-Nieto/Inner_Speech_Dataset -q
!pip3 install mne -q

fatal: destination path 'Inner_Speech_Dataset' already exists and is not an empty directory.
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m


In [11]:
# @title Imports
import mne
import warnings
import numpy as np

# from google.colab import drive

from Inner_Speech_Dataset.Python_Processing.Data_extractions import (
    extract_data_from_subject,
)
from Inner_Speech_Dataset.Python_Processing.Data_processing import (
    select_time_window,
    transform_for_classificator,
)

np.random.seed(23)

mne.set_log_level(verbose="warning")  # to avoid info at terminal

# to avoid warnings from mne
warnings.filterwarnings(action="ignore", category=DeprecationWarning)
warnings.filterwarnings(action="ignore", category=FutureWarning)

## Data Loading.

In [19]:
### Hyperparameters

# The root dir has to point to the folder that contains the database
root_dir = "dataset"

# Data Type (can be exg, which containes EMG muscle, EOG eye, ECG heart or baseline)
datatype = "EEG"

# Sampling rate
fs = 256

# Select the useful par of each trial. Time in seconds
t_start = 1.5
t_end = 3.5

# Subject number
N_S = 1  # [1 to 10]

In [20]:
# @title Data extraction and processing

# Load all trials for a single subject
X, Y = extract_data_from_subject(root_dir, N_S, datatype)

# Cut useful time. i.e action interval
X = select_time_window(X=X, t_start=t_start, t_end=t_end, fs=fs)

In [22]:
print("(X) Data shape: [trials x channels x samples]")
print(X.shape)  # Trials, channels, samples

print("(Y) Labels shape")
print(Y.shape)  # Time stamp, class , condition, session

(X) Data shape: [trials x channels x samples]
(500, 128, 512)
(Y) Labels shape
(500, 4)


# Understanding the Inner Speech Dataset Dimensions

Let me explain the dataset dimensions using concrete examples:

## X Data Shape: [500, 128, 512]

This means:
- **500 trials**: Each trial is one instance of a person thinking a direction
- **128 channels**: Data from 128 different EEG electrodes on the scalp
- **512 samples**: 512 time points for each trial

### Real-World Example:
- `X[0]`: The first trial's complete data from all 128 electrodes
- `X[0, 5]`: All time samples from channel 6 (zero-indexed) in the first trial
- `X[0, 5, 100]`: The voltage value at the 101st time point, from channel 6, in the first trial

Since the sampling rate is 256 Hz and you're selecting from 1.5s to 3.5s (2 seconds total), you get 512 samples (256 samples/second × 2 seconds).

## Y Labels Shape: [500, 4]

This means 500 trials, each with 4 pieces of information:

### The 4 columns typically represent:
1. **Time stamp**: When the trial occurred
2. **Class**: Which direction was being thought (e.g., 1=up, 2=down, 3=left, 4=right)
3. **Condition**: The experimental condition (e.g., 1=inner speech, 2=visualized)
4. **Session**: Which experimental block/session this came from (1, 2, or 3)

### Example of Y data:
```
Y[0] = [1591871042, 1, 1, 1]  # Time stamp, Up direction, Inner speech condition, Session 1
Y[1] = [1591871045, 3, 1, 1]  # Time stamp, Left direction, Inner speech condition, Session 1
Y[2] = [1591871049, 2, 1, 1]  # Time stamp, Down direction, Inner speech condition, Session 1
```

Together, X and Y allow you to know both:
1. What the brain activity was (X)
2. What direction the person was thinking about (Y)

This is the essential data needed to train a classifier that can predict thoughts based on brain activity.

## Create the different groups for a classifier. A group is created with one condition and one class. 

In [23]:
# Conditions to compared
Conditions = [["Inner"], ["Inner"]]
# The class for the above condition
Classes = [["Up"], ["Down"]]

In [24]:
# Transform data and keep only the trials of interes
X, Y = transform_for_classificator(X, Y, Classes, Conditions)

In [25]:
print("Final data shape")
print(X.shape)

print("Final labels shape")
print(Y.shape)

Final data shape
(100, 128, 512)
Final labels shape
(100,)
