# <center> **How to use MOABB and how to validate any algorithm** </center>

The Mother of All BCI Benchmarks (MOABB) is a python library that comprehends all free popular Brain-Computer Interface (BCI) EEG datasets to validate any algorithm for BCIs. Brain-Computer Interface are devices that allows to control by using brain signals, for this, is possible to model different algorithms based on brain data to classify theses signals. After modeling, these algorithms must be validated, in order to know if they are able to accurately predict the brain signal. 

At first, sounds easy to train and validate theses models to classify brain signals, but due to non-stacionary and low Signal-To-Noise (SNR) features of the brain signals recorded by Electroencefalogran (EEG), is pretty difficult to model algorithms that can generalize for different multiple different trials, subjects and sessions. When fresh research about BCI shows the results of an new algorithm that can performn well compared to others, it lacks of generalization between subjects. And when some reasechers tries to use multiple datasets to validate its model, its is very dificult, due to necessity of different *pipelines* to process the data.

To solve this issue [1] developed an python library that has many free EEG datasets that be used to validate an new classification model of brain signals. Is this tutorial will shown how to use this library to validate your own algorithm to classify brain signals.

## **1. Installing MOABB**

To install this library is necessary to have Python version 3.8+ already installed on your machine `python==3.8+`, and Pip package installer. With this prerequisite satified, execute following command on your terminal: `pip install moabb`. It is also possible to execute command on the notebook:

In [None]:
# %pip install moabb

Installing MOABB will automatically install necessary libraries to manipulate data, such as `MNE` and `numpy`. If you and to Guarrantee that all libraries were installed you can also execute the previous command for `numpy` and `mne`, just need to basically: `pip install <package>`, were `<package>` if the name of the library you want to install.

In [None]:
#%pip install mne numpy

## **2. Importing Libraries**

After installing necessary, we must import them to be usable, using `import`. In our case will be using `import moabb`.

In [1]:
import copy
import numpy
import moabb

from mne.decoding import CSP
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.pipeline import make_pipeline

## **3. Datasets available**

The datasets available in this library are separeted by EEG paradigms, such as **Imagery**, **P300**, **State Steady Visual Evoked Potentials (SSVEP)**, **CVEP** and **Resting State**. It is possible to visualize all dataset available by printing `moabb.datasets.utils.dataset_list`.

In [2]:
# List of all dataset
all_datasets = copy.copy(moabb.datasets.utils.dataset_list) # Getting a copy for data safety

In [3]:
print(f"Number of datasets: {len(all_datasets)}")
print(all_datasets)

Number of datasets: 77
[<class 'moabb.datasets.alex_mi.AlexMI'>, <class 'moabb.datasets.braininvaders.BI2012'>, <class 'moabb.datasets.braininvaders.BI2013a'>, <class 'moabb.datasets.braininvaders.BI2014a'>, <class 'moabb.datasets.braininvaders.BI2014b'>, <class 'moabb.datasets.braininvaders.BI2015a'>, <class 'moabb.datasets.braininvaders.BI2015b'>, <class 'moabb.datasets.bnci.BNCI2014001'>, <class 'moabb.datasets.bnci.BNCI2014002'>, <class 'moabb.datasets.bnci.BNCI2014004'>, <class 'moabb.datasets.bnci.BNCI2014008'>, <class 'moabb.datasets.bnci.BNCI2014009'>, <class 'moabb.datasets.bnci.BNCI2014_001'>, <class 'moabb.datasets.bnci.BNCI2014_002'>, <class 'moabb.datasets.bnci.BNCI2014_004'>, <class 'moabb.datasets.bnci.BNCI2014_008'>, <class 'moabb.datasets.bnci.BNCI2014_009'>, <class 'moabb.datasets.bnci.BNCI2015001'>, <class 'moabb.datasets.bnci.BNCI2015003'>, <class 'moabb.datasets.bnci.BNCI2015004'>, <class 'moabb.datasets.bnci.BNCI2015_001'>, <class 'moabb.datasets.bnci.BNCI2015_003

In [12]:
all_datasets[0].__name__

'AlexMI'

It is also possible to select these datasets by its paradigm, it is possible to do this by using `moabb.datasets.utils.dataset_search()` function, where we can pass some parameters to select the perfect dataset we want, one of these parameters is `paradigm`, which can be `'imagery'`, `'p300'`, `'ssvep'` and `'cvep'`.

In [10]:
# Selecting SSVEP paradigm
ssvep_datasets = copy.deepcopy(moabb.datasets.utils.dataset_search(paradigm='ssvep'))

In [None]:
print(f"Number of SSVEP datasets: {len(ssvep_datasets)}")
print(ssvep_datasets)

In the function `moabb.datasets.utils.dataset_search()` there are other parameters to filter, such as: 
* `multi_session` - Returns just the datasets that that has more than one session per subject.
* `events` - Type of event to select.
* `has_all_events` - Select dataset with all types of events.
* `interval` - (Motor Imagery Only) minimal time length of the event.
* `min_subjects` - Minum number of subjects in an dataset.
* `channels` - List of channels.


In [None]:
moabb.datasets.utils.dataset_search(paradigm='imagery',
                                    multi_session=True,
                                    min_subjects=10)

It also possible to create fake dataset using Python library `fake` [2]. When used the functions used above, it shows class `moabb.datasets.fake.FakeDataset()`, which is a class implemented for test purpose.

In [None]:
fake_data = moabb.datasets.fake.FakeDataset(event_list=['fake1', 'fake2'],
                                            n_sessions=2,
                                            n_runs=2,
                                            paradigm='imagery',
                                            channels=('C3', 'Cz', 'C4'))

In [None]:
print(len(fake_data.get_data()))

## **4. Main Concepts**

Before diving into how to use these datasets to create an pipeline, it is important to know the four main concepts of the MOABB: (1) Datasets, (2) Paradigms, (3) Evaluation, and (4) Pipeline.

### **4.1 Datasets**

Since we already know how to seach and select desired datasets, we'll start selecting dataset for `imagery` paradigm.

In [7]:
# Selecting all datasets with motor imagery
imagery_datasets = copy.copy(moabb.datasets.utils.dataset_search(paradigm='imagery'))

It is possible to get the data by using `.get_data()` function. Before using this function, which will download each dataset, is important to set the path directory, by using `moabb.utils.set_downalod_dir(path)`.

In [None]:
# Setting download path. Sometimes this function does not work
moabb.utils.set_download_dir(path='./datasets/')

In [None]:
# Downloading each dataset
#[dataset.get_data() for dataset in imagery_datasets]

### **4.2 Paradigms**

As we explained before, there are four paradigms, which are: (1) Motor Imagery, (2) SSVEP, (3) P300, (4) CVEP, and (5) Resting State. Each of theses paradigms defines how raw MNE data will be processed and feed to the decoing algorithm.

For Motor imagery paradigm:
1. `MotorImagery()` - N Classes, N is the number of classes desired.
2. `LeftRightImagery()`
3. `FilterBankLeftRightImagery()`
4. `FilterBankMotorImagery()`

For P300 paradigm:
1. `SinglePass()`
2. `P300()`

For SSVEP paradigm:
1. `SSVEP()`
2. `FilterBankSSVEP`

For c-VEP Paradigms:
1. `CVEP()`
2. `FilterBankCVEP()`

For Resting State Paradigms:
1. `RestingStateToP300Adapter`

In [None]:
# Using LeftRight for MotorImagery
paradigm = moabb.paradigms.LeftRightImagery()

In [5]:
print(paradigm.datasets)

[<moabb.datasets.bnci.BNCI2014_001 object at 0x000001C61B01C340>, <moabb.datasets.bnci.BNCI2014_004 object at 0x000001C61B01FE80>, <moabb.datasets.gigadb.Cho2017 object at 0x000001C61B01F160>, <moabb.datasets.mpi_mi.GrosseWentrup2009 object at 0x000001C61B01C0A0>, <moabb.datasets.Lee2019.Lee2019_MI object at 0x000001C61B01C1F0>, <moabb.datasets.liu2024.Liu2024 object at 0x000001C61B01DE10>, <moabb.datasets.physionet_mi.PhysionetMI object at 0x000001C61B01CFA0>, <moabb.datasets.schirrmeister2017.Schirrmeister2017 object at 0x000001C61B01FF70>, <moabb.datasets.bbci_eeg_fnirs.Shin2017A object at 0x000001C61B01C8B0>, <moabb.datasets.stieger2021.Stieger2021 object at 0x000001C61B01CA30>, <moabb.datasets.Weibo2014.Weibo2014 object at 0x000001C61B035390>, <moabb.datasets.Zhou2016.Zhou2016 object at 0x000001C61B0367A0>]


In [11]:
imagery_datasets[6]

<moabb.datasets.gigadb.Cho2017 at 0x1c61b036950>

In [12]:
X, labels, meta = paradigm.get_data(dataset=imagery_datasets[6], subjects=[1])

  set_config(key, get_config("MNE_DATA"))
Downloading data from 'https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/100001_101000/100295/mat_data/s01.mat' to file 'C:\Users\erika\Documents\Code\eeg-pyriemann-pipeline\notebooks\.5524\100001_101000\100295\mat_data\s01.mat'.


  7%|##7                                   | 14.9M/203M [00:03<00:27, 6.92MB/s]

KeyboardInterrupt: 

  8%|##8                                   | 15.3M/203M [00:19<00:27, 6.92MB/s]

### **4.3 Pipeline**

A pipeline in ML/DATA engineering refers to the end-to-end sequence of steps that data undergoes—from raw input to model predictions (or insights). Is this case we will be using for model training. We will me using scikit function `make_pipeline` for this purpouse.

In [None]:
pipeline = make_pipeline(CSP(n_components=8), LDA())

### **4.4 Evaluation**

In [None]:
evaluation = moabb.evaluations.WithinSessionEvaluation(
    paradigm=paradigm,
    datasets=[imagery_datasets[0]],
    overwrite=True,
    hdf5_path=None,
)

In [None]:
results = evaluation.process({"csp+lda": pipeline})

In [None]:
results

### **4.5 Statistics, visualization and utilities**

In [None]:
moabb.analysis.plotting.score_plot(results)

## **References**

* [1] MOABB - https://moabb.neurotechx.com/docs/api.html
* [2] Faker - https://faker-readthedocs-io.translate.goog/en/master/?_x_tr_sl=en&_x_tr_tl=pt&_x_tr_hl=pt&_x_tr_pto=tc