# Tutorial for loading leaderboard competition data

To start with this competition, you will need to download the data for the two tasks, sleep cassette and motor imagery (MI). You could either handle them yourself, downloading them from here:
- Sleep Source : https://figshare.com/articles/dataset/SleepSource/14839659
- Sleep Leaderboard test data : https://figshare.com/articles/dataset/LeaderboardSleep/14839653
- MI source 1 (Cho2017) : ftp://parrot.genomics.cn/gigadb/pub/10.5524/100001_101000/100295/mat_data/
- MI source 2 (BNCI 001-2014) : http://bnci-horizon-2020.eu/database/data-sets
- MI source 3 (PhysionetMI) : https://physionet.org/content/eegmmidb/1.0.0/
- MI Leaderboard test data : https://figshare.com/articles/dataset/LeaderboardSleep/14839653

You could also use our competition [package](https://github.com/sylvchev/beetl-competition) to take care automatically of the downloading part, as shown in this tutorial.

Description of the datasets and specific information are available from https://beetl.ai/data, and http://moabb.neurotechx.com/docs/datasets.html 

**Important** 
Please note that for Task 1, you are ONLY allowed to use data in 'SleepSource' and 'LeaderboardSleep', we regard using data not in these folders as cheating. 

For Task 2, you are ONLY allowed to use Cho 2017, BNCI 2014-001, PhysionetMI (see how to use MOABB API to download in motor imagery task start kits) and data provided in 'leaderboardMI', we regard using data not in these data sets and folders as cheating. 

We will test run code from top ranking teams in the final stage of the competition. Please fix your random seed or so to make sure the experiemnts are reproducible.

## Sleep stage task

The goal is to detect sleep stage, using source sleep EEG age group 25-59 with training trials and labels, and apply this on target age group 60-80 (10 sessions). The target dataset, called Leaderboard Sleep, is divided in two groups. First group contains 5 example subjects with labels, they are the example subjects from the testing group. Second group is the testing group, which contains the leaderboard subjects 6-17 that without labels. You need to give predictions for those labels.

**Data information**

| Type | Value |
| :- | :-: |
| Sampling rate | 100 Hz |
| Trial window | 30s |
| Nb of channels | 2 bipolar (Fpz-Cz, Pz-Oz) |
| Highpass filter | 0.5 Hz |
| Lowpass filter | 100.Hz |

The sleep stage labels to predict are:

| Sleep stage | label |
| :- | :-: |
| W | 0 |
| stage 1 | 1 |
| stage 2| 2 |
| stage 3 | 3 |
| stage 4 | 4 |
| REM | 5 |


## Motor imagery task

The source datasets are available on the url indicated above or from MOABB, as `BNCI2014001`, `Cho2017` and `PhysionetMI` datasets.


For the leaderboard, there are five subjects for testing, S1 S2 are from dataset A, S3 S4 S5 are from dataset B. We will release more dataset details after the competition. For each subject, there are two split of data, training and testing. Training split contain 100 trials (S1, S2) or 120 trials (S3, S4 and S5) with labels as target domain samples of that subject. Testing folders contain the trials that you should predict.

**Data information for dataset A (subject 1 & 2)**

| Type | Value |
| :- | :-: |
| Sampling rate | 500 Hz |
| Trial window | 4s |
| Nb of channels | 63 EEG |
| Highpass filter | 1 Hz |
| Lowpass filter | 100.Hz |
| Notch filter | 50 Hz |

The name of the channels are:
'Fp1', 'Fz', 'F3', 'F7', 'FT9', 'FC5', 'FC1', 'C3', 'T7', 
'TP9', 'CP5', 'CP1', 'Pz', 'P3', 'P7', 'O1', 'Oz', 
'O2', 'P4', 'P8', 'TP10', 'CP6', 'CP2', 'C4', 'T8',
'FT10', 'FC6', 'FC2', 'F4', 'F8', 'Fp2', 'AF7', 'AF3', 
'AFz', 'F1', 'F5', 'FT7', 'FC3', 'FCz', 'C1', 'C5', 
'TP7', 'CP3', 'P1', 'P5', 'PO7', 'PO3', 'POz', 'PO4', 
'PO8', 'P6', 'P2', 'CPz', 'CP4', 'TP8', 'C6', 'C2',
'FC4', 'FT8', 'F6', 'F2', 'AF4', 'AF8'.

**Data information for dataset B (subject 3, 4, 5)**

| Type | Value |
| :- | :-: |
| Sampling rate | 200 Hz |
| Trial window | 4s |
| Nb of channels | 32 EEG |
| Highpass filter | 1 Hz |
| Lowpass filter | 100.Hz |

The name of the channels are:
'Fp1', 'Fp2', 'F3', 'Fz', 'F4', 'FC5', 'FC1', 'FC2','FC6', 'C5', 'C3',
'C1', 'Cz', 'C2', 'C4', 'C6', 'CP5', 'CP3', 'CP1',
'CPz', 'CP2', 'CP4', 'CP6', 'P7', 'P5', 'P3', 'P1', 'Pz', 
'P2', 'P4', 'P6', 'P8'

In dataset A, the motor imagery labels to predict are:

| MI task | label |
| :- | :-: |
| Rest | 0 |
| Lefthand | 1 |
| Righthand | 2 |
| Feet | 3 |

In dataset B, the motor imagery labels to predict are:

| MI task | label |
| :- | :-: |
| Lefthand | 0 |
| Righthand | 1 |
| Feet | 2 |
| Rest | 3 |

**However, in task 2, there will be only three catergorties to predict as output labels - Lefthand (0), Righthand (1) and other (2)**

# Downloading data automatically

You could use the helper [package](https://github.com/sylvchev/beetl-competition) to download automatically the data. You could install it with `pip install git+https://github.com/sylvchev/beetl-competition` or `pip install -e git+https://github.com/sylvchev/beetl-competition#egg=beetl-competition`

And then, it is simple as:

In [5]:
from beetl.task_datasets import BeetlSleepLeaderboard, BeetlMILeaderboard

_, _, X_sleep_test, _ = BeetlSleepLeaderboard().get_data(subjects=range(6, 18))
print ("Sleep leaderboard: There are {} trials with {} electrodes and {} time samples".format(*X_sleep_test.shape))

_, _, X_MIA_test = BeetlMILeaderboard().get_data(dataset='A')
print ("MI leaderboard A: There are {} trials with {} electrodes and {} time samples".format(*X_MIA_test.shape))

_, _, X_MIB_test = BeetlMILeaderboard().get_data(dataset='B')
print ("MI leaderboard B: There are {} trials with {} electrodes and {} time samples".format(*X_MIB_test.shape))

Sleep leaderboard: There are 25748 trials with 2 electrodes and 3000 time samples
MI leaderboard A: There are 400 trials with 63 electrodes and 2000 time samples
MI leaderboard B: There are 600 trials with 32 electrodes and 800 time samples


# Loading data manually

Alternatively, you could download the data from https://figshare.com/articles/dataset/leaderboardMI/14839650 and https://figshare.com/articles/dataset/LeaderboardSleep/14839653. When you have downloaded the competition data, you could load your data as shown below. You just need to specify the path where you store the data

## Sleep task

In [1]:
import numpy as np
import pickle

In [2]:
# savebase = 'C:\\Path\\to\\LeaderboardSleep\\testing\\'
savebase = '/Users/sylchev/mne_data/MNE-beetlsleepleaderboard-data/testing/'
savebase = '/home/sylchev/mne_data/MNE-beetlsleepleaderboard-data/testing/'
X_sleep_test = []
for subj in range(6, 18):
    for session in range(1, 3):
        with open(savebase + "leaderboard_s{}r{}X.npy".format(subj, session), 'rb') as f:
            X_sleep_test.append(pickle.load(f))
X_sleep_test = np.concatenate(X_sleep_test)

print ("There are {} trials with {} electrodes and {} time samples".format(*X_sleep_test.shape))

There are 25748 trials with 2 electrodes and 3000 time samples


## Motor imagery dataset A (S1, S2)

In [3]:
import os.path as osp


# path = 'C:\\Path\\to\\leaderboardMI'
path = '/Users/sylchev/mne_data/MNE-beetlmileaderboard-data/'
path = '/home/sylchev/mne_data/MNE-beetlmileaderboard-data/'

X_MIA_test = []
for subj in range(1, 3):
    savebase = osp.join(path, "S{}".format(subj), "testing")

    for i in range(6, 16):
        with open(osp.join(savebase, "race{}_padsData.npy".format(i)), 'rb') as f:
            X_MIA_test.append(pickle.load(f))
X_MIA_test = np.concatenate(X_MIA_test)

print ("There are {} trials with {} electrodes and {} time samples".format(*X_MIA_test.shape))

There are 400 trials with 63 electrodes and 2000 time samples


## Motor imagery dataset B (S3, S4, S5)

In [4]:
# path = 'C:\\Path\\to\\leaderboardMI'
path = '/Users/sylchev/mne_data/MNE-beetlmileaderboard-data/'
path = '/home/sylchev/mne_data/MNE-beetlmileaderboard-data/'

X_MIB_test = []
for subj in range(3, 6):
    savebase = osp.join(path, "S{}".format(subj), "testing")
    with open(osp.join(savebase, "testing_s{}X.npy".format(subj)), 'rb') as f:
        X_MIB_test.append(pickle.load(f))
X_MIB_test = np.concatenate(X_MIB_test)

print ("There are {} trials with {} electrodes and {} time samples".format(*X_MIB_test.shape))

There are 600 trials with 32 electrodes and 800 time samples
