Пример анализа с использованием эмулятора работы нейроинтерфейса

Дальнейшие задачи:
* Уточнить эвристику для интерфейса
* Разобраться как повсеместно зафиксировать seed
* Тестирование на различном числе усреднений эпох
* Тестирование с различными классификаторами
* F-beta (torchmetrics fbeta_score)
* Добавить веса в BCE

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
import os

In [2]:
from eeg_dataset_utils import EEGDataset, EEGDatasetAdvanced, my_train_test_split, sampling
from offline_bci import OfflineBCI, split_by_words

## ML

In [3]:
from sklearn.svm import SVC
from sklearn.preprocessing import RobustScaler
from sklearn.pipeline import make_pipeline

In [4]:
# Sampling
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import Pipeline as ImbPipeline

In [5]:
# Channel sets for testing
ch_set4 = ['Fz', 'Pz', 'Cz', 'POz']
ch_set8 = ['Fz', 'Pz', 'Cz', 'POz', 'P3', 'P4', 'P7', 'P3']
ch_set15 = ['Cz', 'Pz', 'POz', 'P1', 'P2', 'P3','P4', 'P5', 'P6', 'P7', 'P8', 'PO3', 'PO4', 'O1', 'O2']

Загрузка датасета, выбор каналов, ресемплинг

In [6]:
rd = 'd:\\Study\\MSUAI\\P300BCI_ordered_DataSet'

In [194]:
downsample = 15 # 500 -> 35 Hz

dataset = EEGDataset(rd, 'ik')
dataset.pick_channels(ch_set8)
dataset.x = dataset.x[:,:,::downsample]

dataset.x.shape

(3599, 8, 21)

Разделение на train и test  
Производить необходимо на основании группировки по словам из эксперимента, для этого разработана функция `split_by_words`

Затем в тренировочной выборке производится балансирование классов

In [198]:
sampling_mode = 'real' # real, under, over, balanced
average = 8 # n epochs to average

train_set, test_set = split_by_words(dataset, 3)
train_set.average(average)

print('train_size (before averaging):', len(train_set))
print('train_size (after averaging):', train_set.x.shape[0])
print('test_size (no averaging):', len(test_set))

data = sampling(train_set, mode=[sampling_mode])
X, y = data[sampling_mode]['x'], data[sampling_mode]['y']

print('test size after balancing:', len(y))

train_size (before averaging): 2699
train_size (after averaging): 336
test_size (no averaging): 900
test size after balancing: 336


Обучение модели

In [199]:
model = make_pipeline(RobustScaler(), SVC(kernel='linear', class_weight='balanced', random_state=42))
model.fit(X, y)

Оффлайн эмулятор нейроинтрефейса  
На вход при инициализации класса подается тестовый датасет и модель.  
Для нейронных сетей необходимо уточнить аргумент model_type='NN'. По умолчанию 'ML'  
Метод `OfflineBCI.pipeline()` принимает на вход число усреднений эпох,  
на выходе получается значение ITR и точности распознавания команд.

In [218]:
BCI = OfflineBCI(test_set, model)
ITR, P = BCI.pipeline(average, summary=True)

Total trials: 5
Correct trials: 3
ITR: 7.93
Classification accuracy: 0.60


В ходе анализа интерфейс сохраняет информацию о целевых буквах и догадках системы, все сохраняется в словаре `OfflineBCI.result`  
Ключи: 'target_letter' и 'guess'

In [212]:
bci_answers = pd.DataFrame({'target': BCI.result['target_letter'],
                            'guess': BCI.result['guess']})
bci_answers

Unnamed: 0,target,guess
0,a,a
1,c,g
2,e,Fail
3,r,r
4,b,a


## NN

In [49]:
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torch import nn as nn

In [50]:
from IPython.display import clear_output

Определение моделей, датасета

In [51]:
class ConvEEGNN2(nn.Module):

    def __init__(self, n_eeg_channels=44):
        super().__init__()
        self.cnn_layers = nn.Sequential(
            nn.Conv1d(n_eeg_channels, 16, kernel_size=10),
            nn.ReLU(),
            nn.Conv1d(16, 8, kernel_size=10),
            nn.ReLU(),
            nn.AdaptiveMaxPool1d(1),
            nn.Flatten()
        )
        self.fc = nn.Linear(in_features=8, out_features=1)

    def forward(self, batch):
        out = self.cnn_layers(batch)
        out = self.fc(out)
        return torch.sigmoid(out)

In [52]:
import lightning.pytorch as pl
from torch import optim
import torchmetrics

class LitCNN(pl.LightningModule):
    def __init__(self, model):
        super().__init__()
        self.model = model
        self.accuracy = torchmetrics.classification.Accuracy(task="binary")
        self.f1 = torchmetrics.classification.BinaryF1Score()

    def training_step(self, batch, batch_idx):
        # training_step defines the train loop.
        # it is independent of forward
        x, y = batch
        out = self.model(x)
        loss = nn.functional.binary_cross_entropy(out.squeeze(1), y)

        # Logging to TensorBoard (if installed) by default
        self.log("train_loss", loss)
        return loss

    def on_validation_epoch_end(self):
        f1 = self.f1.compute()
        self.log("f1", self.f1.compute())
        # print("F1 =",f1)
        self.f1.reset()
        acc = self.accuracy.compute()
        self.log("accuracy", self.accuracy.compute())
        print("Accuracy =", acc)
        self.accuracy.reset()

    def validation_step(self, batch, batch_idx):
        x, y = batch
        out = self.model(x)
        self.f1.update(out.squeeze(1), y)
        self.accuracy.update(out.squeeze(1), y)


    def configure_optimizers(self):
        optimizer = optim.AdamW(self.parameters(), lr=1e-4)
        return optimizer

In [155]:
from eeg_dataset_utils import EEGDatasetAdvanced, my_train_test_split

class DatasetBuilder():
  def __init__(self, path = '/content/', subjects = [], channels = ['Pz'],
               n_average=None, cache_dir_name='eeg_cache'):

    self.dataset = EEGDatasetAdvanced(path,
                             cache = True,
                             subjects = subjects,
                             load_cache=True,
                             cache_dir_name=cache_dir_name,
                             n_average=n_average)
    self.dataset.pick_channels(channels)
    self.check()

  def check(self):
    # Cheking for file errors
    for i in range(len(self.dataset)):
      try:
        x = self.dataset[i]
      except Exception as e:
        print(e,i,self.dataset.data[i])

  def calculate_mean_and_std(self):
    loader = DataLoader(self.dataset, batch_size=len(self.dataset), shuffle=False)
    data, labels = next(iter(loader))
    return data.mean(dim=(0,2)) , data.std(dim=(0,2))

  def __call__(self):
    mean, std = self.calculate_mean_and_std()
    self.dataset.transform = lambda x: (x - mean[:,None]) / std[:,None]
    return self.dataset

Создание датасета

In [158]:
dataset.available_subjects[:10]

['ak', 'as', 'az', 'dad', 'dkv', 'ds', 'dsi', 'dzg', 'eab', 'en']

In [185]:
from torch.utils.data import WeightedRandomSampler

builder = DatasetBuilder(rd, channels = ch_set8, subjects=['ik'], cache_dir_name='eeg_cache_35Hz')
dataset = builder()

# dataset = EEGDatasetAdvanced(rd, load_cache=True, subjects=dataset.available_subjects[:10], cache_dir_name='eeg_cache_35Hz')
# dataset.pick_channels(ch_set8)

print('Dataset length:',len(dataset), "Item shape", dataset[0][0].shape)

Dataset length: 3599 Item shape torch.Size([8, 21])


Разделение с использованием той же функции `split_by_words`

In [None]:
# train_set, test_set = my_train_test_split(dataset, size=[0.7, 0.3], control_subject=True)
# print('train_size:', len(train_set))
# print('test_size:', len(test_set))

In [187]:
train_set, test_set = split_by_words(dataset, 3)
print('train_size:', len(train_set))
print('test_size:', len(test_set))

train_size: 2699
test_size: 900


Усреднение эпох тренировочного датасета

In [188]:
from copy import deepcopy

average = 8
test_set_no_average = deepcopy(test_set)

train_set.average(average)
test_set.average(average)
print('train size:', len(train_set))
print('train size:', len(test_set))

train size: 336
train size: 112


Определение DataLoader'ов

In [189]:
labels = torch.tensor([y.item() for _, y in train_set], dtype=torch.long)
class_weights = [8,1]
samples_weights = torch.zeros(len(train_set))
samples_weights[labels==0] = class_weights[0]
samples_weights[labels==1] = class_weights[1]

sampler = WeightedRandomSampler(samples_weights, len(samples_weights))

train_loader = DataLoader(train_set, 64, sampler=sampler)
test_loader = DataLoader(test_set, 64, shuffle=False)

  labels = torch.tensor([y.item() for _, y in train_set], dtype=torch.long)


Обучение модели

In [191]:
from lightning.pytorch.loggers import TensorBoardLogger

pl.seed_everything(42)
model = ConvEEGNN2(n_eeg_channels=8)
lit_model = LitCNN(model)
logger = TensorBoardLogger(save_dir=os.getcwd(), version="8ch_1s", name="lightning_logs")
trainer = pl.Trainer(limit_train_batches=100, max_epochs=50, log_every_n_steps = 5, logger=logger)
trainer.fit(model=lit_model, train_dataloaders=train_loader, val_dataloaders= test_loader)

clear_output()

Запуск нейроинтерфейса

In [192]:
BCI = OfflineBCI(test_set_no_average, model, model_type='NN')
ITR, P = BCI.pipeline(average, summary=True)

Total trials: 5
Correct trials: 3
ITR: 7.93
Classification accuracy: 0.60


In [193]:
bci_answers = pd.DataFrame({'target': BCI.result['target_letter'],
                            'guess': BCI.result['guess']})
bci_answers

Unnamed: 0,target,guess
0,a,a
1,c,c
2,e,f
3,r,r
4,b,n
