# Multi-Scale Hybrid Neural Networks for EEG Classification

**Salman Sami Hussain Ali 40161786**

### Abstract
The utilization of Motor Imagery (MI) via Electroencephalography (EEG) as a Brain-Computer Interface (BCI) method enables communication with external devices based on the user's brain intentions. While Convolutional Neural Networks (CNNs) have shown promising results in EEG classification tasks, many existing CNN-based approaches rely on a single convolution mode and kernel size. This limitation hampers their ability to efficiently capture multi-scale temporal and spatial features, thereby restricting further enhancements in MI-EEG signal classification accuracy. To address this, my experiments try a novel approach called Multi-Scale Hybrid Convolutional Neural Network (MSHCNN), aimed at enhancing the decoding of MI-EEG signals for improved classification performance. The MSHCNN leverages both two-dimensional and one-dimensional convolutions to extract both temporal and spatial features as well as advanced temporal features from EEG signals.[2]

### Introduction
Brain-Computer Interface (BCI) technology has fundamentally transformed our capacity to interact directly with computers through neural activity. By capturing brain signals from various points on the scalp, BCI systems convert these neural impulses into machine-readable commands. This capability circumvents traditional physiological pathways, opening up novel avenues for linking the brain with our technological environment. In simpler terms, brain signals are translated into computer instructions, finding primary utility in medical applications such as assistive technologies, neurorehabilitation, and brain health monitoring.

Electroencephalogram (EEG) signals offer a noninvasive means of monitoring brain activity. They capture the electrical signals produced by neurons in the brain using electrodes positioned on the scalp. EEG provides a rich dataset of electrical potentials over time, enabling the study of brain rhythms, event-related potentials, and functional connectivity patterns.

Deep neural networks present significant advantages in analyzing neural time series data, particularly EEG signals. These networks can autonomously learn hierarchical representations from raw signals, eliminating the need for manually crafting features, thus saving time. Deep neural networks have demonstrated exceptional performance across various machine learning tasks, including the analysis of neural time series data.

Convolutional Neural Networks (CNNs) are particularly favored in neural data analysis due to their optimal decoding performance relative to the number of parameters they require. In EEG decoding, CNNs efficiently extract relevant spatial features from multi-channel EEG data, striking a balance between model complexity and decoding accuracy. Additionally, CNNs offer interpretability through their hierarchical structure and local receptive fields.

This preference for CNNs is evident in the current state-of-the-art models for EEG signal processing and BCI applications, as observed through the examination of literature and documentation on EEGNet, ShallowConvNet, and EEGConformer. These models highlight the CNN's ability to achieve an optimal balance between complexity and performance.

Two primary training strategies exist for networks dealing with neural time series data: Within-subject and Leave-one-subject-out. Within-subject training involves using data from the same subject for both training and testing. Conversely, Leave-one-subject-out strategy leaves out one subject's data for testing while training on data from the other subjects. This approach aids in learning subject-specific patterns and variations in brain activity across different individuals.

### Related Work

##### **Setting up Speechbrain**

In [None]:
%%capture
!git clone https://github.com/speechbrain/benchmarks.git

In [None]:
!pip install pip==22.3.1

[0m

In [None]:
!pip install --upgrade pip

In [None]:
%cd /notebooks/benchmarks
!git submodule update --init --recursive
%cd /notebooks/benchmarks/speechbrain
!pip install -r requirements.txt
!pip install -e .
%cd /notebooks/benchmarks/benchmarks/MOABB
!pip install -r extra-requirements.txt    # Install additional dependencies
!pip install -r ../../requirements.txt    # Install base dependencies
%cd /notebooks/benchmarks/benchmarks/MOABB
%env PYTHON_PATH=/notebooks/benchmarks/

In [None]:
!pip uninstall -y torchaudio torchvision torch

In [None]:
!pip install torchvision torchaudio torch

In [None]:
!pip uninstall mne --y
!pip install mne==1.6.1

In [None]:
!pip install tensorflow

### EEGNET

EEGNet is currently the state-of-the-art model for EEG classification and interpretation of EEG-based BCI. It is a compact CNN that uses Depthwise and Separable convolutions to construct an EEG-specific network that encapsulate several EEG feature extraction concepts [1]

<div>
<img src="https://drive.google.com/uc?export=view&id=1bOwLdmLDMdLyQU70VJVJh2e2AcAIitTp" width="800"/>
</div>

### ShallowConvNet

ShallowConvNet is another state-of-the-art model proposed by Schirrmeister, R.T. et al. for EEG classification and interpretation of EEG-based BCI. It utilizes temporal convolution, followed by spatial convolution across the channels. Then, it pooling is applied to it and it is passed to the classification head[3].
<div>
<img src="https://drive.google.com/uc?export=view&id=13xsTMZ5lqNL0Awdq9YAmtpEjsJgp_5M0" width="800"/>
</div>

### Approach

My project mostly consists of my investigation of the two types of convolution utilized in Multi-Scale Hybrid Convolutional Neural Networks(MSHCNN). MSHCNN utilizes a one-dimensional convolution called 1DCNN to extract advanced temporal features of EEG signals, and a two-dimensional convolution called 2DCNN to extract temporal and spatial features of EEG signals(Tang X;Yang C;Sun X;Zou M;Wang H;, 2023). <br><br>In the paper, the novel methodology proposed by Yang, et. Al performed better than EEGNet on certain "Dataset A" which consists of EEG signals collected from 9 normal subjects during left-hand and right-hand motor imagery tasks. The data includes recordings from three channels (C3, C4, and Cz) across 5 sessions per subject. They also tested their methodology on "Dataset B" which consists of EEG data from 9 normal subjects performing four motor imagery tasks: left hand, right hand, feet, and tongue movements. Each subject participated in two sessions on different days, with each session consisting of 6 cycles. Dataset B is very similar to the BNCI2014001 dataset we have been tasked with. However, for their classification tasks, they took only the left-hand and right-hand motor imagery samples. Furthermore, majority of their reported results cover their model's performance on dataset A, and not B.
<div>
<img src="https://drive.google.com/uc?export=view&id=1VlbZ7vXGZi1meEa6YsjKMs-C9mnW6xdA" width="800"/>
</div>

<br><br>I will be experimenting with the MSHCNN approach, but for all four types of samples: Left-hand, Right-hand, Foot, and Tongue, and comparing its performance with current state-of-the-art models such as EEGNet(Lawhern et al., 2018), and ShallowConvNet(Schirrmeister et al., 2018)

**Intuition behind choosing MSHCNN**

I chose MSHCNN because it reported great results for the binary classification between left and right-hand motor imagery, and thought that its level of accuracy could be replicated, but for the complete classification, including tongue and feet

**Dataset**

http://moabb.neurotechx.com/docs/generated/moabb.datasets.BNCI2014_001.html#moabb.datasets.BNCI2014_001

The BNCI2014001 dataset contains EEG data from 9 different subjects. Each participant in the study were instructed to imagine performing specific motor tasks corresponding to different classes while their brain activity was recorded. 22 Ag/AgCl electrodes were used for that matter. The four motor imagery tasks are the following:

1. Image movement of the left hand (class 1)
2. Imagine movement of the right hand (class 2)
3. Imagine movement of both feet (class 3)
4. Imagine movement of the tongue (class 4)

Each subject participated in two sessions on two different days. Each session is comprised of 6 runs separated by short breaks. There are 12 trials for each of the four possible classes, for total of 48 trials in a singular run. Therefore, there are 288 trials per session.

During each trial, the subjects are sat in front of a computer screen. At the beginning of the trial, a fixation cross is displayed on the black screen accompanied by a short acoustic warning. After 2 seconds, the first cue in the form of an arrow points to one of the 4 imagery tasks. The cue remains on the screen for 1.25 seconds and the subjects are expected to perform the desired motor imagery task. The cue disappears after the 1.25 seconds and the cross is back on the screen, but the subjects should perform the motor imagery task for 4 seconds, until the cross is no longer displayed. It is important to mention that no feedback is provided during the experiment.


## Methodology

#### Normalization

I used several types of regularization techniques throughout the model. First, a sequence of batch normalization, dropout, and ReLU activation function is applied on the output of each convolution block in 1DCNN and 2DCNN

#### Loss Function

For the loss function, I decided to use the same as other models used for the same task: Negative Log Likelihood

$$Negative\space Log\space Likelihood \space Loss\space = -\frac{1}{m}\sum_{i=1}^{m}\sum_{j=1}^{N} y_{ij} \log(p_{ij})
$$

### MSHCNN: Motor Imagery EEG Decoding Based on Multi-Scale Hybrid Networks and Feature Enhancement

##### Architecture

Model Component Description:
1. **Data Input Block**: This is the initial stage where the EEG (Electroencephalography) signals are input into the system. The input data shape for this block is (B, N, T), where B represents the batch size, N represents the number of channels for EEG signals, and T represents the length of the EEG signal.

2. **One-Dimensional Multi-Scale Convolutional Neural Network (M1DCNN) Feature Extraction Block**: This block is responsible for extracting temporal features from the EEG signals using a one-dimensional convolutional neural network (1DCNN) on multiple scales. It consists of three 1DCNN blocks and feature splicing layers. The shades of colors in the 1DCNN block represent different convolution kernel sizes.
<div>
<img src="https://drive.google.com/uc?export=view&id=1GTTxk3fXzW7g6YLDDv2NHFJzXeogCJXb" width="600"/>
</div>
<div>
<img src="https://drive.google.com/uc?export=view&id=19COWaohJrEOQgtcHKpKfjdzQaI3Y2QRV" width="400"/>
</div>

3. **Two-Dimensional Multi-Scale Convolutional Neural Network (M2DCNN) Feature Extraction Block**: This block operates in parallel with the M1DCNN block to extract spatio-temporal features from the EEG signals using a two-dimensional convolutional neural network (2DCNN) on multiple scales. Similar to M1DCNN, it also has multiple layers for feature extraction, with the shades of colors representing different convolution kernel sizes.
<div>
<img src="https://drive.google.com/uc?export=view&id=1RBRfqyogWSlIqQb3tcazQuoDdwy8IwOm" width="600"/>
</div>
<div>
<img src="https://drive.google.com/uc?export=view&id=1uRPNCQ5QKnAgmd3VH4E2l088Zm_8Uz34" width="400"/>
</div>

4. **Feature Splicing Block**: This block is responsible for combining the features extracted from both the M1DCNN and M2DCNN blocks. It integrates the temporal and spatial features extracted by the two networks by concatenating over the time axis. Then, the spliced features are subjected to average pooling.
<div>
<img src="https://drive.google.com/uc?export=view&id=192PwJ4esKmY8m2vuNdpxoN4eKawegJEs" width="300"/>
</div>

5. **Feature Classification**: After feature extraction and splicing, the resulting features are fed into a classification model for further processing, such as identifying patterns or making predictions based on the EEG signals.
<div>
<img src="https://drive.google.com/uc?export=view&id=1FitQ9Mrv6Prb4yUk1TmaR3FqcBU2jMQ-" width="400"/>
</div>

**Important:** The passage also highlights that the optimal convolution kernel size for each subject may vary, indicating the importance of adaptability in the convolutional neural network architecture.

##### PyTorch Model

In [None]:
%%file /notebooks/benchmarks/benchmarks/MOABB/models/MSHCNN.py

"""MSHCNN from https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10036384.
Multi-Scale Hybrid convolutional neural networks proposed for a general decoding of single-trial EEG signals.

Authors
 * Salman Sami Hussain Ali, 2024
"""
import torch
import speechbrain as sb


class MSHCNN(torch.nn.Module):
    """MSHCNN.
        Multi-Scale Hybrid Networks
        Arguments
        ---------
        input_shape : tuple
            The shape of the input.
        one_d_kernel_sizes : list(int)
            Kernel sizes for the temporal convolutions in M1DCNN block
        two_d_kernel_sizes : list(int)
            Kernel sizes for the temporal convolutions in M2DCNN block
        postnet_poolsize : tuple
            Pool size of the average pooling after M1DCNN and M2DCNN blocks
        postnet_poolstride : int
            Number of kernels in the 2d spatial depthwise convolution.
        temporal_pool_size: tuple
            Pool size for M1DCNN and M2DCNN max pooling.
        temporal_pool_stride: tuple
            Pool stride for M1DCNN and M2DCNN max pooling.
        dropout: float
            Dropout probability.
        dense_n_neurons: int
            Number of output neurons.

        Example
        -------
        >>> inp_tensor = torch.rand([1, 200, 32, 1])
        >>> model = MSHCNN(input_shape=inp_tensor.shape)
        >>> output = model(inp_tensor)
        >>> output.shape
        torch.Size([1,4])
        """
    def __init__(self,
                 input_shape=None,
                 one_d_kernel_sizes=[60, 80],
                 two_d_kernel_sizes=[60, 80],
                 temporal_pool_size=(6, 1),
                 temporal_pool_stride=(6, 1),
                 postnet_poolsize=8,
                 postnet_poolstride=8,
                 dropout=0.25,
                 dense_n_neurons=4):
        super().__init__()

        C = input_shape[2]

        input_shape_squeezed = input_shape[0:3]

        for i in range(len(two_d_kernel_sizes)):
            two_d_kernel_sizes[i] = (two_d_kernel_sizes[i], 1)

        self.m1dcnn = M1DCNN(
            input_shape=input_shape_squeezed,
            layers_kernel_sizes=one_d_kernel_sizes,
            temporal_pool_size=temporal_pool_size,
            temporal_pool_stride=temporal_pool_stride,
            dropout=dropout)

        self.m2dcnn = M2DCNN(layers_kernel_sizes=two_d_kernel_sizes,
                             temporal_pool_size=temporal_pool_size,
                             spatial_kernelsize=(1, C),
                             temporal_pool_stride=temporal_pool_stride,
                             dropout=dropout)

        self.pool = sb.nnet.pooling.Pooling1d(
            pool_type='avg',
            kernel_size=postnet_poolsize,  # (1, kernel_avg_pool),
            stride=postnet_poolstride,  # (1, stride_avg_pool),
            pool_axis=2,
        )

        out_m2d = self.m2dcnn(
            torch.ones((1,) + tuple(input_shape[1:-1]) + (1,))
        ).squeeze(1)

        out_m1d = self.m1dcnn(torch.ones((1,) + tuple(input_shape[1:-1]) + (1,)))

        out = torch.cat((out_m1d, out_m2d), 1)

        dense_input_size = self._num_flat_features(self.pool(out.squeeze(1)))

        self.classification = torch.nn.Sequential(
            torch.nn.Flatten(),
            torch.nn.Linear(dense_input_size, 100),
            torch.nn.Linear(100, dense_n_neurons),
            torch.nn.Softmax(dim=1)
        )

    def forward(self, x):
        """Returns the output of the model.

        Arguments
        ---------
        x : torch.Tensor (batch, time, EEG channel, channel)
            Input to convolve. 4d tensors are expected.
        """
        x = x.to("cuda")
        m1dcnn = self.m1dcnn(x) # Batch, T_m, Channel(10)

        m2dcnn = self.m2dcnn(x)# Batch, 1, T_n, Channel(10)
        m2dcnn = m2dcnn.squeeze(1)# Batch, T_n, Channel(10)

        res = torch.cat([m1dcnn, m2dcnn], dim=1)# Batch, T_m + T_n, Channel(10)
        res = self.pool(res)

        return self.classification(res)

    def _num_flat_features(self, x):
        """Returns the number of flattened features from a tensor.

        Arguments
        ---------
        x : torch.Tensor
            Input feature map.
        """
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


class M1DCNN(torch.nn.Module):
    """
            M1DCNN
            Arguments
            ---------
            input_shape : tuple
                The shape of the input.
            layers_kernel_sizes : list(int)
                Kernel sizes for the temporal convolutions in M1DCNN block
            temporal_pool_size: int
                Pool size for 1DCNN average pooling.
            temporal_pool_stride: int
                Pool stride for 1DCNN average pooling.
            dropout: float
                Dropout probability.

            Example
            -------
            >>> inp_tensor = torch.rand([1, 200, 32, 1])
            >>> model = MSHCNN(input_shape=inp_tensor.shape, layers_kernel_sizes=[5,10,20], temporal_pool_size=6, temporal_pool_stride=2, dropout=0)
            >>> output = model(inp_tensor)
            >>> output.shape
            torch.Size([1,150, 10])
            """
    def __init__(self, layers_kernel_sizes, temporal_pool_size, dropout, temporal_pool_stride, input_shape):
        super().__init__()

        self.layers = torch.nn.ModuleList()

        for i in range(len(layers_kernel_sizes)):
            layer = OneDCNN(
                input_shape=input_shape,
                        cnn_temporal_kernelsize=layers_kernel_sizes[i],
                        dropout=dropout,
                        temporal_pool_size=temporal_pool_size,
                        temporal_pool_stride=temporal_pool_stride)
            self.layers.append(layer)

    def forward(self, x):
        """Returns the output of the M1DCNN block.
        It returns the concatenation of all the 1DCNN blocks within over the time axis

        Arguments
        ---------
        x : torch.Tensor (batch, time, EEG channel, channel)
            Input to convolve. 4d tensors are expected.
        """
        concatenated_output = None

        for layer in self.layers:
            layer_output = layer(x)
            if concatenated_output is None:
                concatenated_output = layer_output
            else:
                concatenated_output = torch.cat((concatenated_output, layer_output), dim=1)
        return concatenated_output # Batch, T_n, Channel(10)


class OneDCNN(torch.nn.Module):
    """OneDCNN

                Arguments
                ---------
                input_shape : tuple
                    The shape of the input.
                cnn_temporal_kernelsize : int
                    Kernel size for the temporal convolution in 1DCNN block
                temporal_pool_size: int
                    Pool size for 1DCNN average pooling.
                temporal_pool_stride: int
                    Pool stride for 1DCNN average pooling.
                dropout: float
                    Dropout probability.

                Example
                -------
                >>> inp_tensor = torch.rand([1, 200, 32, 1])
                >>> model = MSHCNN(input_shape=inp_tensor.shape, layers_kernel_sizes=[5,10,20], temporal_pool_size=6, temporal_pool_stride=2, dropout=0)
                >>> output = model(inp_tensor)
                >>> output.shape
                torch.Size([1,150, 10])
                """
    def __init__(self,
                 cnn_temporal_kernelsize,
                 dropout,
                 temporal_pool_size,
                 temporal_pool_stride,
                input_shape):
        super().__init__()
        self.conv1 = sb.nnet.CNN.Conv1d(out_channels=10,
                                     kernel_size=cnn_temporal_kernelsize,
                                     stride=3,
                                     padding="valid",
                                     input_shape=input_shape)

        self.bn1 = sb.nnet.normalization.BatchNorm1d(
            input_size=10, momentum=0.01, affine=True,
        )

        self.dropout1 = torch.nn.Dropout(dropout)

        self.conv2 = torch.nn.Conv1d(10, 10, 3, stride=1, padding=0)

        self.bn2 = sb.nnet.normalization.BatchNorm1d(
            input_size=10, momentum=0.01, affine=True,
        )

        self.dropout2 = torch.nn.Dropout(dropout)

        self.pooling = sb.nnet.pooling.Pooling1d(
            pool_type='max',
            kernel_size=6,
            stride=6,
            pool_axis=2,
        )

        self.activation = torch.nn.ReLU()

    def forward(self, x):
        """Returns the output of the 1DCNN block.

        Arguments
        ---------
        x : torch.Tensor (batch, time, EEG channel, channel)
            Input to convolve. 4d tensors are expected.
        """
        # You have a tensor with shape (B, Time, Channel, 1)
        # Squeeze the tensor to remove the dimension of size 1
        x_reshaped = x.squeeze(-1)

        out = self.activation(self.dropout1(self.bn1(self.conv1(x_reshaped)))).transpose(1,2)

        out = self.activation(self.dropout2(self.bn2(self.conv2(out).transpose(1,2)))).transpose(1,2)

        out = self.pooling(out) # Batch, Channel(10), Time
        return out.transpose(1,2) # Batch, Channel(10), Time


class M2DCNN(torch.nn.Module):
    """
                M2DCNN
                Arguments
                ---------
                input_shape : tuple
                    The shape of the input.
                layers_kernel_sizes : list(int)
                    Kernel sizes for the temporal convolutions in M2DCNN block
                temporal_pool_size: int
                    Pool size for 2DCNN average pooling.
                temporal_pool_stride: int
                    Pool stride for 2DCNN average pooling.
                dropout: float
                    Dropout probability.

                Example
                -------
                >>> inp_tensor = torch.rand([1, 200, 32, 1])
                >>> model = MSHCNN(input_shape=inp_tensor.shape, layers_kernel_sizes=[5,10,20], temporal_pool_size=6, temporal_pool_stride=2, dropout=0)
                >>> output = model(inp_tensor)
                >>> output.shape
                torch.Size([1,1, 150, 10])
                """
    def __init__(self, layers_kernel_sizes,
                 temporal_pool_size,
                 temporal_pool_stride,
                 dropout,
                 spatial_kernelsize=(1, 10),
                 ):
        super().__init__()

        self.layers = torch.nn.ModuleList()

        for i in range(len(layers_kernel_sizes)):
            layer = TwoDCNN(
                cnn_temporal_kernelsize=layers_kernel_sizes[i],
                dropout=dropout,
                temporal_pool_size=temporal_pool_size,
                cnn_spatial_kernelsize=spatial_kernelsize,
                temporal_pool_stride=temporal_pool_stride)
            self.layers.append(layer)

    def forward(self, x):
        """Returns the output of the M2DCNN.
        It returns the concatenation of all the 2DCNN blocks within over the time axis

        Arguments
        ---------
        x : torch.Tensor (batch, time, EEG channel, channel)
            Input to convolve. 4d tensors are expected.
        """
        concatenated_output = None
        for layer in self.layers:
            layer_output = layer(x) # Batch, 1, Time
            layer_output = layer_output.transpose(1,2)# Batch, Time, 1
            if concatenated_output is None:
                concatenated_output = layer_output # Batch, 1, T_n, Channel(10)
            else:
                concatenated_output = torch.cat((concatenated_output, layer_output), dim=2)
        return concatenated_output


class TwoDCNN(torch.nn.Module):
    """TwoDCNN

                    Arguments
                    ---------
                    cnn_temporal_kernelsize : int
                        Kernel size for the temporal convolution in 2DCNN block
                    cnn_spatial_kernelsize: tuple
                        Kernel size of the 2d spatial convolution.
                    temporal_pool_size: int
                        Pool size for 2DCNN average pooling.
                    temporal_pool_stride: int
                        Pool stride for 2DCNN average pooling.
                    dropout: float
                        Dropout probability.

                    Example
                    -------
                    >>> inp_tensor = torch.rand([1, 200, 32, 1])
                    >>> model = MSHCNN(input_shape=inp_tensor.shape, layers_kernel_sizes=[5,10,20], temporal_pool_size=6, temporal_pool_stride=2, dropout=0)
                    >>> output = model(inp_tensor)
                    >>> output.shape
                    torch.Size([1,1,150, 10])
                    """
    def __init__(self,
                 cnn_temporal_kernelsize,
                 cnn_spatial_kernelsize,
                 dropout,
                 temporal_pool_size,
                 temporal_pool_stride):
        super().__init__()

        # CONVOLUTIONAL MODULE
        self.conv_module = torch.nn.Sequential()
        # Temporal convolution
        self.conv_module.add_module(
            "conv_0",
            sb.nnet.CNN.Conv2d(
                in_channels=1,
                out_channels=10,
                kernel_size=cnn_temporal_kernelsize,
                padding="valid",
                bias=True,
                swap=True,
            ),
        )

        self.conv_module.add_module(
            "bnorm_1",
            sb.nnet.normalization.BatchNorm2d(
                input_size=10, momentum=0.1, affine=True,
            ),
        )

        self.conv_module.add_module(
            "dropout_1", torch.nn.Dropout(p=dropout),
        )

        self.conv_module.add_module(
            "relu_1", torch.nn.ReLU(),
        )

        # Spatial convolution
        self.conv_module.add_module(
            "conv_1",
            sb.nnet.CNN.Conv2d(
                in_channels=10,
                out_channels=10,
                kernel_size=cnn_spatial_kernelsize,
                padding="valid",
                bias=False,
                swap=True,
            ),
        )
        self.conv_module.add_module(
            "bnorm_2",
            sb.nnet.normalization.BatchNorm2d(
                input_size=10, momentum=0.1, affine=True,
            ),
        )

        self.conv_module.add_module(
            "dropout_2", torch.nn.Dropout(p=dropout),
        )

        self.conv_module.add_module(
            "relu_2", torch.nn.ReLU(),
        )

        self.conv_module.add_module(
            "pool_1",
            sb.nnet.pooling.Pooling2d(
                pool_type='avg',
                kernel_size=temporal_pool_size,
                stride=temporal_pool_stride,
                pool_axis=[1, 2],
            ),
        )


    def forward(self, x):
        """Returns the output of the 2DCNN block.

        Arguments
        ---------
        x : torch.Tensor (batch, time, EEG channel, channel)
            Input to convolve. 4d tensors are expected.
        """
        result = self.conv_module(x) # Batch, 1, T_n, Channel(10)
        return result


Overwriting /notebooks/benchmarks/benchmarks/MOABB/models/MSHCNN.py


#### Hyperparameters

##### First Experiment

In [None]:
%%file /notebooks/benchmarks/benchmarks/MOABB/hparams/MotorImagery/BNCI2014001/MSHCNN_first.yaml

seed: 1234
__set_torchseed: !apply:torch.manual_seed [!ref <seed>]

# DIRECTORIES
data_folder: !PLACEHOLDER  #'/path/to/dataset'. The dataset will be automatically downloaded in this folder
cached_data_folder: !PLACEHOLDER #'path/to/pickled/dataset'
output_folder: !PLACEHOLDER #'path/to/results'

# DATASET HPARS
# Defining the MOABB dataset.
dataset: !new:moabb.datasets.BNCI2014001
save_prepared_dataset: True # set to True if you want to save the prepared dataset as a pkl file to load and use afterwards
data_iterator_name: !PLACEHOLDER
target_subject_idx: !PLACEHOLDER
target_session_idx: !PLACEHOLDER
events_to_load: null # all events will be loaded
original_sample_rate: 250 # Original sampling rate provided by dataset authors
sample_rate: 125 # Target sampling rate (Hz)
# band-pass filtering cut-off frequencies
fmin: 0.11 # @orion_step1: --fmin~"uniform(0.1, 5, precision=2)"
fmax: 50.0 # @orion_step1: --fmax~"uniform(20.0, 50.0, precision=3)"
n_classes: 4
# tmin, tmax respect to stimulus onset that define the interval attribute of the dataset class
# trial begins (0 s), cue (2 s, 1.25 s long); each trial is 6 s long
# dataset interval starts from 2
# -->tmin tmax are referred to this start value (e.g., tmin=0.5 corresponds to 2.5 s)
tmin: 0.
tmax: 4.0 # @orion_step1: --tmax~"uniform(1.0, 4.0, precision=2)"
# number of steps used when selecting adjacent channels from a seed channel (default at Cz)
n_steps_channel_selection: 2 # @orion_step1: --n_steps_channel_selection~"uniform(1, 3,discrete=True)"
T: !apply:math.ceil
    - !ref <sample_rate> * (<tmax> - <tmin>)
C: 22
# We here specify how to perfom test:
# - If test_with: 'last' we perform test with the latest model.
# - if test_with: 'best, we perform test with the best model (according to the metric specified in test_key)
# The variable avg_models can be used to average the parameters of the last (or best) N saved models before testing.
# This can have a regularization effect. If avg_models: 1, the last (or best) model is used directly.
test_with: 'last' # 'last' or 'best'
test_key: "acc" # Possible opts: "loss", "f1", "auc", "acc"

# METRICS
f1: !name:sklearn.metrics.f1_score
    average: 'macro'
acc: !name:sklearn.metrics.balanced_accuracy_score
cm: !name:sklearn.metrics.confusion_matrix
metrics:
    f1: !ref <f1>
    acc: !ref <acc>
    cm: !ref <cm>
# TRAINING HPARS
n_train_examples: 100  # it will be replaced in the train script
# checkpoints to average
avg_models: 1 # @orion_step1: --avg_models~"uniform(1, 15,discrete=True)"
number_of_epochs: 150 # @orion_step1: --number_of_epochs~"uniform(250, 1000, discrete=True)"
lr: 0.001 # @orion_step1: --lr~"choices([0.01, 0.005, 0.001, 0.0005, 0.0001])"
# Learning rate scheduling (cyclic learning rate is used here)
max_lr: !ref <lr> # Upper bound of the cycle (max value of the lr)
base_lr: 0.001 # Lower bound in the cycle (min value of the lr)
step_size_multiplier: 5 #from 2 to 8
step_size: !apply:round
    - !ref <step_size_multiplier> * <n_train_examples> / <batch_size>
lr_annealing: !new:speechbrain.nnet.schedulers.CyclicLRScheduler
    base_lr: !ref <base_lr>
    max_lr: !ref <max_lr>
    step_size: !ref <step_size>
label_smoothing: 0.0
loss: !name:speechbrain.nnet.losses.nll_loss
    label_smoothing: !ref <label_smoothing>
optimizer: !name:torch.optim.Adam
    lr: !ref <lr>
epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter  # epoch counter
    limit: !ref <number_of_epochs>
#batch_size_exponent: 6 # @orion_step1: --batch_size_exponent~"uniform(4, 6,discrete=True)"
batch_size: 20
valid_ratio: 0.2

# DATA AUGMENTATION
# cutcat (disabled when min_num_segments=max_num_segments=1)
max_num_segments: 3 # @orion_step2: --max_num_segments~"uniform(2, 6, discrete=True)"
cutcat: !new:speechbrain.augment.time_domain.CutCat
    min_num_segments: 2
    max_num_segments: !ref <max_num_segments>
# random amplitude gain between 0.5-1.5 uV (disabled when amp_delta=0.)
amp_delta: 0.008079 # @orion_step2: --amp_delta~"uniform(0.0, 0.5)"
rand_amp: !new:speechbrain.augment.time_domain.RandAmp
    amp_low: !ref 1 - <amp_delta>
    amp_high: !ref 1 + <amp_delta>
# random shifts between -300 ms to 300 ms (disabled when shift_delta=0.)
shift_delta_: 25 # orion_step2: --shift_delta_~"uniform(0, 25, discrete=True)"
shift_delta: !ref 1e-2 * <shift_delta_> # 0.250 # 0.-0.25 with steps of 0.01
min_shift: !apply:math.floor
    - !ref 0 - <sample_rate> * <shift_delta>
max_shift: !apply:math.floor
    - !ref 0 + <sample_rate> * <shift_delta>
time_shift: !new:speechbrain.augment.freq_domain.RandomShift
    min_shift: !ref <min_shift>
    max_shift: !ref <max_shift>
    dim: 1
# injection of gaussian white noise
snr_white_low: 15.0 # @orion_step2: --snr_white_low~"uniform(0.0, 15, precision=2)"
snr_white_delta: 5.49 # @orion_step2: --snr_white_delta~"uniform(5.0, 20.0, precision=3)"
snr_white_high: !ref <snr_white_low> + <snr_white_delta>
add_noise_white: !new:speechbrain.augment.time_domain.AddNoise
    snr_low: !ref <snr_white_low>
    snr_high: !ref <snr_white_high>

repeat_augment: 1 # @orion_step1: --repeat_augment 0
augment: !new:speechbrain.augment.augmenter.Augmenter
    parallel_augment: True
    concat_original: True
    parallel_augment_fixed_bs: True
    repeat_augment: !ref <repeat_augment>
    shuffle_augmentations: True
    min_augmentations: 4
    max_augmentations: 4
    augmentations: [
        !ref <cutcat>,
        !ref <rand_amp>,
        !ref <time_shift>,
        !ref <add_noise_white>]

# DATA NORMALIZATION
dims_to_normalize: 1 # 1 (time) or 2 (EEG channels)
normalize: !name:speechbrain.processing.signal_processing.mean_std_norm
    dims: !ref <dims_to_normalize>
# MODEL
input_shape: [null, !ref <T>, !ref <C>, null]
cnn_temporal_kernels: 54 # @orion_step1: --cnn_temporal_kernels~"uniform(4, 64,discrete=True)"
cnn_spatial_kernels: !ref <cnn_temporal_kernels>
cnn_temporal_kernelsize: 6 # @orion_step1: --cnn_temporal_kernelsize~"uniform(5, 62,discrete=True)"
cnn_temporal_pool_stride: 6

# pool size / stride from 4/125 ms to 40/125 ms = circa 30 ms
#cnn_poolsize: !ref <cnn_poolsize_> * 4 # same resolution as for EEGNet research space
#cnn_poolstride: !ref <cnn_poolstride_> * 4 # same resolution as for EEGNet research space
dropout: 0.25 # @orion_step1: --dropout~"uniform(0.0, 0.5)"
one_d_cnn_temporal_kernels: [40,70,85]
two_d_cnn_temporal_kernels: [45,60,90]

postnet_poolsize: 8
postnet_poolstride: 8

model: !new:models.MSHCNN.MSHCNN
    input_shape: !ref <input_shape>
    one_d_kernel_sizes: !ref <one_d_cnn_temporal_kernels>
    two_d_kernel_sizes: !ref <two_d_cnn_temporal_kernels>
    temporal_pool_size: [!ref <cnn_temporal_kernelsize>, 1]
    temporal_pool_stride: [!ref <cnn_temporal_pool_stride>, 1]
    postnet_poolsize: !ref <postnet_poolsize>
    postnet_poolstride: !ref <postnet_poolstride>
    dropout: !ref <dropout>
    dense_n_neurons: !ref <n_classes>


Overwriting /notebooks/benchmarks/benchmarks/MOABB/hparams/MotorImagery/BNCI2014001/MSHCNN_first.yaml


**Training Script**

In [None]:
%cd /notebooks/benchmarks/benchmarks/MOABB
!./run_experiments.sh --hparams hparams/MotorImagery/BNCI2014001/MSHCNN_first.yaml --data_folder eeg_data --output_folder results/MotorImagery/BNCI2014001/MSHCNN-first --nsbj 9 --nsess 2 --nruns 10 --train_mode leave-one-session-out --device=cuda

##### Using the Kernel Sizes defined by Jia et al[4]

In [None]:
%%file /notebooks/benchmarks/benchmarks/MOABB/hparams/MotorImagery/BNCI2014001/MSHCNN_first.yaml

seed: 1234
__set_torchseed: !apply:torch.manual_seed [!ref <seed>]

# DIRECTORIES
data_folder: !PLACEHOLDER  #'/path/to/dataset'. The dataset will be automatically downloaded in this folder
cached_data_folder: !PLACEHOLDER #'path/to/pickled/dataset'
output_folder: !PLACEHOLDER #'path/to/results'

# DATASET HPARS
# Defining the MOABB dataset.
dataset: !new:moabb.datasets.BNCI2014001
save_prepared_dataset: True # set to True if you want to save the prepared dataset as a pkl file to load and use afterwards
data_iterator_name: !PLACEHOLDER
target_subject_idx: !PLACEHOLDER
target_session_idx: !PLACEHOLDER
events_to_load: null # all events will be loaded
original_sample_rate: 250 # Original sampling rate provided by dataset authors
sample_rate: 125 # Target sampling rate (Hz)
# band-pass filtering cut-off frequencies
fmin: 0.11 # @orion_step1: --fmin~"uniform(0.1, 5, precision=2)"
fmax: 50.0 # @orion_step1: --fmax~"uniform(20.0, 50.0, precision=3)"
n_classes: 4
# tmin, tmax respect to stimulus onset that define the interval attribute of the dataset class
# trial begins (0 s), cue (2 s, 1.25 s long); each trial is 6 s long
# dataset interval starts from 2
# -->tmin tmax are referred to this start value (e.g., tmin=0.5 corresponds to 2.5 s)
tmin: 0.
tmax: 4.0 # @orion_step1: --tmax~"uniform(1.0, 4.0, precision=2)"
# number of steps used when selecting adjacent channels from a seed channel (default at Cz)
n_steps_channel_selection: 2 # @orion_step1: --n_steps_channel_selection~"uniform(1, 3,discrete=True)"
T: !apply:math.ceil
    - !ref <sample_rate> * (<tmax> - <tmin>)
C: 22
# We here specify how to perfom test:
# - If test_with: 'last' we perform test with the latest model.
# - if test_with: 'best, we perform test with the best model (according to the metric specified in test_key)
# The variable avg_models can be used to average the parameters of the last (or best) N saved models before testing.
# This can have a regularization effect. If avg_models: 1, the last (or best) model is used directly.
test_with: 'last' # 'last' or 'best'
test_key: "acc" # Possible opts: "loss", "f1", "auc", "acc"

# METRICS
f1: !name:sklearn.metrics.f1_score
    average: 'macro'
acc: !name:sklearn.metrics.balanced_accuracy_score
cm: !name:sklearn.metrics.confusion_matrix
metrics:
    f1: !ref <f1>
    acc: !ref <acc>
    cm: !ref <cm>
# TRAINING HPARS
n_train_examples: 100  # it will be replaced in the train script
# checkpoints to average
avg_models: 1 # @orion_step1: --avg_models~"uniform(1, 15,discrete=True)"
number_of_epochs: 100 # @orion_step1: --number_of_epochs~"uniform(250, 1000, discrete=True)"
lr: 0.001 # @orion_step1: --lr~"choices([0.01, 0.005, 0.001, 0.0005, 0.0001])"
# Learning rate scheduling (cyclic learning rate is used here)
max_lr: !ref <lr> # Upper bound of the cycle (max value of the lr)
base_lr: 0.001 # Lower bound in the cycle (min value of the lr)
step_size_multiplier: 5 #from 2 to 8
step_size: !apply:round
    - !ref <step_size_multiplier> * <n_train_examples> / <batch_size>
lr_annealing: !new:speechbrain.nnet.schedulers.CyclicLRScheduler
    base_lr: !ref <base_lr>
    max_lr: !ref <max_lr>
    step_size: !ref <step_size>
label_smoothing: 0.0
loss: !name:speechbrain.nnet.losses.nll_loss
    label_smoothing: !ref <label_smoothing>
optimizer: !name:torch.optim.Adam
    lr: !ref <lr>
epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter  # epoch counter
    limit: !ref <number_of_epochs>
batch_size_exponent: 4 # @orion_step1: --batch_size_exponent~"uniform(4, 6,discrete=True)"
batch_size: !ref 2 ** <batch_size_exponent>
valid_ratio: 0.2

# DATA AUGMENTATION
# cutcat (disabled when min_num_segments=max_num_segments=1)
max_num_segments: 3 # @orion_step2: --max_num_segments~"uniform(2, 6, discrete=True)"
cutcat: !new:speechbrain.augment.time_domain.CutCat
    min_num_segments: 2
    max_num_segments: !ref <max_num_segments>
# random amplitude gain between 0.5-1.5 uV (disabled when amp_delta=0.)
amp_delta: 0.008079 # @orion_step2: --amp_delta~"uniform(0.0, 0.5)"
rand_amp: !new:speechbrain.augment.time_domain.RandAmp
    amp_low: !ref 1 - <amp_delta>
    amp_high: !ref 1 + <amp_delta>
# random shifts between -300 ms to 300 ms (disabled when shift_delta=0.)
shift_delta_: 25 # orion_step2: --shift_delta_~"uniform(0, 25, discrete=True)"
shift_delta: !ref 1e-2 * <shift_delta_> # 0.250 # 0.-0.25 with steps of 0.01
min_shift: !apply:math.floor
    - !ref 0 - <sample_rate> * <shift_delta>
max_shift: !apply:math.floor
    - !ref 0 + <sample_rate> * <shift_delta>
time_shift: !new:speechbrain.augment.freq_domain.RandomShift
    min_shift: !ref <min_shift>
    max_shift: !ref <max_shift>
    dim: 1
# injection of gaussian white noise
snr_white_low: 15.0 # @orion_step2: --snr_white_low~"uniform(0.0, 15, precision=2)"
snr_white_delta: 5.49 # @orion_step2: --snr_white_delta~"uniform(5.0, 20.0, precision=3)"
snr_white_high: !ref <snr_white_low> + <snr_white_delta>
add_noise_white: !new:speechbrain.augment.time_domain.AddNoise
    snr_low: !ref <snr_white_low>
    snr_high: !ref <snr_white_high>

repeat_augment: 1 # @orion_step1: --repeat_augment 0
augment: !new:speechbrain.augment.augmenter.Augmenter
    parallel_augment: True
    concat_original: True
    parallel_augment_fixed_bs: True
    repeat_augment: !ref <repeat_augment>
    shuffle_augmentations: True
    min_augmentations: 4
    max_augmentations: 4
    augmentations: [
        !ref <cutcat>,
        !ref <rand_amp>,
        !ref <time_shift>,
        !ref <add_noise_white>]

# DATA NORMALIZATION
dims_to_normalize: 1 # 1 (time) or 2 (EEG channels)
normalize: !name:speechbrain.processing.signal_processing.mean_std_norm
    dims: !ref <dims_to_normalize>
# MODEL
input_shape: [null, !ref <T>, !ref <C>, null]
cnn_temporal_kernelsize: 6 # @orion_step1: --cnn_temporal_kernelsize~"uniform(5, 62,discrete=True)"
cnn_temporal_pool_stride: 6

# pool size / stride from 4/125 ms to 40/125 ms = circa 30 ms
#cnn_poolsize: !ref <cnn_poolsize_> * 4 # same resolution as for EEGNet research space
#cnn_poolstride: !ref <cnn_poolstride_> * 4 # same resolution as for EEGNet research space
dropout: 0.25 # @orion_step1: --dropout~"uniform(0.0, 0.5)"
one_d_cnn_temporal_kernels: [15,85]
two_d_cnn_temporal_kernels: [45,105]

postnet_poolsize: 8
postnet_poolstride: 8

model: !new:models.MSHCNN.MSHCNN
    input_shape: !ref <input_shape>
    one_d_kernel_sizes: !ref <one_d_cnn_temporal_kernels>
    two_d_kernel_sizes: !ref <two_d_cnn_temporal_kernels>
    temporal_pool_size: [!ref <cnn_temporal_kernelsize>, 1]
    temporal_pool_stride: [!ref <cnn_temporal_pool_stride>, 1]
    postnet_poolsize: !ref <postnet_poolsize>
    postnet_poolstride: !ref <postnet_poolstride>
    dropout: !ref <dropout>
    dense_n_neurons: !ref <n_classes>


Overwriting /notebooks/benchmarks/benchmarks/MOABB/hparams/MotorImagery/BNCI2014001/MSHCNN_first.yaml


**Training Script**

In [None]:
%cd /notebooks/benchmarks/benchmarks/MOABB
!./run_experiments.sh --hparams hparams/MotorImagery/BNCI2014001/MSHCNN_first.yaml --data_folder eeg_data --output_folder results/MotorImagery/BNCI2014001/MSHCNN-first --nsbj 9 --nsess 2 --nruns 10 --train_mode leave-one-session-out --device=cuda

**Hyperparameter Tuning:**

In [None]:
!./run_hparam_optimization.sh --exp_name 'MSHCNN_BNCI2014001_hopt_' \
                             --output_folder results/MotorImagery/BNCI2014001/MSHCNN/hopt \
                             --data_folder eeg_data/ \
                             --hparams hparams/MotorImagery/BNCI2014001/MSHCNN_first.yaml \
                             --nsbj 9 --nsess 2 \
                             --nsbj_hpsearch 9 --nsess_hpsearch 2 \
                             --nruns 1 \
                             --nruns_eval 10 \
                             --eval_metric acc \
                             --train_mode leave-one-session-out \
                             --exp_max_trials 5

#### HSHCNN Results

After a lot of experimentation, I came to inconclusive results with MSHCNN. It is highly dependent on the subjects and the temporal convolution sizes. There were experiments where the the validation and test accuracy did not exceed 30%, but there were also experiments where the valdiation accuracy reached 80%. The model is extremely sensitive to kernel sizes, and will not converge for many kernel sizes and kernel size combinations. The following screenshot is the result of the hyperparameter tuning done. It should be noted that this accuracy is about half of what was reported in the MSHCNN paper by Tang X;Yang et Al. and this leads me to conclude that the model proposed by them is suitable only for binary classification, not 4 as is tasked in this project. This is because the paper's results are tested on right-hand and left-hand labels only, not including foot and tongue.

<div>
<img src="https://drive.google.com/uc?export=view&id=1LyfJsFA-Wp7CCJT44TuXdsMemxiZXRak" width="800"/>
</div>


### M2DCNN Classification

After my experimentation with MSHCNNs, I decided to see what the effect of using strictly M1DCNN and M2DCNN blocks for classification would be like. M2DCNN is very similar to ShallowConvNet in structure, except that it is missing the log and square functions before the classification. So, I decided to add them to M2DCNN and experiment with what is essentially ShallowConvNet, but with different temporal kernel sizes concatenated together.
<br>

#### Model

In [None]:
%%file /notebooks/benchmarks/benchmarks/MOABB/models/M2DCNNClassifier.py

"""
Authors
 * Salman Sami Hussain Ali, 2024
"""

import torch
import speechbrain as sb

class M2DCNNClassifier(torch.nn.Module):
    """M2DCNN Classifer Model.
        Experimentation of ShallowConvNet with different temporal kernel sizes concatenated together
        Arguments
        ---------
        input_shape : tuple
            The shape of the input.
        one_d_kernel_sizes : list(int)
            Kernel sizes for the temporal convolutions in M1DCNN block
        two_d_kernel_sizes : list(int)
            Kernel sizes for the temporal convolutions in M2DCNN block
        postnet_poolsize : tuple
            Pool size of the average pooling after M1DCNN and M2DCNN blocks
        postnet_poolstride : int
            Number of kernels in the 2d spatial depthwise convolution.
        temporal_pool_size: tuple
            Pool size for M1DCNN and M2DCNN max pooling.
        temporal_pool_stride: tuple
            Pool stride for M1DCNN and M2DCNN max pooling.
        dropout: float
            Dropout probability.
        dense_n_neurons: int
            Number of output neurons.

        Example
        -------
        >>> inp_tensor = torch.rand([1, 200, 32, 1])
        >>> model = MSHCNN(input_shape=inp_tensor.shape)
        >>> output = model(inp_tensor)
        >>> output.shape
        torch.Size([1,4])
        """
    def __init__(self,
                 input_shape=None,
                 temporal_kernel_sizes=[60, 80],
                 temporal_pool_size=(6, 1),
                 temporal_pool_stride=(6, 1),
                 postnet_poolsize=8,
                 postnet_poolstride=8,
                 dropout=0.25,
                 dense_n_neurons=4):
        super().__init__()

        C = input_shape[2]

        for i in range(len(temporal_kernel_sizes)):
            temporal_kernel_sizes[i] = (temporal_kernel_sizes[i], 1)

        self.m2dcnn = Shallow_M2DCNN(layers_kernel_sizes=temporal_kernel_sizes,
                             temporal_pool_size=temporal_pool_size,
                             spatial_kernelsize=(1, C),
                             temporal_pool_stride=temporal_pool_stride,
                             dropout=dropout)

        self.pool = sb.nnet.pooling.Pooling1d(
            pool_type='avg',
            kernel_size=postnet_poolsize,  # (1, kernel_avg_pool),
            stride=postnet_poolstride,  # (1, stride_avg_pool),
            pool_axis=2,
        )

        out_m2d = self.m2dcnn(
            torch.ones((1,) + tuple(input_shape[1:-1]) + (1,))
        ).squeeze(1)


        dense_input_size = self._num_flat_features(self.pool(out_m2d.squeeze(1)))

        self.classification = torch.nn.Sequential(
            torch.nn.Flatten(),
            torch.nn.Linear(dense_input_size, 100),
            torch.nn.Linear(100, dense_n_neurons),
            torch.nn.Softmax(dim=1)
        )

    def forward(self, x):
        """Returns the output of the model.

        Arguments
        ---------
        x : torch.Tensor (batch, time, EEG channel, channel)
            Input to convolve. 4d tensors are expected.
        """
        x = x.to("cuda")

        m2dcnn = self.m2dcnn(x)# Batch, 1, T_n, Channel(10)
        m2dcnn = m2dcnn.squeeze(1)# Batch, T_n, Channel(10)

        res = self.pool(m2dcnn)

        return self.classification(res)

    def _num_flat_features(self, x):
        """Returns the number of flattened features from a tensor.

        Arguments
        ---------
        x : torch.Tensor
            Input feature map.
        """
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


class Shallow_M2DCNN(torch.nn.Module):
    """
                M2DCNN altered for ShallowConvNet
                Arguments
                ---------
                input_shape : tuple
                    The shape of the input.
                layers_kernel_sizes : list(int)
                    Kernel sizes for the temporal convolutions in M2DCNN block
                temporal_pool_size: int
                    Pool size for 2DCNN average pooling.
                temporal_pool_stride: int
                    Pool stride for 2DCNN average pooling.
                dropout: float
                    Dropout probability.

                Example
                -------
                >>> inp_tensor = torch.rand([1, 200, 32, 1])
                >>> model = MSHCNN(input_shape=inp_tensor.shape, layers_kernel_sizes=[5,10,20], temporal_pool_size=6, temporal_pool_stride=2, dropout=0)
                >>> output = model(inp_tensor)
                >>> output.shape
                torch.Size([1,1, 150, 10])
                """
    def __init__(self, layers_kernel_sizes,
                 temporal_pool_size,
                 temporal_pool_stride,
                 dropout,
                 spatial_kernelsize=(1, 10),
                 ):
        super().__init__()

        self.layers = torch.nn.ModuleList()

        for i in range(len(layers_kernel_sizes)):
            layer = Shallow_TwoDCNN(
                cnn_temporal_kernelsize=layers_kernel_sizes[i],
                dropout=dropout,
                temporal_pool_size=temporal_pool_size,
                cnn_spatial_kernelsize=spatial_kernelsize,
                temporal_pool_stride=temporal_pool_stride)
            self.layers.append(layer)

    def forward(self, x):
        """Returns the output of the M2DCNN.
        It returns the concatenation of all the 2DCNN blocks within over the time axis

        Arguments
        ---------
        x : torch.Tensor (batch, time, EEG channel, channel)
            Input to convolve. 4d tensors are expected.
        """
        concatenated_output = None
        for layer in self.layers:
            layer_output = layer(x) # Batch, 1, Time
            layer_output = layer_output.transpose(1,2)# Batch, Time, 1
            if concatenated_output is None:
                concatenated_output = layer_output # Batch, 1, T_n, Channel(10)
            else:
                concatenated_output = torch.cat((concatenated_output, layer_output), dim=2)
        return concatenated_output


class Shallow_TwoDCNN(torch.nn.Module):
    """TwoDCNN altered for ShallowConvNet

                    Arguments
                    ---------
                    cnn_temporal_kernelsize : int
                        Kernel size for the temporal convolution in 2DCNN block
                    cnn_spatial_kernelsize: tuple
                        Kernel size of the 2d spatial convolution.
                    temporal_pool_size: int
                        Pool size for 2DCNN average pooling.
                    temporal_pool_stride: int
                        Pool stride for 2DCNN average pooling.
                    dropout: float
                        Dropout probability.

                    Example
                    -------
                    >>> inp_tensor = torch.rand([1, 200, 32, 1])
                    >>> model = MSHCNN(input_shape=inp_tensor.shape, layers_kernel_sizes=[5,10,20], temporal_pool_size=6, temporal_pool_stride=2, dropout=0)
                    >>> output = model(inp_tensor)
                    >>> output.shape
                    torch.Size([1,1,150, 10])
                    """
    def __init__(self,
                 cnn_temporal_kernelsize,
                 cnn_spatial_kernelsize,
                 dropout,
                 temporal_pool_size,
                 temporal_pool_stride):
        super().__init__()

        # CONVOLUTIONAL MODULE
        self.conv_module = torch.nn.Sequential()
        # Temporal convolution
        self.conv_module.add_module(
            "conv_0",
            sb.nnet.CNN.Conv2d(
                in_channels=1,
                out_channels=10,
                kernel_size=cnn_temporal_kernelsize,
                padding="valid",
                bias=True,
                swap=True,
            ),
        )

        self.conv_module.add_module(
            "bnorm_1",
            sb.nnet.normalization.BatchNorm2d(
                input_size=10, momentum=0.1, affine=True,
            ),
        )

        self.conv_module.add_module(
            "dropout_1", torch.nn.Dropout(p=dropout),
        )

        self.conv_module.add_module(
            "relu_1", torch.nn.ReLU(),
        )

        # Spatial convolution
        self.conv_module.add_module(
            "conv_1",
            sb.nnet.CNN.Conv2d(
                in_channels=10,
                out_channels=10,
                kernel_size=cnn_spatial_kernelsize,
                padding="valid",
                bias=False,
                swap=True,
            ),
        )
        self.conv_module.add_module(
            "bnorm_2",
            sb.nnet.normalization.BatchNorm2d(
                input_size=10, momentum=0.1, affine=True,
            ),
        )

        self.conv_module.add_module(
            "dropout_2", torch.nn.Dropout(p=dropout),
        )

        self.conv_module.add_module(
            "relu_2", torch.nn.ReLU(),
        )

        self.conv_module.add_module(
            "square_1", Square(),
        )

        self.conv_module.add_module(
            "pool_1",
            sb.nnet.pooling.Pooling2d(
                pool_type='avg',
                kernel_size=temporal_pool_size,
                stride=temporal_pool_stride,
                pool_axis=[1, 2],
            ),
        )

        self.conv_module.add_module(
            "log_1", Log(),
        )


    def forward(self, x):
        """Returns the output of the 2DCNN block.

        Arguments
        ---------
        x : torch.Tensor (batch, time, EEG channel, channel)
            Input to convolve. 4d tensors are expected.
        """
        result = self.conv_module(x) # Batch, 1, T_n, Channel(10)
        return result


class Square(torch.nn.Module):
    """Layer for squaring activations."""
    def forward(self, x):
        return torch.square(x)


class Log(torch.nn.Module):
    """Layer to compute log of activations."""

    def forward(self, x):
        return torch.log(torch.clamp(x, min=1e-6))

Overwriting /notebooks/benchmarks/benchmarks/MOABB/models/M2DCNNClassifier.py


#### Hyperparameters

In [None]:
%%file /notebooks/benchmarks/benchmarks/MOABB/hparams/MotorImagery/BNCI2014001/M2DCNNClassifier_first.yaml

seed: 1234
__set_torchseed: !apply:torch.manual_seed [!ref <seed>]

# DIRECTORIES
data_folder: !PLACEHOLDER  #'/path/to/dataset'. The dataset will be automatically downloaded in this folder
cached_data_folder: !PLACEHOLDER #'path/to/pickled/dataset'
output_folder: !PLACEHOLDER #'path/to/results'

# DATASET HPARS
# Defining the MOABB dataset.
dataset: !new:moabb.datasets.BNCI2014001
save_prepared_dataset: True # set to True if you want to save the prepared dataset as a pkl file to load and use afterwards
data_iterator_name: !PLACEHOLDER
target_subject_idx: !PLACEHOLDER
target_session_idx: !PLACEHOLDER
events_to_load: null # all events will be loaded
original_sample_rate: 250 # Original sampling rate provided by dataset authors
sample_rate: 125 # Target sampling rate (Hz)
# band-pass filtering cut-off frequencies
fmin: 0.11 # @orion_step1: --fmin~"uniform(0.1, 5, precision=2)"
fmax: 50.0 # @orion_step1: --fmax~"uniform(20.0, 50.0, precision=3)"
n_classes: 4
# tmin, tmax respect to stimulus onset that define the interval attribute of the dataset class
# trial begins (0 s), cue (2 s, 1.25 s long); each trial is 6 s long
# dataset interval starts from 2
# -->tmin tmax are referred to this start value (e.g., tmin=0.5 corresponds to 2.5 s)
tmin: 0.
tmax: 4.0 # @orion_step1: --tmax~"uniform(1.0, 4.0, precision=2)"
# number of steps used when selecting adjacent channels from a seed channel (default at Cz)
n_steps_channel_selection: 2 # @orion_step1: --n_steps_channel_selection~"uniform(1, 3,discrete=True)"
T: !apply:math.ceil
    - !ref <sample_rate> * (<tmax> - <tmin>)
C: 22
# We here specify how to perfom test:
# - If test_with: 'last' we perform test with the latest model.
# - if test_with: 'best, we perform test with the best model (according to the metric specified in test_key)
# The variable avg_models can be used to average the parameters of the last (or best) N saved models before testing.
# This can have a regularization effect. If avg_models: 1, the last (or best) model is used directly.
test_with: 'last' # 'last' or 'best'
test_key: "acc" # Possible opts: "loss", "f1", "auc", "acc"

# METRICS
f1: !name:sklearn.metrics.f1_score
    average: 'macro'
acc: !name:sklearn.metrics.balanced_accuracy_score
cm: !name:sklearn.metrics.confusion_matrix
metrics:
    f1: !ref <f1>
    acc: !ref <acc>
    cm: !ref <cm>
# TRAINING HPARS
n_train_examples: 100  # it will be replaced in the train script
# checkpoints to average
avg_models: 1 # @orion_step1: --avg_models~"uniform(1, 15,discrete=True)"
number_of_epochs: 100 # @orion_step1: --number_of_epochs~"uniform(250, 1000, discrete=True)"
lr: 0.001 # @orion_step1: --lr~"choices([0.01, 0.005, 0.001, 0.0005, 0.0001])"
# Learning rate scheduling (cyclic learning rate is used here)
max_lr: !ref <lr> # Upper bound of the cycle (max value of the lr)
base_lr: 0.001 # Lower bound in the cycle (min value of the lr)
step_size_multiplier: 5 #from 2 to 8
step_size: !apply:round
    - !ref <step_size_multiplier> * <n_train_examples> / <batch_size>
lr_annealing: !new:speechbrain.nnet.schedulers.CyclicLRScheduler
    base_lr: !ref <base_lr>
    max_lr: !ref <max_lr>
    step_size: !ref <step_size>
label_smoothing: 0.0
loss: !name:speechbrain.nnet.losses.nll_loss
    label_smoothing: !ref <label_smoothing>
optimizer: !name:torch.optim.Adam
    lr: !ref <lr>
epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter  # epoch counter
    limit: !ref <number_of_epochs>
batch_size_exponent: 4 # @orion_step1: --batch_size_exponent~"uniform(4, 6,discrete=True)"
batch_size: !ref 2 ** <batch_size_exponent>
valid_ratio: 0.2

# DATA AUGMENTATION
# cutcat (disabled when min_num_segments=max_num_segments=1)
max_num_segments: 3 # @orion_step2: --max_num_segments~"uniform(2, 6, discrete=True)"
cutcat: !new:speechbrain.augment.time_domain.CutCat
    min_num_segments: 2
    max_num_segments: !ref <max_num_segments>
# random amplitude gain between 0.5-1.5 uV (disabled when amp_delta=0.)
amp_delta: 0.008079 # @orion_step2: --amp_delta~"uniform(0.0, 0.5)"
rand_amp: !new:speechbrain.augment.time_domain.RandAmp
    amp_low: !ref 1 - <amp_delta>
    amp_high: !ref 1 + <amp_delta>
# random shifts between -300 ms to 300 ms (disabled when shift_delta=0.)
shift_delta_: 25 # orion_step2: --shift_delta_~"uniform(0, 25, discrete=True)"
shift_delta: !ref 1e-2 * <shift_delta_> # 0.250 # 0.-0.25 with steps of 0.01
min_shift: !apply:math.floor
    - !ref 0 - <sample_rate> * <shift_delta>
max_shift: !apply:math.floor
    - !ref 0 + <sample_rate> * <shift_delta>
time_shift: !new:speechbrain.augment.freq_domain.RandomShift
    min_shift: !ref <min_shift>
    max_shift: !ref <max_shift>
    dim: 1
# injection of gaussian white noise
snr_white_low: 15.0 # @orion_step2: --snr_white_low~"uniform(0.0, 15, precision=2)"
snr_white_delta: 5.49 # @orion_step2: --snr_white_delta~"uniform(5.0, 20.0, precision=3)"
snr_white_high: !ref <snr_white_low> + <snr_white_delta>
add_noise_white: !new:speechbrain.augment.time_domain.AddNoise
    snr_low: !ref <snr_white_low>
    snr_high: !ref <snr_white_high>

repeat_augment: 1 # @orion_step1: --repeat_augment 0
augment: !new:speechbrain.augment.augmenter.Augmenter
    parallel_augment: True
    concat_original: True
    parallel_augment_fixed_bs: True
    repeat_augment: !ref <repeat_augment>
    shuffle_augmentations: True
    min_augmentations: 4
    max_augmentations: 4
    augmentations: [
        !ref <cutcat>,
        !ref <rand_amp>,
        !ref <time_shift>,
        !ref <add_noise_white>]

# DATA NORMALIZATION
dims_to_normalize: 1 # 1 (time) or 2 (EEG channels)
normalize: !name:speechbrain.processing.signal_processing.mean_std_norm
    dims: !ref <dims_to_normalize>
# MODEL
input_shape: [null, !ref <T>, !ref <C>, null]
cnn_temporal_kernelsize: 6 # @orion_step1: --cnn_temporal_kernelsize~"uniform(5, 62,discrete=True)"
cnn_temporal_pool_stride: 6

# pool size / stride from 4/125 ms to 40/125 ms = circa 30 ms
#cnn_poolsize: !ref <cnn_poolsize_> * 4 # same resolution as for EEGNet research space
#cnn_poolstride: !ref <cnn_poolstride_> * 4 # same resolution as for EEGNet research space
dropout: 0.25 # @orion_step1: --dropout~"uniform(0.0, 0.5)"
temporal_kernels: [65,45,105]

postnet_poolsize: 8
postnet_poolstride: 8

model: !new:models.M2DCNNClassifier.M2DCNNClassifier
    input_shape: !ref <input_shape>
    temporal_kernel_sizes: !ref <temporal_kernels>
    temporal_pool_size: [!ref <cnn_temporal_kernelsize>, 1]
    temporal_pool_stride: [!ref <cnn_temporal_pool_stride>, 1]
    postnet_poolsize: !ref <postnet_poolsize>
    postnet_poolstride: !ref <postnet_poolstride>
    dropout: !ref <dropout>
    dense_n_neurons: !ref <n_classes>


Overwriting /notebooks/benchmarks/benchmarks/MOABB/hparams/MotorImagery/BNCI2014001/M2DCNNClassifier_first.yaml


#### Training Script

In [None]:
%cd /notebooks/benchmarks/benchmarks/MOABB
!./run_experiments.sh --hparams hparams/MotorImagery/BNCI2014001/M2DCNNClassifier_first.yaml --data_folder eeg_data --output_folder results/MotorImagery/BNCI2014001/M2DCNNClassifier_first --nsbj 9 --nsess 2 --nruns 10 --train_mode leave-one-session-out --device=cuda

#### M2DCNN Classifier Results

This experiment was not succesful as the model was not converging whatsoever. After trying many different kernel sizes and combinations, it failed to converge and accuracy remained at a stable 25%.

## Conclusion

Multi-Scale Hybrid Neural Networks for classification of Motor Imagery has not been very successful for the classification task given. The authors of the MSHCNN paper reported great success with binary classification; between left and right-hand. However, this success did not seem to carry over into the classification of 4 different labels. It might also be worth noting that I attempted to experiment with Mamba as Self-Attention, taking inspiration from EEGConformer, but failed to integrate it with SpeechBrain. Thank you for reading.

## References



1.   Lawhern, V.J. et al. (2018) EEGNet: A compact convolutional network for EEG-based brain-computer interfaces, arXiv.org. Available at: https://arxiv.org/abs/1611.08024 (Accessed: 06 April 2024).
2.   Tang X;Yang C;Sun X;Zou M;Wang H; (2023) Motor imagery EEG decoding based on multi-scale hybrid networks and feature enhancement, IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society. Available at: https://pubmed.ncbi.nlm.nih.gov/37022411/ (Accessed: 06 April 2024).
3.  Schirrmeister, R.T. et al. (2018) Deep learning with convolutional neural networks for EEG decoding and visualization, arXiv.org. Available at: https://arxiv.org/abs/1703.05051 (Accessed: 07 April 2024).
4.  Z. Jia, Y. Lin, J. Wang, K. Yang, T. Liu, and X. Zhang, “Mmcnn:A multi-branch multi-scale convolutional neural network for motor imagery classification,” in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases. Cham, Switzerland: Springer, 2020, pp. 736–751. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-030-67664-3_44

