# Basics of PyTorch

PyTorch is a popular open-source machine learning library that provides an efficient and flexible platform for building and training deep neural networks. Its key strength lies in its dynamic computational graph feature, which allows for on-the-fly modification and debugging of models. PyTorch is written in Python and provides a range of built-in functions and libraries for tasks such as image and text processing, time-series analysis, and more. The library's intuitive APIs and easy-to-learn syntax make it a popular choice among researchers and developers.

**Installation**
- [PyTorch.org](https://pytorch.org/)

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils, datasets

## Building a network 
To build a neural network in PyTorch, we can define a class that inherits from the `nn.Module` class. In the class, we'll define the layers of the network in the `__init__` function and define how data passes through the layers in the `forward` function. Here's an example of a neural network with two hidden layers:

In [2]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)
        
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In this example, we're defining a network with three layers: a fully connected layer with 784 input neurons and 256 output neurons, another one from 256 to 128 neurons, and an output layer that projects into 10 dimensions. The forward function takes an input x and passes it through each layer, applying the ReLU activation function to the output of the two hidden layers.

### Sequential blocks
For convenience, PyTorch allows us to define a neural network as a sequence of layers. We can use the `nn.Sequential` class for this purpose. Here's the same network as above using `nn.Sequential`:

In [3]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(784, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )
        
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.net(x)
        return x

## Training, tuning and evaluating a network
Now that we have defined our neural network, we can train it on a dataset. To do so, we'll need to define a training loop and split the data into training, validation, and testing sets. Furthermore, we need to select a loss function and an optimizer.

By splitting a dataset into training, validation, and testing sets, we can assess the performance of a model and ensure that the model is able to generalize well to unseen data. The adverse effect of just memorizing the training data is called overfitting.

The training set is used to train the model's parameters, while the validation set is used to tune the model's hyperparameters, such as learning rate or regularization strength. The testing set is then used to evaluate the final performance of the model after it has been trained and fine-tuned. Optimally, the latter is only evaluated once, after the model has been trained and tuned.

In [4]:
def train(model, train_loader, criterion, optimizer):
    model.train()
    for i, (inputs, targets) in enumerate(train_loader):
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

The training loop iteratively optimizes a model to minimize the loss on the training data. At each iteration, a batch of data is passed through the model. The loss between the model's predictions and the true targets is then computed, and the gradients of the loss with respect to the model's parameters are backpropagated through the model. The optimizer uses these gradients to update the model's parameters in the direction of the negative gradient, using a specified update rule.

In this training loop, we're iterating through the training data in batches using a `train_loader`. For each batch, we're zeroing out the gradients of the optimizer, passing the inputs through the `model` to obtain the predicted outputs, and computing the loss between the predicted and target values (`criterion`). We then use the loss to compute the gradients of the model parameters using backpropagation, and update the parameters of the model using the `optimizer`'s step function.

This whole process of going through the batched dataset will be repeated for a number of epochs or until convergence, with the goal of finding the optimal model parameters that minimize the loss on the training data. We will see this in the MNIST example below. For now it is only the inner loop.

For evaluation on the validation or even test set, we don't need to compute gradients, so we can set the model into evaluation mode `model.eval()` and wrap the code in a `torch.no_grad()` context manager to prevent PyTorch's autograd engine from tracking gradients. We can then pass the validation data through the model and compute the loss between the predicted and target values.

In [5]:
def infer(model, test_loader, criterion):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            outputs = model(inputs)
            test_loss += criterion(outputs, targets).item()
            preds = torch.argmax(outputs, dim=1)
            correct += preds.eq(targets).sum().item()
    test_loss /= len(test_loader.dataset)
    accuracy = correct / len(test_loader.dataset)
    return test_loss, accuracy

## Application on MNIST

Now we can combine the model definition and training/evaluation strategy with a dataset. 

The MNIST dataset is a collection of 70,000 images of handwritten digits, each of size 28x28 pixels. It is a popular dataset for training and testing machine learning models for image classification tasks. (cf. http://yann.lecun.com/exdb/mnist/)

![MNIST Dataset](./MnistExamples.png)


### Initializing the model
We can initialize the model that is conveniently defined above.

In [6]:
model = Net()

### Initializing the loss function and optimizer

Next, we'll define the loss function and optimizer that we'll use to train the network. For the loss function, we'll use the cross-entropy loss, which is a common loss function for classification tasks.

In [7]:
loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

### Initializing the datasets

In [8]:
BATCH_SIZE = 64

In [9]:
train_data = datasets.MNIST(root='./data_downloaded', train=True, transform=transforms.ToTensor(), download=True)
train_data, _ = torch.utils.data.random_split(train_data, [50000, 10000], generator=torch.manual_seed(42))
train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)

# Standardization might be a bit overkill for an already 0-1 normalized MNIST.
# However, pay attention that the parameters are only calculated on the training set.
mean = 0.
std = 0.
for images, _ in train_loader:
    num_samples = images.size(0)
    images = torch.flatten(images, start_dim=1)
    mean += images.mean(1).sum(0)
    std += images.std(1).sum(0)
mean /= len(train_loader.dataset)
std /= len(train_loader.dataset)

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=mean, std=std)
])

# Create datasets and loaders with new normalizing transforms
train_data = datasets.MNIST(root='./data_downloaded', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='./data_downloaded', train=False, download=True, transform=transform)
train_data, val_data = torch.utils.data.random_split(train_data, [50000, 10000], generator=torch.manual_seed(42))

train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_data, batch_size=BATCH_SIZE, shuffle=False)
test_loader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=False)

### Running the full training loop for a number of epochs
After every full cycle through the data, the performance is calculated on the validation set and can be reported.

In [10]:
NUM_EPOCHS = 50

In [11]:
for epoch in range(NUM_EPOCHS):
    train(model, train_loader, loss_function, optimizer)

    # Evaluate the network on the validation set
    val_loss, val_accuracy = infer(model, val_loader, loss_function)
    if epoch % 5 == 0:
        print(f"Epoch {epoch+1}: Validation Loss = {val_loss:.4f}, Accuracy = {val_accuracy:.4f}")

# Evaluate the network on the test set
test_loss, test_accuracy = infer(model, test_loader, loss_function)
print(f"Test on model epoch {epoch+1}: Test Loss = {test_loss:.4f}, Accuracy = {test_accuracy:.4f}")

Epoch 1: Validation Loss = 0.0026, Accuracy = 0.9503
Epoch 6: Validation Loss = 0.0013, Accuracy = 0.9768
Epoch 11: Validation Loss = 0.0014, Accuracy = 0.9789
Epoch 16: Validation Loss = 0.0014, Accuracy = 0.9808
Epoch 21: Validation Loss = 0.0015, Accuracy = 0.9802
Epoch 26: Validation Loss = 0.0015, Accuracy = 0.9802
Epoch 31: Validation Loss = 0.0016, Accuracy = 0.9803
Epoch 36: Validation Loss = 0.0016, Accuracy = 0.9806
Epoch 41: Validation Loss = 0.0016, Accuracy = 0.9802
Epoch 46: Validation Loss = 0.0017, Accuracy = 0.9801
Test on model epoch 50: Test Loss = 0.0013, Accuracy = 0.9831


### Introducing early stopping
Early stopping is a technique to prevent overfitting. It involves monitoring a validation metric, such as the validation loss, during the training process and stopping the training early when the metric no longer improves. This is done by setting a threshold for the number of epochs or iterations with no improvement in the validation metric, `STOPPING_PATIENCE`.

In [12]:
STOPPING_PATIENCE = 5

In [13]:
model = Net()
loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

In [14]:
best_loss = np.inf
best_epoch = 0
for epoch in range(NUM_EPOCHS):
    train(model, train_loader, loss_function, optimizer)

    # Evaluate the network on the validation set
    val_loss, val_accuracy = infer(model, val_loader, loss_function)
    print(f"Epoch {epoch+1}: Validation Loss = {val_loss:.4f}, Accuracy = {val_accuracy:.4f}")
    
    # Remainder only for early stopping
    if val_loss < best_loss:
        best_loss = val_loss
        best_epoch = epoch
        torch.save(model.state_dict(), 'best_model.pth')
    elif epoch - best_epoch > STOPPING_PATIENCE:
        print("Early stopping...")
        break

# Load the best model and evaluate it on the test set
model.load_state_dict(torch.load('best_model.pth'))
test_loss, test_accuracy = infer(model, test_loader, loss_function)
print(f"Test on best model epoch {best_epoch+1}, Test Loss = {test_loss:.4f}, Accuracy = {test_accuracy:.4f}")

Epoch 1: Validation Loss = 0.0024, Accuracy = 0.9515
Epoch 2: Validation Loss = 0.0019, Accuracy = 0.9614
Epoch 3: Validation Loss = 0.0015, Accuracy = 0.9708
Epoch 4: Validation Loss = 0.0013, Accuracy = 0.9732
Epoch 5: Validation Loss = 0.0014, Accuracy = 0.9726
Epoch 6: Validation Loss = 0.0015, Accuracy = 0.9718
Epoch 7: Validation Loss = 0.0015, Accuracy = 0.9731
Epoch 8: Validation Loss = 0.0013, Accuracy = 0.9753
Epoch 9: Validation Loss = 0.0013, Accuracy = 0.9754
Epoch 10: Validation Loss = 0.0014, Accuracy = 0.9765
Epoch 11: Validation Loss = 0.0014, Accuracy = 0.9770
Epoch 12: Validation Loss = 0.0014, Accuracy = 0.9761
Epoch 13: Validation Loss = 0.0013, Accuracy = 0.9780
Epoch 14: Validation Loss = 0.0014, Accuracy = 0.9769
Early stopping...
Test on best model epoch 8, Test Loss = 0.0011, Accuracy = 0.9796


Here, we're training the network for up to `NUM_EPOCHS` epochs, and we're saving the model with the best test loss on the validation set. If the loss doesn't improve for more than 5 epochs, we stop training early to prevent overfitting. After this, we load the best model and test it on the test set to get an estimate of the general performance. The code also gives us an impression on how to do checkpointing (saving the parameters of a model every once in a while).

## CNN example

A convolutional neural network (or CNN) is a type of neural network comprising typically of following building blocks:
- Convolutional Layer: These are a set of kernels/filters that convolve with a signal (1D: audio, EEG, etc; 2D: Images; 3D: Videos) to find particular patterns in it based on the kernel type. The kernels or filters are trainable.
- Non-linearity: ReLU, Sigmoid, Tanh, etc.
- Pooling layer: They downsample the input signal, which reduces the necessity to have a larger convolutional layer at the output. It also introduces a small translation invariance to the input signal.
- Fully connected layer/Linear layer: They are mainly used at the end to model the actual decision process. Example: classifier.

![Basic CNN block](cnn_architecture.svg)
Image source: https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks

In [15]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1,padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(16, 32, 5, 1, 2),
            nn.ReLU(),
            nn.MaxPool2d(2),
        )
        self.out = nn.Linear(32 * 7 * 7, 10)
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)
        output = self.out(x)
        return output

In [16]:
model = CNN()
print(model)

CNN(
  (conv1): Sequential(
    (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (out): Linear(in_features=1568, out_features=10, bias=True)
)


In [17]:
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = 0.01)

In [18]:
# Similar code as above
best_loss = np.inf
best_epoch = 0
for epoch in range(NUM_EPOCHS):
    train(model, train_loader, loss_function, optimizer)

    # Evaluate the network on the validation set
    val_loss, val_accuracy = infer(model, val_loader, loss_function)
    if epoch % 5 == 0:
        print(f"Epoch {epoch+1}: Validation Loss = {val_loss:.4f}, Accuracy = {val_accuracy:.4f}")
    
    if val_loss < best_loss:
        best_loss = val_loss
        best_epoch = epoch
        torch.save(model.state_dict(), 'best_cnn_model.pth')
    elif epoch - best_epoch > STOPPING_PATIENCE:
        print("Early stopping...")
        break

# Load the best model and evaluate it on the test set
model.load_state_dict(torch.load('best_cnn_model.pth'))
test_loss, test_accuracy = infer(model, test_loader, loss_function)
print(f"Test on best model epoch {best_epoch+1}, Test Loss = {test_loss:.4f}, Accuracy = {test_accuracy:.4f}")

Epoch 1: Validation Loss = 0.0029, Accuracy = 0.9455
Epoch 6: Validation Loss = 0.0021, Accuracy = 0.9614
Epoch 11: Validation Loss = 0.0020, Accuracy = 0.9600
Epoch 16: Validation Loss = 0.0023, Accuracy = 0.9558
Epoch 21: Validation Loss = 0.0019, Accuracy = 0.9645
Early stopping...
Test on best model epoch 15, Test Loss = 0.0015, Accuracy = 0.9706


### Using a GPU

The method `to` can be used to transfer PyTorch tensors to the GPU, where they can be processed using parallel computing. This will in most cases significantly speed up the training process. To define where to send the tensors, the string `cuda:` followed by the id of the gpu, e.g. `0`, can be used.

In [19]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

In [20]:
torch.cuda.device_count()

0

In [21]:
def train(model, train_loader, criterion, optimizer, device):
    model.train()
    for i, (inputs, targets) in enumerate(train_loader):
        inputs = inputs.to(device)
        targets = targets.to(device)
        
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

In [22]:
def infer(model, test_loader, criterion, device):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs = inputs.to(device)
            targets = targets.to(device)
            
            outputs = model(inputs)
            test_loss += criterion(outputs, targets).item()
            preds = torch.argmax(outputs, dim=1)
            correct += preds.eq(targets).sum().item()
    test_loss /= len(test_loader.dataset)
    accuracy = correct / len(test_loader.dataset)
    return test_loss, accuracy

In [23]:
model = CNN()

# .to() on the model sends all parameters to device:
model.to(device)
print(next(model.parameters()).device)

cpu


In [24]:
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = 1e-3)

In [25]:
# Same code as above
best_loss = np.inf
best_epoch = 0
for epoch in range(NUM_EPOCHS):
    train(model, train_loader, loss_function, optimizer, device)

    # Evaluate the network on the validation set
    val_loss, val_accuracy = infer(model, val_loader, loss_function, device)
    if epoch % 5 == 0:
        print(f"Epoch {epoch+1}: Validation Loss = {val_loss:.4f}, Accuracy = {val_accuracy:.4f}")
    
    if val_loss < best_loss:
        best_loss = val_loss
        best_epoch = epoch
        torch.save(model.state_dict(), 'best_cnn_model.pth')
    elif epoch - best_epoch > STOPPING_PATIENCE:
        print("Early stopping...")
        break

# Load the best model and evaluate it on the test set
model.load_state_dict(torch.load('best_cnn_model.pth'))
test_loss, test_accuracy = infer(model, test_loader, loss_function, device)
print(f"Test on best model epoch {best_epoch+1}, Test Loss = {test_loss:.4f}, Accuracy = {test_accuracy:.4f}")

Epoch 1: Validation Loss = 0.0010, Accuracy = 0.9820
Epoch 6: Validation Loss = 0.0007, Accuracy = 0.9889
Epoch 11: Validation Loss = 0.0008, Accuracy = 0.9897
Early stopping...
Test on best model epoch 8, Test Loss = 0.0005, Accuracy = 0.9905


# Using a custom dataset
Make sure to check out the other notebook `deep_learning_start.ipynb`

In [26]:
class MyDataset(Dataset):
    def __init__(self, data_dir, stimuli_dir, fs, bandpass_band, split="train"):
        self.data_dir = data_dir
        self.stimuli_dir = stimuli_dir
        self.split = split

        pass
    
    def __len__(self):
        pass

    def __getitem__(self, idx):
        # Load raw or preprocessed data here
        # Splitting can also be done before

        if self.split == "train":
            pass
        elif self.split == "val":
            pass
        elif self.split == "test":
            pass
        
        # Attended signal could be 'left' or 'right'. We can achieve this in different ways.
        if np.random.random() < 0.5:
            return eeg_split, attended_split, unattended_split, torch.tensor([0], dtype=torch.double)
        else:
            return eeg_split, unattended_split, attended_split, torch.tensor([1], dtype=torch.double)
