#How to Create a Custom Resnet and Then Optimize it With Optuna
## Tutorial by Dustin Wang



I learned a lot of the following material from these resources take a look.

To learn more about how CNN's work watch this video series by DeepLearningAI on Youtube (https://www.youtube.com/playlist?list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF)

To learn more about how SNN's work you can take a look at these python notebook tutorials (https://snntorch.readthedocs.io/en/latest/tutorials/index.html)

To learn more about optimization with Optuna, you can check out their page here(https://optuna.org/)

Patrick Loeber's Pytorch tutorial 14 on Youtube (https://www.youtube.com/watch?v=pDdP0TFzsoQ&ab_channel=PatrickLoeber)

Transfer Learning tutorial on Pytorch Docs
(https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)

Disclaimer: The following tutorial is intended for educational purposes. The resulting model does not actually perform well due to the structure of the model.

#Introduction


What if, for a project, we started by training a residual network to classify different species of jellyfish and then, just by curiosity, added a spiking neural network to the end of the resnet? Would that work at all? How bad would it be? How accurate can we make this combination? Here are the answers in order: "Not really", "Pretty bad", and "I got it to about 76% accuracy".

In this tutorial you will learn how to:
* Use transfer learning to train convolutional neural networks on image datasets
* Plug a static image into a time-varying neural network
* Optimize the hyperparameters of a neural network using Optuna

# Dependencies

You will need to install these libraries into your python environment.


In [None]:
import torch
import snntorch as snn
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import optuna
from optuna.trial import TrialState
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
from tempfile import TemporaryDirectory

# Dataset and Dataloading

If you haven't already, go ahead and download the image dataset linked above from Kaggle. It has a standard file structure that we can use with Pytorch's ImageFolder class to make this part much easier.


In [None]:
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
}


Here, we set up some data transforms for data augmentation. The ImageFolder class from will automatically create our datasets. If you take a look at the file structure of our data it looks like this:


```
root/
|-- test/
|   |-- class 1
|       |--img1.jpg
|       |--img2.jpg
|   |-- class 2
|       |--img1.jpg
|       |--img2.jpg
|
|-- train/
|   |-- class 1
|       |--img1.jpg
|       |--img2.jpg
|   |-- class 2
|       |--img1.jpg
|       |--img2.jpg
|
|-- val/
|   |-- class 1
|       |--img1.jpg
|       |--img2.jpg
|   |-- class 2
|       |--img1.jpg
|       |--img2.jpg
|
|-- ...
```

The ImageFolder class takes one folder with images split into class folders within it. So, with some string manipulation, we pass the train and val folders in to get our datasets and then dataloaders in sequence. I set the dataloader batch size to 4.

In [None]:

data_dir = './data/jellies'
#create datasets for the train and validation folder paths
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
#create dataloaders for the train and validation datasets. Batch size set to 4
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=0)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}

#run on nvidia cuda cores if we got em
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Defining the Model

Let's define our model now.

In [None]:
class CustomResNet(nn.Module):
    def __init__(self, resnet, beta):
        super(CustomResNet, self).__init__()
        self.resnet = resnet
        self.lif1 = snn.Leaky(beta=beta)
        self.fc2 = nn.Linear(256, 6)
        self.lif2 = snn.Leaky(beta=beta)

    def forward(self, x):
        # Initialize hidden states at t=0
        mem1 = self.lif1.init_leaky()
        mem2 = self.lif2.init_leaky()

        # Record the final layer
        spk2_rec = []
        mem2_rec = []

        cur1 = self.resnet(x)

        # Forward pass
        # Repeatedly pass the image into SNN and record spikes and mem potential of the final layer
        # based on number of steps
        for step in range(20):
            spk1, mem1 = self.lif1(cur1, mem1)
            cur2 = self.fc2(spk1)
            spk2, mem2 = self.lif2(cur2, mem2)
            spk2_rec.append(spk2)
            mem2_rec.append(mem2)

        return torch.stack(spk2_rec, dim=0), torch.stack(mem2_rec, dim=0)

This custom network is residual network passed into a spiking neural network. Since the spiking neural network requires time varying data, I just passed in the residual networks output repeatedly for 20 timesteps. For the spiking neural network, I added two layers. Each parent layer contains a fully connected layer and also a leaky-integrate-fire layer. You'll notice that there isn't a fully connected layer to correspond with the first LIF layer and that problem will be addressed in the next section. Beta is the decay rate for the LIF layers. We want to pass that value in as a variable because it is a hyperparameter that we can optimize.

# Transfer Learning

Transfer learning is when you take a pre-trained model and train it again to specialize it for your needs. In our case we are going to take a pre-trained ResNet provided by Pytorch. I chose to freeze all of the pretrained layers so that the parameters are not trained. This way I can speed up computation.

In [None]:
def create_Model(beta):
    # Load pre-trained
    model_ft = models.resnet18(weights='IMAGENET1K_V1')

    # Freeze all layers
    for param in model_ft.parameters():
        param.requires_grad = False

    # get number of input ftrs of last fc layer
    num_ftrs = model_ft.fc.in_features

    # Alternatively, it can be generalized to ``nn.Linear(num_ftrs, len(class_names))``.
    # New layers have requires_grad=True by default
    model_ft.fc = nn.Linear(num_ftrs, 256)

    #run on gpu if we have one
    model_ft = model_ft.to(device)

    # Create custom resnet with SNN layers
    return CustomResNet(model_ft, beta)

The ResNet contains one fully connected layer at the very end of the network. I chose to simply alter the output of that layer so that it would fit into the custom network definition.

# Training Loop

This last function for the training loop is a bit of a monster but it works in conjunction with Optuna. So let's get into how Optuna works.

##Optuna

Simply, given a function and a set of parameters to toy with, Optuna will run many trials in order to get the parameters that best suit your needs. In our case, we have a training loop function that results in a model with an accuracy metric. We want to maximize the accuracy metric and we have a set of hyperparameters that we can give Optuna.

As for syntax, Optuna takes an __objective(trial)__ function and you can tell it what parameters to use with __trial.suggestInt()__ or __suggestFloat()__ and so on. Then after you've defined your __objective()__ function and given Optuna the parameters you can call a trial run and Optuna will start its work.

This is much easier than plugging hyperparameters manually. Optuna will tell you what parameters are best.

##The Training Loop

This code is an altered version of the training loop used in the Transfer Learning Tutorial linked above.

In [None]:
def objective(trial):

    # hyperparameters
    beta = trial.suggest_float("beta", 0, 1)
    lr = trial.suggest_float("lr", .00000001, 0.0001, log=True)
    num_epochs = 5

    # create model
    model_ft = create_Model(beta)

    criterion = nn.CrossEntropyLoss()

    optimizer_ft = optim.SGD(model_ft.parameters(), lr=lr)

    # Decay learning rate by a factor of 0.1 every 7 epochs
    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=.1)

    since = time.time()

    # Create a temporary directory to save training checkpoints
    with TemporaryDirectory() as tempdir:
        best_model_params_path = os.path.join(tempdir, 'best_model_params.pt')

        torch.save(model_ft.state_dict(), best_model_params_path)
        best_acc = 0.0

        for epoch in range(num_epochs):
            print(f'Epoch {epoch}/{num_epochs - 1}')
            print('-' * 10)

            # Each epoch has a training and validation phase
            for phase in ['train', 'val']:
                if phase == 'train':
                    model_ft.train()  # Set model to training mode
                else:
                    model_ft.eval()   # Set model to evaluate mode

                running_loss = 0.0
                running_corrects = 0
                acc_hist = []

                # Iterate over data.
                for inputs, labels in dataloaders[phase]:
                    inputs = inputs.to(device)
                    labels = labels.to(device)

                    # forward
                    # track history only if in train
                    with torch.set_grad_enabled(phase == 'train'):
                        spk_rec, mem_rec = model_ft(inputs)

                        # Get prediction by rate. Index of most spiked neuron is the prediction.
                        _, idx = spk_rec.sum(dim=0).max(1)

                        acc = np.mean((labels == idx).detach().cpu().numpy())

                        # Count up all the losses over every time step
                        loss = torch.zeros((1), dtype=torch.float)
                        for step in range(20):
                            loss += criterion(mem_rec[step], labels)

                        # backward + optimize only if in training phase
                        optimizer_ft.zero_grad()
                        if phase == 'train':
                            loss.backward()
                            optimizer_ft.step()

                    # statistics
                    running_loss += loss.item() * inputs.size(0)
                    acc_hist.append(acc)
                if phase == 'train':
                    exp_lr_scheduler.step()

                epoch_loss = running_loss / dataset_sizes[phase]
                epoch_acc = np.mean(acc_hist)

                print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

                # deep copy the model
                if phase == 'val' and epoch_acc > best_acc:
                    best_acc = epoch_acc
                    torch.save(model_ft.state_dict(), best_model_params_path)

            print()

        time_elapsed = time.time() - since
        print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
        print(f'Best val Acc: {best_acc:4f}')

        return best_acc
        # load best model weights
        #model_ft.load_state_dict(torch.load(best_model_params_path))

I like this version of the training loop because it saves the best version of our network automatically which I thought was great, so kudos to Sasank Chilamkurthy who wrote the Transfer Learning tutorial.

**In order to calculate accuracy**, we use prediction by spike rate. The neuron in the last layer that spikes the most is the networks prediction. Then, we compare that to the label on the image.

**In order to calculate loss**, we use the Cross Entropy Loss which will automatically softmax the membrane potential history and get our loss.

Lastly, I've basically wrapped the entire training loop in the objective function and I output the best accuracy at the bottom. At the top, we set Optuna to tune the beta and the learning rate hyperparameters. All that's left now is to run a study with Optuna and wait.

In [None]:
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

# Results and Conclusion

So, after waiting 4 hours (Ryzen 9 6900HS CPU), Optuna came out with its verdict. The highest accuracy met was 76% with the parameters below.

In [None]:
parameters: {'beta':0.990313977538696, 'lr':6.989109944155411e-05}

In practice, now that we have parameters close enough to the maximum accuracy, we would plug them into the model to save it, but the accuracy is not good enough to warrant that effort. We have, though, answered the question at the beginning of this notebook. Combining a resnet with an SNN to train on a static image dataset does not work well.