# Tuning

## Description

In this homework, you'll tune some hyperparameters to try to improve the performance of a network. This will involve setting up a convolutional network for the CIFAR-10 dataset, setting up TensorBoard for logging, and then experimenting. I am _not_ going to be setting specific accuracy targets for different grades because there is too much randomness in the training process. Moreover, achieving high accuracy is easier if you have access to a lot of computational resources, so there are some equity issues with just grading by final model performance.

Instead, I'm going to ask you to explain your tuning _process_. That is, for each experiment you run (each set of hyperparameters you try), explain why you ran that experiment and what happened. Based on your observations, what changes did you make for the next run? As long as you have explained your reasoning and it corresponds to the principles we've talked about in class, you'll do fine. Be sure to set up your logging in a way that indicates the hyperparameter values used for each run.

**IMPORTANT: Please zip up and submit your TensorBoard log files with your homework. That will help me to see what you were looking at as you went through your tuning process.**

I'm also deliberately giving you no starter code for this homework. I understand that a lot of it will just be copy-paste from past classes/labs/homeworks but I still think there is some value in going from a blank document to a complete program.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
torchvision.disable_beta_transforms_warning()
import torchvision.transforms.v2 as transforms


import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineRenderer.figure_format = 'retina'


import torch.utils.tensorboard as tb
import datetime
import os
from tqdm.notebook import tqdm
import time 


### Setting up the data and tensorboard

Make directories to save the models and logs in.

In [4]:
if not os.path.exists("logs"):
    os.mkdir("logs")

if not os.path.exists("logs"):
    os.mkdir("models")


In [10]:
# Setting the transform so that the images are put in the right format
transform = transforms.Compose([
    transforms.ToImage(),
    transforms.ConvertImageDtype(),
])

# Load data
cifar = torchvision.datasets.CIFAR10("../../data/torch/cifar", download=True, transform=transform)

# Set the training size to 80% of the total data
train_size = int(0.8 * len(cifar))

# Split data into training and validation sets
train_data, valid_data = torch.utils.data.random_split(cifar, [train_size, len(cifar) - train_size])

# These are the classes within the dataset
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']


Files already downloaded and verified


Get the necessary constants and define normalization for this dataset

In [None]:
cifar_mean = (0.4914, 0.4822, 0.4465)
cifar_std = (0.2470, 0.2435, 0.2616)

normalize = transforms.Normalize(cifar_mean, cifar_std)

### setting up the CNN


I want to set up my CNN such that I can change the convolutions in each run, partially because that will make a difference and partially because i don't fully understand the math. I will always end with flattening and a linear layer, but I will need to change the parameters for `nn.Linear` depending on the previous convolutions.

In [9]:
class CNN(nn.Module):

    def __init__(self, convolutions = torch.nn.ModuleList( 
                 nn.Conv2d(3, 8, 3, padding=3), 
                  nn.Conv2d(8, 16, 3, groups=2, padding=1, stride=2), 
                  nn.Conv2d(16, 32, 3, stride=2)), 
                  activation = nn.ReLU):
        super().__init__()
        # have some checker function to make sure the math is right 

        # Dynamically create attributes for each convolution
        self.convolutions = convolutions

        self.activation = activation()

        self.num_convolutions

        # for i in range(len(convolutions)):
        #     setattr(CNN, 'conv'+str(i), conv)


        self.flatten = nn.Flatten()

        # Fix linear inputs so it's dynamic
        self.linear =  nn.Linear(32*8*8, 10)

    def forward(self, x):
        
        for i in range(self.num_convolutions):
            conv = self.convolutions[i]
            x = conv(x)
            if i == self.num_convolutions -1: break
            x = activation(x)

        flattened = self.flatten(h5)
        return self.linear(flattened)

### Training


In [None]:
def train(run, convolutions=torch.nn.ModuleList(nn.Conv2d(3, 8, 3, padding=3), nn.Conv2d(8, 16, 3, groups=2, padding=1, stride=2), nn.Conv2d(16, 32, 3, stride=2)),lr=1e-3,epochs=10,batch_size=64,reg=1e-5,activation=nn.ReLU, use_augmentation=False,aug_params=[], device = 'mps'):
    start = time.time()
    data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
    valid_loader = torch.utils.data.DataLoader(valid_data, batch_size=batch_size, shuffle=False)
    
    if use_augmentation:
        augments = transforms.Compose([
            transforms.RandomHorizontalFlip(aug_params['flip_prob']),
            transforms.RandomGrayscale(aug_params['grayscale_prob']),
            transforms.ColorJitter(
                brightness=(aug_params['bright_min'], aug_params['bright_max']),
                contrast=0,
                saturation=0,
                hue=0),
            transforms.RandomCrop(
                size=32,
                padding=aug_params['shift_size'],
                fill=cifar_mean)
        ])

    model = CNN(convolutions).to(device)

    loss = nn.CrossEntropyLoss()
    opt = optim.SGC(model.parameters(), momentum=0.9, lr=lr, weight_decay=reg)

    # Naming the training runs
    name += str(run) + '--' # Run number
    name += ":".join(map(str, convolutions))
    name += '-lr-' + str(lr) + '-bs-' + str(batch_size) + '-epochs-' + str(epochs) + '-reg-' + str(reg)
    logger = tb.SummaryWriter(os.path.join("logs/", name))
    global_step = 0


## Training and tuning


will be a loop

First, we want to make sure that we don this on the GPU:

In [None]:
if torch.backends.mps.is_available():
    device = torch.device('mps')

else:
    decive = 'cpu'

In [None]:
# set up first training loop