<a href="https://colab.research.google.com/github/dungwoong/CSC413Final/blob/main/training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!git clone https://github.com/dungwoong/CSC413Final.git
%cd CSC413Final

Cloning into 'CSC413Final'...
remote: Enumerating objects: 34, done.[K
remote: Counting objects: 100% (34/34), done.[K
remote: Compressing objects: 100% (22/22), done.[K
remote: Total 34 (delta 15), reused 28 (delta 9), pack-reused 0[K
Unpacking objects: 100% (34/34), 11.17 KiB | 953.00 KiB/s, done.
/content/CSC413Final


In [None]:
import time
import pandas as pd
import torch
from torch import nn
import torchvision
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
from shufflenetv2 import base_model, se_model, sle_model

# Todos

 - add save model checkpoints to the training loop
 - maybe add stats for alexnet, vgg or something at the end maybe...wait but they're not made for CIFAR...idk
 - note that FLOPs is roughly 2x MMacs. However FLOPs is much higher in other papers cuz input size is higher, so it doesn't mean anything in the greater scheme of things, only when we compare to other models that we train in this study so should be ok.

# Procedure

**Measure**

Training stats: 

 - Params, MMacs(done)
 - train loss, val loss, top1, top5 error for every epoch
 - Time for every epoch, number of batches/images per epoch(can be inferred from the dataset but yea)

Model parameters

- Model label
- batch size
- lr
- Adam betas
- Adam weight decay (1e-4?)

**Hyperparam tuning**

We will look for clear indications of training problems from the training and validations curves for each hyperparameter configuration. We will search based on this, and otherwise we'll do a gridsearch.

For the validation curves, we want the model to converge at a moderate pace. Gridsearch will mostly be done to see if we can get out of any local minima, but otherwise I think the training curve will indicate if there's any problems.

**Model Comparison**

Overall, I think the metrics we wanted to collect is enough. I will consult with papers again...

We want to compare a few things:

 - Params: ShuffleNet was built for mobile devices, so having less params is better.
 - MMacs: More MMacs is worse
 - we can test the model's inference speed later.
 - Performance: 
  - lowest val loss/signs of overfitting, check top1 and top3 error rates. this is a general performance metric.
  - convergence speed: how fast does model converge
  - how stable is the training curve? idk


In [None]:
mean, std = (0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261)

transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean, std)
])

train_set = torchvision.datasets.CIFAR10(root="data", train=True, download=True, transform=transform)
train_size = len(train_set)
test_set = torchvision.datasets.CIFAR10(root="data", train=False, download=True, transform=transform)
test_size = len(test_set)

print(f"Training data has {train_size} observations, test has {test_size}.")

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting data/cifar-10-python.tar.gz to data
Files already downloaded and verified
Training data has 50000 observations, test has 10000.


In [None]:
# TOP 1 and 3 ERROR
from util import top1_error, top3_error, plot_training_curve

# def top3_error(predictions, targets):  # even though it says top5, it's top3 ok...
#     if len(predictions.shape) == 1:
#       predictions = predictions.unsqueeze(1)
#     if len(targets.shape) == 1:
#       targets = targets.unsqueeze(1)
#     with torch.no_grad():
#         _, top5_pred = torch.topk(predictions, k=3, dim=1)
#         top5_correct = top5_pred.eq(targets.expand(top5_pred.size()))
#         correct = top5_correct.float().sum()
#         total = targets.size(0)
#     return correct.item(), total


# def top1_error(predictions, targets):
#     if len(predictions.shape) == 1:
#       predictions = predictions.unsqueeze(1)
#     if len(targets.shape) == 1:
#       targets = targets.unsqueeze(1)
#     with torch.no_grad():
#         _, top1_pred = torch.max(predictions, dim=1)
#         top1_correct = top1_pred.unsqueeze(1).eq(targets)
#         correct = top1_correct.float().sum()
#         total = targets.size(0)
#     return correct.item(), total

In [None]:
# code adapted from https://colab.research.google.com/github/uoft-csc413/2023/blob/master/assets/tutorials/tut04_cnn.ipynb#scrollTo=Ztj0yQO8-TtS

def get_dataloaders(batch_size):
  train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=1)
  test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=False, num_workers=1)

  data_loaders = {"train": train_loader, "test": test_loader}
  dataset_sizes = {"train": train_size, "test": test_size}
  return data_loaders, dataset_sizes

def run_epoch(model, loss_fn, optimizer, device, data_loaders, dataset_sizes):
  epoch_loss = {"train": 0.0, "test": 0.0}
  epoch_acc_1 = {"train": 0.0, "test": 0.0}
  epoch_acc_5 = {"train": 0.0, "test": 0.0}  # 5 is actually 3 btw
  
  # running loss for train/test phase
  running_loss = {"train": 0.0, "test": 0.0}
  running_corrects_1 = {"train": 0, "test": 0}
  running_corrects_5 = {"train": 0, "test": 0}

  # time and batch data
  start_time = time.time()
  batches = {"train": 0, "test": 0}

  for phase in ["train", "test"]:
    print(f"Running phase {phase}")
    # set train/eval mode
    if phase == "train":
        model.train(True)
    else:
        model.train(False)
    
    # go thru batches
    for data in data_loaders[phase]:
      batches[phase] += 1
      inputs, labels = data
      
      inputs = inputs.to(device)
      labels = labels.to(device)
      
      optimizer.zero_grad() # clear all gradients
      
      outputs = model(inputs) # batch_size x num_classes
      sm_outputs = torch.softmax(outputs, dim=1)
      _, preds = torch.max(outputs.data, 1) # values, indices
      loss = loss_fn(outputs, labels)
      
      if phase == "train":
          loss.backward()  # compute gradients
          optimizer.step() # update weights/biases
          
      running_loss[phase] += loss.data.item() * inputs.size(0)
      c, t = top1_error(sm_outputs, labels.data)
      running_corrects_1[phase] += c
      c2, t2 = top3_error(sm_outputs, labels.data)
      running_corrects_5[phase] += c2
    
      epoch_loss[phase] = running_loss[phase] / dataset_sizes[phase]
      epoch_acc_1[phase] =  running_corrects_1[phase] / dataset_sizes[phase]
      epoch_acc_5[phase] =  running_corrects_5[phase] / dataset_sizes[phase]

  return {"loss": epoch_loss,
          "top1_acc": epoch_acc_1,
          "top3_acc": epoch_acc_5,
          "running_corrects_1": running_corrects_1,
          "running_corrects_3": running_corrects_5,
          "dataset_sizes": dataset_sizes,
          "time": time.time() - start_time,
          "batches": batches}

def flatten_dict(dic, sep='_'):
  ret = dict()
  for key in dic:
    if isinstance(dic[key], dict):
      flat = flatten_dict(dic[key], sep='_')
      for key2 in flat:
        ret[key + sep + key2] = flat[key2]
    else:
      ret[key] = dic[key]
  return ret

def to_df(flattened_dict):
  d = {key: [flattened_dict[key]] for key in flattened_dict}
  return pd.DataFrame(d)

In [None]:
def train(model, device, batch_size, lr, beta0, beta1, weight_decay, epochs=100, csv_path="", models_path="tmp", plot=True, print_results_every_epoch=False):
  results = None

  # save model info
  res_csv = f"{csv_path}{model.label}_results.csv"
  mod_csv = f"{csv_path}{model.label}_params.csv"
  model_info = {"keys": ["label", "batch_size", "lr", "beta0", "beta1", "weight_decay"],
                "values": [model.label, batch_size, lr, beta0, beta1, weight_decay]}
  print(f"Saving model info to {mod_csv}")
  model_info = pd.DataFrame(model_info)
  model_info.to_csv(mod_csv, index=False)

  data_loaders, dataset_sizes = get_dataloaders(batch_size)
  loss_fn = nn.CrossEntropyLoss()
  optimizer = optim.Adam(model.parameters(), lr=lr, betas=(beta0, beta1), weight_decay=weight_decay)

  # train for many epochs
  for i in range(epochs):
    print(f"Epoch {i+1} / {epochs}")
    print("-" * 30)
    epoch_res = run_epoch(model, loss_fn, optimizer, device, data_loaders, dataset_sizes)
    epoch_res["epoch"] = i
    if print_results_every_epoch:
      print(epoch_res)
    epoch_res = flatten_dict(epoch_res)
    epoch_res = to_df(epoch_res)
    # print(epoch_res.transpose()) # print the flat version?
    results = pd.concat([results, epoch_res], axis=0) if results is not None else epoch_res

    # save information
    print(f"Saving to {res_csv}")
    results.to_csv(res_csv, index=True)
    model_info.to_csv(mod_csv, index=False)
    
    # save model info
    epoch_formatted = '{:04d}'.format(epoch)
    torch.save({"mod": model.state_dict(),
                "opt": optimizer.state_dict()}, f"{models_path}/{epoch_formatted}.csv")
    
    if plot:
      plot_training_curve(results)
  return results, model_info



In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

cuda


In [None]:
model = base_model().to(device) # updated model file to not include device
train(model, device, 32, 1e-4, beta0=0.9, beta1=0.999, weight_decay=0, epochs=10)

Saving model info to ShuffleNetV2_params.csv
Epoch 1 / 10
------------------------------
Running phase train
Running phase test
{'loss': {'train': 2.0450134798049926, 'test': 1.8473541984558106}, 'top1_acc': {'train': 0.23478, 'test': 0.3178}, 'top3_acc': {'train': 0.5498, 'test': 0.665}, 'running_corrects_1': {'train': 11739.0, 'test': 3178.0}, 'running_corrects_3': {'train': 27490.0, 'test': 6650.0}, 'dataset_sizes': {'train': 50000, 'test': 10000}, 'time': 85.24949932098389, 'batches': {'train': 1563, 'test': 313}, 'epoch': 0}
                                     0
loss_train                    2.045013
loss_test                     1.847354
top1_acc_train                0.234780
top1_acc_test                 0.317800
top3_acc_train                0.549800
top3_acc_test                 0.665000
running_corrects_1_train  11739.000000
running_corrects_1_test    3178.000000
running_corrects_3_train  27490.000000
running_corrects_3_test    6650.000000
dataset_sizes_train       50000.000

(   loss_train  loss_test  top1_acc_train  top1_acc_test  top3_acc_train  \
 0    2.045013   1.847354         0.23478         0.3178         0.54980   
 0    1.766087   1.678941         0.34742         0.3846         0.69520   
 0    1.620010   1.590618         0.40514         0.4080         0.75022   
 0    1.503184   1.511885         0.45030         0.4445         0.78568   
 0    1.407286   1.469218         0.49038         0.4660         0.81234   
 0    1.310587   1.437722         0.52760         0.4823         0.83624   
 0    1.228089   1.421808         0.55936         0.4903         0.85528   
 0    1.150845   1.410984         0.58700         0.4951         0.87266   
 0    1.067388   1.409521         0.61888         0.5048         0.88774   
 0    0.998365   1.423143         0.64432         0.5106         0.90204   
 
    top3_acc_test  running_corrects_1_train  running_corrects_1_test  \
 0         0.6650                   11739.0                   3178.0   
 0         0.7303 

on CPU it takes like an hour btw, 7 mins per epoch approximately

on gpu it's around 90s per epoch