# Baseline Setup on VGG19

Use this notebook to recreate the results of the original paper on the VGG19 network.

**Note :** I am using the labels generated from earlier experiments done with Resnet32 models

In [2]:
import os
import time
from google.colab import drive

drive.mount('/gdrive')

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).


In [None]:
! nvidia-smi

In [3]:
# install dependencies
! pip install torchtoolbox

Collecting torchtoolbox
  Downloading torchtoolbox-0.1.8.2-py3-none-any.whl (84 kB)
[?25l[K     |███▉                            | 10 kB 36.9 MB/s eta 0:00:01[K     |███████▊                        | 20 kB 7.7 MB/s eta 0:00:01[K     |███████████▋                    | 30 kB 7.0 MB/s eta 0:00:01[K     |███████████████▍                | 40 kB 3.4 MB/s eta 0:00:01[K     |███████████████████▎            | 51 kB 3.4 MB/s eta 0:00:01[K     |███████████████████████▏        | 61 kB 4.1 MB/s eta 0:00:01[K     |███████████████████████████     | 71 kB 4.3 MB/s eta 0:00:01[K     |██████████████████████████████▉ | 81 kB 4.7 MB/s eta 0:00:01[K     |████████████████████████████████| 84 kB 2.5 MB/s 
Collecting transformers
  Downloading transformers-4.18.0-py3-none-any.whl (4.0 MB)
[K     |████████████████████████████████| 4.0 MB 23.2 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
[K     |████████████████████████████████| 880 kB 57.4 MB/s 
[?25hCollect

In [4]:
# import dependencies
from scipy.stats import entropy
import numpy as np
import matplotlib.pyplot as plt
import glob
from scipy.special import kl_div

In [5]:
# training parameters
ROOT = '/gdrive/MyDrive/practical_deep_learning/project/cd3250_experiments'
LABELS = '/labels/label_files'
OUTPUTS = '/gdrive/MyDrive/practical_deep_learning/project/outputs'

original = glob.glob(ROOT + LABELS + '/cifar10_original/*.npy')
entropy = glob.glob(ROOT + LABELS + '/entropy/*.npy')

## Model Training

Train the following baseline models. No noise is currently present in the labels.

- Categorical Labels
- High Dimentional Labels
    - Speech
    - Shuffled Speech
    - Uniform Distribution
    - Composite of Gaussians
    - BERT Embeddings
    - Random
- Low Dimentional Labels
- GloVe Embeddings


### Model Training - Categorical Labels

Parameters
- Model : `VGG19`
- Dataset : CIFAR 10
- Seed : 7
- Model Save directory : `OUTPUT`

In [12]:
start = time.time()
! python /gdrive/MyDrive/practical_deep_learning/project/original_experiments/train.py --model vgg19 --dataset cifar10 --seed 7 --label category --base_dir /gdrive/MyDrive/practical_deep_learning/project/outputs
end = time.time()
print("Time:", end-start)

Start training 100% cifar10 category model with manual seed 7 and model vgg19.
Best model location: /gdrive/MyDrive/practical_deep_learning/project/outputs/cifar10/seed7/vgg19/model_category/category_seed7_vgg19_best_model.pth.
Checkpoint location: /gdrive/MyDrive/practical_deep_learning/project/outputs/cifar10/seed7/vgg19/model_category/category_seed7_vgg19_checkpoint.pth.
Log location: /gdrive/MyDrive/practical_deep_learning/project/outputs/log/cifar10_category_log.csv.
Snapshots location: /gdrive/MyDrive/practical_deep_learning/project/outputs/cifar10/seed7/vgg19/model_category/snapshots
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Number of Epochs : 100
Epoch 1 train loss 2.5305571976033123 | valid loss 2.0594279766082764 | valid acc 0.2116
Saving new best checkpoint...
Epoch 2 train loss 1.9824220030145212 | valid loss 1.9366320538520814 | valid acc 0.2592
Saving new best checkpoint...
Epoch 3 train loss 1.871306

### Model Training - High Dimentional Labels - Speech

Parameters
- Model : `VGG19`
- Dataset : CIFAR 10
- Seed : 7
- Model Save Directory : `OUTPUT`
- Speech Label Directory : `/gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/`

In [8]:
start = time.time()
! python /gdrive/MyDrive/practical_deep_learning/project/original_experiments/train.py --model vgg19 --dataset cifar10 --seed 7 --label speech --base_dir /gdrive/MyDrive/practical_deep_learning/project/outputs --label_dir /gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/
end = time.time()
print("Time:", end-start)

Start training 100% cifar10 speech model with manual seed 7 and model vgg19.
Best model location: /gdrive/MyDrive/practical_deep_learning/project/outputs/cifar10/seed7/vgg19/model_speech/speech_seed7_vgg19_best_model.pth.
Checkpoint location: /gdrive/MyDrive/practical_deep_learning/project/outputs/cifar10/seed7/vgg19/model_speech/speech_seed7_vgg19_checkpoint.pth.
Log location: /gdrive/MyDrive/practical_deep_learning/project/outputs/log/cifar10_speech_log.csv.
Snapshots location: /gdrive/MyDrive/practical_deep_learning/project/outputs/cifar10/seed7/vgg19/model_speech/snapshots
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Number of Epochs : 200
Epoch 1 train loss 8.924302885478193 | valid loss 7.0967805957794186 | valid acc 0.245
Saving new best checkpoint...
Epoch 2 train loss 6.620989328080958 | valid loss 6.575222864151001 | valid acc 0.3814
Saving new best checkpoint...
Epoch 3 train loss 5.775987556034869 | valid 

### Model Training - High Dimentional Labels - Shuffled Speech

Parameters
- Model : `VGG19`
- Dataset : CIFAR 10
- Seed : 7
- Model Save Directory : `OUTPUT`
- Speech Label Directory : `/gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/`

In [None]:
start = time.time()
! python /gdrive/MyDrive/practical_deep_learning/project/original_experiments/train.py --model vgg19 --dataset cifar10 --seed 7 --label shuffle --base_dir /gdrive/MyDrive/practical_deep_learning/project/outputs --label_dir /gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/
end = time.time()
print("Time:", end-start)

Start training 100% cifar10 shuffle model with manual seed 7 and model vgg19.
Best model location: /gdrive/MyDrive/practical_deep_learning/project/outputs/cifar10/seed7/vgg19/model_shuffle/shuffle_seed7_vgg19_best_model.pth.
Checkpoint location: /gdrive/MyDrive/practical_deep_learning/project/outputs/cifar10/seed7/vgg19/model_shuffle/shuffle_seed7_vgg19_checkpoint.pth.
Log location: /gdrive/MyDrive/practical_deep_learning/project/outputs/log/cifar10_shuffle_log.csv.
Snapshots location: /gdrive/MyDrive/practical_deep_learning/project/outputs/cifar10/seed7/vgg19/model_shuffle/snapshots
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Number of Epochs : 200
Epoch 1 train loss 12.862218049439518 | valid loss 11.480647048950196 | valid acc 0.2774
Saving new best checkpoint...
Epoch 2 train loss 9.76443181254647 | valid loss 9.031763553619385 | valid acc 0.4978
Saving new best checkpoint...
Epoch 3 train loss 8.257414342327552 

### Model Training - High Dimentional Labels - Composite Gaussian

Parameters
- Model : `VGG19`
- Dataset : CIFAR 10
- Seed : 7
- Model Save Directory : `OUTPUT`
- Speech Label Directory : `/gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/`

In [None]:
start = time.time()
! python /gdrive/MyDrive/practical_deep_learning/project/original_experiments/train.py --model vgg19 --dataset cifar10 --seed 7 --label composite --base_dir /gdrive/MyDrive/practical_deep_learning/project/outputs --label_dir /gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/
end = time.time()
print("Time:", end-start)

### Model Training - High Dimentional Labels - BERT Embeddings

Parameters
- Model : `VGG19`
- Dataset : CIFAR 10
- Seed : 7
- Model Save Directory : `OUTPUT`
- Speech Label Directory : `/gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/`

In [None]:
start = time.time()
! python /gdrive/MyDrive/practical_deep_learning/project/original_experiments/train.py --model vgg19 --dataset cifar10 --seed 7 --label bert --base_dir /gdrive/MyDrive/practical_deep_learning/project/outputs --label_dir /gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/
end = time.time()
print("Time:", end-start)

### Model Training - High Dimentional Labels - Random

Parameters
- Model : `VGG19`
- Dataset : CIFAR 10
- Seed : 7
- Model Save Directory : `OUTPUT`
- Speech Label Directory : `/gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/`

In [None]:
start = time.time()
! python /gdrive/MyDrive/practical_deep_learning/project/original_experiments/train.py --model vgg19 --dataset cifar10 --seed 7 --label random --base_dir /gdrive/MyDrive/practical_deep_learning/project/outputs --label_dir /gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/
end = time.time()
print("Time:", end-start)

In [None]:
start = time.time()
! python /gdrive/MyDrive/practical_deep_learning/project/original_experiments/train.py --model vgg19 --dataset cifar10 --seed 7 --label lowdim --base_dir /gdrive/MyDrive/practical_deep_learning/project/outputs --label_dir /gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/
end = time.time()
print("Time:", end-start)

In [None]:
start = time.time()
! python /gdrive/MyDrive/practical_deep_learning/project/original_experiments/train.py --model vgg19 --dataset cifar10 --seed 7 --label glove --base_dir /gdrive/MyDrive/practical_deep_learning/project/outputs --label_dir /gdrive/MyDrive/practical_deep_learning/project/original_experiments/labels/label_files/
end = time.time()
print("Time:", end-start)