# DataScience: Convolutional Neural Networks

**Your name: Tyler Lott** 


In this homework, we will implement and train a CNN model on CIFAR-10.

**The rule of this homework is to design an advanced CNN model to achieve good performance on CIFAR-10.** The end of this notebooks has several hints to improve your results. 


**Please list your modifications below**:
<br/>
Did everything in Tensorflow because I am more familiar with it
- Loaded data directly from downloaded files instead of torch dataloader
- Implemented a ResNet CNN in tensorflow

<br/>
Preprocessed images
- made black and white


<br/>
Applied transformations to the images to create new training data
- zoom applied to some
- random rotation applied to some
<br/>
Used SAM Optimizer, this estimates a sharpness-aware gradient 
<br/>
<br/>


# Tensorflow network

## Load Data
I downloaded the dataset from keras datasets. I loaded this into memory because I have hella on my local machine.

In [1]:
# Imports

from tensorflow.keras.datasets import cifar10
import matplotlib.pyplot as plt

ModuleNotFoundError: No module named 'tensorflow'

In [None]:
(train_data, train_labels), (test_data, test_labels) = cifar10.load_data()

Show a couple of images to make sure they are in there

In [None]:
for i in range(9):
    plt.subplot(330 + 1 + i)
    plt.imshow(train_data[i])
plt.show

# Preprocess Data

In [None]:
# Imports

import numpy as np
# from sklearn.utils import shuffle
from tensorflow.keras.preprocessing.image import ImageDataGenerator

## Create one hot matrix of labels

In [None]:
def one_hotify(np_array):
    nb_classes = np_array.max()+1
    targets = np.array([np_array]).reshape(-1)
    return np.eye(nb_classes)[targets]
    
train_labels = one_hotify(train_labels)
test_labels = one_hotify(test_labels)

labels_catagories = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [None]:
print(train_data.shape)
print(train_labels.shape)

## Shuffle datasets
I don't know if this is strictly nessecary but I figured it wouldn't hurt, other than a small amount of compute time.

In [None]:
# train_data, train_labels = shuffle(train_data, train_labels)

plt.imshow(train_data[0])
plt.show
print(train_labels[0])

## Normalize the images

Image pixel data has values between 0 and 255 so here we are normalizing between 0 and 1, should help with compute.

In [None]:
train_data = train_data.astype('float32')/255
test_data = test_data.astype('float32')/255

## Break into train and validation

In [None]:
train_data, train_valid = train_data[5000:], train_data[:5000]
train_labels, valid_labels = train_labels[5000:], train_labels[:5000]

Below is the data generator for training images. This augments the images before training but in a generator form so
the whole augmented dataset isn't in memory, only the batch size being used. Decreased Training time per epoch and 
allowedtraining data to be augmented for each batch instead of just once like it was before. This increased model 
accuracy. 

In [None]:
datagen_train = ImageDataGenerator(width_shift_range=.12, height_shift_range=.12, horizontal_flip=True, zoom_range=.2)

datagen_train.fit(train_data)

## Image Augementation: Pre-ImageDataGenerator

Below was a decent Idea to augment the images but using tensorflow.keras imagePreprocessing proved to be faster and 
I was able to make a generator out of that so I didn't have to put the whole dataset into memory.

In [None]:
# Create an image manipulator to adjust images zoom and rotation

import PIL
from PIL import Image
import random

random.seed(69)

def image_augment(np_image):
    # create image from array
    im = Image.fromarray(np_image)
    og_height, og_width = im.size
    
    # get augmentation parameters from random
    rotation = random.randint(5,34)
    crop_pix = 4
    if rotation < 19:
        crop_pix = 3
    if rotation < 10:
        crop_pix = 2
    crop = (crop_pix, crop_pix, og_height-crop_pix, og_width-crop_pix)
    x_flip = bool(random.getrandbits(1))
    y_flip = bool(random.getrandbits(1))
    
    # augment image
    if x_flip and y_flip:
        return np.array(im.rotate(rotation).crop(crop).transpose(PIL.Image.FLIP_LEFT_RIGHT).transpose(PIL.Image.FLIP_TOP_BOTTOM).resize((32, 32)))
    elif x_flip:
        return np.array(im.rotate(rotation).crop(crop).transpose(PIL.Image.FLIP_LEFT_RIGHT).resize((32, 32)))
    elif y_flip:
        return np.array(im.rotate(rotation).crop(crop).transpose(PIL.Image.FLIP_TOP_BOTTOM).resize((32, 32)))
    else: 
        return np.array(im.rotate(rotation).crop(crop).resize((32, 32)))

This section was the loop to augment the images and save as a new dataset as originally planned, but as discussed above, using
a generator proved to be much more time and memory efficient.

In [None]:
# # Create augmented versions of all images
import time

start = time.time()

for i in range(len(train_data)):
    aug = np.reshape(image_augment(train_data[i]), (1, 32, 32, 3))
    train_data = np.concatenate((train_data, aug), axis=0)
    if i % 1000 == 0:
        int_time = time.time()
        print(f'Time to modify {i} images: {round(int_time - start)}s')

end = time.time()

train_labels = np.concatenate((train_labels, train_labels), axis=0)

print(f'Time to create augmented data: {round(end - start)}s')


This section is loading the data pre-generator

In [None]:
# # save or load data because that took hella long to process
# 
import os.path
import pickle
from os import path

path_data = 'Cifar10_train_data'
path_labels = 'Cifar10_train_labels'

if path.exists(path_data):
    pickle_in = open(path_data, 'rb')
    train_data = pickle.load(pickle_in)
else:
    pickle_out = open(path_data, 'wb')
    pickle.dump(train_data, pickle_out)
    pickle_out.close()

if path.exists(path_labels):
    pickle_in = open(path_labels, 'rb')
    train_labels = pickle.load(pickle_in)
else:
    pickle_out = open(path_labels, 'wb')
    pickle.dump(train_labels, pickle_out)
    pickle_out.close()

This section just compared the first 9 images before and after modification

In [None]:
# # plot original first 9 images
for i in range(9):
    plt.subplot(330 + 1 + i)
    plt.imshow(train_data[i])
plt.show

# plot augmented first 9 images
for i in range(9):
    plt.subplot(330 + 1 + i)
    plt.imshow(image_augment(train_data[i]))
plt.show


# Network design: ResNet 101 structure

In [None]:
# Imports

from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation, Add, MaxPooling2D, Input, ZeroPadding2D, MaxPooling2D, Flatten, Dense, AveragePooling2D, Dropout
from tensorflow.keras.activations import relu
from tensorflow.keras.regularizers import l2
from tensorflow.keras import Model

## identity residual block

In [None]:
def res_id(x, filters):
    skip = x
    f1, f2 = filters
    
    reg = .001
  
    x = Conv2D(f1, kernel_size=(1, 1), strides=(1, 1), padding='valid', kernel_regularizer=l2(reg))(x)
    x = BatchNormalization()(x)
    x = Activation(relu)(x)
   
    x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', kernel_regularizer=l2(reg))(x)
    x = BatchNormalization()(x)
    x = Activation(relu)(x)

    x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', kernel_regularizer=l2(reg))(x)
    x = BatchNormalization()(x)
    
    x = Add()([x, skip])
    x = Activation(relu)(x)
    
    return x

## Convolution residual block

In [None]:
def res_conv(x, s, filters):

    skip = x
    f1, f2 = filters
    
    reg = .001
    
    x = Conv2D(f1, kernel_size=(1, 1), strides=(s, s), padding='valid', kernel_regularizer=l2(reg))(x)
    x = BatchNormalization()(x)
    x = Activation(relu)(x)
    
    x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', kernel_regularizer=l2(reg))(x)
    x = BatchNormalization()(x)
    x = Activation(relu)(x)
    
    x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', kernel_regularizer=l2(reg))(x)
    x = BatchNormalization()(x)
    
    skip = Conv2D(f2, kernel_size=(1, 1), strides=(s, s), padding='valid', kernel_regularizer=l2(reg))(skip)
    skip = BatchNormalization()(skip)
    
    x = Add()([x, skip])
    x = Activation(relu)(x)
    
    return x
    

## Main structure of ResNet101

In [None]:
def resnet101():

    # Part 1
    in_image = Input(shape=(train_data.shape[1], train_data.shape[2], train_data.shape[3]))
    x = ZeroPadding2D(padding=(3, 3))(in_image)
    
    x = Conv2D(64, kernel_size=(7, 7), strides=(2, 2))(x)
    x = BatchNormalization()(x)
    x = Activation(relu)(x)
    x = MaxPooling2D((3, 3), strides=(2, 2))(x)
    
    # Part 2
    filt = (64, 256)
    x = res_conv(x, s=1, filters=filt)
    x = res_id(x, filters=filt)
    x = res_id(x, filters=filt)
    
    # Part 3
    filt = (128, 512)
    x = res_conv(x, s=2, filters=filt)
    x = res_id(x, filters=filt)
    x = res_id(x, filters=filt)
    x = res_id(x, filters=filt)
    
    # Part 4
    filt = (256, 1024)
    x = res_conv(x, s=2, filters=filt)
    for i in range(22):
        x = res_id(x, filters=filt)
    
    # Part 5  
    filt = (512, 2048)
    x = res_conv(x, s=2, filters=filt)
    x = res_id(x, filters=filt)
    x = res_id(x, filters=filt)
    
    # End
    x = AveragePooling2D((2, 2), padding='same')(x)

    x = Flatten()(x)
    x = Dense(len(train_labels[0]), activation='softmax', kernel_initializer='he_normal')(x)
    
    
    model = Model(inputs=in_image, outputs=x, name='ResNet101')
    
    return model 

# build model

In [None]:
model = resnet101()
# # model.summary()

# Train Model

In [None]:
# Imports

import time

In [None]:
BATCH_SIZE = 128
EPOCHS = 250
SAVEPATH = f'weights/ResNet101_bs-{BATCH_SIZE}_ep-{EPOCHS}_{int(time.time())}.h5'

## Create Callbacks

In [None]:
# Imports 
from tensorflow.keras.callbacks import TensorBoard, ModelCheckpoint
from tensorflow.keras.callbacks import LearningRateScheduler
import numpy as np
import shutil
import os

Tensorboard Callback to plot the loss and accuray while training

In [None]:
# remove old file or create new for the callback to save to
log_dir = f'cifar_logs\\{BATCH_SIZE}-{EPOCHS}'
if not os.path.exists(f'cifar_logs/{BATCH_SIZE}-{EPOCHS}'):
    os.mkdir(f'cifar_logs/{BATCH_SIZE}-{EPOCHS}')
    
if os.path.exists(f'cifar_logs/{BATCH_SIZE}-{EPOCHS}/train'):
    shutil.rmtree(f'cifar_logs/{BATCH_SIZE}-{EPOCHS}/train')
    
if os.path.exists(f'cifar_logs/{BATCH_SIZE}-{EPOCHS}/validation'):
    shutil.rmtree(f'cifar_logs/{BATCH_SIZE}-{EPOCHS}/validation')

loss_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)

Model checkpoint callback to save weights while training

In [None]:
weights_callback = ModelCheckpoint(filepath=SAVEPATH, save_weights_only=True, monitor='val_accuracy', mode='max', save_best_only=True)
# weights_callback = ModelCheckpoint(filepath=SAVEPATH, save_weights_only=True, save_freq='epoch')

Learning rate decay callback to adjust learning rate while training

In [None]:
def lrdecay(epoch):
    lr = 1e-3
    if epoch > 250:
        lr *= 0.5e-3
    elif epoch > 210:
        lr *= 1e-3
    elif epoch > 160:
        lr *= 1e-2
    elif epoch > 120:
        lr *= 1e-1
    print('Learning rate: ', lr)
    return lr
  # if epoch < 40:
  #   return 0.01
  # else:
  #   return 0.01 * np.math.exp(0.03 * (40 - epoch))
lrdecay = LearningRateScheduler(lrdecay)

## Define the Optimizer

In [None]:
# Imports 

from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.optimizers import Adam, SGD

In [None]:
# tried two different optimizing functions while training this network 

opt = Adam(learning_rate=.0001)
opt = SGD(lr=.001, momentum=.8, decay=.001/100, nesterov=True)

# model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

## Load weights from the model with the best performance

In [None]:
# Load weights from best performing network trained from scratch
model.load_weights('weights/ResNet101_bs-256_ep-160_1604135162.h5')

## Train the model

In [None]:
# uncomment to actually train

history = model.fit(datagen_train.flow(train_data, train_labels, batch_size=BATCH_SIZE),
                    steps_per_epoch=train_data.shape[0] // BATCH_SIZE,
                    epochs=EPOCHS, verbose=2, callbacks=[loss_callback, weights_callback],
                    validation_data=(train_valid, valid_labels),
                    validation_steps=train_valid.shape[0] // BATCH_SIZE)

In [None]:
# plots the training variables of the model fit above

print(history.history)

plt.plot(history.history['val_loss'])
plt.title('Validation loss history')
plt.ylabel('Loss value')
plt.xlabel('No. epoch')
plt.show()

plt.plot(history.history['loss'])
plt.title('Loss history')
plt.ylabel('Loss value')
plt.xlabel('No. epoch')
plt.show()

plt.plot(history.history['accuracy'])
plt.title('Accuracy history')
plt.ylabel('Loss value')
plt.xlabel('No. epoch')
plt.show()

plt.plot(history.history['val_accuracy'])
plt.title('Validation Accuracy history')
plt.ylabel('Loss value')
plt.xlabel('No. epoch')
plt.show()

## Evaluate Model Accuracy

In [None]:
# evaluate the test data on the model
# this should give 85.8% accuracy

result = model.evaluate(test_data, test_labels)

print(f'Test Accuracy of the model: {result[1]}')

## Remove old tensorflow model from memory

In [None]:
from tensorflow.keras.backend import clear_session

clear_session()

# Doing Better

This was all well and good for a model but I want to do better, especially with the resnet101 I coded from scratch.
To do this I used google's Big Transfer (BiT). This essentially uses a resnet101 trained on ImageNet-21k (a massive 
dataset with 21k categories, lots of training time on many GPUs was put into these models by google)

## Load Model

In [None]:
# Imports
import tensorflow_hub as hub

In [None]:
model_url = "https://tfhub.dev/google/bit/m-r101x1/1"
module = hub.KerasLayer(model_url)

## Create model 
Here we use some transfer learning trickery and add a Dense layer as the head with size of 10 so we are able to predict our 10 categories

In [None]:
# Imports
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense

In [None]:
class BiTResNet101(Model):

    def __init__(self, num_classes, module):
        super().__init__()
        
        self.num_classes = num_classes
        self.head = Dense(num_classes, kernel_initializer='zeros')
        self.bit_module = module
    
    def call(self, images):
        bit_embedding = self.bit_module(images)
        return self.head(bit_embedding)

model = BiTResNet101(num_classes=len(train_labels[0]), module=module)

## Define learning rate

In [None]:
# Imports 

from tensorflow.keras.optimizers.schedules import PiecewiseConstantDecay

In [None]:
lr = .1

lr_decay = PiecewiseConstantDecay(boundaries=[3000, 6000, 9000], values=[lr, lr*.1, lr*.001, lr*.0001])

## Define optimizer

In [None]:
# Imports
from tensorflow.keras.optimizers import SGD

In [None]:
opt = SGD(learning_rate=lr_decay, momentum=.9)

## Define loss

In [None]:
# Imports

from tensorflow.keras.losses import CategoricalCrossentropy

In [None]:
loss_func = CategoricalCrossentropy(from_logits=True)

## Compile model

In [None]:
model.compile(optimizer=opt, loss=loss_func, metrics=['accuracy'])

## Train Model

In [None]:
BATCH_SIZE = 512

history = model.fit(datagen_train.flow(train_data, train_labels, batch_size=BATCH_SIZE),
                    steps_per_epoch=train_data.shape[0] // BATCH_SIZE,
                    epochs=10, 
                    verbose=2,
                    validation_data=(train_valid, valid_labels),
                    validation_steps=train_valid.shape[0] // BATCH_SIZE)

## Evaluate model

In [None]:
result = model.evaluate(test_data, test_labels)

print(f'Test Accuracy of the model: {result[1]}')

In [None]:
import tensorflow

device_name = tensorflow.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

# Original Pytorch Stuff Below
I dislike Pytorch so I did everything in tesorflow above. The only change I made to this was commenting the last two
cells so I didn't spend any time training that default model.

First, import the packages or modules required for the competition.

In [None]:
import os
import pandas as pd
import shutil
import time
from copy import deepcopy

device = 'cuda'

### Loading and normalizing 

Using torchvision, it’s extremely easy to load CIFAR10.

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import random_split

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainset, valset = random_split(trainset, [42000,8000])
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=2)
valloader = torch.utils.data.DataLoader(valset, batch_size=64, 
                                        shuffle=False, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

## Define the Model

(We will cover hybridize next week. It often makes your model run faster, but you can ignore what it means for this homework.)

Here, we build the residual blocks based on the HybridBlock class, which is slightly different than the implementation described in the [“Residual networks (ResNet)”](http://d2l.ai/chapter_convolutional-neural-networks/resnet.html) section. This is done to improve execution efficiency.

In [None]:
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net().to(device)

In [None]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.01)
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)

## Define the Training Functions

We will select the model and tune hyper-parameters according to the model's performance on the validation set. Next, we define the model training function `train`. We record the training time of each epoch, which helps us compare the time costs of different models.

In [None]:
# %%time
# best_val_acc = -1000
# best_val_model = None
# for epoch in range(10):  
#     net.train()
#     running_loss = 0.0
#     running_acc = 0
#     for i, data in enumerate(trainloader, 0):
#         inputs, labels = data
#         inputs, labels = inputs.cuda(),labels.cuda()
# 
#         optimizer.zero_grad()
#         outputs = net(inputs)
#         loss = criterion(outputs, labels)
#         loss.backward()
#         optimizer.step()
# 
#         # print statistics
#         running_loss += loss.item() * inputs.size(0)
#         out = torch.argmax(outputs.detach(),dim=1)
#         assert out.shape==labels.shape
#         running_acc += (labels==out).sum().item()
#     print(f"Train loss {epoch+1}: {running_loss/len(trainset)},Train Acc:{running_acc*100/len(trainset)}%")
#     
#     correct = 0
#     net.eval()
#     with torch.no_grad():
#         for inputs,labels in valloader:
#             out = net(inputs.cuda()).cpu()
#             out = torch.argmax(out,dim=1)
#             acc = (out==labels).sum().item()
#             correct += acc
#     print(f"Val accuracy:{correct*100/len(valset)}%")
#     if correct>best_val_acc:
#         best_val_acc = correct
#         best_val_model = deepcopy(net.state_dict())
#     lr_scheduler.step()
#     
# print('Finished Training')  

In [None]:
# %%time
# correct = 0
# net.load_state_dict(best_val_model)
# net.eval()
# with torch.no_grad():
#     for inputs,labels in testloader:
#         out = net(inputs.cuda()).cpu()
#         out = torch.argmax(out,dim=1)
#         acc = (out==labels).sum().item()
#         
#         correct += acc
# print(f"Test accuracy: {correct*100/len(testset)}%")

## Hints to Improve Your Results

* You'd better use a GPU machine to run it, otherwise it'll be quite slow.
* Revise the simple CNN model
* Revise the *transforms* function by using some image augumentation techniques
* Tune hyper-parameters, such as batch_size
* Change to another network, such as ResNet-34 or Inception
* Using the pre-trained models