<a href="https://colab.research.google.com/github/btcain44/Applied_Deep_Learning/blob/main/Enhanced_Transfer_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Bi-Weekly Report #5
### Brian Cain
#### Enhanced_Transfer_Learning.ipynb

In my previous attempts I have had lackluster performance in attempting to classify the Cifar-100 dataset. The best results I got were in my Bi-Weekly Report #3 when I made a Generalist-Specialist model with 39.62% accuracy. 

I now will try to extend the lessons learned in the <b>Comparitive_Transfer_Learning.ipynb</b> notebook to apply transfer learning to the CIFAR-100 classification problem. Here are a couple of lessons learned about transfer learning in that notebook:
* Add layers after ImageNet weights to give the network more ability to learn about Task B
* Turn on Fine Tuning of Task A's weights when training Task B so the network can adjust the ImageNet weights to boost classification 

Now I will import the CIFAR-100 dataset. 

In [1]:
##Import necessary packages
import numpy as np
import tensorflow as tf

##Load the CIFAR-100 dataset
from tensorflow import keras
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar100.load_data()
assert x_train.shape == (50000, 32, 32, 3)
assert x_test.shape == (10000, 32, 32, 3)
assert y_train.shape == (50000, 1)
assert y_test.shape == (10000, 1)
 

##Format the labels so we have shape (50000,) and (10000,) respectively 
y_train = np.array([i[0] for i in y_train])
y_test = np.array([i[0] for i in y_test])


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz


### Transfer Learning CIFAR-100 by Going Deeper

Lets craft up the enhanced architecture of the model with Fine-Tuning enabled. Here are some additional features of this architecture and what they are aimed at solving:
* <b>Fine-Tuning ImageNet Weights:</b> The results in Comparitive_Transfer_Learning.ipynb made it obvious that transfer learning from ResNet50 with additional dense layers could only do so much. By Fine-tuning weights we can improve accuracy by learning more about the data in our specific classification problem rather than just relying on static weights designed for ImageNet
* <b>Add Convolutions After ResNet50 ImageNet Weights:</b> Another hypothesized downfall of the transfer learning architecture used in Comparitive_Transfer_Learning.ipynb was the fact that only a single dense layer was used after the ImageNet weights to output a prediction. This time around, I made a deeper network by adding two 1x1 convolutions combined with Global Average Pooling to give the Cifar-100 network a change to learn even more about the data. However, I am concerned if over-fitting may occur because of this. 
* <b>Adam Optimizer with Learning Rate .001:</b> In previous attempts to classify the CIFAR-100 dataset, I have achieved extremely low accuracy. This time, I chose to specify a relatively low learning rate for the Adam optimizer so that the model will be less likely to miss optimal weights when Fine-Tuning the network and creating weights for new convolutions. This will add compute time but I am hoping it is worth it. 

In [2]:
##Import the ResNet50 Architecture
import tensorflow as tf
from tensorflow.keras.applications import ResNet50

#Obtain Imagenet model weights
feature_extractor = ResNet50(weights='imagenet', 
                             input_shape=(32, 32, 3),
                             include_top=False)

#For better performance, make it so that we are Fine-Tuning imagenet weights during transfer learning
feature_extractor.trainable = True

#Define dimensions for the
input_ = tf.keras.Input(shape=(32, 32, 3))

#Create layer that extracts features
x = feature_extractor(input_, training=True)

##Add additional convolutions to see if this helps network become more accuracte
x = tf.keras.layers.Conv2D(1024, (1, 1), activation='relu', input_shape=(1,1,2048))(x)
x = tf.keras.layers.Conv2D(612, (1, 1), activation='relu', input_shape=(1,1,1024))(x)

#Perform global average pooling to condense ResNet50 output
x = tf.keras.layers.GlobalAveragePooling2D()(x)

# Set the final layer with sigmoid activation function
x = tf.keras.layers.Dense(306, activation='relu')(x)
output_ = tf.keras.layers.Dense(100, activation='softmax')(x)

#Make an instance of the transfer learning architecture
cifar_100_deep = tf.keras.Model(input_, output_)

#Compile the model
cifar_100_deep.compile(optimizer=tf.keras.optimizers.Adam(lr=0.001),
                             loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
                             metrics=['accuracy'])

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5


  "The `lr` argument is deprecated, use `learning_rate` instead.")


Now let's split the train/validation set and train the model:

In [3]:
##Conduct a step of data normalization
x_train, x_test = x_train / 255.0, x_test / 255.0

##Perform train/validation split on the data
from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.25, random_state=42)

In [8]:
##Fit and train the model
cifar_deep_fit = cifar_100_deep.fit(x_train, y_train, batch_size=100, epochs=50, 
                                   validation_data=(x_val, y_val))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


Now lets Assess the Test Results:

In [9]:
##Evaluate the testing accuracy of the model
print('Test Accuracy of CIFAR-100 Transfer Learned:')
cifar_100_deep.evaluate(x_test,  y_test, verbose=0)[1]

Test Accuracy of CIFAR-100 Transfer Learned:


0.44609999656677246

To start off with a positive, this transfer learning technique did surpass my previous best accuray on CIFAR-100 of 39.62% with a test accuracy of 44.61%. 

However, there ended up being a crazy amount of over-fitting present in this CIFAR-100 classification model using ResNet50 ImageNet transfer learning. I'm not sure exactly what would account for that to happen but perhaps the addition of the 1x1 convolutional layers and a lower learning rate had something to do with it. 

### Transfer Learning CIFAR-100 going Less Deep and More Data

One more experiment I would like to try with transfer learning to see if I can improve CIFAR-100 classification is to make the network slightly less deep and give it more training data. Here are the ways I will do so:
* <b>Get rid of additional convolutional layers after ImageNet weights</b>
* <b>Mixup Data Augmentation:</b> In Bi-Weekly report 2, I attempted using Mixup Data Augmentation on the CIFAR-100 dataset. Although the results weren't great, it did prevent over-fitting. I will re-train the network here using Mixup Augmentation and hope to see less over-fitting. 

Define function for mixup data augmentation:

In [10]:
##Define a function that performs mixup two images 
def mixup(image1, image2, label1, label2, beta_params):
    
    ##Generate sample from lambda distribution 
    lambda_val = np.random.beta(beta_params[0], beta_params[1])
    
    ##Perform mix-up operation 
    newImg = lambda_val*image1 + (1-lambda_val)*image2
    newLabel = round(lambda_val*label1 + (1-lambda_val)*label2)
    
    return newImg.astype(int), newLabel

Now perform mix-up on a segment of the training data:

In [11]:
##Alter a random 20% of the training data 
np.random.seed(0)
rand_images = np.random.randint(1, 37500, size=int(37500*.2), dtype=int)

##Create training dataset using mixup
x_train_mixup = []
y_train_mixup = []
for i in rand_images:
    mixup_result = (mixup(x_train[i],x_train[i-1],y_train[i],y_train[i-i],[.2,.2]))
    x_train_mixup.append(mixup_result[0])
    y_train_mixup.append(mixup_result[1])

##Translate training data into numpy arrays
x_train_mixup = np.array(x_train_mixup)
y_train_mixup = np.array(y_train_mixup)

##Concatenate new data onto existing training data
x_train_mixup = np.concatenate((x_train, x_train_mixup), axis=0)
y_train_mixup = np.concatenate((y_train, y_train_mixup),axis=None)

Now we will make the alterations to the model described above:

In [20]:
#Obtain Imagenet model weights
feature_extractor = ResNet50(weights='imagenet', 
                             input_shape=(32, 32, 3),
                             include_top=False)

#For better performance, make it so that we are Fine-Tuning imagenet weights during transfer learning
feature_extractor.trainable = True

#Define dimensions for the
input_ = tf.keras.Input(shape=(32, 32, 3))

#Create layer that extracts features
x = feature_extractor(input_, training=True)

#Perform global average pooling to condense ResNet50 output
x = tf.keras.layers.GlobalAveragePooling2D()(x)

# Set the final layer with sigmoid activation function
output_ = tf.keras.layers.Dense(100, activation='softmax')(x)

#Make an instance of the transfer learning architecture
cifar_100_shallow = tf.keras.Model(input_, output_)

#Compile the model
cifar_100_shallow.compile(optimizer='Adam',
                             loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
                             metrics=['accuracy'])

Also it is important to note that in this model I had to use a larger batch size of 200 over 25 epochs to speed up training because Google had previously interrupted my session for going over the GPU limits so I wanted to make sure I could fit this training session in without being interrupted.  

In [23]:
##Fit and train the model
cifar_shallow_fit = cifar_100_shallow.fit(x_train_mixup, y_train_mixup, batch_size=200, epochs=25, 
                                   validation_data=(x_val, y_val))

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


In [24]:
##Evaluate the testing accuracy of the model
print('Test Accuracy of CIFAR-100 Transfer Learned with Data Augmentation and Shallower Framework:')
cifar_100_shallow.evaluate(x_test,  y_test, verbose=0)[1]

Test Accuracy of CIFAR-100 Transfer Learned with Data Augmentation and Shallower Framework:


0.38199999928474426

It might be unfair to directly compare this shallower transfer learning model to the one above because of increased batch size and decreased epochs, but it seems that my attempts to decrease over-fitting and increase accuracy have actually made the model worse. The test accuracy for this model is 38.2%, which is a decrease from the Generalist Specialist model made in report #3. So, it seems that by removing those convolutions I had added in the deeper transfer learning model, that we ended up losing some predictive power. I don't think that any predictive power was lost by adding the data augmentation because typically data augmentation is empirically shown to boost performance. 

## Final Thoughts

The CIFAR-100 classification problem remains a tough one to solve, at least for me. However, using Transfer Learning with ImageNet weights from ResNet50 broke my previous record for best accuracy attained on the CIFAR-100 dataset. My previous record was obtained through a laborious process of making Generalist and Specialist models for coarse and fine labels in CIFAR-100. Transfer Learning required significantly less modeling and better results. It would be interesting in the future to try and potentially combine Generalist Specialist and Transfer learning frameworks to see if that works any better. Through the experiments in this notebook and in Comparitive_Transfer_Learning.ipynb, it is easy to see why Transfer Learning has become so relevant as it not only boosts performance but also efficiently recycles large models to create new accurate models with less work. 