<a href="https://colab.research.google.com/github/btcain44/Applied_Deep_Learning/blob/main/Densenet_Exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Bi-weekly Report #2
### Brian Cain 
#### Densenet_Exploration.ipynb

After having some success in Data_Augmentation, in particular with the RGB Intensity alteration augmentation, I would like to now try this strategy in unison with the Densenet covered in class. Thus far on reports, I have not train a network as deep as Densenet, but from our learnings in class for a dataset like CIFAR-100, this method should work better than a shallow CNN. However, in this notebook it will also be interesting to see if Densenet overfits the data because it is so deep. Ideally, the Data Augmentation integration here will offset any over-fitting. 

This is as well my first exploration with Google Collab, which was necessary to do this more large scale experiment for training over more epochs than I could feasibly do on my personal PC. 

In [1]:
##Import necessary packages
import numpy as np
import tensorflow as tf

##Load the CIFAR-100 dataset
##Assistance from keras documentation to assert proper shapes: https://keras.io/api/datasets/cifar100/
from tensorflow import keras
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar100.load_data() 
assert x_train.shape == (50000, 32, 32, 3)
assert x_test.shape == (10000, 32, 32, 3)
assert y_train.shape == (50000, 1)
assert y_test.shape == (10000, 1)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz


<b>Re-Define RGB Intensity Alteration from Data_Augmentation.ipynb:</b> In Data_Augmentation.ipynb we observed how much data augmentation increases a networks ability to generalize to unseen data. Since I don't want those findings to go to waste, I will use Data-Augmentation when training the Densenet, particularly I will use the RGB Intensity Alteration method as this had a fair amount of success in terms of accuracy over other methods used in the Data_Augmentation notebook.

In [2]:
##Define function taking image dataset x data as input and returns the covariance matrix of RGB values
def cov_rgb(x_data):
    
    r, g, b = [],[],[] ##Initiate empty lists to hold R, G, and B values from across the dataset
    
    ##Compile vectors of RGB values
    for i in x_data:
        redVals, greenVals, blueVals = (np.concatenate(i[:,:,0],0), 
                                        np.concatenate(i[:,:,1],0), 
                                        np.concatenate(i[:,:,2],0))
        r.append(redVals)
        g.append(greenVals)
        b.append(blueVals)
        
    ##Combine our arrays in an appropriate format to compute the covariance
    r, g, b = (np.concatenate(r,0),
               np.concatenate(g,0),
               np.concatenate(b,0))
    cov_input = np.stack((r,g,b), axis=0)
    
    ##Compute the covariance matrix
    cov_mat = np.cov(cov_input)
    
    ##Return the covariance matrix
    return cov_mat

##Define function that performs channel intensity ata augmentation on an input image
##Takes in the covariance matrix for RGB channels in the dataset and the input image itself
def change_channel_intensity(input_image, rgb_cov_mat, guassian_noise_mu, guassian_noise_sigma):
    
    ##Compute the eigenvalues and eigenvector of the RGB covariance matrix
    eigValues, eigVector = np.linalg.eig(rgb_cov_mat)
    
    ##Loop through pixels in the image and adjust pixels intensities as defined by method above
    i_ct = 0 
    for i in input_image:
        j_ct = 0
        for j in i:
            
            ##Compute the pixel intensity calculation
            noise = np.random.normal(guassian_noise_mu, guassian_noise_sigma, 3) ##Generate random gaussian noise
            addition_component = np.multiply(noise, eigValues) ##Need better naming convention for this variable
            multiplication_component = np.matmul(eigVector,np.transpose(addition_component))
            input_image[i_ct][j_ct] = j+multiplication_component
        
        ##Update the count iterators
            j_ct+=1
        i_ct+=1
            
    ##Return the image with alterated pixel values
    return input_image

<b>Format the Data:</b> Here before creating an architecture and training it, three tasks need to be performed to format the data. 
1. Split data into training, validation, and testing sets
2. Create Data Augmentation in the training set

Split the data:

In [3]:
##Reformat label data to be 1 dimensional
y_train = np.array([i[0] for i in y_train])

##Split the data into training and validation set
from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.25, random_state=42)

Create Data Augmentation in the Training set: (here I'm using the same strategies from Data_Augmentation.ipynb)

In [4]:
##Generate augmented data for training set

##Alter a random 20% of the training data according to the RGB intensity method above
np.random.seed(0)
rand_images = np.random.randint(0, 37500, size=int(37500*.2), dtype=int)

##Compute the covariance matrix of the dataset
x_train_intensity = []
y_train_intensity = []
cov_mat = cov_rgb(x_train)
for i in rand_images:
    intensityImg = change_channel_intensity(np.copy(x_train[i]), cov_mat, 0, .001)
    x_train_intensity.append(intensityImg)
    y_train_intensity.append(y_train[i])
x_train = np.concatenate((x_train, x_train_intensity), axis=0)
y_train = np.concatenate((y_train, y_train_intensity),axis=None)

##Free up memory by dropping the temporary storage data for looping
del x_train_intensity
del y_train_intensity

<b>Pull in Densenet Model and add some Layers for Regularization/Prediction</b>

Used the following as a resource: https://www.pluralsight.com/guides/introduction-to-densenet-with-tensorflow

To implement this Densenet, I followed the tutorial from the link above and added a few changes to what that author did. The first major difference was that the article used a "Natural Images" dataset and re-used Densenet's pre-trained Imagenet weights to make training quicker. In my case, I am using CIFAR-100 and I elected to actually re-train the Densenet weights to better fit the CIFAR-100 data. As well, the article created Data-Augmentation through a function called ImageDataGenerator that performs augmentation during training, as stated above, I used RGB Intensity Alteration defined in Data_Augmentation.ipynb. Another change, is that in the last dropout layer I changed the dropout probability from .5 to .2 in hopes that the model would converge faster since I am re-training the Densenet weights unlike the article, which naturally takes more time. 

A few important consequences to watch out for as a result of these changes:
* Will re-training Densenet rather than using the Imagenet weights make it over-fit the CIFAR-100 training data?
* Will adjusting the dropout to .2 for faster accuracy increase over-fitting on the training data?
* Will RGB Intensity Alteration help mitigate the possibility that Densenet overfits CIFAR-100 training data?

Define the model:

In [8]:
##Import necessary packages to use in the Densenet implementation
from tensorflow.keras.layers import Dense,GlobalAveragePooling2D,Convolution2D,BatchNormalization
from tensorflow.keras.layers import Flatten,MaxPooling2D,Dropout
from tensorflow.keras.applications import DenseNet121

##We can use some pre-trained weights from imagenet
model_d=DenseNet121(include_top=False, input_shape=(32,32,3)) 
x=model_d.output
x= GlobalAveragePooling2D()(x)
x= BatchNormalization()(x)
x= Dropout(0.5)(x)
x= Dense(1024,activation='relu')(x) 
x= Dense(512,activation='relu')(x) 
x= BatchNormalization()(x)
x= Dropout(0.2)(x)

preds=Dense(100,activation='softmax')(x) 

##Compile a summary of the model
model=tf.keras.Model(inputs=model_d.input,outputs=preds)
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            [(None, 32, 32, 3)]  0                                            
__________________________________________________________________________________________________
zero_padding2d_2 (ZeroPadding2D (None, 38, 38, 3)    0           input_2[0][0]                    
__________________________________________________________________________________________________
conv1/conv (Conv2D)             (None, 16, 16, 64)   9408        zero_padding2d_2[0][0]           
__________________________________________________________________________________________________
conv1/bn (BatchNormalization)   (None, 16, 16, 64)   256         conv1/conv[0][0]                 
____________________________________________________________________________________________

Make all the layers trainable rather than re-using imagenet weights:

In [9]:
##Set all layers to be trainable since we're not using the ImageNet dataset
for layer in model.layers[:]:
    layer.trainable=True

##Verify this ensures we're still training the whole densenet
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            [(None, 32, 32, 3)]  0                                            
__________________________________________________________________________________________________
zero_padding2d_2 (ZeroPadding2D (None, 38, 38, 3)    0           input_2[0][0]                    
__________________________________________________________________________________________________
conv1/conv (Conv2D)             (None, 16, 16, 64)   9408        zero_padding2d_2[0][0]           
__________________________________________________________________________________________________
conv1/bn (BatchNormalization)   (None, 16, 16, 64)   256         conv1/conv[0][0]                 
____________________________________________________________________________________________

In [10]:
##Compile the model
model_d.compile(optimizer='adam',
                          loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                          metrics=['accuracy'])

##Fit our created model over 50 epochs (Batch size of 32)
model_d_fit = model_d.fit(x_train, y_train, batch_size=32, epochs=50, validation_data=(x_val, y_val))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


Evaluate the Testing Accuracy:

In [11]:
##Now assess the test accuracy of the final model
print('Test Accuracy of Revised Densenet of CIFAR-100:')
model_d.evaluate(x_test,  y_test, verbose=0)[1]

Test Accuracy of Revised Densenet of CIFAR-100:


0.020547688007354736

In the test results above, it can be seen that the modified Densenet model essentially failed to effectively train on the CIFAR-100 dataset and then was unable to perform well on test data as a result. It can also be seen that over 50-epochs there was never really any improvement in validation accuracy, it started at 1.45% validation accuracy and ended at a test accuracy of 2.05%.

Referring to the potential consequences from above we can reflect on a few points:
* Perhaps the RGB Intensity Alteration augmentation doesn't jive well with Densenet so much as it did regular CNN's in Data_Augmentation.ipynb
* It's possible that the two dropout layers, especially one with p=.5 prevented the model from learning enough information in its fully connected dense layers
* It doesn't seem the model needed an increase in Epochs, since it didn't show validation accuracy improvement throughout training, indicating it wouldn't with more epochs

### Interpretation:

These results are a little surprising and to be frank slightly dissapointing. In Data_Augmentation.ipynb, we actually achieved better results in terms of accuracy using a simple convolutional neural network with Random Erasing and RGB Intensity alteration. The models accuracy never takes significant jumps up during training. In the article referenced to help construct this Densenet, they achieved a test accuracy of 98% with only slight differences in augmentation and dropout methods. Now, it is important to note that the dataset they used only had about 7,000 images and 8 classes to predict, which is a significantly less complex challenge than predicting 100 different classes. The simplest explanation is to say that the label space complexity hindered Densenet's ability to make reliable predictions on CIFARR-100. 

My secondary hypothesis for the failures in training, are that the layers I added on after the Densenet model might have interfered with the actual Densenet's model ability to have success.

As a part of my next report, I will remove these additional layers and let the model re-train this data and see if the results are better without me interfering. (I tried to do this in this report but I kept getting notifications that there were no available GPU resources at the moment, so I will save this as a quick exercise to start of the next report).

### Final Thoughts:

Although the results were not ideal, this was still a useful exercise for this bi-weekly report. I believe it was also mentioned that in this course we will review transfer learning, which it seems like had I used the "Imagenet" weights in my training, then that may have been similar transfer learning because it is using information from one model to help train another. This should be good experience for starting that later on in the semester. I am looking forward to seeing if only using the Densenet rather than my additional layers will improve the accuracy, as Denseset was cited as having 82.8% accuracy on the CIFAR-100 dataset, so these results were a bit surprising. (Feel free to offer any insights/mistakes in the grading feedback as well as to why this performance was so bad in this notebook).  