<a href="https://colab.research.google.com/github/GaureeshAnvekar/DenseNet-Assignment/blob/master/densenet_assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# https://keras.io/
!pip install -q keras
import keras

Using TensorFlow backend.


In [0]:
# All the necessary imports
import keras
from keras.datasets import cifar10
from keras.models import Model, Sequential
from keras.layers import Flatten, Input, AveragePooling2D, merge, Activation, GlobalAveragePooling2D
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from keras.layers import Concatenate
from keras.optimizers import SGD
from keras.regularizers import l2
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.preprocessing.image import ImageDataGenerator
from google.colab import drive                                    # This is to mount google drive with colab to save checkpoints in google drive.
from keras.models import load_model
from keras.utils import Sequence
import numpy as np
import math as m

In [0]:
# Mount google drive with colab
drive.mount('/content/drive', force_remount=True)



Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
# this part will prevent tensorflow to allocate all the avaliable GPU Memory
# backend
import tensorflow as tf
from keras import backend as k

# Don't pre-allocate memory; allocate as-needed
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

# Create a session with the above options specified.
k.tensorflow_backend.set_session(tf.Session(config=config))

In [0]:
# Hyperparameters
batch_size = 128
num_classes = 10
epochs = 250
num_layers = 12              # No.of layers in each dense-block
num_filter = 12
compression = 0.5
l2_reg = 1e-4                #lambda for L2 regularization
learning_rate = 0.1

In [0]:
# Load CIFAR10 Data
(x_train, y_train_orig), (x_test, y_test_orig) = cifar10.load_data()
img_height, img_width, channel = x_train.shape[1],x_train.shape[2],x_train.shape[3]

# convert to one hot encoing 
y_train = keras.utils.to_categorical(y_train_orig.squeeze(), num_classes)
y_test = keras.utils.to_categorical(y_test_orig.squeeze(), num_classes)

#shape of x_train is (50000, 32, 32, 3)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


In [0]:
"""
Data pre-processing step where we first normalize each channel according to the means and std deviations for CIFAR-10 examples.
The means for red, green and blue are (125.3, 123, 113.9). The std deviations for red, green, blue are (63, 62.1, 66.7)
Then use the data-augmentation techniques for the training set.

1.) First use "Random Cropping" of images. For this, first pad 4 zeros on all 4 sides, i.e. top and bottom, right and left.
    This will change our original CIFAR-10 images from "32 x 32 x 3" to "40 x 40 x 3". After this we randomly do cropping for 
    each image, such that we get back the same size i.e. "32 x 32 x 3".

2.) Then do horizontal flip on half of the images obtained above to get mirrored images.

While doing the data augmentation, the original no.of examples are 50k, so if we do data augmentation with factor of 2, then total images
will be 1M. But this directly cannot be stored in array on Colab RAM. So I have created separate datasets of smaller size which will be dynamically changed during
training. I have used data augmentation factor of 3 and created 4 total sets of data. Each dataset is of size 75K, such that 50K are original images and
25K are augmented images(among 25K, all 25K are randomly cropped, and 12K are horizontally flipped)
"""

# Normalize each channel i.e red, green, blue
def normalize_data(x_train):
  
  #first for red channel
  x_train[:,:,:,0] -= 125.3
  x_train[:,:,:,0] /= 63.0
  
  #green
  x_train[:,:,:,1] -= 123
  x_train[:,:,:,1] /= 62.1
  
  #blue
  x_train[:,:,:,2] -= 113.9
  x_train[:,:,:,2] /= 66.7
  
  return x_train


# Following function does both, "Random Cropping" and "horizontal flip" on the input "X_train". It returns 2 datasets of size 75K each. In each dataset,
# 50K are original images and 25K are randomly cropped images. In 25K, 12K are horizontally flipped images. In the second returned dataset, again 50K original
# images, but the 25K images will be the second half which were not included in dataset1. Again randomly cropped and 12K flipped horizontally.
def augment_data(x_train, y_train):
  
  # We need to first pad 4 zeros on all 4 sides for all images. We can achieve this by simply
  # making a zero_matrix1 of all zeros of size (25000, 40, 40, 3) where 25000 are randomly cropped no.of examples we want to create.
  # Then simply insert our 1st half CIFAR-10 images (25000, 32, 32, 3) in-between, such that they are at the center of 
  # our original zero_matrix1 which we created with all zeros. So by doing this, we automatically get matrix with images, but
  # also the padding on all 4 sides.
  # Similarly create zero_matrix2 with all zeros(25, 40, 40, 3). But now insert the second half of CIFAR-10 images i.e. 25K to 50K (25000, 32, 32, 3) in
  # between zero_matrix2 to create padding.
  
  zero_matrix1 = np.zeros([25000, 40, 40, 3])  # all zeros matrix
  zero_matrix2 = np.zeros([25000, 40, 40, 3])
  
  temp1 = np.zeros([25000, 32, 32, 3]) # to store images after doing random cropping (1st half of CIFAR-10)
  temp2 = np.zeros([25000, 32, 32, 3]) # to store images after doing random cropping (2nd half of CIFAR-10)
 
  
  zero_matrix1[:, 4:36, 4:36, :] = x_train[0:25000,:,:,:]       # inserts the 1st half of images of x_train at the center of our zero_matrix1. 
                                                                # So this completes padding.
  zero_matrix2[:, 4:36, 4:36, :] = x_train[25000:50000,:,:,:]   # inserts the 2nd half of images of x_train at the center of our zero_matrix2.
                                                                # So this completes padding
  
 
  new_height = 40      # due to padding
  new_width = 40       # due to padding
  
  required_size = 32
  
  # Loop over each example and crop to 32 x 32 x 3. This will fill the "temp1" and "temp2" with 1st half and 2nd half of X_train, but
  # are randomly cropped.
  for i in range(25000):
    x_value1 = np.random.randint(0, (new_height - required_size) + 1)      # We need rand int between [ 0,9 )
    y_value1 = np.random.randint(0, (new_width - required_size) + 1)       # We need rand int between [ 0,9 )
 
    x_value2 = np.random.randint(0, (new_height - required_size) + 1)      # We need rand int between [ 0,9 )
    y_value2 = np.random.randint(0, (new_width - required_size) + 1)       # We need rand int between [ 0,9 )
    
    temp1[i,:,:,:] = zero_matrix1[i, x_value1:(x_value1 + required_size), y_value1:(y_value1 + required_size), :] 
    temp2[i,:,:,:] = zero_matrix2[i, x_value2:(x_value2 + required_size), y_value2:(y_value2 + required_size), :]
  
  
  # Once we get the randomly cropped images in temp1 and temp2, now horizonally flip 12K images in both temp1 and temp2. Randomly
  # select images for both which decides which image to flip.
  for i in range(12000):
    random_img_1 = np.random.randint(0,25000)
    random_img_2 = np.random.randint(0,25000)
    
    temp1[random_img_1,:,:,:] =  np.flip(temp1[random_img_1], axis=1)
    temp2[random_img_2,:,:,:] =  np.flip(temp2[random_img_2], axis=1)     
  
  
  
  # Now concatenate original x_train with both temp1 and temp2 and return 2 augmented data sets.
  return np.concatenate((x_train,temp1), axis=0), np.concatenate((x_train,temp2), axis=0)
  
  

In [0]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# First normalize both train and test set
x_train = normalize_data(x_train)
x_test = normalize_data(x_test)

# Do data augmentation twice, because I'm using 4 total sets, which we will dynamically change during training.
augmented_data_1, augmented_data_2 = random_crop(x_train, y_train)
augmented_data_3, augmented_data_4 = random_crop(x_train, y_train)

# As our augmented_data length is 75K for all 4 sets, the training labels length should also be 75K with appropriate labels.
# So here y_train_aug_1 will work for both "augmented_data_1" and "augmented_data_2" sets. 
# y_train_aug_2 will work for both "augmented_data_3" and "augmented_data_4" sets.
y_train_aug_1 = np.concatenate((y_train[0:50000], y_train[0:25000]), axis=0)
y_train_aug_2 = np.concatenate((y_train[0:50000], y_train[25000:50000]), axis=0)



(50000, 32, 32, 3)

In [0]:
print(augmented_data_1.shape)
print(augmented_data_2.shape)
print(augmented_data_3.shape)
print(augmented_data_4.shape)

(75000, 32, 32, 3)
(75000, 32, 32, 3)
(75000, 32, 32, 3)
(75000, 32, 32, 3)


In [0]:
# Dense Block
def add_denseblock(input, growth_rate):
    #global compression 
    temp = input
    for _ in range(num_layers):
        #Using concept of bottle-neck i.e. dense-net-B. Batch_norm -> Relu -> Conv(1x1) -> Batch_norm -> Relu -> Conv(3x3)
        BatchNorm1 = BatchNormalization(gamma_regularizer=l2(l2_reg), beta_regularizer=l2(l2_reg))(temp)
        relu1 = Activation('relu')(BatchNorm1)
        Conv2D_1_1 = Conv2D(4*growth_rate, (1,1), use_bias=False ,padding='same', kernel_initializer="he_normal", kernel_regularizer=l2(l2_reg))(relu1)
        
        BatchNorm2 = BatchNormalization(gamma_regularizer=l2(l2_reg), beta_regularizer=l2(l2_reg))(Conv2D_1_1)
        relu2 = Activation("relu")(BatchNorm2)
        Conv2D_3_3 = Conv2D(growth_rate, (3,3), use_bias=False, padding="same", kernel_initializer="he_normal", kernel_regularizer=l2(l2_reg))(relu2)
        
        concat = Concatenate(axis=-1)([temp,Conv2D_3_3])
        
        temp = concat
        
    return temp

In [0]:
def add_transition(input, num_channels):
    global compression
    new_num_channels = int(compression * num_channels)   #Using concept of compression i.e. dense-net-C. To reduce no.of channels
    
    BatchNorm = BatchNormalization(gamma_regularizer=l2(l2_reg), beta_regularizer=l2(l2_reg))(input)
    relu = Activation('relu')(BatchNorm)
    Conv2D_BottleNeck = Conv2D(new_num_channels, (1,1), use_bias=False, padding="same", kernel_initializer="he_normal", kernel_regularizer=l2(l2_reg))(relu)
    avg = AveragePooling2D(pool_size=(2,2))(Conv2D_BottleNeck)
    
    return avg

In [0]:
def output_layer(input):
    global compression
    BatchNorm = BatchNormalization()(input)
    relu = Activation('relu')(BatchNorm)
    # The final dense layer was also replaced with a Conv2d layer as below.
    Conv2D_layer = Conv2D(num_classes, (1,1), use_bias=False, padding="same", kernel_initializer="he_normal", kernel_regularizer=l2(l2_reg))(relu)
    GlobalAvgPooling = GlobalAveragePooling2D()(Conv2D_layer)
    output = Activation("softmax")(GlobalAvgPooling)
   
    return output

In [0]:
initial_channels = 24   # growth_rate*2 where growth_rate=12
growth_rate = 12        # growth_rate is the no.of new channels produced at each dense-block



input = Input(shape=(32, 32, 3,))
First_Conv2D = Conv2D(initial_channels, (3,3), use_bias=False ,padding='same', kernel_initializer="he_normal", kernel_regularizer=l2(l2_reg))(input)

First_Block = add_denseblock(First_Conv2D, growth_rate)
num_of_channels_after_first = (growth_rate * num_layers) + initial_channels     # where 'num_layers' is 12 in each dense-block, so total concatenated channels will be
                                                                                # (growth * num_layers) + initial_channels. This is true for each dense-block
First_Transition = add_transition(First_Block, num_of_channels_after_first)

Second_Block = add_denseblock(First_Transition, growth_rate)
num_of_channels_after_second = num_of_channels_after_first + (growth_rate * num_layers)  # This gets added up from previous num_of_channels

Second_Transition = add_transition(Second_Block, num_of_channels_after_second)

Third_Block = add_denseblock(Second_Transition, growth_rate)

output = output_layer(Third_Block)


Instructions for updating:
Colocations handled automatically by placer.


In [0]:
model = Model(inputs=[input], outputs=[output])
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 32, 32, 3)    0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 32, 32, 24)   648         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 32, 32, 24)   96          conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 32, 32, 24)   0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv2d_2 (

In [0]:
# This class is made so that it can inherit from "Sequence" class, so that we override functions like "getitem" and "on_epoch_end" to get batches
# of data during training. "on_epoch_end" is responsible for changing the dataset dynamically after end of each epoch. As we have created 4 total augmented data
# sets, "on_epoch_end" continuously keeps using datasets one after another after each epoch. It keeps track of which dataset was used in previous epoch, 
# and based on that goes to the next data-set in the next epoch.
# The object of this class will be passed to "fit_generator" for training.

class CIFAR10Sequence(Sequence):

    def __init__(self, x_set, y_set, batch_size, last_aug_index):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size
        self.last_aug_data = last_aug_index
        self.epoch = 0
        
    def __len__(self):
        return int(np.ceil(len(self.x) / float(self.batch_size)))

    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]

        return batch_x, batch_y
      
    def on_epoch_end(self):
        if (self.last_aug_data == 1):
          self.x = augmented_data_2
          self.y = y_train_aug_2
          self.last_aug_data += 1
        elif (self.last_aug_data == 2):
          self.x = augmented_data_3
          self.y = y_train_aug_1
          self.last_aug_data += 1
        elif (self.last_aug_data == 3):
          self.x = augmented_data_4
          self.y = y_train_aug_2
          self.last_aug_data += 1
        elif (self.last_aug_data == 4):
          self.x = augmented_data_1
          self.y = y_train_aug_1
          self.last_aug_data = 1
          
        self.epoch += 1   

In [0]:
# Callback functions to be used, "ModelCheckPoint" and "LearningRateScheduler".
# ModelCheckPoint to save our model weights after every epoch. This help during system failure during training, to re-train from last saved weights
# LearningRateScheduler to change our initial learning_rate which is 0.1. After 50% of epochs, divide by 10, again after 75% of epochs, again divide by 10.

# This is the folder inside google drive where weights will be saved. Weights will be saved after each epoch.
filepath = "/content/drive/My Drive/DensenetCheckpoints4/weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5"                                        

callback1 = ModelCheckpoint(filepath, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)         

# Schedule function which is used by "LearningRateScheduler". This takes in epoch index and current learning rate as input
def schedule(epoch, learning_rate):
  if ((epoch == 124) or (epoch == 186)):       # i.e. 50% of 250 wil be 125, but epoch starts from zero, so take 124 and 75% will be 186
    learning_rate = learning_rate / 10
  
  return learning_rate

callback2 = LearningRateScheduler(schedule, verbose=0)



In [0]:
# Object of CIFA10Sequence class. Start with dataset1 and dataset2.
training_generator_seq = CIFAR10Sequence(augmented_data_1, y_train_aug_1, batch_size, 1)



# Determine loss-function and optimizer
model.compile(loss="categorical_crossentropy",
              optimizer=SGD(learning_rate, momentum=0.9, nesterov=True),
              metrics=["accuracy"])


# start training using fit_generator.
model.fit_generator(generator=training_generator_seq,
                    steps_per_epoch=m.ceil(augmented_data_1.shape[0]/ batch_size),
                    epochs=250,
                    validation_data=(x_test, y_test),
                    use_multiprocessing=True,
                    workers=6,
                    verbose=1,
                    callbacks=[callback1, callback2])


model.save("/content/drive/My Drive/DensenetCheckpoints4/FinalModel/model.hdf5")

Instructions for updating:
Use tf.cast instead.
Epoch 1/250
Epoch 2/250
Epoch 3/250
Epoch 4/250
Epoch 5/250
Epoch 6/250
Epoch 7/250
Epoch 8/250
Epoch 9/250
Epoch 10/250
Epoch 11/250
Epoch 12/250
Epoch 13/250
Epoch 14/250
Epoch 15/250
Epoch 16/250
Epoch 17/250
Epoch 18/250
Epoch 19/250
Epoch 20/250
Epoch 21/250
Epoch 22/250
Epoch 23/250
Epoch 24/250
Epoch 25/250
Epoch 26/250
Epoch 27/250
Epoch 28/250
Epoch 29/250
Epoch 30/250
Epoch 31/250
Epoch 32/250
Epoch 33/250
Epoch 34/250
Epoch 35/250
Epoch 36/250
Epoch 37/250
Epoch 38/250
Epoch 39/250
Epoch 40/250
Epoch 41/250
Epoch 42/250
Epoch 43/250
Epoch 44/250
Epoch 45/250
Epoch 46/250
Epoch 47/250
Epoch 48/250
Epoch 49/250
Epoch 50/250
104/586 [====>.........................] - ETA: 4:31 - loss: 0.4490 - acc: 0.9409

In [0]:
# Restart training from epoch 50.

training_generator_seq = CIFAR10Sequence(augmented_data_1, y_train_aug_1, batch_size, 1)

model = load_model("/content/drive/My Drive/DensenetCheckpoints4/weights-improvement-49-0.84.hdf5")



model.fit_generator(training_generator_seq,
                    steps_per_epoch=m.ceil(augmented_data_1.shape[0] / batch_size), epochs=250, validation_data=(x_test, y_test), verbose=1, callbacks=[callback1, callback2], initial_epoch=49)


Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Epoch 50/250
Epoch 51/250
Epoch 52/250
Epoch 53/250
Epoch 54/250
Epoch 55/250
Epoch 56/250
Epoch 57/250
Epoch 58/250
Epoch 59/250
Epoch 60/250
Epoch 61/250
Epoch 62/250
Epoch 63/250
Epoch 64/250
Epoch 65/250
Epoch 66/250
Epoch 67/250
Epoch 68/250
Epoch 69/250
Epoch 70/250
Epoch 71/250
Epoch 72/250
Epoch 73/250
Epoch 74/250
Epoch 75/250
Epoch 76/250
Epoch 77/250
Epoch 78/250
Epoch 79/250
Epoch 80/250
Epoch 81/250
Epoch 82/250
Epoch 83/250
Epoch 84/250
Epoch 85/250
Epoch 86/250
Epoch 87/250
Epoch 88/250
Epoch 89/250
Epoch 90/250
Epoch 91/250
Epoch 92/250
Epoch 93/250
Epoch 94/250
Epoch 95/250
Epoch 96/250
Epoch 97/250
Epoch 98/250
Epoch 99/250
Epoch 100/250
Epoch 101/250
Epoch 102/250
Epoch 103/250
Epoch 104/250
Epoch 105/250
Epoch 106/250
Epoch 107/250
Epoch 108/250
Epoch 109/250
Epoch 110/250
Epoch 111/250
Epoch 112/250
Epoch 113/250
Epoch 114/250
Epoch 115/250
Epoch 

In [0]:
# Restart training from epoch 175
# dataset 2 was used because at 175 epoch dataset2 was supposed to be used according to on_epoch_end.
training_generator_seq = CIFAR10Sequence(augmented_data_2, y_train_aug_2, batch_size, 2)

model = load_model("/content/drive/My Drive/DensenetCheckpoints4/weights-improvement-174-0.94.hdf5")


model.fit_generator(training_generator_seq,
                    steps_per_epoch=m.ceil(augmented_data_2.shape[0] / batch_size), epochs=250, validation_data=(x_test, y_test), verbose=1, callbacks=[callback1, callback2], initial_epoch=174, shuffle=True)


# Save final model
model.save("/content/drive/My Drive/DensenetCheckpoints4/FinalModel/model.hdf5")

# Test the model
score = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Epoch 175/250
Epoch 176/250
Epoch 177/250
Epoch 178/250
Epoch 179/250
Epoch 180/250
Epoch 181/250
Epoch 182/250
Epoch 183/250
Epoch 184/250
Epoch 185/250
Epoch 186/250
Epoch 187/250
Epoch 188/250
Epoch 189/250
Epoch 190/250
Epoch 191/250
Epoch 192/250
Epoch 193/250
Epoch 194/250
Epoch 195/250
Epoch 196/250
Epoch 197/250
Epoch 198/250
Epoch 199/250
Epoch 200/250
Epoch 201/250
Epoch 202/250
Epoch 203/250
Epoch 204/250
Epoch 205/250
Epoch 206/250
Epoch 207/250
Epoch 208/250
Epoch 209/250
Epoch 210/250
Epoch 211/250
Epoch 212/250
Epoch 213/250
Epoch 214/250
Epoch 215/250
Epoch 216/250
Epoch 217/250
Epoch 218/250
Epoch 219/250
Epoch 220/250
Epoch 221/250
Epoch 222/250
Epoch 223/250
Epoch 224/250
Epoch 225/250
Epoch 226/250
Epoch 227/250
Epoch 228/250
Epoch 229/250
Epoch 230/250
Epoch 231/250
Epoch 232/250
Epoch 233/250
Epoch 234/250
Epoch 235/250
Epoch 236/250
Epoch 237/250
Epoch 238/250
Epoch 239/250
Epoch 240/250
Epoch 241/250
Epoch 242/250
Epoch 243/250
Epoch 244/250
Epoch 245/250
Epoch 