<div style="color: #2590c2; text-align: center;">
<span style="font-size:18pt;"><b>ST: DEEP LEARNING</b></span><br/>
<span>(CS 696-04) (SM18)</span><br/><br/>
<span><b>Homework 2</b></span><br/><br/>
<span>Submitted By</span><br/>
<span>Ashok Kumar Shrestha</span>
</div>

<h4><u>Tasks:</u></h4><br/>
<ul><li>
In this program assignment, you are asked to implement Convnet for Cifar-10 dataset. You need to implement conv layer, pooling layer and their backpropagation.   
</li><li>
When submitting your program, also include the data file in the zip file, so that it would be more convenient for me to test. Please also sprinkle your report into relevant places in the ipython notebook file. 
</li><li>
In addition, submit a hard copy of your report on July 25th, 2018. Thanks. 
</li></ul>


<b>
Code:
</b><br/>
<span>
Read data sets (MNIST and CIFAR-10) from the file.
</span>

In [None]:
"""
Read Data sets: MNIST and CIFAR-10
-----------------------------------------------
Parameters:
===========
file_name: file name to read

Return:
=======
train_img, test_img, train_lbl, test_lbl values
"""
import cloudpickle as pickle
from sklearn.model_selection import train_test_split
import _pickle as pickle
import numpy as np
import os

def get_CIFAR10_data(cifar10_dir, num_training=49000, num_validation=1000, num_test=1000):
    # Load the raw CIFAR-10 data
    X_train, y_train, X_test, y_test = load(cifar10_dir)

    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    X_train = X_train.astype(np.float64)
    X_val = X_val.astype(np.float64)
    X_test = X_test.astype(np.float64)

    # Transpose so that channels come first
    X_train = X_train.transpose(0, 3, 1, 2)
    X_val = X_val.transpose(0, 3, 1, 2)
    X_test = X_test.transpose(0, 3, 1, 2)
    mean_image = np.mean(X_train, axis=0)
    std = np.std(X_train)

    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    X_train /= std
    X_val /= std
    X_test /= std
    
    return {
        'X_train': X_train, 'y_train': y_train,
        'X_val': X_val, 'y_val': y_val,
        'X_test': X_test, 'y_test': y_test,
        'mean': mean_image, 'std': std
    }

def load_CIFAR_batch(filename):
    ''' load single batch of cifar '''
    with open(filename, 'rb') as f:
        datadict = pickle.load(f, encoding ='bytes')
        X = datadict[b'data']
        Y = datadict[b'labels']
        X = X.reshape(10000, 3, 32, 32)
        Y = np.array(Y)
        return X, Y

def get_CIFAR10(ROOT):
    ''' load all of cifar '''
    xs = []
    ys = []
    for b in range(1, 6):
        f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
        X, Y = load_CIFAR_batch(f)
        xs.append(X)
        ys.append(Y)
    Xtr = np.concatenate(xs)
    Ytr = np.concatenate(ys)
    del X, Y
    Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch'))
    return Xtr, Xte, Ytr, Yte

def load_mnist(data_file="mnist.data", test_size=0.10, random_state=0):
    mnist = pickle.load(open(data_file, "rb"))
    
    mnist['data'] = np.reshape(mnist['data'],(mnist['data'].shape[0],1,28,28))
    return train_test_split(mnist['data'], mnist['target'], test_size=test_size,
                            random_state=random_state)

def load(file_name):
    if file_name == "mnist":
        print("MNIST data loaded.")
        return load_mnist(data_file="mnist.data", test_size=0.2, random_state=42)
    
    elif file_name == "cifar":
        print("CIFAR data loaded.")
        return get_CIFAR10("")

<b>Main Program:</b>

In [None]:
"""
Code: Main program to execute. Users can also run test.py for the same.
-----------------------------------------------------------------------
Features:
* Customizable type and no. of layers
* Customizable Data sets (MNIST and CIFAR-10)
* Customizable batch size and dropout
* Visualization of Error vs Epochs
"""

import cost_functions as cost
import numpy as np
from ConvNet import ConvNet
from Pool import Pool
from Flatten import Flatten
from FCLayer import FCLayer
from model import Model

def preprocess_data(X):
    return (X-np.mean(X,axis=0))/np.std(X)

def main():
    print("Starting Network...")
    print("-------------------------------------------------------")
    print("Reading Data sets...")
    
    # MNIST Data sets
    #train_img, test_img, train_lbl, test_lbl = load(file_name="mnist")
    
    # CIFAR-10 Data sets
    train_img, test_img, train_lbl, test_lbl = load(file_name="cifar")
    
    Y = train_lbl[:].astype(int)
    X = train_img[:]/255.
    Y_test = test_lbl[:].astype(int)
    X_test = test_img[:]/255.

    #preprocess data
    X = preprocess_data(X)
    X_test = preprocess_data(X_test)
    
    #model
    model = Model()

    model.add(ConvNet(filter_size=(5,5),filter_no=6,zero_padding=0,stride=(1,1),activation="relu"))
    model.add(Pool(pool_size=(2,2),stride=(2,2),pool_type="max"))
    model.add(Flatten())
    model.add(FCLayer(activation="relu",n_neurons=32,l_rate=0.001, is_drop_out=True, drop_out=0.7))
    model.add(FCLayer(activation="softmax",n_neurons=10,l_rate=0.001))

    print("-------------------------------------------------------")
    print("CNN Layers:")
    print("-------------------------------------------------------")
    model.print_layers()

    print("-------------------------------------------------------")
    print("Begin Training...")
    
    model.train(X,Y,n_epochs=7, print_loss=True, batch_size=32)
    
    print("End Training.")
    print("-------------------------------------------------------")
    print("Begin Testing...")

    train_accuracy = model.test(X,Y)
    test_accuracy = model.test(X_test,Y_test)

    print("End Testing.")
    print("-------------------------------------------------------")

    print('Training Accuracy: {0:0.2f} %'.format(train_accuracy))
    print('Test Accuracy: {0:0.2f} %'.format(test_accuracy))
    model.show_graph()

if __name__=="__main__":
    main()

<b>
Implementation:
</b>

<span>The projects consist of the following modules:</span>
<ul>
<li>
activations.py
</li><li>
cost_functions.py
</li><li>
ConvNet.py
</li><li>
decay.py
</li><li>
FCLayer.py
</li><li>
Flatten.py
</li><li>
Layers.py
</li><li>
model.py
</li><li>
Pool.py
</li><li>
read_file.py
</li><li>
test.py
</li><li>
weight_initialization.py
</li>
</ul>


<span>The project can be run by executing test.py file.  All the codes are zipped to "AshokShrestha.zip" zip file.</span>

<div>
<b> Task 1:</b> &nbsp;
<span>Divide dataset into training and testing</span>
</div>

<span>Separate module <b>"read_file.py"</b> has been created for the above mentioned task. Import this module from main program to read both MNIST data and CIFAR-10 data.</span>

<b>MNIST Data:</b>

In [None]:
# Reading MNIST Data sets
train_img, test_img, train_lbl, test_lbl = load(file_name="mnist")

<b>CIFAR-10 Data</b>

In [None]:
# Reading CIFAR-10 Data sets
train_img, test_img, train_lbl, test_lbl = load(file_name="cifar")

<b>Task 2:</b> &nbsp;
<span>Flexibility of the program</span>

<span>Creating the Convolution Neural Network model:</span>

In [None]:
model = Model()

<span>Adding ConvNet layer to model:</span>

In [None]:
model.add(ConvNet(filter_size=(5,5),filter_no=6,zero_padding=0,stride=(1,1),activation="relu",l_rate=0.1))
model.add(ConvNet(filter_size=(5,5),filter_no=16,zero_padding=0,stride=(1,1),activation="relu"))

<span>Adding Pooling layer to model:</span>

In [None]:
model.add(Pool(pool_size=(2,2),zero_padding=0,stride=(2,2),pool_type="max"))
model.add(Pool(pool_size=(2,2),zero_padding=0,stride=(2,2),pool_type="max"))

<span>Adding Flatten layer to model:</span>

In [None]:
model.add(Flatten())

<span>Adding fully connected layer to model:</span>

In [None]:
model.add(FCLayer(activation="relu",n_neurons=120,l_rate=0.01, is_drop_out=True, drop_out=0.5))
model.add(FCLayer(activation="relu",n_neurons=84,l_rate=0.01, is_drop_out=True, drop_out=0.5))
model.add(FCLayer(activation="softmax",n_neurons=10,l_rate=0.01))

<span>Traing the model:</span>

In [None]:
model.train(X,Y,n_epochs=10, print_loss=True, batch_size=512)

<span>Testing the model:</span>

In [None]:
train_accuracy = model.test(X,Y)
test_accuracy = model.test(X_test,Y_test)    

<span>
Users can choose activation functions from following list.
</span>
<ol>
    <li>
    Sigmoid
    </li><li>
    Softmax
    </li><li>
    Tanh
    </li><li>
    Relu
    </li><li>
    Leaky Relu
    </li>
</ol>

<span>
Following code is the activation functions and their corresponding first derivative (prime) implented in
<b>"activations.py"</b> module. For this project Relu is used as default.
</span>

In [None]:
'''
Parameters:
===========
z: input
'''
def sigmoid(z):
    return 1.0 / (1.0 + np.exp(-z))

def sigmoid_prime(z):
    return z * (1 - z)

def softmax(z):
    z -= np.max(z)
    return (np.exp(z).T / np.sum(np.exp(z), axis=1)).T

def tanh(z):
    return np.tanh(z)

def tanh_prime(z):
    return 1 - z * z

def relu(z):
    return np.maximum(z, 0)

def relu_prime(z):
    dz = np.ones_like(z)
    dz[z < 0] = 0
    return dz

def leaky_relu(z, alpha=0.01):
    return np.maximum(z, z * alpha)

def leaky_relu_prime(z, alpha=0.01):
    dz = np.ones_like(z)
    dz[z < 0] = alpha
    return dz

Users can also specify different cost functions for the network specified in <b>"cost_functions.py"</b> module. 
The list includes:
<ol>
<li>
Cross Entropy Cost
</li><li>
Linear Cost
</li><li>
Mean Squared Cost
</li>
</ol>

In [None]:
'''
Parameters:
===========
a: actual output
p: predicted output
'''
def cross_entropy_cost(a, p):
    m = len(a)
    cost =  (-1 / m) * np.sum(a * np.log(p) + (1 - a) * np.log(1 - p))
    return cost

def linear_cost(a, p):
    delta = a - np.array(p).reshape(len(a), 1)
    return np.mean(delta)

def mean_square(a, p):
    delta = a - np.array(p).reshape(len(a), 1)
    error = np.sum(np.square(delta))
    return error / len(a)

<span>There is also option to use various weight initialization methods for each hidden layers. It is implemented in the <b>"weight_initialization.py"</b> module. List of available weight initialization methods are:</span>
<ol>
<li>
Xavier
</li><li>
HE
</li><li>
Other variation of HE
</li>
</ol>


In [None]:
'''
Parameters:
===========
p: no. of neurons in previous layer (l-1)
c: no. of neurons in current layer (l)
'''    
def he(p, c):
    return np.random.rand(p, c) * np.sqrt(2/p)

def xavier(p, c):
    return np.random.rand(p, c) * np.sqrt(1/p)

def _he(p, c):
    return np.random.rand(p, c) * np.sqrt(2/ (p + c))

<span>
The separate module <b>"decay.py"</b> has been implemented to adapt the learning rate (alpha) with number of epochs.
Users can choose decay from following list:
<ol>
<li>
Step Decay
</li><li>
Exponential Decay
</li>
</ol>
</span>

In [None]:
def step_decay(epoch, initial_lrate=0.1, drop=0.6, epochs_drop=1000):
    return initial_lrate * np.power(drop, np.floor((1 + epoch) / epochs_drop))

def exp_decay(epoch, initial_lrate=0.1, k=0.1):
    return initial_lrate * np.exp(-k * epoch)

<span>
In order to avoid local minima, momentum coeffiencient has been used for updating the weights. 
</span>

<span>Used Adam for gradient descent optimization.</span>

In [None]:
        #Adam optimization
        beta1 = 0.9
        beta2 = 0.999
        eps = 1e-8
        t = self.n_epochs
        m = self.m
        v = self.v
        
        m = beta1*m + (1-beta1)*dw
        mt = m / (1-beta1**t)
        
        v = beta2*v + (1-beta2)*(dw**2)
        vt = v / (1-beta2**t)
        
        self.weights += self.l_rate * mt / (np.sqrt(vt) + eps)

<span>
<b>Task 3 : </b>&nbsp;Report
</span>

<span>For model visualization plot of cost of model vs the number of epochs is plotted.</span>

In [None]:
plt.plot(Epochs, Cost, 'r--')
plt.title("Cost vs Epoch")
plt.xlabel("Epochs")
plt.ylabel("Training Cost")
plt.savefig("cost_vs_epochs.png")
plt.show()

<b>Learning rate:</b>
<span>Testing with different learning rates.</span>


<span>
    <b><u>CIFAR-10 data:</u></b><br/>
No. of epochs: 150<br/>
Accuracy: 58.3%<br/>
    ConvNet: filter(5X5), stride(1X1), padding:0, "leaky_relu"<br/>
    Maxpooling: filter(5X5), stride(1X1)<br/>
    Flatten:
    FCLayer: neurons:32, l_rate:0.0001, "leaky_relu"<br/>
    FCLayer: neurons:10, l_rate:0.0001, "softmax"<br/>
</span>

<img src="img/error_vs_epoch.png" style="height:250px;">