<div style="color: #2590c2; text-align: center;">
<span style="font-size:18pt;"><b>ST: DEEP LEARNING</b></span><br/>
<span>(CS 696-04) (SM18)</span><br/><br/>
<span><b>Homework 1</b></span><br/><br/>
<span>Submitted By</span><br/>
<span>Ashok Kumar Shrestha</span>
</div>

<h4><u>Tasks:</u></h4><br/>
<span> 
In this assignment, you are asked to implement a deep fully-connected MLP with three hidden layers. 
Your program is supposed to be working on both the MNIST data and CIFAR-10 data.  
Both of them are 10-class classification problem.  When implementing the program, please take the 
following into consideration: </span>
    <ul>
    <li>
Like any machine learning assignment, you need to divide your dataset into training and testing. 
</li><li>
The flexibility of the program: an ideal program should allow the following: number of layers are not 
hard-coded, different activation functions can be used, etc. In other words, different combination can be 
easily built on top of the modules of your program. 
</li><li>
A detailed project report containing:  design of your program; flexibility of your program if any; the effect of using different learning rate; the plot of loss versus epoch; the plot of accuracy versus epoch
Implement your program using Jupyter notebook. 
</li><li>
Hard copy of report submitted in class on June 27th.  
</li><li>
The program is zipped into a single YourfistnameYourLastName.zip file and submit it online before class starts that day.
</li>
</ul>


<b>
Code:
</b><br/>
<span>
Read data sets (MNIST and CIFAR-10) from the file.
</span>

In [1]:
"""
Read Data sets: MNIST and CIFAR-10
-----------------------------------------------
Parameters:
===========
file_name: file name to read

Return:
=======
train_img, test_img, train_lbl, test_lbl values
"""

import cloudpickle as pickle
from sklearn.model_selection import train_test_split
import _pickle as pickle
import numpy as np
import os


def get_CIFAR10_data(cifar10_dir, num_training=49000, num_validation=1000, num_test=1000):
    '''
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the neural net classifier.
    '''
    # Load the raw CIFAR-10 data
    X_train, y_train, X_test, y_test = load(cifar10_dir)

    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    X_train = X_train.astype(np.float64)
    X_val = X_val.astype(np.float64)
    X_test = X_test.astype(np.float64)

    # Transpose so that channels come first
    X_train = X_train.transpose(0, 3, 1, 2)
    X_val = X_val.transpose(0, 3, 1, 2)
    X_test = X_test.transpose(0, 3, 1, 2)
    mean_image = np.mean(X_train, axis=0)
    std = np.std(X_train)

    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    X_train /= std
    X_val /= std
    X_test /= std
    
    return {
        'X_train': X_train, 'y_train': y_train,
        'X_val': X_val, 'y_val': y_val,
        'X_test': X_test, 'y_test': y_test,
        'mean': mean_image, 'std': std
    }


def load_CIFAR_batch(filename):
    ''' load single batch of cifar '''
    with open(filename, 'rb') as f:
        datadict = pickle.load(f, encoding ='bytes')
        X = datadict[b'data']
        Y = datadict[b'labels']
        X = X.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1)
        Y = np.array(Y)
        return X, Y


def get_CIFAR10(ROOT):
    ''' load all of cifar '''
    xs = []
    ys = []
    for b in range(1, 6):
        f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
        X, Y = load_CIFAR_batch(f)
        xs.append(X)
        ys.append(Y)
    Xtr = np.concatenate(xs)
    Ytr = np.concatenate(ys)
    del X, Y
    Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch'))
    return Xtr, Xte, Ytr, Yte

def load_mnist(data_file="mnist.data", test_size=0.10, random_state=0):
    mnist = pickle.load(open(data_file, "rb"))
    return train_test_split(mnist['data'], mnist['target'], test_size=test_size,
                            random_state=random_state)

def load(file_name):
    if file_name == "mnist":
        print("MNIST data loaded.")
        return load_mnist(data_file="mnist.data", test_size=0.2, random_state=42)
    
    elif file_name == "cifar":
        print("CIFAR data loaded.")
        train_img, test_img, train_lbl, test_lbl = get_CIFAR10("")
    
        # covert N x 3 x 32 x 32 to N x 3072
        '''
        train_img = np.reshape(train_img, (len(train_img), 3 * 32 * 32))
        test_img = np.reshape(test_img, (len(test_img), 3 * 32 * 32))
        '''
        train_img = np.reshape(train_img, (-1, 3072))
        test_img = np.reshape(test_img, (-1, 3072))
        
        return train_img, test_img, train_lbl, test_lbl 

<b>Main Program:</b>

In [2]:
"""
Code: Main program to execute. Users can also run test.py for the same.
-----------------------------------------------------------------------
Features:
* Customizable no. of Hidden layers (activation functions and no. of neurons/units)
* Customizable Data sets (MNIST and CIFAR-10)
* Visualization of Error vs Epochs

"""

import matplotlib.pyplot as plt
from ann_classifier import Network as Network

def main():
    print("Starting Network...")
    print("-------------------------------------------------------")
    print("Reading Data sets...")
    
    # MNIST Data sets
    train_img, test_img, train_lbl, test_lbl = load(file_name="mnist")
    
    # CIFAR-10 Data sets
    #train_img, test_img, train_lbl, test_lbl = load(file_name="cifar")
    
    y = train_lbl[:].astype(int)
    x = train_img[:]/255.
    testY = test_lbl[:].astype(int)
    testX = test_img[:]/255.
    
    n_in = x.shape[1]
    n_out = 10
    
    print("-------------------------------------------------------")
    print("Training started...")
    
    nn = Network(n_in=n_in, n_out=n_out, l_rate=0.1)
    nn.add_layer(activation_function="tanh", n_neurons=64, is_dropout=True, drop_out=0.7)
    nn.add_layer(activation_function="tanh", n_neurons=32, is_dropout=False, drop_out=0.5)
    nn.add_layer(activation_function="softmax", n_neurons=n_out)
    nn.train(x,y, n_epoch=100, print_loss=True, batch_size=64)
    
    print("Training ended.")
    print("-------------------------------------------------------")
    print("Testing started...")

    train_accuracy = nn.getAccuracy(x,y)
    test_accuracy = nn.getAccuracy(testX,testY)

    print("Testing ended.")
    print("-------------------------------------------------------")
    
    print('Training Accuracy: {0:0.2f} %'.format(train_accuracy))
    print('Test Accuracy: {0:0.2f} %'.format(test_accuracy))
    
    nn.show_graph()
    
if __name__ =="__main__":
    main()


Starting Network...
-------------------------------------------------------
Reading Data sets...
MNIST data loaded.
-------------------------------------------------------
Training started...
Iteration: 0 | Loss: 2.6790060177670245
Iteration: 10 | Loss: 0.4297826865517941
Iteration: 20 | Loss: 0.2643785269717866
Iteration: 30 | Loss: 0.09177066380685146
Iteration: 40 | Loss: 0.14856892016033088
Iteration: 50 | Loss: 0.08807304070928765
Iteration: 60 | Loss: 0.05594001047213199
Iteration: 70 | Loss: 0.10461389101180359
Iteration: 80 | Loss: 0.07189120708043555
Iteration: 90 | Loss: 0.06609034207967879
Training ended.
-------------------------------------------------------
Testing started...
Testing ended.
-------------------------------------------------------
Training Accuracy: 99.18 %
Test Accuracy: 97.16 %


<matplotlib.figure.Figure at 0x1a0783b668>

<b>
Implementation:
</b>

<span>
The projects consist of the following modules:
<ul>
<li>
activations.py
</li><li>
ann_classifier.py
</li><li>
cost_functions.py
</li><li>
decay.py
</li><li>
layers.py
</li><li>
read_file.py
</li><li>
test.py
</li><li>
weight_initialization.py
</ul>
</span>

<span>
The project can be run by executing test.py file.  All the codes are zipped to "AshokShrestha.zip" zip file. In this zip file, datasets are not included for reducing the file size. Please copy the data sets (MNIST and CIFAR-10) into the folder containing the code.
</span>

<div>
<b> Task 1:</b> &nbsp;
<span>Divide dataset into training and testing</span>
</div>

<span>
Separate module <b>"read_file.py"</b> has been created for the above mentioned task. Import this module from main program to
read both MNIST data and CIFAR-10 data.

</span>

<b>MNIST Data:</b>

In [None]:
# Reading MNIST Data sets
train_img, test_img, train_lbl, test_lbl = load(file_name="mnist")

<b>CIFAR-10 Data</b>

In [None]:
# Reading CIFAR-10 Data sets
train_img, test_img, train_lbl, test_lbl = load(file_name="cifar")

<b>Task 2:</b> &nbsp;
<span>Flexibility of the program</span>

<span>
The separate module <b>"ann_classifier.py"</b> has been created to implement MLP with three hidden layers as per the task 
mentioned. Also user can easily add or remove hidden layer/s as required. Similarly, users can specify the desired activation
functions listed in module <b>"activations.py"</b> along with the number of units in the hidden layer.
</span>

In [None]:
    nn = Network(n_in=n_in, n_out=n_out)
    nn.add_layer(activation_function="tanh", n_neurons=15)
    nn.add_layer(activation_function="relu", n_neurons=10)
    nn.add_layer(activation_function="softmax", n_neurons=n_out)

<span>
Here,<br/> 
n_in : number of input units in the network,<br/>
n_out: number of ouput units in the network,<br/>
</span>


<span>
New hidden layer can be added with <b>"add_layer()"</b> method. It takes two parameters: name of the activation function and
the number of units in the hidden layer. Users can choose activation functions from following list.
</span>
<ol>
<li>
Sigmoid
</li><li>
Bipolar Sigmoid
</li><li>
Softmax
</li><li>
Tanh
</li><li>
Relu
</li><li>
Leaky Relu
</ol>

<span>
Following code is the activation functions and their corresponding first derivative (prime) implented in
<b>"activations.py"</b> module.
</span>

In [None]:

def sigmoid(z):
    return 1.0 / (1.0 + np.exp(-z))


def sigmoid_prime(z):
    # z = sigmoid(x)
    return z * (1 - z)


def bipolar_sigmoid(z):
    return -1.0 + 2.0 / (1.0 + np.exp(-z))


def bipolar_sigmoid_prime(z):
    # z = bipolar_sigmoid(x)
    return (1.0 - np.square(z)) / 2.0


def softmax(z):
    z -= np.max(z)
    return (np.exp(z).T / np.sum(np.exp(z), axis=1)).T
    

def tanh(z):
    return np.tanh(z)


def tanh_prime(z):
    # z = tanh(x)
    return 1 - z * z


def relu(z):
    return np.maximum(z, 0)


def relu_prime(z):
    dz = np.ones_like(z)
    dz[z < 0] = 0
    return dz


def leaky_relu(z, alpha=0.01):
    return np.maximum(z, z * alpha)


def leaky_relu_prime(z, alpha=0.01):
    dz = np.ones_like(z)
    dz[z < 0] = alpha
    return dz

Users can also specify different cost functions for the network specified in <b>"cost_functions.py"</b> module. 
The list includes:
<ol>
<li>
Softmax Cost
</li><li>
Cross Entropy Cost
</li><li>
Linear Cost
</li><li>
Mean Squared Cost
</li><li>
Mean Squared Linalg Cost
</li><li>
Quadratic Cost
</li>
</ol>

In [None]:

def softmax_cost(a, p):
    correct_log_probs = -np.log(p[range(len(a)), a])
    return np.sum(correct_log_probs)


def cross_entropy_cost(a, p):
    p = np.array(p).reshape(len(a), 1)
    cost = - (a * np.log(p) + (1 - a) * np.log(1 - p))
    return np.sum(cost)


def linear_cost(a, p):
    delta = a - np.array(p).reshape(len(a), 1)
    return np.mean(delta)


def mean_square(a, p):
    delta = a - np.array(p).reshape(len(a), 1)
    error = np.sum(np.square(delta))
    return error / len(a)


def mean_square_linalg(a, p):
    delta = a - np.array(p).reshape(len(a), 1)
    error = np.linalg.norm(delta)
    return error / len(a)


def quadratic_cost(a, p):
    delta = a - np.array(p).reshape(len(a), 1)
    cost = np.square(delta, axis=0)
    return np.sum(cost) / 2

<span>
There is also option to use various weight initialization methods for each hidden layers. It is implemented in the
<b>"weight_initialization.py"</b> module. List of available weight initialization methods are:
<ol>
<li>
Xavier
</li><li>
HE
</li><li>
Interval (-0.5, 0.5)
</li><li>
Interval One (-1, 1)
</li>
</ol>
</span>

In [None]:

def xavier(p, c):
    return np.random.rand(p, c) / np.sqrt(p)


def he(p, c):
    return np.random.rand(p, c) * np.sqrt(2/(p + c))


def interval(p, c):
    return np.random.rand(p, c) - 0.5


def interval_one(p, c):
    return 2 * np.random.random((p, c)) - 1


<span>
The separate module <b>"decay.py"</b> has been implemented to adapt the learning rate (alpha) with number of epochs.
Users can choose decay from following list:
<ol>
<li>
Step Decay
</li><li>
Exponential Decay
</li>
</ol>
</span>

In [None]:
def step_decay(epoch, initial_lrate=0.1, drop=0.6, epochs_drop=1000):
    return initial_lrate * np.power(drop, np.floor((1 + epoch) / epochs_drop))


def exp_decay(epoch, initial_lrate=0.1, k=0.1):
    return initial_lrate * np.exp(-k * epoch)


<span>
In order to avoid local minima, momentum coeffiencient has been used for updating the weights. 
</span>

<span>
<b>Task 3 : </b>&nbsp;Report
</span>

<span>
For model visualization plot of cost of model vs the number of epochs is plotted.
</span>

In [None]:
plt.plot(Epochs, Cost, 'r--')
plt.title("Cost vs Epoch")
plt.xlabel("Epochs")
plt.ylabel("Training Cost")
plt.savefig("cost_vs_epochs.png")
plt.show()

<b>
Learning rate:
</b>
<span>
Testing with different learning rates.
</span>


<span>
Testing the effect of different learning rates with following parameters.<br/><br/>
Training size: 56,000<br/>
Testing size:14,000<br/>
Split size: 80% : 20%<br/>
No. of hidden layers: 3<br/>
Hidden layer 1: Tanh (17 units)<br/>
Hidden layer 2: Tanh (10 units)<br/>
Hidden layer 3: Softmax (10 units)<br/>
</span>

<b>
MNIST Data:
</b>

<img src="img/mnist_lrate.png"/>
<img src="img/mnist_lrate_accuracy.png"/>

<b>
Accuracy vs Epoch
</b>
<br/>
<span>
Computing accuracy vs Epoch for the model with following parameters:<br/><br/>
Training size: 56,000<br/>
Testing size:14,000<br/>
Split size: 80% : 20%<br/>
No. of hidden layers: 3<br/>
Hidden layer 1: Tanh (17 units)<br/>
Hidden layer 2: Tanh (10 units)<br/>
Hidden layer 3: Softmax (10 units)<br/>
</span>


<b>
MNIST Data:
</b>

<img src="img/mnist_epoch.png"/>
<img src="img/mnist_accuracy_epoch.png"/>

<b>Accuracy:</b><br/><br/>
<span>
<u>MNIST Data:</u><br/>
Training size: 56,000<br/>
Testing size:14,000<br/>
Split size: 80% : 20%<br/>
No. of epochs: 200<br/>
No. of hidden layers: 3<br/>
Hidden layer 1: Tanh (17 units)<br/>
Hidden layer 2: Tanh (10 units)<br/>
Hidden layer 3: Softmax (10 units)<br/>
Training Accuracy: 90.81 %<br/>
Test Accuracy: 90.76 %
</span>

<img src="img/mnist_cost_epochs.png"/>

<span>
<u>CIFAR Data:</u><br/>
Training size: 49,000<br/>
Testing size:1,000<br/>
No. of epochs: 500<br/>
Accuracy: 35%<br/>
No. of hidden layers: 3<br/>
Hidden layer 1: Tanh (10 units)<br/>
Hidden layer 2: Tanh (17 units)<br/>
Hidden layer 3: Softmax (10 units)<br/>
</span>

<img src="img/cifar_cost_epochs.png"/>

<b>
Testing the Code:
</b><br/>
<span>
The main file to run is <b>"test.py"</b>. Open the file to modify/update necessary details such as adding hidden layers,
testing with different activation functions, hidden layers neuron units, different weight initialization techniques,
cost functions, learning rate decay and so on.
</span>

<b>
Note:
</b>
<span>
All the codes are zipped to "AshokShrestha.zip" zip file. In this zip file, datasets are not included for reducing
the file size. Please copy the data sets (MNIST and CIFAR-10) into the folder containing the code.
</span>