# COMS 4995_002 Deep Learning Assignment 1
Due on Thursday, Feb 10, 11:59pm

This assignment can be done in groups of at most 2 students. Everyone must submit on Courseworks individually.

## Member 1: Apoorv Purwar, ap3644

## Member 2: Shreyas Mundhra, ssm2211

### Part 1) Fully Connected Neural Network - 

The best accuracy which we received for part 1 was - 

__Training Accuracy: 60.35777777777778__

__Validation Accuracy: 52.94__

Training vs Validation Split was in the ratio of 9:1

> This accuracy was obtained using *2 hidden layers of 512 nodes each, learning rate of 0.01 and batch size of 64 in 20000 iterations*.

No regularization, dropout, optimization or annealing was used to achieve these accuracy. 

The detailed results of each iteration are available below in the Part-1 section of the code.


*Results of Part-2 are at the bottom of notebook*


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy.misc
import glob
import sys
# you shouldn't need to make any more imports

In [17]:
class NeuralNetwork(object):
    """
    Abstraction of neural network.
    Stores parameters, activations, cached values. 
    Provides necessary functions for training and prediction. 
    """
    def __init__(self, layer_dimensions, drop_prob=0.0, reg_lambda=0.0):
        """
        Initializes the weights and biases for each layer
        :param layer_dimensions: (list) number of nodes in each layer
        :param drop_prob: drop probability for dropout layers. Only required in part 2 of the assignment
        :param reg_lambda: regularization parameter. Only required in part 2 of the assignment
        """
        np.random.seed(1)
        
        self.num_layers = len(layer_dimensions) #Length of tuple is the number of layers
        self.parameters = {} 
        for i in range(1, self.num_layers):
            curr_layer_dim = layer_dimensions[i]
            prev_layer_dim = layer_dimensions[i-1]
            epsilon = np.sqrt(2.0 / (curr_layer_dim + prev_layer_dim))
            # Random Initialisation of weights
            weight = np.random.randn(curr_layer_dim, prev_layer_dim) * epsilon
            # Random Initialisation of bias
            bias = np.zeros((curr_layer_dim, 1)) + 0.01 # added to make sure relus fire in the start
            self.parameters[i] = [weight, bias]
        self.drop_prob = drop_prob
        self.reg_lambda = reg_lambda  
        
    def affineForward(self, A, W, b):
        """
        Forward pass for the affine layer.
        :param A: input matrix, shape (L, S), where L is the number of hidden units in the previous layer and S is
        the number of samples
        :returns: the affine product WA + b, along with the cache required for the backward pass
        """
        y = np.matmul(W,A) + b
        cache = [A,y,W]
        return y, cache
        

    def activationForward(self, A, activation="relu"):
        """¬
        Common interface to access all activation functions.
        :param A: input to the activation function
        :param prob: activation funciton to apply to A. Just "relu" for this assignment.
        :returns: activation(A)
        """ 
        activation = self.relu(A)  # since no other activation fn exists
        return activation


    def relu(self, X):
        A = np.maximum(0, X)
        return A
            
    def dropout(self, A, prob):
        """
       :param A: 
       :param prob: drop prob
       :returns: tuple (A, M) 
            WHERE
            A is matrix after applying dropout
            M is dropout mask, used in the backward pass
        """
        
        M = (np.random.rand(*A.shape) > prob)*1.0/(1-prob)
        A = A*M
        return A, M


    def forwardPropagation(self, X):
        """
        Runs an input X through the neural network to compute activations
        for all layers. Returns the output computed at the last layer along
        with the cache required for backpropagation.
        :returns: (tuple) AL, cache
            WHERE 
            AL is activation of last layer
            cache is cached values for each layer that
                     are needed in further steps
        """
        cache = {}
        prev_activation = X 
        for i in range(1, self.num_layers-1):
            nodes = layer_dimensions[i]
            curr_activations = np.zeros((nodes,X.shape[1])) # Get the column count of X 
            curr_activations, cache[i] = \
            self.affineForward(prev_activation, self.parameters[i][0], self.parameters[i][1])
            curr_activations = self.activationForward(curr_activations)
            
            # dropout
            curr_activations, M = self.dropout(curr_activations, self.drop_prob)
            cache[i] = [cache[i][0],cache[i][1],cache[i][2],M]    
            
            prev_activation = curr_activations
            
        
        nodes = layer_dimensions[self.num_layers-1]
        curr_activations = np.zeros((nodes,X.shape[1])) # Get the column count of X 
        curr_activations, cache[self.num_layers-1] = \
        self.affineForward(prev_activation, self.parameters[self.num_layers-1][0], self.parameters[self.num_layers-1][1])
        
        AL = curr_activations

        return AL, cache

    def softmax(self, X):
        return np.exp(X) / np.sum(np.exp(X),axis=0) #changes input into probabilities

    def regularize(self, reg='l1'):
        reg_term = 0
        if reg == 'l1':
            for i in range(1,self.num_layers):
                reg_term = reg_term + self.reg_lambda*np.sum(np.absolute(self.parameters[i][0]))
        elif reg == 'l2':
            for i in range(1, self.num_layers):
                reg_term = reg_term + 0.5*self.reg_lambda*np.sum(np.square(self.parameters[i][0]))
        return reg_term
            
    def costFunction(self, AL, y):
        """
        :param AL: Activation of last layer, shape (num_classes, S)
        :param y: labels, shape (S)
        :param alpha: regularization parameter
        :returns cost, dAL: A scalar denoting cost and the gradient of cost
        """
        
        numberSamples = AL.shape[1]
        AL = AL - np.max(AL, axis=0, keepdims=True) #shifted values 
        AL_prob = self.softmax(AL)
        correct_label_prob = AL_prob[y, np.arange(numberSamples)]
        logProb = np.log(correct_label_prob)
        cost = -np.sum(logProb) / numberSamples
        # gradient of cost
        AL_prob[y, np.arange(numberSamples)] -= 1
        dAL = AL_prob/float(numberSamples)
        
        if self.reg_lambda > 0:
            # add regularization
            cost = cost + self.regularize(reg='l2')
            
        return cost, dAL
            

    def affineBackward(self, dA_prev, cache):
        """
        Backward pass for the affine layer.
        :param dA_prev: gradient from the next layer.
        :param cache: cache returned in affineForward
        :returns dA: gradient on the input to this layer
                 dW: gradient on the weights
                 db: gradient on the bias
        """
        A = cache[0]
        y = cache[1]
        W = cache[2]
                
        S = y.shape[1]
                
        dy = dA_prev
        dA = np.matmul(W.transpose(),dy)
        
        dW = np.matmul(dy, A.transpose())
        db = np.sum(dy, axis=1)
        db = db.reshape((len(db), 1))
        return dA, dW, db


    def activationBackward(self, dA, cache, activation="relu"):
        """
        Interface to call backward on activation functions.
        In this case, it's just relu. 
        """
        dx = self.relu_derivative(dA,cache[1])
        return dx

        
    def relu_derivative(self, dx, cached_x):
        nodes, S = cached_x.shape
        cached_x_derivative = np.zeros((nodes,S))       
        cached_x_derivative[cached_x > 0] = 1    
        dx = np.multiply(dx,cached_x_derivative)
        return dx


    def dropout_backward(self, dA, cache):
        dA = dA*cache[3]
        return dA

    
    def reg_derivative(self, gradients, reg='l1'):
        if reg == 'l1':
            for i in range(1, self.num_layers):
                dW = gradients[i][0]
                dW = dW + self.reg_lambda*np.sign(self.parameters[i][0])
                gradients[i][0] = dW
        elif reg == 'l2':
            for i in range(1, self.num_layers):
                dW = gradients[i][0]
                dW = dW + self.reg_lambda*self.parameters[i][0]
                gradients[i][0] = dW

        return gradients

    def backPropagation(self, dAL, Y, cache):
        """
        Run backpropagation to compute gradients on all paramters in the model
        :param dAL: gradient on the last layer of the network. Returned by the cost function.
        :param Y: labels
        :param cache: cached values during forwardprop
        :returns gradients: dW and db for each weight/bias
        """
        gradients = {}
        
        S = len(Y)
        
        dA_prev, dW, db = self.affineBackward(dAL, cache[self.num_layers-1])
        gradients[self.num_layers-1] = [dW,db]
        
        for l in range(self.num_layers-2,0,-1):
            # dropout
            if self.drop_prob > 0:
                dA_prev = self.dropout_backward(dA_prev, cache[l])
                
            dA_prev = self.activationBackward(dA_prev, cache[l])
            dA, dW, db = self.affineBackward(dA_prev, cache[l])            
            gradients[l] = [dW,db]
            dA_prev = dA
           
    # REGULARIZATION
        # if self.reg_lambda > 0:
        # gradients = self.reg_derivative(gradients, reg='l2')
        
        return gradients  
    
    def momentum(self,gradients,m,iteration,beta=0.9):
        for l in range(1, self.num_layers):
            dW = gradients[l][0]
            db = gradients[l][1]
            if(iteration == 0):
                m[l] = [dW, db]
            else:
                m[l][0] = (1-beta)*dW - beta*m[l][0]
                m[l][1] = (1-beta)*db - beta*m[l][1]
            
            # momentum
            dW = m[l][0]
            db = m[l][1]
            gradients[l] = [dW, db]
            
        return gradients
    
    def rmsprop(self,gradients,s,iteration,beta=0.9,epsilon=1e-2):
        for l in range(1, self.num_layers):
            dW = gradients[l][0]
            db = gradients[l][1]
            if(iteration == 0):
                s[l] = [dW*dW, db*db]
            else:
                s[l][0] = (1-beta)*dW*dW + beta*s[l][0]
                s[l][1] = (1-beta)*db*db + beta*s[l][1]
            s1 = s[l][0]
            s2 = s[l][1]
            
            # rmsprop
            dW = dW/np.sqrt(s1 + epsilon)
            db = db/np.sqrt(s2 + epsilon)
            gradients[l] = [dW, db]
            
        return gradients, s

    def adam(self,gradients,m,s,iteration,beta1=0.9,beta2=0.999,epsilon=1e-2):
        for l in range(1, self.num_layers):
            dW = gradients[l][0]
            db = gradients[l][1]
            if(iteration == 0):
                m[l] = [dW, db]
                s[l] = [dW*dW, db*db]
            else:
                m[l][0] = (1-beta1)*dW - beta1*m[l][0]
                m[l][1] = (1-beta1)*db - beta1*m[l][1]
                s[l][0] = (1-beta2)*dW*dW + beta2*s[l][0]
                s[l][1] = (1-beta2)*db*db + beta2*s[l][1]
            s1 = s[l][0]
            s2 = s[l][1]
            m1 = m[l][0]
            m2 = m[l][1]
            
            # adam
            dW = m1/np.sqrt(s1 + epsilon)
            db = m2/np.sqrt(s2 + epsilon)
            gradients[l] = [dW, db]
            
        return gradients, m, s

    def updateParameters(self, gradients, alpha):
        """
        :param gradients: gradients for each weight/bias
        :param alpha: step size for gradient descent 
        """
        for i in range(1, self.num_layers):
            self.parameters[i][0] = self.parameters[i][0] - alpha*gradients[i][0]
            self.parameters[i][1] = self.parameters[i][1] - alpha*gradients[i][1]

            
    def train(self, X, y, iters=1000, alpha=0.0001, batch_size=100, print_every=100):

        """
        :param X: input samples, each column is a sample
        :param y: labels for input samples, y.shape[0] must equal X.shape[1]
        :param iters: number of training iterations
        :param alpha: step size for gradient descent
        :param batch_size: number of samples in a minibatch
        :param print_every: no. of iterations to print debug info after
        """
        
        # rmsprop
        s = {}
        # momentum
        m = {}
        
        S = len(y)
        train_split_percent = 90
        
        num_train = int(train_split_percent*S/100)
        num_val = S - num_train
        
        X_train = X[:, :num_train]
        y_train = y[:num_train]
        X_val = X[:, num_train:]
        y_val = y[num_train:]
        
        X_batch, y_batch = self.get_batch(X_train,y_train,batch_size)
        numBatches = len(y_batch)
        
        j = 0
        for i in range(0, iters):
            
            if (j+1%20) == 0:
                alpha = alpha*0.75

            # get minibatch
            X_mini = X_batch[i % numBatches]
            y_mini = y_batch[i % numBatches]

            # forward prop
            AL, cache = self.forwardPropagation(X_mini)

            # compute loss
            cost, dAL = self.costFunction(AL, y_mini)

            # compute gradients
            gradients = self.backPropagation(dAL, y_mini, cache)

            # update with ADAM             
            # gradients, m, s = self.adam(gradients,m,s,i,beta1=0.9,beta2=0.99,epsilon=1e-4)

            # update with RMSPROP             
            gradients, s = self.rmsprop(gradients,s,i,beta=0.9,epsilon=1e-3)

            # update with MOMENTUM
            # gradients = self.momentum(gradients,m,i,beta=0.8)
            # m = gradients

            # REGULARIZATION
            if self.reg_lambda > 0:
                gradients = self.reg_derivative(gradients, reg='l2')

            # update weights and biases based on gradient
            self.updateParameters(gradients, alpha)

            if i % print_every == 0:
                y_pred_train = self.predict(X_train)

                y_pred_val = self.predict(X_val)

                train_preds = y_pred_train
                train_accuracy = 0
                for i in range(0, len(y_train)):
                    if y_pred_train[y_train[i],i] == 1:
                        train_accuracy = train_accuracy + 1
                train_accuracy = train_accuracy*100.0/num_train

                val_accuracy = 0
                for i in range(0, len(y_val)):
                    if y_pred_val[y_val[i],i] == 1:
                        val_accuracy = val_accuracy + 1
                val_accuracy = val_accuracy*100.0/num_val

                # print cost, train and validation set accuracies
                j = j+1
                print("Iter -",j," : Loss, training accuracy, val accuracy: ", cost, train_accuracy, val_accuracy)
                
                
    def predict(self, X):
        """
        Make predictions for each sample
        """      
        drop_prob = self.drop_prob
        self.drop_prob = 0.0
        AL, cache = self.forwardPropagation(X)
        self.drop_prob = drop_prob
        
        S = AL.shape[1]
        y_pred = np.zeros(S, dtype=int)
        
        for j in range(0, S):
            y_pred[j] = np.argmax(AL[:,j])
        
        return one_hot(y_pred)

    def get_batch(self, X, y, batch_size):
        """
        Return minibatch of samples and labels
        
        :param X, y: samples and corresponding labels
        :parma batch_size: minibatch size
        :returns: (tuple) X_batch, y_batch
        """
        n = len(y)
        
        X_batch = []
        y_batch = []
        for start in range(0,n-batch_size,batch_size):
            X_batch.append(X[:,start:start+batch_size])
            y_batch.append(y[start:start+batch_size])
            
        return X_batch, y_batch

In [3]:
# Helper functions, DO NOT modify this

def get_img_array(path):
    """
    Given path of image, returns it's numpy array
    """
    return scipy.misc.imread(path)

def get_files(folder):
    """
    Given path to folder, returns list of files in it
    """
    filenames = [file for file in glob.glob(folder+'*/*')]
    filenames.sort()
    return filenames

def get_label(filepath, label2id):
    """
    Files are assumed to be labeled as: /path/to/file/999_frog.png
    Returns label for a filepath
    """
    tokens = filepath.split('/')
    label = tokens[-1].split('_')[1][:-4]
    if label in label2id:
        return label2id[label]
    else:
        sys.exit("Invalid label: " + label)

In [4]:
# Functions to load data, DO NOT change these

def get_labels(folder, label2id):
    """
    Returns vector of labels extracted from filenames of all files in folder
    :param folder: path to data folder
    :param label2id: mapping of text labels to numeric ids. (Eg: automobile -> 0)
    """
    files = get_files(folder)
    y = []
    for f in files:
        y.append(get_label(f,label2id))
    return np.array(y)

def one_hot(y, num_classes=10):
    """
    Converts each label index in y to vector with one_hot encoding
    """
    y_one_hot = np.zeros((num_classes, y.shape[0]))
    y_one_hot[y, range(y.shape[0])] = 1
    return y_one_hot

def get_label_mapping(label_file):
    """
    Returns mappings of label to index and index to label
    The input file has list of labels, each on a separate line.
    """
    with open(label_file, 'r') as f:
        id2label = f.readlines()
        id2label = [l.strip() for l in id2label]
    label2id = {}
    count = 0
    for label in id2label:
        label2id[label] = count
        count += 1
    return id2label, label2id

def get_images(folder):
    """
    returns numpy array of all samples in folder
    each column is a sample resized to 30x30 and flattened
    """
    files = get_files(folder)
    images = []
    count = 0
    
    for f in files:
        count += 1
        if count % 10000 == 0:
            print("Loaded {}/{}".format(count,len(files)))
        img_arr = get_img_array(f)
        img_arr = img_arr.flatten() / 255.0
        images.append(img_arr)
    X = np.column_stack(images)

    return X

def get_train_data(data_root_path):
    """
    Return X and y
    """
    train_data_path = data_root_path + 'train'
    id2label, label2id = get_label_mapping(data_root_path+'labels.txt')
    print(label2id)
    X = get_images(train_data_path)
    y = get_labels(train_data_path, label2id)
    return X, y

def save_predictions(filename, y):
    """
    Dumps y into .npy file
    """
    np.save(filename, y)

In [5]:
# Load the data
data_root_path = './cifar10-hw1/'
X_train, y_train = get_train_data(data_root_path) # this may take a few minutes
X_test = get_images(data_root_path + 'test')
print('Data loading done')

{'airplane': 0, 'automobile': 1, 'bird': 2, 'cat': 3, 'deer': 4, 'dog': 5, 'frog': 6, 'horse': 7, 'ship': 8, 'truck': 9}
Loaded 10000/50000
Loaded 20000/50000
Loaded 30000/50000
Loaded 40000/50000
Loaded 50000/50000
Loaded 10000/10000
Data loading done


## Part 1

#### Simple fully-connected deep neural network

In [37]:
layer_dimensions = [X_train.shape[0], 512, 512, 10]  # including the input and output layers
NN = NeuralNetwork(layer_dimensions)
NN.train(X_train, y_train, iters=20000, alpha=0.01, batch_size=64, print_every=100)

Iter - 1  : Loss, training accuracy, val accuracy:  2.39856000715 10.266666666666667 10.88
Iter - 2  : Loss, training accuracy, val accuracy:  2.11956402805 27.11111111111111 26.66
Iter - 3  : Loss, training accuracy, val accuracy:  1.91374630564 30.706666666666667 30.72
Iter - 4  : Loss, training accuracy, val accuracy:  1.98365215525 33.00222222222222 33.18
Iter - 5  : Loss, training accuracy, val accuracy:  2.00216803491 34.82222222222222 33.94
Iter - 6  : Loss, training accuracy, val accuracy:  1.80199660942 36.16222222222222 35.72
Iter - 7  : Loss, training accuracy, val accuracy:  1.69810505849 36.59777777777778 36.0
Iter - 8  : Loss, training accuracy, val accuracy:  1.6874360985 37.184444444444445 36.92
Iter - 9  : Loss, training accuracy, val accuracy:  1.88276793928 37.24 36.82
Iter - 10  : Loss, training accuracy, val accuracy:  1.80350906078 38.64 37.74
Iter - 11  : Loss, training accuracy, val accuracy:  1.69643538313 37.653333333333336 37.1
Iter - 12  : Loss, training acc

### Part - 1) Iteration Details:

Iter - 1  : Loss, training accuracy, val accuracy:  2.39856000715 10.266666666666667 10.88
Iter - 2  : Loss, training accuracy, val accuracy:  2.11956402805 27.11111111111111 26.66
Iter - 3  : Loss, training accuracy, val accuracy:  1.91374630564 30.706666666666667 30.72
Iter - 4  : Loss, training accuracy, val accuracy:  1.98365215525 33.00222222222222 33.18
Iter - 5  : Loss, training accuracy, val accuracy:  2.00216803491 34.82222222222222 33.94
Iter - 6  : Loss, training accuracy, val accuracy:  1.80199660942 36.16222222222222 35.72
Iter - 7  : Loss, training accuracy, val accuracy:  1.69810505849 36.59777777777778 36.0
Iter - 8  : Loss, training accuracy, val accuracy:  1.6874360985 37.184444444444445 36.92
Iter - 9  : Loss, training accuracy, val accuracy:  1.88276793928 37.24 36.82
Iter - 10  : Loss, training accuracy, val accuracy:  1.80350906078 38.64 37.74
Iter - 11  : Loss, training accuracy, val accuracy:  1.69643538313 37.653333333333336 37.1
Iter - 12  : Loss, training accuracy, val accuracy:  1.94802189156 37.297777777777775 36.2
Iter - 13  : Loss, training accuracy, val accuracy:  1.84079408844 39.937777777777775 39.68
Iter - 14  : Loss, training accuracy, val accuracy:  1.73210460588 38.64888888888889 37.8
Iter - 15  : Loss, training accuracy, val accuracy:  1.72525426764 39.63333333333333 39.3
Iter - 16  : Loss, training accuracy, val accuracy:  1.63998045251 40.95777777777778 40.42
Iter - 17  : Loss, training accuracy, val accuracy:  1.79438403567 40.708888888888886 39.7
Iter - 18  : Loss, training accuracy, val accuracy:  1.5785203133 40.05777777777778 39.04
Iter - 19  : Loss, training accuracy, val accuracy:  1.56952700632 42.45111111111111 42.0
Iter - 20  : Loss, training accuracy, val accuracy:  1.61729213918 42.28888888888889 41.66
Iter - 21  : Loss, training accuracy, val accuracy:  1.71708525265 41.544444444444444 41.04
Iter - 22  : Loss, training accuracy, val accuracy:  1.83684965666 43.34444444444444 42.7
Iter - 23  : Loss, training accuracy, val accuracy:  1.54903720739 42.035555555555554 40.82
Iter - 24  : Loss, training accuracy, val accuracy:  1.57992504393 43.282222222222224 42.36
Iter - 25  : Loss, training accuracy, val accuracy:  1.59005433228 43.2 41.76
Iter - 26  : Loss, training accuracy, val accuracy:  1.64580007933 43.053333333333335 42.98
Iter - 27  : Loss, training accuracy, val accuracy:  1.61654097206 43.78888888888889 42.24
Iter - 28  : Loss, training accuracy, val accuracy:  1.75348839194 43.78888888888889 41.82
Iter - 29  : Loss, training accuracy, val accuracy:  1.60189293713 42.57555555555555 41.62
Iter - 30  : Loss, training accuracy, val accuracy:  1.86743493355 44.62 44.14
Iter - 31  : Loss, training accuracy, val accuracy:  1.60659605379 44.955555555555556 44.5
Iter - 32  : Loss, training accuracy, val accuracy:  1.63288692223 43.26222222222222 42.68
Iter - 33  : Loss, training accuracy, val accuracy:  1.5779095838 45.64222222222222 44.72
Iter - 34  : Loss, training accuracy, val accuracy:  1.71767914712 45.568888888888885 44.54
Iter - 35  : Loss, training accuracy, val accuracy:  1.83030150529 46.013333333333335 45.28
Iter - 36  : Loss, training accuracy, val accuracy:  1.55727760039 45.42888888888889 43.8
Iter - 37  : Loss, training accuracy, val accuracy:  1.68374872876 43.72666666666667 42.36
Iter - 38  : Loss, training accuracy, val accuracy:  1.55268579333 45.797777777777775 44.76
Iter - 39  : Loss, training accuracy, val accuracy:  1.60594533481 44.88666666666666 43.78
Iter - 40  : Loss, training accuracy, val accuracy:  1.49101513079 45.931111111111115 44.64
Iter - 41  : Loss, training accuracy, val accuracy:  1.65541576177 46.43333333333333 44.66
Iter - 42  : Loss, training accuracy, val accuracy:  1.67456713585 44.906666666666666 43.28
Iter - 43  : Loss, training accuracy, val accuracy:  1.44699456598 46.775555555555556 45.46
Iter - 44  : Loss, training accuracy, val accuracy:  1.57881238088 47.708888888888886 46.5
Iter - 45  : Loss, training accuracy, val accuracy:  1.50816427786 44.89111111111111 43.62
Iter - 46  : Loss, training accuracy, val accuracy:  1.56684974859 46.8 45.28
Iter - 47  : Loss, training accuracy, val accuracy:  1.72574603472 46.74888888888889 44.8
Iter - 48  : Loss, training accuracy, val accuracy:  1.54288451974 48.68222222222222 47.08
Iter - 49  : Loss, training accuracy, val accuracy:  1.62576599526 46.37777777777778 44.4
Iter - 50  : Loss, training accuracy, val accuracy:  1.48826586298 48.284444444444446 46.82
Iter - 51  : Loss, training accuracy, val accuracy:  1.33820351124 47.41555555555556 45.3
Iter - 52  : Loss, training accuracy, val accuracy:  1.39663021925 48.784444444444446 47.12
Iter - 53  : Loss, training accuracy, val accuracy:  1.23000053711 48.27333333333333 46.36
Iter - 54  : Loss, training accuracy, val accuracy:  1.47773437072 48.922222222222224 46.96
Iter - 55  : Loss, training accuracy, val accuracy:  1.47658889397 49.01777777777778 47.22
Iter - 56  : Loss, training accuracy, val accuracy:  1.40174419993 49.82 47.18
Iter - 57  : Loss, training accuracy, val accuracy:  1.33383524886 49.76444444444444 47.8
Iter - 58  : Loss, training accuracy, val accuracy:  1.59253670357 48.62444444444444 46.98
Iter - 59  : Loss, training accuracy, val accuracy:  1.45811266125 50.22666666666667 47.82
Iter - 60  : Loss, training accuracy, val accuracy:  1.43557552556 48.85777777777778 46.3
Iter - 61  : Loss, training accuracy, val accuracy:  1.44804833794 49.55555555555556 47.52
Iter - 62  : Loss, training accuracy, val accuracy:  1.39057529763 49.31333333333333 46.78
Iter - 63  : Loss, training accuracy, val accuracy:  1.61138268697 47.16888888888889 45.16
Iter - 64  : Loss, training accuracy, val accuracy:  1.44361621379 50.33777777777778 47.8
Iter - 65  : Loss, training accuracy, val accuracy:  1.38057137941 50.07333333333333 47.92
Iter - 66  : Loss, training accuracy, val accuracy:  1.37017334832 49.68888888888889 46.76
Iter - 67  : Loss, training accuracy, val accuracy:  1.50868875946 49.233333333333334 46.74
Iter - 68  : Loss, training accuracy, val accuracy:  1.40819551679 49.66888888888889 47.38
Iter - 69  : Loss, training accuracy, val accuracy:  1.34740602698 50.42444444444445 47.98
Iter - 70  : Loss, training accuracy, val accuracy:  1.41118034037 48.20666666666666 46.58
Iter - 71  : Loss, training accuracy, val accuracy:  1.38286898097 49.904444444444444 47.42
Iter - 72  : Loss, training accuracy, val accuracy:  1.29557649128 49.80888888888889 47.62
Iter - 73  : Loss, training accuracy, val accuracy:  1.20629640393 49.83777777777778 47.26
Iter - 74  : Loss, training accuracy, val accuracy:  1.41008877349 49.81777777777778 47.2
Iter - 75  : Loss, training accuracy, val accuracy:  1.30414828203 48.22666666666667 46.2
Iter - 76  : Loss, training accuracy, val accuracy:  1.34365371478 49.931111111111115 47.14
Iter - 77  : Loss, training accuracy, val accuracy:  1.33510763094 50.897777777777776 48.76
Iter - 78  : Loss, training accuracy, val accuracy:  1.34245292505 51.28666666666667 49.0
Iter - 79  : Loss, training accuracy, val accuracy:  1.20875104512 51.18 48.42
Iter - 80  : Loss, training accuracy, val accuracy:  1.44352878448 50.873333333333335 48.14
Iter - 81  : Loss, training accuracy, val accuracy:  1.47102345037 52.022222222222226 49.12
Iter - 82  : Loss, training accuracy, val accuracy:  1.46719549226 50.46222222222222 48.16
Iter - 83  : Loss, training accuracy, val accuracy:  1.55573262773 51.80444444444444 48.72
Iter - 84  : Loss, training accuracy, val accuracy:  1.32828992927 50.40888888888889 47.76
Iter - 85  : Loss, training accuracy, val accuracy:  1.37359656232 49.25111111111111 45.46
Iter - 86  : Loss, training accuracy, val accuracy:  1.41122609413 51.28666666666667 48.46
Iter - 87  : Loss, training accuracy, val accuracy:  1.52214367388 52.19777777777778 49.26
Iter - 88  : Loss, training accuracy, val accuracy:  1.50695439766 50.45111111111111 47.46
Iter - 89  : Loss, training accuracy, val accuracy:  1.51163939214 51.80222222222222 48.32
Iter - 90  : Loss, training accuracy, val accuracy:  1.18297027184 51.67111111111111 48.52
Iter - 91  : Loss, training accuracy, val accuracy:  1.66774011824 53.41777777777778 49.96
Iter - 92  : Loss, training accuracy, val accuracy:  1.20660873645 52.12888888888889 48.6
Iter - 93  : Loss, training accuracy, val accuracy:  1.3685080492 52.60444444444445 49.32
Iter - 94  : Loss, training accuracy, val accuracy:  1.24960681299 53.26222222222222 49.38
Iter - 95  : Loss, training accuracy, val accuracy:  1.34110487766 52.964444444444446 49.58
Iter - 96  : Loss, training accuracy, val accuracy:  1.499345652 53.56 49.94
Iter - 97  : Loss, training accuracy, val accuracy:  1.24117189036 53.59111111111111 49.86
Iter - 98  : Loss, training accuracy, val accuracy:  1.11607803613 53.58222222222222 49.54
Iter - 99  : Loss, training accuracy, val accuracy:  1.42956022295 53.64222222222222 49.92
Iter - 100  : Loss, training accuracy, val accuracy:  1.32593363595 53.23777777777778 49.54
Iter - 101  : Loss, training accuracy, val accuracy:  1.38450834819 51.73111111111111 48.54
Iter - 102  : Loss, training accuracy, val accuracy:  1.43178103856 50.92666666666667 47.58
Iter - 103  : Loss, training accuracy, val accuracy:  1.13279423074 54.431111111111115 50.84
Iter - 104  : Loss, training accuracy, val accuracy:  1.27469418556 52.437777777777775 48.6
Iter - 105  : Loss, training accuracy, val accuracy:  1.4437867852 50.94888888888889 46.8
Iter - 106  : Loss, training accuracy, val accuracy:  1.34323949976 53.18666666666667 49.64
Iter - 107  : Loss, training accuracy, val accuracy:  1.43266172513 55.41111111111111 51.18
Iter - 108  : Loss, training accuracy, val accuracy:  1.21225805588 48.995555555555555 45.68
Iter - 109  : Loss, training accuracy, val accuracy:  1.29400355803 53.98222222222222 49.28
Iter - 110  : Loss, training accuracy, val accuracy:  1.21562945817 52.70666666666666 48.8
Iter - 111  : Loss, training accuracy, val accuracy:  1.23858605332 53.14222222222222 49.18
Iter - 112  : Loss, training accuracy, val accuracy:  1.34602219696 53.526666666666664 49.2
Iter - 113  : Loss, training accuracy, val accuracy:  1.13261189067 53.40222222222222 49.4
Iter - 114  : Loss, training accuracy, val accuracy:  1.4984337524 54.07555555555555 50.42
Iter - 115  : Loss, training accuracy, val accuracy:  1.18663346864 53.17777777777778 48.78
Iter - 116  : Loss, training accuracy, val accuracy:  1.29086079758 55.56 51.5
Iter - 117  : Loss, training accuracy, val accuracy:  1.51608262259 54.47555555555556 50.46
Iter - 118  : Loss, training accuracy, val accuracy:  1.16661029511 55.431111111111115 51.14
Iter - 119  : Loss, training accuracy, val accuracy:  1.26780806602 53.60444444444445 49.0
Iter - 120  : Loss, training accuracy, val accuracy:  1.18686567633 54.58444444444444 50.4
Iter - 121  : Loss, training accuracy, val accuracy:  1.0740039754 55.4 51.02
Iter - 122  : Loss, training accuracy, val accuracy:  1.1399696055 54.715555555555554 50.04
Iter - 123  : Loss, training accuracy, val accuracy:  1.41576132635 55.111111111111114 49.76
Iter - 124  : Loss, training accuracy, val accuracy:  1.27279641849 52.76 48.44
Iter - 125  : Loss, training accuracy, val accuracy:  1.40701413552 55.495555555555555 50.46
Iter - 126  : Loss, training accuracy, val accuracy:  1.17336511915 54.84222222222222 49.96
Iter - 127  : Loss, training accuracy, val accuracy:  1.30050510754 54.766666666666666 49.64
Iter - 128  : Loss, training accuracy, val accuracy:  1.53366217913 55.9 51.34
Iter - 129  : Loss, training accuracy, val accuracy:  1.28801648176 56.78888888888889 51.56
Iter - 130  : Loss, training accuracy, val accuracy:  1.07292170603 56.117777777777775 51.24
Iter - 131  : Loss, training accuracy, val accuracy:  1.0723979669 55.04888888888889 50.0
Iter - 132  : Loss, training accuracy, val accuracy:  1.28393162712 56.80222222222222 51.04
Iter - 133  : Loss, training accuracy, val accuracy:  1.27317884423 55.15111111111111 49.64
Iter - 134  : Loss, training accuracy, val accuracy:  1.14538216224 52.44 47.64
Iter - 135  : Loss, training accuracy, val accuracy:  1.30087169582 56.40888888888889 50.84
Iter - 136  : Loss, training accuracy, val accuracy:  1.2360479106 54.282222222222224 49.5
Iter - 137  : Loss, training accuracy, val accuracy:  1.13133926618 56.24666666666667 51.24
Iter - 138  : Loss, training accuracy, val accuracy:  1.08190547282 54.84222222222222 50.08
Iter - 139  : Loss, training accuracy, val accuracy:  1.27372759673 55.715555555555554 50.82
Iter - 140  : Loss, training accuracy, val accuracy:  1.5086079831 54.406666666666666 48.54
Iter - 141  : Loss, training accuracy, val accuracy:  1.24055087257 51.90888888888889 46.44
Iter - 142  : Loss, training accuracy, val accuracy:  1.45539068133 57.602222222222224 51.78
Iter - 143  : Loss, training accuracy, val accuracy:  1.19932170617 57.58222222222222 52.04
Iter - 144  : Loss, training accuracy, val accuracy:  0.888891290493 54.035555555555554 49.32
Iter - 145  : Loss, training accuracy, val accuracy:  1.22713118648 56.52 50.7
Iter - 146  : Loss, training accuracy, val accuracy:  1.15523199477 57.52 51.72
Iter - 147  : Loss, training accuracy, val accuracy:  1.34557957108 56.775555555555556 50.84
Iter - 148  : Loss, training accuracy, val accuracy:  1.00390027495 56.54222222222222 51.28
Iter - 149  : Loss, training accuracy, val accuracy:  1.13881648269 56.964444444444446 51.08
Iter - 150  : Loss, training accuracy, val accuracy:  1.14744395387 57.013333333333335 51.6
Iter - 151  : Loss, training accuracy, val accuracy:  0.960630895391 56.48444444444444 50.98
Iter - 152  : Loss, training accuracy, val accuracy:  1.10573396375 57.986666666666665 51.7
Iter - 153  : Loss, training accuracy, val accuracy:  1.2180094434 57.593333333333334 51.46
Iter - 154  : Loss, training accuracy, val accuracy:  1.23931068502 58.028888888888886 51.6
Iter - 155  : Loss, training accuracy, val accuracy:  1.17072194902 56.26444444444444 50.42
Iter - 156  : Loss, training accuracy, val accuracy:  1.28074549687 57.904444444444444 51.34
Iter - 157  : Loss, training accuracy, val accuracy:  1.26089171125 57.18 51.58
Iter - 158  : Loss, training accuracy, val accuracy:  1.36953248374 58.49111111111111 52.54
Iter - 159  : Loss, training accuracy, val accuracy:  1.21787839093 55.06 49.7
Iter - 160  : Loss, training accuracy, val accuracy:  1.03501755019 59.00666666666667 52.42
Iter - 161  : Loss, training accuracy, val accuracy:  1.21335067052 56.315555555555555 49.8
Iter - 162  : Loss, training accuracy, val accuracy:  1.4997879886 59.035555555555554 52.04
Iter - 163  : Loss, training accuracy, val accuracy:  0.986268678216 56.59111111111111 50.08
Iter - 164  : Loss, training accuracy, val accuracy:  1.22788641257 58.98888888888889 52.08
Iter - 165  : Loss, training accuracy, val accuracy:  0.830148229663 57.64888888888889 51.58
Iter - 166  : Loss, training accuracy, val accuracy:  1.26389578485 56.528888888888886 50.28
Iter - 167  : Loss, training accuracy, val accuracy:  1.10924343915 59.471111111111114 52.7
Iter - 168  : Loss, training accuracy, val accuracy:  1.31495998595 57.611111111111114 50.86
Iter - 169  : Loss, training accuracy, val accuracy:  1.38147047586 56.535555555555554 50.36
Iter - 170  : Loss, training accuracy, val accuracy:  1.11965173558 58.94222222222222 52.74
Iter - 171  : Loss, training accuracy, val accuracy:  1.25893326868 58.62222222222222 51.44
Iter - 172  : Loss, training accuracy, val accuracy:  1.02230144153 59.36222222222222 52.36
Iter - 173  : Loss, training accuracy, val accuracy:  1.11819007347 56.45111111111111 50.0
Iter - 174  : Loss, training accuracy, val accuracy:  1.26665993371 59.92 52.76
Iter - 175  : Loss, training accuracy, val accuracy:  1.23015355966 58.784444444444446 52.26
Iter - 176  : Loss, training accuracy, val accuracy:  1.02044682734 59.02 51.32
Iter - 177  : Loss, training accuracy, val accuracy:  1.28893623413 57.04 50.5
Iter - 178  : Loss, training accuracy, val accuracy:  0.948334685011 58.08444444444444 51.26
Iter - 179  : Loss, training accuracy, val accuracy:  1.29639841753 55.18888888888889 49.3
Iter - 180  : Loss, training accuracy, val accuracy:  1.23850846306 58.215555555555554 51.1
Iter - 181  : Loss, training accuracy, val accuracy:  1.15562501119 54.99111111111111 48.18
Iter - 182  : Loss, training accuracy, val accuracy:  1.0256207713 59.65111111111111 52.9
Iter - 183  : Loss, training accuracy, val accuracy:  1.06968177112 58.58444444444444 51.08
Iter - 184  : Loss, training accuracy, val accuracy:  1.16305908692 58.38666666666666 51.46
Iter - 185  : Loss, training accuracy, val accuracy:  1.06735814742 59.53777777777778 52.24
Iter - 186  : Loss, training accuracy, val accuracy:  1.20695989629 60.35777777777778 52.94
Iter - 187  : Loss, training accuracy, val accuracy:  1.2241071439 57.14888888888889 49.42
Iter - 188  : Loss, training accuracy, val accuracy:  1.34826954264 59.275555555555556 52.76
Iter - 189  : Loss, training accuracy, val accuracy:  1.13682262354 58.471111111111114 50.54
Iter - 190  : Loss, training accuracy, val accuracy:  1.05428679614 55.45111111111111 49.18
Iter - 191  : Loss, training accuracy, val accuracy:  1.16106494028 60.48888888888889 52.64
Iter - 192  : Loss, training accuracy, val accuracy:  1.45714625502 60.053333333333335 52.24
Iter - 193  : Loss, training accuracy, val accuracy:  1.12006554229 60.86222222222222 52.46
Iter - 194  : Loss, training accuracy, val accuracy:  0.960738706078 57.98 50.7
Iter - 195  : Loss, training accuracy, val accuracy:  1.02694455849 60.397777777777776 52.68
Iter - 196  : Loss, training accuracy, val accuracy:  1.23554024906 58.977777777777774 51.36
Iter - 197  : Loss, training accuracy, val accuracy:  1.07282257278 59.846666666666664 51.74
Iter - 198  : Loss, training accuracy, val accuracy:  1.22724046637 58.87777777777778 51.56
Iter - 199  : Loss, training accuracy, val accuracy:  1.22984278929 56.98888888888889 49.04
Iter - 200  : Loss, training accuracy, val accuracy:  1.25146930467 60.64666666666667 52.24

In [40]:
y_predicted = NN.predict(X_test)
save_predictions('ans1-ap3644', y_predicted)

In [41]:
# test if your numpy file has been saved correctly
loaded_y = np.load('ans1-ap3644.npy')
np.set_printoptions(threshold=np.nan)
print(loaded_y.shape)
loaded_y[:10]

(10, 10000)


array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
         0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.

## Part 2: Improving the performance

In [None]:
## Data Augmentation - 

def reflect_image_horizontal(image_arr):
    image_arr = image_arr.reshape((32,32,3))
    flip_image_arr = np.fliplr(image_arr)
    return flip_image_arr

In [None]:
def augment_data(X,y,step, label_file):
    S = X.shape[1]
    j = S + 1
    for i in range(0,S,step):
        flipped_X = reflect_image_horizontal(X[:,i])
        flipped_y = y[i]
        id2label, label2id = get_label_mapping(label_file)
        directory = './aug_cifar/'
        file_name = str(j) + '_' + id2label[flipped_y] + '.png'
        scipy.misc.imsave(directory + file_name, flipped_X)
        j = j + 1

In [None]:
# augment_data(X_train, y_train, 2872, data_root_path + 'labels.txt')

In [18]:
layer_dimensions = [X_train.shape[0], 512, 512, 10]  # including the input and output layers
NN2 = NeuralNetwork(layer_dimensions, drop_prob=0.1, reg_lambda=0.0)
NN2.train(X_train, y_train, iters=20000, alpha=0.005, batch_size=512, print_every=100)

Iter - 1  : Loss, training accuracy, val accuracy:  2.4010634747 9.988888888888889 10.14
Iter - 2  : Loss, training accuracy, val accuracy:  1.96624381973 32.90888888888889 32.64
Iter - 3  : Loss, training accuracy, val accuracy:  1.90114597727 28.106666666666666 28.04
Iter - 4  : Loss, training accuracy, val accuracy:  1.77981989955 37.937777777777775 38.06
Iter - 5  : Loss, training accuracy, val accuracy:  1.76818725138 39.05111111111111 39.22
Iter - 6  : Loss, training accuracy, val accuracy:  1.78488032224 36.71111111111111 35.9
Iter - 7  : Loss, training accuracy, val accuracy:  1.63747380683 40.93555555555555 40.1
Iter - 8  : Loss, training accuracy, val accuracy:  1.725935019 43.166666666666664 42.38
Iter - 9  : Loss, training accuracy, val accuracy:  1.67803499798 41.54888888888889 41.26
Iter - 10  : Loss, training accuracy, val accuracy:  1.5734044521 44.717777777777776 44.48
Iter - 11  : Loss, training accuracy, val accuracy:  1.54526908441 44.193333333333335 43.12
Iter - 12

KeyboardInterrupt: 

### Part-2 Iteration Details: 

Iter - 1  : Loss, training accuracy, val accuracy:  2.4010634747 9.988888888888889 10.14
Iter - 2  : Loss, training accuracy, val accuracy:  1.96624381973 32.90888888888889 32.64
Iter - 3  : Loss, training accuracy, val accuracy:  1.90114597727 28.106666666666666 28.04
Iter - 4  : Loss, training accuracy, val accuracy:  1.77981989955 37.937777777777775 38.06
Iter - 5  : Loss, training accuracy, val accuracy:  1.76818725138 39.05111111111111 39.22
Iter - 6  : Loss, training accuracy, val accuracy:  1.78488032224 36.71111111111111 35.9
Iter - 7  : Loss, training accuracy, val accuracy:  1.63747380683 40.93555555555555 40.1
Iter - 8  : Loss, training accuracy, val accuracy:  1.725935019 43.166666666666664 42.38
Iter - 9  : Loss, training accuracy, val accuracy:  1.67803499798 41.54888888888889 41.26
Iter - 10  : Loss, training accuracy, val accuracy:  1.5734044521 44.717777777777776 44.48
Iter - 11  : Loss, training accuracy, val accuracy:  1.54526908441 44.193333333333335 43.12
Iter - 12  : Loss, training accuracy, val accuracy:  1.53792158665 46.602222222222224 45.04
Iter - 13  : Loss, training accuracy, val accuracy:  1.54318993834 46.32666666666667 44.78
Iter - 14  : Loss, training accuracy, val accuracy:  1.60905277252 48.11333333333334 46.2
Iter - 15  : Loss, training accuracy, val accuracy:  1.47275180317 47.45333333333333 45.16
Iter - 16  : Loss, training accuracy, val accuracy:  1.46282173816 49.41777777777778 47.42
Iter - 17  : Loss, training accuracy, val accuracy:  1.36844827545 51.553333333333335 49.02
Iter - 18  : Loss, training accuracy, val accuracy:  1.40986334452 51.282222222222224 47.94
Iter - 19  : Loss, training accuracy, val accuracy:  1.38842208054 52.38444444444445 49.78
Iter - 20  : Loss, training accuracy, val accuracy:  1.39699811094 52.208888888888886 49.1
Iter - 21  : Loss, training accuracy, val accuracy:  1.46872464644 53.937777777777775 50.54
Iter - 22  : Loss, training accuracy, val accuracy:  1.42478201322 54.54666666666667 50.68
Iter - 23  : Loss, training accuracy, val accuracy:  1.29480264587 54.05555555555556 50.6
Iter - 24  : Loss, training accuracy, val accuracy:  1.34527642901 53.83555555555556 49.44
Iter - 25  : Loss, training accuracy, val accuracy:  1.48724084518 53.76 49.62
Iter - 26  : Loss, training accuracy, val accuracy:  1.41143269774 52.37555555555556 47.74
Iter - 27  : Loss, training accuracy, val accuracy:  1.22710173433 53.8 48.82
Iter - 28  : Loss, training accuracy, val accuracy:  1.27588896514 54.82888888888889 50.9
Iter - 29  : Loss, training accuracy, val accuracy:  1.33873117665 53.83111111111111 49.72
Iter - 30  : Loss, training accuracy, val accuracy:  1.19799434437 55.437777777777775 50.32
Iter - 31  : Loss, training accuracy, val accuracy:  1.33427998589 55.66 49.84
Iter - 32  : Loss, training accuracy, val accuracy:  1.26651578448 57.446666666666665 51.56
Iter - 33  : Loss, training accuracy, val accuracy:  1.25860844778 58.81777777777778 52.94
Iter - 34  : Loss, training accuracy, val accuracy:  1.267222618 56.82666666666667 51.34
Iter - 35  : Loss, training accuracy, val accuracy:  1.23194027189 58.14 51.18
Iter - 36  : Loss, training accuracy, val accuracy:  1.34926453871 57.224444444444444 50.76
Iter - 37  : Loss, training accuracy, val accuracy:  1.1970854371 60.47555555555556 53.02
Iter - 38  : Loss, training accuracy, val accuracy:  1.19102184103 60.54888888888889 52.38
Iter - 39  : Loss, training accuracy, val accuracy:  1.16339658152 61.94 53.44
Iter - 40  : Loss, training accuracy, val accuracy:  1.22325915618 59.535555555555554 51.44
Iter - 41  : Loss, training accuracy, val accuracy:  1.19928765428 55.42444444444445 48.44
Iter - 42  : Loss, training accuracy, val accuracy:  1.07728690828 61.913333333333334 52.52
Iter - 43  : Loss, training accuracy, val accuracy:  1.05992651966 63.26444444444444 52.84
Iter - 44  : Loss, training accuracy, val accuracy:  1.06404158595 62.80444444444444 52.7
Iter - 45  : Loss, training accuracy, val accuracy:  1.10203161118 61.78888888888889 52.26
Iter - 46  : Loss, training accuracy, val accuracy:  1.10211329966 60.84222222222222 51.4
Iter - 47  : Loss, training accuracy, val accuracy:  1.10983091061 65.13777777777777 54.16
Iter - 48  : Loss, training accuracy, val accuracy:  1.26644198636 57.86222222222222 48.72
Iter - 49  : Loss, training accuracy, val accuracy:  1.03331426369 64.57333333333334 52.78
Iter - 50  : Loss, training accuracy, val accuracy:  0.988047133578 63.833333333333336 52.2
Iter - 51  : Loss, training accuracy, val accuracy:  1.02265984183 63.42888888888889 52.26
Iter - 52  : Loss, training accuracy, val accuracy:  1.01030982184 66.2311111111111 53.3
Iter - 53  : Loss, training accuracy, val accuracy:  1.15682292633 63.66888888888889 51.28
Iter - 54  : Loss, training accuracy, val accuracy:  1.02940573303 63.644444444444446 51.84
Iter - 55  : Loss, training accuracy, val accuracy:  1.01214353438 66.33555555555556 53.92
Iter - 56  : Loss, training accuracy, val accuracy:  0.977536327096 65.54888888888888 51.88
Iter - 57  : Loss, training accuracy, val accuracy:  0.964783815203 66.1288888888889 52.82
Iter - 58  : Loss, training accuracy, val accuracy:  0.952326768579 68.91555555555556 53.86
Iter - 59  : Loss, training accuracy, val accuracy:  1.00386923545 66.16222222222223 53.66
Iter - 60  : Loss, training accuracy, val accuracy:  0.951417398665 68.2088888888889 53.46
Iter - 61  : Loss, training accuracy, val accuracy:  0.926957081638 67.26222222222222 53.04
Iter - 62  : Loss, training accuracy, val accuracy:  1.04291649348 64.04888888888888 49.92
Iter - 63  : Loss, training accuracy, val accuracy:  0.929101743891 67.53111111111112 52.4
Iter - 64  : Loss, training accuracy, val accuracy:  0.897021359802 67.99111111111111 52.8
Iter - 65  : Loss, training accuracy, val accuracy:  0.967501309774 67.55777777777777 51.06
Iter - 66  : Loss, training accuracy, val accuracy:  0.84582289801 70.81333333333333 53.5
Iter - 67  : Loss, training accuracy, val accuracy:  0.913090328197 69.18444444444444 52.08
Iter - 68  : Loss, training accuracy, val accuracy:  0.900139485045 67.86666666666666 52.92
Iter - 69  : Loss, training accuracy, val accuracy:  0.937365321605 70.44 52.82
Iter - 70  : Loss, training accuracy, val accuracy:  0.881041541265 71.51555555555555 53.28
Iter - 71  : Loss, training accuracy, val accuracy:  0.826503857333 70.50888888888889 52.9
Iter - 72  : Loss, training accuracy, val accuracy:  0.839495835757 73.06666666666666 54.38
Iter - 73  : Loss, training accuracy, val accuracy:  0.935851862896 70.13333333333334 52.54
Iter - 74  : Loss, training accuracy, val accuracy:  0.886123681717 69.00888888888889 51.68
Iter - 75  : Loss, training accuracy, val accuracy:  0.902116270099 70.59555555555555 52.96
Iter - 76  : Loss, training accuracy, val accuracy:  0.908959036625 70.69111111111111 52.84
Iter - 77  : Loss, training accuracy, val accuracy:  0.72300068868 73.62 54.0
Iter - 78  : Loss, training accuracy, val accuracy:  0.83143431878 72.97333333333333 53.12
Iter - 79  : Loss, training accuracy, val accuracy:  0.759323561099 73.96222222222222 53.9
Iter - 80  : Loss, training accuracy, val accuracy:  0.807740219432 73.14444444444445 52.46
Iter - 81  : Loss, training accuracy, val accuracy:  0.765861289222 74.49333333333334 53.94
Iter - 82  : Loss, training accuracy, val accuracy:  0.765093371742 75.06 54.6
Iter - 83  : Loss, training accuracy, val accuracy:  0.818376572074 75.44222222222223 54.38
Iter - 84  : Loss, training accuracy, val accuracy:  0.746724049722 76.35555555555555 54.04
Iter - 85  : Loss, training accuracy, val accuracy:  0.702439713343 77.32666666666667 54.7
Iter - 86  : Loss, training accuracy, val accuracy:  0.739779664282 76.38888888888889 53.78
Iter - 87  : Loss, training accuracy, val accuracy:  0.830511377549 75.91777777777777 52.78
Iter - 88  : Loss, training accuracy, val accuracy:  0.834496971629 73.42444444444445 51.82
Iter - 89  : Loss, training accuracy, val accuracy:  0.662150484429 77.29555555555555 54.08
Iter - 90  : Loss, training accuracy, val accuracy:  0.690792135068 76.62 53.74
Iter - 91  : Loss, training accuracy, val accuracy:  0.677473937098 76.96444444444444 53.7
Iter - 92  : Loss, training accuracy, val accuracy:  0.821714457435 73.12222222222222 50.66
Iter - 93  : Loss, training accuracy, val accuracy:  0.731123197847 75.60888888888888 53.12
Iter - 94  : Loss, training accuracy, val accuracy:  0.666537154886 77.52666666666667 53.78
Iter - 95  : Loss, training accuracy, val accuracy:  0.700005656373 78.50444444444445 53.96
Iter - 96  : Loss, training accuracy, val accuracy:  0.638910368587 77.56222222222222 53.5
Iter - 97  : Loss, training accuracy, val accuracy:  0.616314746772 79.59555555555555 54.82
Iter - 98  : Loss, training accuracy, val accuracy:  0.731387744883 77.47333333333333 52.98
Iter - 99  : Loss, training accuracy, val accuracy:  0.634996316969 78.99777777777778 53.24
Iter - 100  : Loss, training accuracy, val accuracy:  0.716069249575 79.57777777777778 53.94
Iter - 101  : Loss, training accuracy, val accuracy:  0.713326923578 79.52444444444444 54.06
Iter - 102  : Loss, training accuracy, val accuracy:  0.656676002129 79.75333333333333 53.62
Iter - 103  : Loss, training accuracy, val accuracy:  0.671945646078 81.09555555555555 53.92
Iter - 104  : Loss, training accuracy, val accuracy:  0.659834433544 79.89777777777778 53.22
Iter - 105  : Loss, training accuracy, val accuracy:  0.655054751495 78.91777777777777 52.34
Iter - 106  : Loss, training accuracy, val accuracy:  0.763903205516 76.94 52.08
Iter - 107  : Loss, training accuracy, val accuracy:  0.627189825703 78.93111111111111 52.58
Iter - 108  : Loss, training accuracy, val accuracy:  0.572933886756 81.02 54.04

In [19]:
y_predicted2 = NN2.predict(X_test)
save_predictions('ans2-ap3644', y_predicted2)

In [20]:
# test if your numpy file has been saved correctly
loaded_y = np.load('ans2-ap3644.npy')
np.set_printoptions(threshold=np.nan)
print(loaded_y.shape)
loaded_y[:10]

(10, 10000)


array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
         0.,  1.,  1.,  1.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,
         1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.

### Part - 2) Results:

The best accuracy which we received for part 2 was - 

__Training Accuracy: 79.59555555555555__  

__Validation Accuracy: 54.82__  

> This accuracy was achieved by using RMSProp with a network of 2 hidden layers of 512 nodes each and a batch size of 512, with annealing. The initial learning rate was 0.005, followed by a step decay of 25% at every 20th iteration. The beta value for RMSProp was kept as '0.9' and epsilon was '1e-3'.
In addition to the increased accuracy, the network also converged in almost half the number of iterations (~10,000), which were required in Part-1, giving us a much efficient system.


Training vs Validation Split was in the ratio of 9:1

### The other configurations (along with their accuracies) for part - 2 which we tried to improve accuracy are presented in the following format - 

**layer_dimensions, alpha, batch_size, iters, drop_prob, reg_lambda, type, train_accuracy, test_accuracy, annealing (or without annealing)**
<br><br>
The format for different types is as follows:<br>
**MOMENTUM[beta]<br>
RMSPROP[beta, epsilon]<br>
ADAM[beta1, beta2, epsilon]<br>
DATA AUGMENTATION[step]**
<br><br>

* [X_train.shape[0], 512, 256, 128, 10], 0.001, 200, 10000, 0.3, 0.0; ADAM [0.9,0.999,1e-8], **Training - 44.57054423523343, Validation - 42.45939675174014**, annealing
* [X_train.shape[0], 512, 256, 128, 10], 0.001, 200, 10000, 0.3, 0.0; ADAM [0.9,0.999,1e-8], **Training - 44.57054423523343, Validation - 42.45939675174014**, without annealing
*    [X_train.shape[0], 512, 512, 10], 0.01, 600, 10000, 0.0, 0.0; ADAM[0.9,0.99,1e-4], **Training - 53.391797781790046, Validation - 42.98143851508121**, annealing
*    [X_train.shape[0], 512, 512, 10], 0.01, 600, 10000, 0.0, 0.0; ADAM[0.9,0.99,1e-4], **Training - 77.97265927263348, Validation - 45.53364269141532**, annealing
*    [X_train.shape[0], 512, 512, 10], 0.01, 200, 10000, 0.0, 0.0; MOMENTUM [0.8], **Training - 72.5303069383544, Validation - 43.735498839907194**, annealing
*    [X_train.shape[0], 512, 512, 10], 0.001, 64, 10000, 0.0, 0.0; RMSPROP [0.9,1e-3], **Training - 85.0657725045138, Validation - 50.17401392111369**, without annealing
*     [X_train.shape[0], 600, 600, 10], 0.001, 64, 10000, 0.0, 0.0; RMSPROP[0.9,1e-3], **Training - 86.32318803198349, Validation - 50.29002320185615**, without annealing
* [X_train.shape[0], 600, 600, 10], 0.01, 200, 20000, 0.6, 0.0, DATA AUGMENTATION[2], 60.36800208795511, 51.35029354207436, ** Training - 60.36, Validation - 53.23**, without annealing

Other than this, we also tried data normalization, which resulted in overfitting of data, giving us very high training accuracy but a validation accuracy of ~50%.
