# COMS 4995_002 Deep Learning Assignment 1
Due on Monday, Oct 9, 11:59pm

This assignment can be done in groups of at most 3 students. Everyone must submit on Courseworks individually.

Write down the UNIs of your group (if applicable)

Member 1: Isht Dwivedi, id2303

Member 2: Abhinav Sharma, as5414

Member 3: Aayush Maini, am4810

## Parameters of our Net

### Common parameters for all parts
2 hidden layers, first with 250 nodes, second with 200 nodes

batch size is 500

train val split is 9:1. 

### For part 1:

Constant learning rate of 0.0001 for 10000 iterations

#### Results for part 1: validation accuracy 55.05% , training accuracy 66.70%
 
### For part 2:

Step decay of learning rate. Initial value of 0.0001 for 15000 iterations, after which it is divided by 10

Dropout rate of 0.2 used. 

Regularization parameter value taken as 0.1 

#### Results for part 2: validation accuracy 57.79%, training accuracy 71.95%


### Note: optional part in a seperate file named "HW1-optional-part.ipynb"


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy.misc
import glob
import sys
import copy
from time import time
t1 = time()
# you shouldn't need to make any more imports

In [2]:
class NeuralNetwork(object):
    """
    Abstraction of neural network.
    Stores parameters, activations, cached values. 
    Provides necessary functions for training and prediction. 
    """
    def __init__(self, layer_dimensions, drop_prob=0.0, reg_lambda=0.0):
        """
        Initializes the weights and biases for each layer
        :param layer_dimensions: (list) number of nodes in each layer
        :param drop_prob: drop probability for dropout layers. Only required in part 2 of the assignment
        :param reg_lambda: regularization parameter. Only required in part 2 of the assignment
        """
        np.random.seed(1)
        self.num_classes = 10
        self.layer_dimensions = layer_dimensions
        self.num_layers = len(layer_dimensions)
        #self.batch_size = 
        self.parameters = {} # layer->[W,b]
        self.best_parameters = {}
        # Xavier init : w ~ N(0,1/sqrt(fan_in)); b = 0.01
        for i in range(1,self.num_layers):
            l_cur = layer_dimensions[i]
            l_prev = layer_dimensions[i-1] 
            bias = 0.01*np.ones((l_cur,1))
            std_dev = 1/np.sqrt(l_prev)
            self.parameters[i] = [np.random.normal(0,std_dev,(l_cur,l_prev)),bias]
        
        for i in self.parameters.keys():
            self.best_parameters[i] = copy.deepcopy(self.parameters[i])
        #self.drop_prob = 
        self.reg_lambda = reg_lambda
        self.drop_prob = 1.0 - drop_prob
        # init parameters
        

    def affineForward(self, A, W, b):
        """
        Forward pass for the affine layer.
        :param A: input matrix, shape (L, S), where L is the number of hidden units in the previous layer and S is
        the number of samples
        :returns: the affine product WA + b, along with the cache required for the backward pass
        """
        y = np.matmul(W,A) + b
        cache = [A,y,W]
        return (y,cache)
        

    def activationForward(self, A, activation="relu"):
        """
        Common interface to access all activation functions.
        :param A: input to the activation function
        :param prob: activation funciton to apply to A. Just "relu" for this assignment.
        :returns: activation(A)
        """ 
        Act = self.relu(A)
        return Act

    def relu(self, X):
        A = np.maximum(0,X)
        return A
        

            
    def dropout(self, A, prob):
        """
        :param A: 
        :param prob: drop prob
        :returns: tuple (A, M) 
            WHERE
            A is matrix after applying dropout
            M is dropout mask, used in the backward pass
        """
        M = (np.random.rand(*A.shape)<prob)/prob
        A = A*M
        return A, M

    def forwardPropagation(self, X):
        """
        Runs an input X through the neural network to compute activations
        for all layers. Returns the output computed at the last layer along
        with the cache required for backpropagation.
        :returns: (tuple) AL, cache
            WHERE 
            AL is activation of last layer
            cache is cached values for each layer that
                     are needed in further steps
        """
        A_prev = X
        S = X.shape[1]
        cache = {} #layer->(A,y,W,mask)
        for i in range(1,self.num_layers):
            num_nodes = self.layer_dimensions[i]
            A_cur = np.zeros((num_nodes,S))
            A_cur,cache[i] = self.affineForward(A_prev, self.parameters[i][0], self.parameters[i][1])
            A_prev = self.activationForward(A_cur)
            if self.drop_prob > 0:
                A_prev,temp = self.dropout(A_prev,self.drop_prob)
                cache[i].append(temp)
        AL = A_cur
        return AL, cache
    
    def costFunction(self, AL, y):
        """
        :param AL: Activation of last layer, shape (num_classes, S)
        :param y: labels, shape (S)
        :param alpha: regularization parameter
        :returns cost, dAL: A scalar denoting cost and the gradient of cost
        """
        # compute loss
        cost = 0
        S = len(AL[0])
        dAL = np.zeros((self.num_classes,S))
        for i in range(S):
            indx = np.argmax(y[:,i])
            exp = np.exp(AL[:,i])
            y_hat = exp/np.sum(exp)
            val = exp[indx]/np.sum(exp)
            cost += -np.log(val)
            dAL[:,i] = y_hat - y[:,i]
        cost /= S

        
        if self.reg_lambda > 0:
            for i in range(1,self.num_layers):
                cost+= 0.5 * self.reg_lambda * np.sum(self.parameters[i][0]**2)
        return cost, dAL

    def affineBackward(self, dA_prev, cache):
        """
        Backward pass for the affine layer.
        :param dA_prev: gradient from the next layer.
        :param cache: cache returned in affineForward
        :returns dA: gradient on the input to this layer
                 dW: gradient on the weights
                 db: gradient on the bias
        """

        dA = np.matmul(cache[2].transpose(),dA_prev)
        dW = np.matmul(dA_prev,cache[0].transpose())#/len(cache[0]) #divide here or 
        db = np.sum(dA_prev, axis=1)[np.newaxis,:] # to do
        return dA, dW, db

    def activationBackward(self, dA, cache, activation="relu"):
        """
        Interface to call backward on activation functions.
        In this case, it's just relu. 
        """
        dx = self.relu_derivative(dA,cache)
        return dx
    
    def relu_derivative(self, dx, cache):
        dx_relu = dx
        dx_relu[cache[1] <= 0] = 0
        return dx_relu

    def dropout_backward(self, dA, cache):
        dA = cache[3]*dA
        return dA

    def backPropagation(self, dAL, Y, cache):
        """
        Run backpropagation to compute gradients on all paramters in the model
        :param dAL: gradient on the last layer of the network. Returned by the cost function.
        :param Y: labels
        :param cache: cached values during forwardprop
        :returns gradients: dW and db for each weight/bias
        """
        gradients = {} #layer->(dw,db)
        dA_prev = dAL
        for i in range(self.num_layers-1,0,-1): # changed to 0, as we dont need to backprop the first layer
            if self.drop_prob > 0:
                if i<self.num_layers-1:
                    dA_prev = self.dropout_backward(dA_prev,cache[i])
            dA, dW, db = self.affineBackward(dA_prev, cache[i])
            if i==1:
                # No relu after input layer
                dx = dA
            else:
                dx = self.activationBackward(dA,cache[i-1])
            dA_prev = dx
            gradients[i] = [dW,db]
           
            
        if self.reg_lambda > 0:
            # add gradients from L2 regularization to each dW
            for i in range(self.num_layers-1,0,-1): 
                gradients[i][0]+=self.reg_lambda * self.parameters[i][0]
            
        return gradients


    def updateParameters(self, gradients, alpha):
        """
        :param gradients: gradients for each weight/bias
        :param alpha: step size for gradient descent 
        """
        

        for i in range(1,self.num_layers):
            self.parameters[i][0] -= alpha*gradients[i][0]
            self.parameters[i][1] -= alpha*gradients[i][1].transpose()
        return
    
    def predict(self, X,mode ='train'):
        """
        Make predictions for each sample
        """
        if mode is 'test':
            print 'predict mode is test'
            for ke in self.best_parameters.keys():
                self.parameters[ke] = copy.deepcopy(self.best_parameters[ke])
        y_pred = np.zeros(X.shape[1],dtype=np.int)
        y_pred_prob = np.zeros((10,X.shape[1]),dtype=np.float)
        temp = self.drop_prob
        self.drop_prob = 0.0
        for i in range(X.shape[1]):
            AL = self.forwardPropagation(X[:,i][np.newaxis,:].transpose())[0]
            y_pred_prob[:,i] = AL[:,0]
            y_pred[i] = np.argmax(AL[:,0])
        self.drop_prob = temp
        return y_pred,y_pred_prob

    def train(self, X, y, iters=1000, alpha=0.0001, batch_size=100, print_every=100):
        """
        :param X: input samples, each column is a sample
        :param y: labels for input samples, y.shape[0] must equal X.shape[1]
        :param iters: number of training iterations
        :param alpha: step size for gradient descent
        :param batch_size: number of samples in a minibatch
        :param print_every: no. of iterations to print debug info after
        """
        best_val_till_now = 0 
        train_idx = []
        split_ratio = 0.9
        orig_full = range(len(y_train))
        np.random.shuffle(orig_full)
        split_idx = int((len(orig_full)*split_ratio))
        orig_train = orig_full[:split_idx]
        orig_val = orig_full[split_idx+1:]
        counter = 0
        epocs = (iters*batch_size)/len(orig_train) + 1
        for i in range(epocs):
            np.random.shuffle(orig_train)
            train_idx.extend(orig_train)
        
        for i in range(0, iters):
            if (i+2)%15000==0:
                alpha = alpha/10.0
            X_batch, y_batch = get_batch(self, X, y, batch_size,train_idx,counter)
            counter += batch_size
            y_batch = one_hot(y_batch)
            AL, cache = self.forwardPropagation(X_batch)
            # forward prop
            cost, dAL = self.costFunction(AL, y_batch)
            # compute loss
            gradients = self.backPropagation( dAL, y, cache)
            # compute gradients
            self.updateParameters(gradients, alpha)
            # update weights and biases based on gradient

            if i % print_every == 0:
                # Compute training accuracy
                X_t = X[:,orig_train]
                y_pred_t = self.predict(X_t,'train')[0]
                y_t = y[orig_train]
                tx_acc = 1 - float(np.count_nonzero(y_pred_t - y_t))/len(orig_train)
                # Compute validation accuracy
                X_v = X[:,orig_val]
                y_pred_v = self.predict(X_v,'train')[0]
                y_v = y[orig_val]
                tv_acc = 1 - float(np.count_nonzero(y_pred_v - y_v))/len(orig_val)
                
                if tv_acc>best_val_till_now:
                    best_val_till_now = tv_acc
                    train_acc_till_now = tx_acc
                    for ke in self.parameters.keys():
                        self.best_parameters[ke] = copy.deepcopy(self.parameters[ke])
                print 'Iter: ',i,'loss: ',cost,'val_accuracy: ',tv_acc,'train_accuracy: ',tx_acc
        print 'best val accuracy: ',best_val_till_now,'train accuacy at this point: ',train_acc_till_now

In [3]:
    def get_batch(self, X, y, batch_size,train_idx,counter):
        """
        Return minibatch of samples and labels

        :param X, y: samples and corresponding labels
        :parma batch_size: minibatch size
        :returns: (tuple) X_batch, y_batch
        """
        idx = train_idx[counter:counter+batch_size]
        X_batch = X[:,idx]
        y_batch = y[idx]
        return X_batch, y_batch

In [4]:
# Helper functions, DO NOT modify this

def get_img_array(path):
    """
    Given path of image, returns it's numpy array
    """
    return scipy.misc.imread(path)

def get_files(folder):
    """
    Given path to folder, returns list of files in it
    """
    filenames = [file for file in glob.glob(folder+'*/*')]
    filenames.sort()
    return filenames

def get_label(filepath, label2id):
    """
    Files are assumed to be labeled as: /path/to/file/999_frog.png
    Returns label for a filepath
    """
    tokens = filepath.split('/')
    label = tokens[-1].split('_')[1][:-4]
    if label in label2id:
        return label2id[label]
    else:
        sys.exit("Invalid label: " + label)

In [5]:
# Functions to load data, DO NOT change these

def get_labels(folder, label2id):
    """
    Returns vector of labels extracted from filenames of all files in folder
    :param folder: path to data folder
    :param label2id: mapping of text labels to numeric ids. (Eg: automobile -> 0)
    """
    files = get_files(folder)
    y = []
    for f in files:
        y.append(get_label(f,label2id))
    return np.array(y)


def one_hot(y, num_classes=10):
    """
    Converts each label index in y to vector with one_hot encoding
    """
    y_one_hot = np.zeros((y.shape[0], num_classes))
    for i in range(y.shape[0]):
        y_one_hot[i,y[i]]=1
    return y_one_hot.T

def get_label_mapping(label_file):
    """
    Returns mappings of label to index and index to label
    The input file has list of labels, each on a separate line.
    """
    with open(label_file, 'r') as f:
        id2label = f.readlines()
        id2label = [l.strip() for l in id2label]
    label2id = {}
    count = 0
    for label in id2label:
        label2id[label] = count
        count += 1
    return id2label, label2id

def get_images(folder):
    """
    returns numpy array of all samples in folder
    each column is a sample resized to 30x30 and flattened
    """
    files = get_files(folder)
    images = []
    count = 0
    
    for f in files:
        count += 1
        if count % 10000 == 0:
            print("Loaded {}/{}".format(count,len(files)))
        img_arr = get_img_array(f)
        img_arr = img_arr.flatten() / 255.0
        images.append(img_arr)
    X = np.column_stack(images)

    return X

def get_train_data(data_root_path):
    """
    Return X and y
    """
    train_data_path = data_root_path + 'train'
    id2label, label2id = get_label_mapping(data_root_path+'labels.txt')
    print(label2id)
    X = get_images(train_data_path)
    y = get_labels(train_data_path, label2id)
    return X, y

def save_predictions(filename, y):
    """
    Dumps y into .npy file
    """
    np.save(filename, y)

In [6]:
# Load the data
data_root_path = 'cifar10-hw1/'
X_train, y_train = get_train_data(data_root_path) # this may take a few minutes
X_test = get_images(data_root_path + 'test')
print('Data loading done')

{'horse': 7, 'automobile': 1, 'deer': 4, 'dog': 5, 'frog': 6, 'cat': 3, 'truck': 9, 'ship': 8, 'airplane': 0, 'bird': 2}
Loaded 10000/50000
Loaded 20000/50000
Loaded 30000/50000
Loaded 40000/50000
Loaded 50000/50000
Loaded 10000/10000
Data loading done


## Part 1

#### Simple fully-connected deep neural network

In [7]:
layer_dimensions = [X_train.shape[0],250,200,10]  # including the input and output layers
NN = NeuralNetwork(layer_dimensions)
NN.train(X_train, y_train, iters=10000, alpha=0.0001, batch_size=500, print_every=50)

Iter:  0 loss:  2.32632854468 val_accuracy:  0.143228645729 train_accuracy:  0.136888888889
Iter:  50 loss:  1.99885255311 val_accuracy:  0.316063212643 train_accuracy:  0.311222222222
Iter:  100 loss:  1.98324125513 val_accuracy:  0.304460892178 train_accuracy:  0.295066666667
Iter:  150 loss:  1.90736132054 val_accuracy:  0.325665133027 train_accuracy:  0.326577777778
Iter:  200 loss:  1.98137646225 val_accuracy:  0.313062612523 train_accuracy:  0.311333333333
Iter:  250 loss:  1.81494481428 val_accuracy:  0.353270654131 train_accuracy:  0.356755555556
Iter:  300 loss:  1.76172608178 val_accuracy:  0.371474294859 train_accuracy:  0.377688888889
Iter:  350 loss:  1.74387287098 val_accuracy:  0.358471694339 train_accuracy:  0.362422222222
Iter:  400 loss:  1.74867775335 val_accuracy:  0.344068813763 train_accuracy:  0.349355555556
Iter:  450 loss:  1.66949562584 val_accuracy:  0.383076615323 train_accuracy:  0.384733333333
Iter:  500 loss:  1.69095670865 val_accuracy:  0.367673534707 t

In [8]:
y_predicted = NN.predict(X_test,'test')[1]
save_predictions('ans1-id2303', y_predicted)

testing


In [9]:
# test if your numpy file has been saved correctly
loaded_y = np.load('ans1-id2303.npy')
print(loaded_y.shape)
loaded_y[:10]

(10, 10000)


array([[ 0.45536249,  1.63894347,  3.007115  , ..., -2.5946128 ,
        -0.10439245, -0.45016118],
       [-0.1483906 ,  3.31136591, -1.73210326, ..., -4.38615414,
        -2.0569141 , -1.77918121],
       [ 1.3475619 , -0.90090121,  0.47615873, ...,  2.10323646,
        -0.03508179,  0.80957269],
       ..., 
       [-3.03068954, -1.94352655, -1.58373465, ..., -0.19282724,
        -0.01620476,  3.62385487],
       [-1.38669692,  4.47137583,  1.83075453, ...,  0.66878847,
        -1.823124  , -1.25085093],
       [-2.29790742,  4.55268448, -4.02390731, ..., -1.70494757,
        -2.47889177, -0.71704684]])

## Part 2: Regularizing the neural network
#### Add dropout and L2 regularization

In [10]:
NN2 = NeuralNetwork(layer_dimensions, drop_prob=0.2, reg_lambda=0.1)
NN2.train(X_train, y_train, iters=20000, alpha=0.0001, batch_size=500, print_every=50)

Iter:  0 loss:  25.4297315977 val_accuracy:  0.133826765353 train_accuracy:  0.128133333333
Iter:  50 loss:  25.1579672926 val_accuracy:  0.307061412282 train_accuracy:  0.299133333333
Iter:  100 loss:  25.1592905279 val_accuracy:  0.307661532306 train_accuracy:  0.307288888889
Iter:  150 loss:  25.1618287295 val_accuracy:  0.320264052811 train_accuracy:  0.312977777778
Iter:  200 loss:  25.1313610233 val_accuracy:  0.340868173635 train_accuracy:  0.336488888889
Iter:  250 loss:  25.0945239604 val_accuracy:  0.354870974195 train_accuracy:  0.351755555556
Iter:  300 loss:  25.1083129169 val_accuracy:  0.351870374075 train_accuracy:  0.353311111111
Iter:  350 loss:  25.0930743561 val_accuracy:  0.362672534507 train_accuracy:  0.359666666667
Iter:  400 loss:  25.1289008552 val_accuracy:  0.367273454691 train_accuracy:  0.364066666667
Iter:  450 loss:  25.0967990542 val_accuracy:  0.376675335067 train_accuracy:  0.374511111111
Iter:  500 loss:  25.1778161629 val_accuracy:  0.363072614523 t

In [11]:
y_predicted2 = NN2.predict(X_test,'test')[1]
save_predictions('ans2-id2303',y_predicted2)

testing


In [18]:
t2 = time()
print 'time taken: ',t2-t1

time taken:  8545.71585608
