#  Part 1: CIFAR10 with fully connected layers

@Author: <b>Akshat Tyagi (at3761)</b><br>


This is the first section. 

## Goal
Here I have attempted to implement a fully connected feed forward neural network in order to classify images into correct classes using the CIFAR10 Dataset. This is an attempt to display my understanding of the workings of a neural network. To that end, implementation of the network has been done using just numpy. 
<br><br>
The main goal is to reach <b>accuracy of above 50% in the test set</b>


## Installations

The following cells are used to install dependencies that are needed

In [1]:
!pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl 

Collecting torch==0.3.0.post4 from http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl
[?25l  Downloading http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl (592.3MB)
[K    100% |████████████████████████████████| 592.3MB 97.1MB/s 
Installing collected packages: torch
Successfully installed torch-0.3.0.post4


In [0]:
# http://pytorch.org/
from os import path
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())

accelerator = 'cu80' if path.exists('/opt/bin/nvidia-smi') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.3.0.post4-{platform}-linux_x86_64.whl torchvision
import torch

In [3]:
!pip3 install torchvision



## Loading the Data

In [4]:
import torch
import torchvision
import torchvision.transforms as transforms
import numpy as np

transform = transforms.Compose(
  [
      transforms.ToTensor(), 
      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
  ]
)

# create dataset objects
trainset = torchvision.datasets.CIFAR10(root='/data', train=True, download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

# load images
trainingset_loader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
testingset_loader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=True, num_workers=2)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to /data/cifar-10-python.tar.gz
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


Modeling the data to be usable by numpy

In [0]:
enumeratedTrainset = enumerate(trainset)
X = np.empty((50000,3,32,32),dtype=float)
Y = np.empty((50000),dtype=int)
for i,v in enumeratedTrainset:
    val = v[0].numpy()
    label = v[1]
    X[i] = val
    Y[i] = label
    
mean_image = np.mean(X, axis=0)
X -= mean_image

enumeratedTestSet = enumerate(testset)
X_test = np.empty((10000,3,32,32),dtype=float)
Y_test = np.empty((10000),dtype=int)
for i,v in enumeratedTestSet:
    val = v[0].numpy()
    label = v[1]
    X_test[i] = val
    Y_test[i] = label

## Main Architecture

Designed an architecture that has the following features:
*   4 fully connected hidden layers 
*   ReLU activation at every layer
*   SGD, Learning Rate:3.113669e-01


In [0]:
class NeuralNetwork:
  
  def __init__(self, num_layers, layer_dimensions, drop_prob, reg_lambda = 0):
    
    self.reg = reg_lambda
    self.num_layers = 1 + len(layer_dimensions)
    self.params = {}
    self.weight_scale = 2.461858e-02 
    self.fc_cache = {}
    self.relu_cache = {}
    self.batch_size = 250
    
    input_dim = 3*32*32
    
    for i in range(self.num_layers - 1):
      self.params['W' + str(i+1)] = np.random.normal(0, self.weight_scale, [input_dim, layer_dimensions[i]])
      self.params['b' + str(i+1)] = np.zeros([layer_dimensions[i]])
      input_dim = layer_dimensions[i]  

    self.params['W' + str(self.num_layers)] = np.random.normal(0, self.weight_scale, [input_dim, 10])
    self.params['b' + str(self.num_layers)] = np.zeros([10])

  
  def relu (self, x):
    '''
    Relu activation function
    :param x: input value
    '''
    relu_func = lambda x: max(0,x)
    return relu_func
  
  
  def affineForward(self, A, W, b):
    '''
    The affine forward pass
    @param x: input matrix
    @param w: weight matrix
    @param b: bias matrix
    '''
    out = None
    NN = A.shape[0]
    reshaped_input = np.reshape(A, [NN, -1])
    out = np.dot(reshaped_input, W) + b
    cache = (A, W, b)
    return out, cache
  
  def activationForward(self,x):
    '''
    This function applies relu activation function
    @param x: Input
    '''
    out = None
    out = x.copy()
    out[out < 0] = 0
    cache = x
    return out, cache

  def dropout(self, A, prob):
    '''
    @param A: Input Matrix
    @param prob: A dropout probabilty percentage
    '''
    mask = (np.random.rand(*A.shape) < prob) / prob
    out = A * mask
    return out, mask

  def forwardPropogation(self, X):
    '''
    This function ties together the forward pass
    @param X: input image
    '''
    size = X.shape[0]
    X = np.reshape(X, [size, -1])  
    
    for i in range(self.num_layers-1):
        fc_act, self.fc_cache[str(i+1)] = self.affineForward(X, self.params['W'+str(i+1)], self.params['b'+str(i+1)])
        relu_act, self.relu_cache[str(i+1)] = self.activationForward(fc_act)
        X = relu_act.copy()
        
    scores, final_cache = self.affineForward(X, self.params['W'+str(self.num_layers)], self.params['b'+str(self.num_layers)])  
    return scores, final_cache
  
  
  def affineBackward (self, dAl, cache):
    '''
    This function performs the backward pass on the fully connected layers
    @param dAl: Gradient
    @param cache: The cached parameters from the forward pass
    '''
    x, w, b = cache
    dx, dw, db = None, None, None
    NN = x.shape[0]
    reshaped_x = np.reshape(x,[NN, -1])
    dx = np.dot(dAl, w.T)
    dx = np.reshape(dx, x.shape)
    dw = np.dot(reshaped_x.T,dAl)
    db = np.sum(dAl, axis=0)
    return dx, dw, db
  

  def activationBackward(self, dout, cache):
    '''
    This function is used to perfrom backward pass through the relu. Returns the gradient
    '''
    dx, x = None, cache
    relu_mask = (x >= 0)
    dx = dout * relu_mask
    return dx

  
  
  def backPropogation(self, scores, y, final_cache):
    '''
    This function basically controls the backward pass in through the layers
    @param scores: Result of the forward pass 
    @param y: Real classes
    @param final_cache: Cache to keep track of features and weights
    '''
    loss, grads = 0.0, {}
    loss, dsoft = self.softmax_loss(scores, y)
    loss += 0.5*self.reg*(np.sum(np.square(self.params['W'+str(self.num_layers)])))
    
    dx_last, dw_last, db_last = self.affineBackward(dsoft, final_cache)
    
    grads['W'+str(self.num_layers)] = dw_last + self.reg*self.params['W'+str(self.num_layers)]
    grads['b'+str(self.num_layers)] = db_last

    for i in range(self.num_layers-1, 0, -1):
        drelu = self.activationBackward(dx_last, self.relu_cache[str(i)])
        dx_last, dw_last, db_last = self.affineBackward(drelu, self.fc_cache[str(i)])
        grads['W' + str(i)] = dw_last + self.reg * self.params['W' + str(i)]
        grads['b' + str(i)] = db_last
        loss += 0.5 * self.reg * (np.sum(np.square(self.params['W' + str(i)])))

    return loss, grads    
    
  
  def softmax_loss(self, x, y):
    '''
    This function implements the softmax function
    @param x: Inputs
    @param y: Output class
    '''
    shiftedLogits = x - np.max(x, axis=1, keepdims=True)
    Z = np.sum(np.exp(shiftedLogits), axis=1, keepdims=True)
    log_probs = shiftedLogits - np.log(Z)
    probs = np.exp(log_probs)
    N = x.shape[0]
    loss = -np.sum(log_probs[np.arange(N), y]) / N
    dx = probs.copy()
    dx[np.arange(N), y] -= 1
    dx /= N
    return loss, dx

  
  def train(self, X, y, print_every=20, num_epochs=20, batch_size=250, alpha = 3.113669e-01):
    '''
    This is the training function where the main loop to train the designed architecture recides
    @param X: The training set
    @param y: The classes
    @param print_every: Determines the number of iterations after which feedback is provided
    @param num_epochs: The total number of epochs
    @param batch_size: The batch size in use during the training process
    @param alpha: The learning parameter
    '''
    num_train = X.shape[0]
    iterations_per_epoch = max(num_train // batch_size , 1)
    
    num_iterations= 10000
    print("Total iterations:"+ str(num_iterations))

    for i in range(num_iterations):

      num_train = X.shape[0]
      batch_mask = np.random.choice(num_train, self.batch_size)
      X_batch = X[batch_mask]
      y_batch = y[batch_mask]
      scores, final_cache = self.forwardPropogation(X_batch)
      loss, grads = self.backPropogation(scores, y_batch, final_cache)
      self.updateParameters(grads,alpha)
      
      if i % print_every == 0:
        print(str(i)+"/"+str(num_iterations)+" iterations done. At i="+str(i)+" => | Loss: "+str(loss) + " | Accuracy:"+str(self.accuracy(np.argmax(scores,axis=1),y_batch)))
        
    print("Done")
      
  def updateParameters(self,gradients,alpha):
    '''
    This function updates the gradients obtained using gradient descent
    @param gradients: Computed gradients through backpropogation
    @param alpha: The learning rate for gradient descent
    '''
    for p, w in self.params.items():
      dw = gradients[p]
      prev_dw = dw
      self.params[p] = self.params[p] - dw * alpha #alpha*dw
           
  def predict(self,X_test):
    '''
    This function is used to test the trined classifier by performing predictions on the test set
    @param X_test: The test set
    '''
    y_pred = []
    scores, cache = self.forwardPropogation(X_test)    
    y_pred.append(np.argmax(scores, axis=1))                 
    return y_pred
  
  def accuracy(self,y_pred,y_test):
    '''
    This function tests the accuracy of the trained model
    @param y_pred: Predicted class 
    @param y_test: Actual Class
    '''
    y_pred = np.hstack(y_pred)
    acc = np.mean(y_pred == y_test)
    return acc

## Training

In [11]:
a = NeuralNetwork(4, [500, 350, 100, 50], 0.05)
a.train(X,Y)
y_pred = a.predict(X_test)


Total iterations:10000
0/10000 iterations done. At i=0 => | Loss: 2.302830998322575 | Accuracy:0.072
20/10000 iterations done. At i=20 => | Loss: 2.2994755221733363 | Accuracy:0.144
40/10000 iterations done. At i=40 => | Loss: 2.302746974420147 | Accuracy:0.092
60/10000 iterations done. At i=60 => | Loss: 2.3026644416145445 | Accuracy:0.124
80/10000 iterations done. At i=80 => | Loss: 2.2975250535943297 | Accuracy:0.092
100/10000 iterations done. At i=100 => | Loss: 2.2864587852581457 | Accuracy:0.132
120/10000 iterations done. At i=120 => | Loss: 2.1996987420876875 | Accuracy:0.168
140/10000 iterations done. At i=140 => | Loss: 2.074822681753758 | Accuracy:0.196
160/10000 iterations done. At i=160 => | Loss: 1.950701646514746 | Accuracy:0.212
180/10000 iterations done. At i=180 => | Loss: 1.9340956424996398 | Accuracy:0.244
200/10000 iterations done. At i=200 => | Loss: 1.9242730614541634 | Accuracy:0.276
220/10000 iterations done. At i=220 => | Loss: 1.8088647606763852 | Accuracy:0.2

## Results and final comment

In [12]:
print(a.accuracy(y_pred,Y_test))

0.5095


In [0]:
def save_parameters(filename,y):
  '''
  This function is used to save the parameter
  @param filename: Name of the file where classes are stored
  @param y: The predicted values of y
  '''
  np.save(filename, y)

save_parameters('ans1.npy', np.hstack(y_pred))

In [14]:
import os
print( os.getcwd() )
print( os.listdir('/content') )

/content
['sample_data', '.config', 'data', 'ans1-at3761.npy']


In [0]:
from google.colab import files
files.download( "ans1.npy" )