#### Q1. Using backpropagation method and the computational graph learned in the class to derive gradients.

$\frac{\delta L^r}{\delta w^l_{ij}}= a^{l-1}_j \frac{\delta L^r}{\delta z^l_i}$  l>1, 
$\frac{\delta L^r}{\delta w^l_{ij}}= x^{r}_j \frac{\delta L^r}{\delta z^l_i}$ for l=1 

Plugging in these equations for the given 3 layer neural network we get the following equations.

$\frac{\delta L^r}{\delta w^3_{ij}}= a^{2}_j \frac{\delta L^r}{\delta z^3_i}$, 

$\frac{\delta L^r}{\delta w^2_{ij}}= a^{1}_j \frac{\delta L^r}{\delta z^2_i}$, 

$\frac{\delta L^r}{\delta w^1_{ij}}= x_j \frac{\delta L^r}{\delta z^1_i}$

For the last layer (3), $\frac{\delta L^r}{\delta z^3_i}$ = $\frac{\delta L^r}{\delta y_i} \frac{\delta y_i}{\delta z^3_i}$ = $\hat{y}_i - y_i * \frac{e^{z^3_1}e^{z^3_2}}{(e^{z^3_1}+e^{z^3_2})^2}$

Let's initialize the weights with random numbers, compute the $\delta$ 's (errors) and derivatives with respect to the weights using backpropogation

In [407]:
import numpy as np
import math
import matplotlib.pyplot as plt
import pandas as pd

$W^1$ is a $2x3$ matrix. $W^2$ is a $3x3$ matrix. $W^3$ is a $3x2$ matrix.
X is $1x2$ vector.
Lets randomly initialize these matrices to compute $Z^1$ = $W^{1T}X$, $Z^2=W^{2T}a^1$, $Z^3$ = $W^{3T}a^2$. 
$a^1=\sigma (Z^1)$, $a^2 = \sigma(Z^2)$.
Let's randomly intialize $W^1$, $W^2$ and $W^3$ weight matrices and compute other quantities in the forward pass.

In [408]:
W1 = np.random.rand(2,3)
W2 = np.random.rand(3,3)
W3 = np.random.rand(3,2)
X = np.random.rand(1,2)
# dummy one-hot class vector
y = np.empty([1,2])
y[0,0] = 0.0
y[0,1] = 1.0

In [409]:
# Forward propogation
z1 = X.dot(W1)
a1 = sigmoid(z1)
z2 = a1.dot(W2)
a2 = sigmoid(z2)
z3 = a2.dot(W3)
exp_scores = np.exp(z3)
# softmax
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)

# Backpropogation
ones = np.ones(exp_scores.shape)
# derivative of softmax
dsoftmax = ones*(np.prod(exp_scores)/(np.sum(exp_scores)*np.sum(exp_scores)))
# derivative of loss w.r.t its inputs
dy = dsoftmax*2*(probs-y)
# error in the second hidden layer
delta3 = sigmoid(z2)*(1-sigmoid(z2))*dy.dot(W3.T)
# error in the first hidden layer
delta2 = sigmoid(z1)*(1-sigmoid(z1))*delta3.dot(W2.T)

# gradients of Loss w.r.t model parameters             
dW3 = (a2.T).dot(dy)
dW2 = (a1.T).dot(delta3)
dW1 = np.dot(X.T, delta2)

# dW3, dW2, dW1 represent the gradient of loss w.r.t to W1, W2 and W3 respectively
print(dW1) # 2x3
print(dW2) # 3x3
print(dW3) # 3x2

# Z1 = np.matmul(W1.transpose(), X)

[[ 0.00104847  0.00054933  0.00049761]
 [ 0.00193249  0.00101249  0.00091717]]
[[ 0.00504814  0.00498336  0.00348466]
 [ 0.00582539  0.00575064  0.00402119]
 [ 0.0049689   0.00490514  0.00342996]]
[[ 0.2145273  -0.2145273 ]
 [ 0.19132985 -0.19132985]
 [ 0.20871479 -0.20871479]]


#### Q2. Implement the stochastic gradient decent (SGD) algorithm learned in the class to train the above neural network.
Note: Remove the spaces around #x1#, #x2# and class from train_data.txt and test_data.txt to execute the code without errors.

In [381]:
# Activation functions
def sigmoid(z):
  return 1 / (1 + np.exp(-1*z))

def softmax(z):
    return (np.exp(z) / np.sum(np.exp(z)))

In [382]:
# Helper function to predict an output (0 or 1) and compute accuracy
def predict(model, x):
    W1, W2, W3 = model['W1'], model['W2'], model['W3']
    confusion_matrix=[[0,0],[0,0]]
    # Testing the model
    accuracy = 0
    for i, X in enumerate(test_data, 0):
        X = np.reshape(X, (1, 2))
        y = np.reshape(test_labels[i, :], (1,2))
        z1 = X.dot(W1)
        a1 = sigmoid(z1)
        z2 = a1.dot(W2)
        a2 = sigmoid(z2)
        z3 = a2.dot(W3)
        exp_scores = np.exp(z3)
        probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
        idx = np.argmax(probs, axis=1)
        accuracy+=y[0, idx]
        
        ix = idx.astype(bool)
        l = y[0].astype(bool)
        
        if l[0]:
            if not ix[0]:
                confusion_matrix[0][0] +=1
            else:
                confusion_matrix[0][1] +=1
        elif l[1]:
            if ix[0]:
                confusion_matrix[1][1] +=1
            else:
                confusion_matrix[1][0]+=1

    return confusion_matrix, (accuracy/test_data.shape[0]*100)


In [383]:
# This function learns parameters for the neural network and returns the model.
def build_model(train_data, labels, input_dim, hidden_dim, output_dim, epochs=1):
    """
    train_data: training data
    labels: training data labels in one-hot vector format
    input_dim: 2 in our case
    hidden_dim: 3 in our case
    output_dim: 2 in our case
    epochs: no. of passes through the training data
    """
    # Initialize the parameters to random values. We need to learn these.
    np.random.seed(0)
    W1 = np.random.randn(input_dim, hidden_dim) / np.sqrt(input_dim)
    W2 = np.random.randn(hidden_dim, hidden_dim) / np.sqrt(hidden_dim)
    W3 = np.random.randn(hidden_dim, output_dim) / np.sqrt(hidden_dim)

    # This is what we return at the end
    model = {}

    # Gradient descent. For each batch...
    for epoch in range(0, epochs):
        print("Training for epoch: {}".format(epoch+1))
        for i, X in enumerate(train_data):

            # Forward propagation
            X = np.reshape(X, (1, 2))
            y = np.reshape(labels[i, :], (1,2))
            z1 = X.dot(W1)
            a1 = sigmoid(z1)
            z2 = a1.dot(W2)
            a2 = sigmoid(z2)
            z3 = a2.dot(W3)

            # softmax
            exp_scores = np.exp(z3)
            probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
            ones = np.ones(exp_scores.shape)

            # loss for one example             
            loss = np.sum((y-exp_scores)*(y-exp_scores))
            
            # derivate of softmax w.r.t its inputs z3
            dsoftmax = ones*(np.prod(exp_scores)/(np.sum(exp_scores)*np.sum(exp_scores)))

            # Backpropogation
            dy = dsoftmax*2*(probs-y)
            delta3 = sigmoid(z2)*(1-sigmoid(z2))*dy.dot(W3.T)
            delta2 = sigmoid(z1)*(1-sigmoid(z1))*delta3.dot(W2.T)

            # gradients of Loss w.r.t model parameters             
            dW3 = (a2.T).dot(dy)
            dW2 = (a1.T).dot(delta3)
            dW1 = np.dot(X.T, delta2)

            # Gradient descent parameter update
            W1 += -epsilon * dW1
            W2 += -epsilon * dW2
            W3+= -epsilon * dW3

    model = { 'W1': W1, 'W2': W2, 'W3': W3}
    return model

In [384]:
def process_labels(data):
    labels = np.ndarray((data.shape[0], 2), dtype=float)
    for i,label in enumerate(data.iloc[:, 2]):
        if label == 1:
            labels[i,0] = 0
            labels[i,1] = 1
        elif label == 0:
            labels[i,0] = 1
            labels[i,1] = 0
    return labels

In [385]:
train_data = pd.read_csv('train_data.txt') # note remove spaces around the header column in train_data.txt and test_data.txt
train_data = train_data.sample(frac=1) # shuffle the training examples
train_labels = process_labels(train_data)
train_data.drop(['class'], axis = 1, inplace = True)
train_data = train_data.values

num_examples = len(train_data) # training set size
nn_input_dim = 2 # input layer dimensionality
nn_output_dim = 2 # output layer dimensionality

epsilon = 0.1 # learning rate for gradient descent


# Build a model with a 3-dimensional hidden layer
model = build_model(train_data, train_labels,  2, 3, 2, epochs=1)

# read test data
test_data = pd.read_csv('test_data.txt')
test_data = test_data.sample(frac=1)
test_labels = process_labels(test_data)
test_data.drop(['class'], axis = 1, inplace = True)
test_data = test_data.values


confusion_matrix_1, accuracy = predict(model, test_data)
# Test model
# printing test accuracy
print("Confusion Matrix: {}".format(confusion_matrix_1))
print("Testing accuracy: {}%".format(accuracy[0]))

Training for epoch: 1
Confusion Matrix: [[500, 0], [1, 499]]
Testing accuracy: 99.9%


#### Q3. Train the above the neural network on the same training data using the deep learning frameworks. You are free to choose any python package, but your code and results should presented in Jupiter Notebook. I will be using PyTorch to train the above network.

In [386]:
import torch
import torch.utils.data
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import pandas as pd
import numpy as np

Defining the hyperparameters

In [387]:
input_size = 2
hidden_size = 3
output_size = 2
batch_size = 1

Defining the network architecture

In [388]:
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size, bias=False)
        self.fc2 = nn.Linear(hidden_size, hidden_size, bias=False)
        self.fc3 = nn.Linear(hidden_size, output_size, bias=False)
        self.criterion = nn.MSELoss()
        self.criterion.sizeAverage = False

    def forward(self, x, target):
        x = F.sigmoid(self.fc1(x))
        x = F.sigmoid(self.fc2(x))
        x = F.sigmoid(self.fc3(x))
        x = F.softmax(x)
        loss = self.criterion(x, target)
        return x, loss

#### Loading data. Note that the labels need to be converted into one hot vector. If the class is 0, one hot vector will  be [1,0]. If the class is 1, one hot vector representation will be [0,1].

In [391]:
def process_labels(data):
    """
    Converts the third column of the input pandas dataframe into one-hot vectors
    """
    labels = np.ndarray((data.shape[0], 2), dtype=float)
    for i,label in enumerate(data.iloc[:, 2]):
        if label == 1:
            labels[i,0] = 0
            labels[i,1] = 1
        elif label == 0:
            labels[i,0] = 1
            labels[i,1] = 0
    labels = torch.from_numpy(labels)
    labels = labels.float()
    return labels

#### Reading data and converting it into appropriate data type for consumption by the network

In [392]:
train_data = pd.read_csv('train_data.txt')
train_data = train_data.sample(frac=1)
train_labels = process_labels(train_data)
train_data.drop(['class'], axis = 1, inplace = True)
train_data = train_data.values
train_data = torch.from_numpy(train_data)
train_data = train_data.float()

test_data = pd.read_csv('test_data.txt')
test_data = test_data.sample(frac=1)
test_labels = process_labels(test_data)
test_data.drop(['class'], axis = 1, inplace = True)
test_data = test_data.values
test_data = torch.from_numpy(test_data)
test_data = test_data.float()

train_loader = torch.utils.data.DataLoader(dataset=train_data,
                                       batch_size=batch_size,
                                       shuffle=False)

In [393]:
# Function to test the model and print accuracy
def test_model(model, test_data, test_labels):
    # Testing the model
    accuracy = 0
    confusion_matrix = [[0,0], [0,0]]
    for i, data in enumerate(test_data, 0):
        labels = test_labels[i, :]
        data, labels = Variable(data), Variable(labels)
        outputs, _ = net(data, labels)
        predicted, idx = torch.max(outputs.data, 0)
        accuracy+=labels[idx]
        ix = idx.numpy().astype(bool)
        l = labels.data.numpy().astype(bool)
        if l[0]:
            if not ix:
                confusion_matrix[0][0] +=1
            else:
                confusion_matrix[0][1] +=1
        elif l[1]:
            if ix:
                confusion_matrix[1][1] +=1
            else:
                confusion_matrix[1][0]+=1
    return confusion_matrix, (accuracy/test_data.shape[0]*100)

#### Initializing the network

In [396]:
net = Net(2, 3, 2)
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)
epochs = 2

#### Train the network

In [397]:
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)
for epoch in range(epochs):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        # get the inputs
        inputs  = data
        labels = train_labels[i, :]

        # wrap them in Variable
        inputs, labels = Variable(inputs), Variable(labels)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs, loss = net(inputs, labels)

        loss.backward()
        optimizer.step()

    print('Finished Training: epoch {}'.format(epoch))

confusion_matrix_2, accuracy = test_model(net, test_data, test_labels)
print("Confusion_matrix: {}".format(confusion_matrix))
print("Test accuracy: {}".format(accuracy[0]))

Finished Training: epoch 0
Finished Training: epoch 1
Confusion_matrix: [[499, 1], [0, 500]]
Test accuracy: Variable containing:
 99.9000
[torch.FloatTensor of size 1]



#### Q4. Evaluate models that obtained in question 2 and 3 with the testing data using metrics precision, recall and f-score. Show the details of your work.

In [398]:
confusion_matrix_1 # confusion matrix from first model

[[500, 0], [1, 499]]

In [399]:
confusion_matrix_2 # confusion matrix from second model

[[500, 0], [1, 499]]

In [400]:
TP = 499/500

Precision for class 1

In [401]:
def precision(confusion_matrix):
    """ TP/TP+FP"""
    return (confusion_matrix[1][1]/(confusion_matrix[1][0]+confusion_matrix[1][1]))

In [402]:
def recall(confusion_matrix):
    """TP/TP+FN"""
    return (confusion_matrix[1][1]/(confusion_matrix[0][1]+confusion_matrix[1][1]))

In [403]:
def fscore(precision, recall):
    return (2*precision*recall/(precision+recall))

In [404]:
precision1 = precision(confusion_matrix_1)
recall1 = recall(confusion_matrix_1)
fscore1 = fscore(precision1, recall1)

precision2 = precision(confusion_matrix_2)
recall2 = recall(confusion_matrix_2)
fscore2 = fscore(precision1, recall2)

In [405]:
print("For the neural network trained from scratch: precision = {}, recall = {} and fscore = {}".format(precision1, recall1, fscore1 ))

For the neural network trained from scratch: precision = 0.998, recall = 1.0 and fscore = 0.998998998998999


In [406]:
print("For the neural network trained using PyTorch: precision = {}, recall = {} and fscore = {}".format(precision2, recall2, fscore2 ))

For the neural network trained using PyTorch: precision = 0.998, recall = 1.0 and fscore = 0.998998998998999
