# IN3063 Mathematics and Programming for AI
## Resit Coursework

### 1. Introduction
This program is a simple Neural Network that evaluates the accuracy of a training set and testing set, with the implementation of three kinds of activation methods (Sigmoid, ReLu, LeakyReLu), Softmax and Inverted Dropout.

This program is made using Python 3 through Jupyter Notebook IDE. 

### 2. Instructions

Best way to run this program is to restart the whole kernel, as the cells follow a sequential routine.

For re-running the program without restarting the whole kernel, after the initial run, run the cells from Loading Data and Training NN and below.

You’ll require both the fashion-mnist_test.zip and fashion-mnist_train.zip file for this to run, this can be downloaded from the Google Drive linked below.

#### Google Drive
https://drive.google.com/drive/folders/1s2Ni2n0RjN2gfEagrqMHpTigm511qxLO?usp=sharing

### 3. Import Libraries

Necessary libraries that are used for the Neural Network

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import gzip
import shutil
import pandas as pd
import zipfile as zp
import pickle
import os
import random
from scipy.stats import truncnorm
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker

### 4. Extracting Data and Pickling

Extract compressed datasets from Zip Files and then proceed to package the data together into a pickle (.pkl) file for easier extraction of data.

###### Make sure to run this at least once before running the below cells

In [2]:
# with open('t10k-images-idx3-ubyte', 'rb') as f_in:
#     with gzip.open('t10k-images-idx3-ubyte.gz', 'wb') as f_out:
#         shutil.copyfileobj(f_in, f_out)

num_of_labels = 10

# Assign Zip Files to variable
zf = zp.ZipFile('fashion-mnist_train.zip')
zf1 = zp.ZipFile('fashion-mnist_test.zip')

# Load CSV files from Zip Files
train_data = np.loadtxt(zf.open('fashion-mnist_train.csv'), delimiter=',')
test_data = np.loadtxt(zf1.open('fashion-mnist_test.csv'), delimiter=',')

# Display images from 1 to 10
# for i in range(10):
#     img = train_imgs[i].reshape((28,28))
#     plt.imshow(img, cmap="Greys")
#     plt.show()

# Map image data values into intervals [0.01, 0.99]
fac = 0.99 / 255
add_fac = 0.01
train_imgs = np.asfarray(train_data[:, 1:]) * fac + add_fac
test_imgs = np.asfarray(test_data[:, 1:]) *fac + add_fac
train_labels = np.asfarray(train_data[:, :1])
test_labels = np.asfarray(test_data[:, :1])

lr = np.arange(num_of_labels)
# transform labels into one hot representation
train_labels_one_hot = (lr==train_labels).astype(np.float)
test_labels_one_hot = (lr==test_labels).astype(np.float)
# we don't want zeroes and ones in the labels neither:
train_labels_one_hot[train_labels_one_hot==0] = 0.01
train_labels_one_hot[train_labels_one_hot==1] = 0.99
test_labels_one_hot[test_labels_one_hot==0] = 0.01
test_labels_one_hot[test_labels_one_hot==1] = 0.99

# Create Pickle file from previous data
with open(os.path.join(".","pkl_fashionmnist.pkl"), "bw") as fh:
    data = (train_imgs, 
            test_imgs, 
            train_labels,
            test_labels,
            train_labels_one_hot,
            test_labels_one_hot)
    pickle.dump(data, fh)


### 5. Activation Functions, Softmax, and NN Class
#### Task 1, 2, 3, 4

We have the three activation functions: sigmoid, relu, and leakyrelu. Softmax function necessary for Task 3. Neural Network class with selectable activation functions. L2 Regularization in the form of MSE, Mean Squared Error.
The base for this code comes from the Tutorial 5 part 2.

In [3]:
@np.vectorize
# Activation Functions
def sigmoid(x):
    return 1 / (1 + np.e ** -x)

def dsigmoid(x):
    return x * (1 - x)
            
def softmax(x):
    e_x = np.exp(x)
    return e_x / e_x.sum()

def dsoftmax(x, y):
    si_sj = -x * x.reshape(y, 1)
    s_der = np.diag(x) + si_sj
    return s_der
    
def relu(x):
    return np.maximum(0, x)

def drelu(x):
    return np.where(x <= 0, 0, 1)

def leaky_relu(x):
    return np.where(x >= 0, x, x * 0.01)

def dleaky_relu(x):
    out = np.ones_like(x)
    out[x < 0] *= 0.01
    return out


from scipy.stats import truncnorm
def truncated_normal(mean=0, sd=1, low=0, upp=10):
    return truncnorm((low - mean) / sd, 
                     (upp - mean) / sd, 
                     loc=mean, 
                     scale=sd)
class NeuralNetwork:
    
    def __init__(self, 
                 no_of_in_nodes, 
                 no_of_out_nodes, 
                 no_of_hidden_nodes,
                 activation_function,
                 learning_rate,
                 bias=None,
                ):  
        self.no_of_in_nodes = no_of_in_nodes
        self.no_of_out_nodes = no_of_out_nodes       
        self.no_of_hidden_nodes = no_of_hidden_nodes          
        self.learning_rate = learning_rate 
        self.bias = bias
        
        if activation_function == 'sigmoid':
            self.activation_ih = sigmoid
            self.activation_ho = sigmoid
            self.dactivation_oh = dsigmoid
            self.dactivation_hi = dsigmoid
            
        if activation_function == 'softmax':
            self.activation_ih = sigmoid
            self.activation_ho = softmax
            self.dactivation_oh = dsoftmax
            self.dactivation_hi = dsigmoid
        
        if activation_function == 'relu':
            self.activation_ih = relu
            self.activation_ho = relu
            self.dactivation_oh = drelu
            self.dactivation_hi = drelu
            
        if activation_function == 'leakyrelu':
            self.activation_ih = leaky_relu
            self.activation_ho = leaky_relu
            self.dactivation_oh = dleaky_relu
            self.dactivation_hi = dleaky_relu
        
        self.use_dropout = False
        self.active_input_percentage = 1.0
        self.active_hidden_percentage = 1.0
            
        self.create_weight_matrices()
        
    def create_weight_matrices(self):
        X = truncated_normal(mean=2, sd=1, low=-0.5, upp=0.5)
        
        bias_node = 1 if self.bias else 0

        n = (self.no_of_in_nodes + bias_node) * self.no_of_hidden_nodes
        X = truncated_normal(mean=2, sd=1, low=-0.5, upp=0.5)
        self.wih = X.rvs(n).reshape((self.no_of_hidden_nodes, 
                                                   self.no_of_in_nodes + bias_node))

        n = (self.no_of_hidden_nodes + bias_node) * self.no_of_out_nodes
        X = truncated_normal(mean=2, sd=1, low=-0.5, upp=0.5)
        self.who = X.rvs(n).reshape((self.no_of_out_nodes, 
                                                    (self.no_of_hidden_nodes + bias_node)))

    def dropout_weight_matrices(self,
                                active_input_percentage=1.0,
                                active_hidden_percentage=1.0):
        # restore wih array, if it had been used for dropout
        self.wih_orig = self.wih.copy()
        self.no_of_in_nodes_orig = self.no_of_in_nodes
        self.no_of_hidden_nodes_orig = self.no_of_hidden_nodes
        self.who_orig = self.who.copy()
        

        active_input_nodes = int(self.no_of_in_nodes * active_input_percentage)
        active_input_indices = sorted(random.sample(range(0, self.no_of_in_nodes), 
                                      active_input_nodes))
        active_hidden_nodes = int(self.no_of_hidden_nodes * active_hidden_percentage)
        active_hidden_indices = sorted(random.sample(range(0, self.no_of_hidden_nodes), 
                                       active_hidden_nodes))
        
        self.wih = self.wih[:, active_input_indices][active_hidden_indices]       
        self.who = self.who[:, active_hidden_indices]
        
        self.no_of_hidden_nodes = active_hidden_nodes
        self.no_of_in_nodes = active_input_nodes
        return active_input_indices, active_hidden_indices
    
    def weight_matrices_reset(self, 
                              active_input_indices, 
                              active_hidden_indices):
        
        # resets weight matrices back to original
        # self.wih and self.who variables are updated from the active nodes
 
        temp = self.wih_orig.copy()[:,active_input_indices]
        temp[active_hidden_indices] = self.wih
        self.wih_orig[:, active_input_indices] = temp
        self.wih = self.wih_orig.copy()

        self.who_orig[:, active_hidden_indices] = self.who
        self.who = self.who_orig.copy()
        self.no_of_in_nodes = self.no_of_in_nodes_orig
        self.no_of_hidden_nodes = self.no_of_hidden_nodes_orig
        
    
    def train_single(self, input_vector, target_vector):
        # Forward Propagation
        # Input Layer to Hidden Layer
        if self.bias:
            # adding bias node to the end of the input_vector
            input_vector = np.concatenate( (input_vector, [self.bias]) )

        input_vector = np.array(input_vector, ndmin=2).T
        target_vector = np.array(target_vector, ndmin=2).T

        output_vector1 = np.dot(self.wih, input_vector) # linear
        output_vector_hidden = self.activation_ih(output_vector1) # non-linear
        
        # Hidden Layer to Output Layer
        if self.bias:
            output_vector_hidden = np.concatenate( (output_vector_hidden, [[self.bias]]) )
        
        output_vector2 = np.dot(self.who, output_vector_hidden) # linear
        output_vector_network = self.activation_ho(output_vector2) # non-linear
        
        # Backward Propagation
        # Output Layer to Hidden Layer
        output_errors = target_vector - output_vector_network
        
        # if using softmax
        if self.dactivation_oh == dsoftmax:
            ovn = output_vector_network.reshape(output_vector_network.size,)
            dsoftmax_output = dsoftmax(ovn, self.no_of_out_nodes)
            tmp = np.dot(dsoftmax_output, output_errors)
            
        else:
            tmp = output_errors * self.dactivation_oh(output_vector_network) # derivative
        
        if self.use_dropout:
            tmp = tmp / self.active_hidden_percentage # Inverted Dropout
            
        tmp = self.learning_rate  * np.dot(tmp, output_vector_hidden.T)
        self.who += tmp # update hidden-output weights

        # Hidden Layer to Input Layer
        # calculate hidden errors:
        hidden_errors = np.dot(self.who.T, output_errors)
        # update the weights:
        tmp = hidden_errors * self.dactivation_hi(output_vector_hidden) # derivative
        
        if self.use_dropout:
            tmp = tmp / self.active_input_percentage # Inverted Dropout
            
        if self.bias:
            x = np.dot(tmp, input_vector.T)[:-1,:] 
        else:
            x = np.dot(tmp, input_vector.T)
        self.wih += self.learning_rate * x # update input-hidden weights
        
    def train(self, data_array, 
              labels_one_hot_array,
              epochs=1,
              active_input_percentage=1.0,
              active_hidden_percentage=1.0,
              no_of_dropout_tests = 1, 
              use_dropout = False):
        
        self.use_dropout = use_dropout
        self.active_input_percentage = active_input_percentage
        self.active_hidden_percentage = active_hidden_percentage

        partition_length = int(len(data_array) / no_of_dropout_tests)
        
        # if using dropout
        if self.use_dropout:
            for epoch in range(epochs):
                print("epoch: ", epoch)
                for start in range(0, len(data_array), partition_length):
                    active_in_indices, active_hidden_indices = \
                               self.dropout_weight_matrices(active_input_percentage,
                                                            active_hidden_percentage)
                    for i in range(start, start + partition_length):
                        self.train_single(data_array[i][active_in_indices], 
                                         labels_one_hot_array[i]) 

                    self.weight_matrices_reset(active_in_indices, active_hidden_indices)
        else:
            for epoch in range(epochs):
                print("epoch: ", epoch)
                for i in range(len(data_array)):
                        self.train_single(data_array[i], labels_one_hot_array[i])
                        
                
                    
        
    def confusion_matrix(self, data_array, labels):
        cm = np.zeros((10, 10), int)
        for i in range(len(data_array)):
            res = self.run(data_array[i])
            res_max = res.argmax()
            target = labels[i][0]
            cm[res_max, int(target)] += 1
        return cm
    
    def precision(self, label, confusion_matrix):
        col = confusion_matrix[:, label]
        return confusion_matrix[label, label] / col.sum()
    
    def recall(self, label, confusion_matrix):
        row = confusion_matrix[label, :]
        return confusion_matrix[label, label] / row.sum()
        
    
    def run(self, input_vector):
        """ input_vector can be tuple, list or ndarray """
        
        input_vector = np.array(input_vector, ndmin=2).T
        output_vector = np.dot(self.wih, 
                               input_vector)
        output_vector = self.activation_ih(output_vector)
        
        output_vector = np.dot(self.who, 
                               output_vector)
        output_vector = self.activation_ho(output_vector)
    
        return output_vector
    
    def evaluate(self, data, labels):
        corrects, wrongs = 0, 0
        for i in range(len(data)):
            res = self.run(data[i])
            res_max = res.argmax()
            if res_max == labels[i]:
                corrects += 1
            else:
                wrongs += 1
        return corrects, wrongs
    
    def mean_squared_error(self, data, labels):
        actual = labels
        predicted = [None] * len(actual)
        
        corrects, wrongs = 0, 0
        for i in range(len(data)):
            res = self.run(data[i])
            res_max = res.argmax()
            predicted[i] = res_max
        
        sum_square_error = 0.0
        for i in range(len(actual)):
            sum_square_error += (actual[i] - predicted[i])**2.0
        mean_square_error = 1.0 / len(actual) * sum_square_error
        return mean_square_error

### 6. Loading data and Training NN

Load the data from the pickle file, initialise the Neural Network with the desired arguments and then train them.

Notable arguments:

    no_of_in_nodes - original image data size (28x28) being passed through equals input size
    no_of_out_nodes - desired amount of output nodes for the data
    no_of_hidden_nodes - desired number of hidden nodes to process the data
    learning_rate - the desired learning rate
    activation_function - the desired activation functions; 
        includes: Sigmoid 'sigmoid', ReLU 'relu', and Leaky ReLU 'leakyrelu'

To use different activation functions for the Neural Network, initialise a new NeuralNetwork class and set the value of the activation_function variable as 'sigmoid' for Sigmoid, 'relu' for ReLu, and 'leakyrelu' for LeakyReLu.

For Softmax pass the activation_function variable, when initialising the Neural Network, to 'softmax'.

To use dropout, when running the Train set the use_dropout variable to True and change the active_input_percentage, active_hidden_percentage, and no_of_dropout_tests accordingly.

In [4]:
# Load data from Pickle file
with open(os.path.join(".","pkl_fashionmnist.pkl"), "br") as fh:
    data = pickle.load(fh)
train_imgs = data[0]
test_imgs = data[1]
train_labels = data[2]
test_labels = data[3]
train_labels_one_hot = data[4]
test_labels_one_hot = data[5]

img_size = 28 # dimensions
num_of_labels = 10 # 0, 1, 2, ... 9
image_pixels = img_size * img_size

Re-run from this point down.

In [5]:
epochs = 10

simple_NN = NeuralNetwork(no_of_in_nodes = image_pixels, 
                    no_of_out_nodes = 10, 
                    no_of_hidden_nodes = 100,
                    learning_rate = 0.15,
                    activation_function = 'sigmoid', 
                    bias=None)
    
simple_NN.train(train_imgs, 
                train_labels_one_hot, 
                active_input_percentage=0.7, 
                active_hidden_percentage=0.7, 
                no_of_dropout_tests = 100, 
                use_dropout = False,
                epochs=epochs)

epoch:  0
epoch:  1
epoch:  2
epoch:  3
epoch:  4
epoch:  5
epoch:  6
epoch:  7
epoch:  8
epoch:  9


In [6]:
corrects, wrongs = simple_NN.evaluate(train_imgs, train_labels)
print("accuracy train: ", corrects / ( corrects + wrongs))
corrects, wrongs = simple_NN.evaluate(test_imgs, test_labels)
print("accuracy test: ", corrects / ( corrects + wrongs))
print()

train_cm = simple_NN.confusion_matrix(train_imgs, train_labels)
print("Training confusion matrix: \n", train_cm)
print()

test_cm = simple_NN.confusion_matrix(test_imgs, test_labels)
print("Testing confusion matrix: \n", test_cm)
print()

print("Training Label Accuracy")
for i in range(10):
    print("digit: ", i, "precision: ", simple_NN.precision(i, train_cm), "recall: ", simple_NN.recall(i, train_cm))

print()
print("Testing Label Accuracy")
for i in range(10):
    print("digit: ", i, "precision: ", simple_NN.precision(i, test_cm), "recall: ", simple_NN.recall(i, test_cm))

print()
train_mse = simple_NN.mean_squared_error(train_imgs, train_labels)
print("Training mean square error: ", train_mse)

test_mse = simple_NN.mean_squared_error(test_imgs, test_labels)
print("Testing mean square error: ", test_mse)

accuracy train:  0.71625
accuracy test:  0.7096

Training confusion matrix: 
 [[4226    4   24   41    4    5  940    0    8    4]
 [  36 5666   19  124  143    0   28    0    5    0]
 [ 465   76 5123  232 2260    3 3935    0  238    2]
 [1177  204  109 5534  395    3  630    0   31    1]
 [  26   46  699   47 3180    2  342    0   21    1]
 [   2    0    1    0    0 5523    1 2551   83  770]
 [  20    0    3    3    2    0   43    0    0    0]
 [   0    0    0    0    0   72    0 3007    7   97]
 [  39    4   21   18   16  103   73   30 5593   45]
 [   9    0    1    1    0  289    8  412   14 5080]]

Testing confusion matrix: 
 [[687   1   5  14   0   1 147   0   1   1]
 [  7 947   2  28  22   0   6   0   1   0]
 [ 84  21 855  35 352   1 644   0  46   0]
 [202  25  18 915  60   1 131   0   4   0]
 [  4   6 114   7 561   0  50   0   2   0]
 [  0   0   0   0   0 895   0 436  21 149]
 [  4   0   0   0   0   0   7   0   0   0]
 [  0   0   0   0   0  19   0 482   1  18]
 [  8   0   6   1 