# IN3063 Mathematics and Programming for AI
## Resit Coursework

### 1. Introduction
This program is a simple Neural Network that evaluates the accuracy of a training set and testing set, with the implementation of three kinds of activation methods (Sigmoid, ReLu, LeakyReLu), Softmax and Inverted Dropout.

This program is made using Python 3 through Jupyter Notebook IDE. 

### 2. Instructions

Best way to run this program is to restart the whole kernel, as the cells follow a sequential routine.

For re-running the program without restarting the whole kernel, after the initial run, run the cells from Loading Data and Training NN and below.

### 3. Import Libraries

Necessary libraries that are used for the Neural Network

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import gzip
import shutil
import pandas as pd
import zipfile as zp
import pickle
import os
import random
from scipy.stats import truncnorm
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker

ModuleNotFoundError: No module named 'numpy'

### 4. Extracting Data and Pickling

Extract compressed datasets from Zip Files and then proceed to package the data together into a pickle (.pkl) file for easier extraction of data.

###### Make sure to run this at least once before running the below cells

In [2]:
# with open('t10k-images-idx3-ubyte', 'rb') as f_in:
#     with gzip.open('t10k-images-idx3-ubyte.gz', 'wb') as f_out:
#         shutil.copyfileobj(f_in, f_out)

num_of_labels = 10

# Assign Zip Files to variable
zf = zp.ZipFile('fashion-mnist_train.zip')
zf1 = zp.ZipFile('fashion-mnist_test.zip')

# Load CSV files from Zip Files
train_data = np.loadtxt(zf.open('fashion-mnist_train.csv'), delimiter=',')
test_data = np.loadtxt(zf1.open('fashion-mnist_test.csv'), delimiter=',')

# Display images from 1 to 10
# for i in range(10):
#     img = train_imgs[i].reshape((28,28))
#     plt.imshow(img, cmap="Greys")
#     plt.show()

# Map image data values into intervals [0.01, 0.99]
fac = 0.99 / 255
add_fac = 0.01
train_imgs = np.asfarray(train_data[:, 1:]) * fac + add_fac
test_imgs = np.asfarray(test_data[:, 1:]) *fac + add_fac
train_labels = np.asfarray(train_data[:, :1])
test_labels = np.asfarray(test_data[:, :1])

lr = np.arange(num_of_labels)
# transform labels into one hot representation
train_labels_one_hot = (lr==train_labels).astype(np.float)
test_labels_one_hot = (lr==test_labels).astype(np.float)
# we don't want zeroes and ones in the labels neither:
train_labels_one_hot[train_labels_one_hot==0] = 0.01
train_labels_one_hot[train_labels_one_hot==1] = 0.99
test_labels_one_hot[test_labels_one_hot==0] = 0.01
test_labels_one_hot[test_labels_one_hot==1] = 0.99

# Create Pickle file from previous data
with open(os.path.join(".","pkl_fashionmnist.pkl"), "bw") as fh:
    data = (train_imgs, 
            test_imgs, 
            train_labels,
            test_labels,
            train_labels_one_hot,
            test_labels_one_hot)
    pickle.dump(data, fh)


### 5. Activation Functions, Softmax, and NN Class
#### Task 1, 2, 3, 4

We have the three activation functions: sigmoid, relu, and leakyrelu. Softmax function necessary for Task 3. Neural Network class with selectable activation functions.


In [11]:
# Activation Functions
def relu(x):
    return np.maximum(0.0, x)

def drelu(x):
    row = len(x)
    column = len(x[0])
    
    for r in range(row):
        for c in range(column):
            if x[r, c]:
                return 0
            else:
                return 1
            
def softmax(x):
    assert len(x.shape) == 2
    s = np.max(x, axis=1)
    s = s[:, np.newaxis] # necessary step to do broadcasting
    e_x = np.exp(x - s)
    div = np.sum(e_x, axis=1)
    div = div[:, np.newaxis] # dito
    return e_x / div

def leaky_relu(x):
    _x = x.copy()
    _x[x < 0] = _x[x < 0] * 0.01
    return _x

def dleaky_relu(x):
    _x = x.copy()
    row = len(_x)
    column = len(_x[0])
    
    for r in range(row):
        for c in range(column):
            if _x[r, c] < 0:
                _x[r, c] = _x[r, c] * 0.01
                
    return _x
            
@np.vectorize
def sigmoid(x):
    return 1 / (1 + np.e ** -x)

def dsigmoid(x):
    output = 1/(1+np.e ** -x)
    return output * (1 - output)

def truncated_normal(mean=0, sd=1, low=0, upp=10):
    return truncnorm((low - mean) / sd, 
                     (upp - mean) / sd, 
                     loc=mean, 
                     scale=sd)
class NeuralNetwork:
    def __init__(self, 
                 no_of_in_nodes, 
                 no_of_out_nodes, 
                 no_of_hidden_nodes,
                 activation_function,
                 learning_rate,
                 bias=None,
                 active_input_percentage=0.70,
                 active_hidden_percentage=0.70
                ):  
        self.no_of_in_nodes = no_of_in_nodes
        self.no_of_out_nodes = no_of_out_nodes       
        self.no_of_hidden_nodes = no_of_hidden_nodes          
        self.learning_rate = learning_rate 
        self.bias = bias
        self.active_input_percentage=active_input_percentage
        self.active_hidden_percentage=active_input_percentage
        
        if activation_function == 'sigmoid':
            self.activation = sigmoid
            self.dactivation = dsigmoid
            
        if activation_function == 'softmax':
            self.activation = sigmoid
            self.dactivation = softmax
        
        if activation_function == 'relu':
            self.activation = relu
            self.dactivation = drelu
            
        if activation_function == 'leakyrelu':
            self.activation = leaky_relu
            self.dactivation = dleaky_relu
            
        self.create_weight_matrices()
        
        
    def create_weight_matrices(self):
#         """ A method to initialize the weight matrices of the neural network"""
#         rad = 1 / np.sqrt(self.no_of_in_nodes)
#         X = truncated_normal(mean=0, 
#                              sd=1, 
#                              low=-rad, 
#                              upp=rad)
#         self.wih = X.rvs((self.no_of_hidden_nodes, 
#                                        self.no_of_in_nodes))
#         rad = 1 / np.sqrt(self.no_of_hidden_nodes)
#         X = truncated_normal(mean=0, 
#                              sd=1, 
#                              low=-rad, 
#                              upp=rad)
#         self.who = X.rvs((self.no_of_out_nodes, 
#                                         self.no_of_hidden_nodes))

        bias_node = 1 if self.bias else 0
        n = (self.no_of_in_nodes + bias_node) * self.no_of_hidden_nodes
        X = truncated_normal(mean=2, sd=1, low=-0.5, upp=0.5)
        self.wih = X.rvs(n).reshape((self.no_of_hidden_nodes, 
                                                   self.no_of_in_nodes + bias_node))
        n = (self.no_of_hidden_nodes + bias_node) * self.no_of_out_nodes
        X = truncated_normal(mean=2, sd=1, low=-0.5, upp=0.5)
        self.who = X.rvs(n).reshape((self.no_of_out_nodes, 
                                                    (self.no_of_hidden_nodes + bias_node)))
        
    def dropout_weight_matrices(self):
        # restore wih array, if it had been used for dropout
        self.wih_orig = self.wih.copy()
        self.no_of_in_nodes_orig = self.no_of_in_nodes
        self.no_of_hidden_nodes_orig = self.no_of_hidden_nodes
        self.who_orig = self.who.copy()
        
        active_input_nodes = int(self.no_of_in_nodes * self.active_input_percentage)
        active_input_indices = sorted(random.sample(range(0, self.no_of_in_nodes), 
                                      active_input_nodes))
        active_hidden_nodes = int(self.no_of_hidden_nodes * self.active_hidden_percentage)
        active_hidden_indices = sorted(random.sample(range(0, self.no_of_hidden_nodes), 
                                       active_hidden_nodes))
        
        self.wih = self.wih[:, active_input_indices][active_hidden_indices]       
        self.who = self.who[:, active_hidden_indices]
        
        self.no_of_hidden_nodes = active_hidden_nodes
        self.no_of_in_nodes = active_input_nodes
        return active_input_indices, active_hidden_indices
    
    def weight_matrices_reset(self, 
                              active_input_indices, 
                              active_hidden_indices):
        
        """
        self.wih and self.who contain the newly adapted values from the active nodes.
        We have to reconstruct the original weight matrices by assigning the new values 
        from the active nodes
        """
 
        temp = self.wih_orig.copy()[:,active_input_indices]
        temp[active_hidden_indices] = self.wih
        self.wih_orig[:, active_input_indices] = temp
        self.wih = self.wih_orig.copy()
        self.who_orig[:, active_hidden_indices] = self.who
        self.who = self.who_orig.copy()
        self.no_of_in_nodes = self.no_of_in_nodes_orig
        self.no_of_hidden_nodes = self.no_of_hidden_nodes_orig
        
    
    def train_single(self, input_vector, target_vector):
        """ 
        input_vector and target_vector can be tuple, list or ndarray
        """
         # Forward Pass
        if self.bias:
            # adding bias node to the end of the input_vector
            input_vector = np.concatenate( (input_vector, [self.bias]) )
        input_vector = np.array(input_vector, ndmin=2).T
        target_vector = np.array(target_vector, ndmin=2).T
        output_vector1 = np.dot(self.wih, input_vector)
        output_vector_hidden = self.activation(output_vector1)
        
        # Backward Pass
        if self.bias:
            output_vector_hidden = np.concatenate( (output_vector_hidden, [[self.bias]]) )
        
        output_vector2 = np.dot(self.who, output_vector_hidden)
        output_vector_network = self.dactivation(output_vector2)
        
        output_errors = target_vector - output_vector_network
        # update the weights:
        # inverted dropout
        tmp = output_errors * output_vector_network * (1.0 - output_vector_network) / self.active_input_percentage  
        tmp = self.learning_rate  * np.dot(tmp, output_vector_hidden.T)
        self.who += tmp
        # calculate hidden errors:
        hidden_errors = np.dot(self.who.T, output_errors)
        # update the weights:
        tmp = hidden_errors * output_vector_hidden * (1.0 - output_vector_hidden) / self.active_hidden_percentage
        if self.bias:
            x = np.dot(tmp, input_vector.T)[:-1,:] 
        else:
            x = np.dot(tmp, input_vector.T)
        self.wih += self.learning_rate * x
        
    def train(self, data_array, 
              labels_one_hot_array,
              epochs=1,
              active_input_percentage=0.70,
              active_hidden_percentage=0.70,
              no_of_dropout_tests = 10):
        partition_length = int(len(data_array) / no_of_dropout_tests)
        
        for epoch in range(epochs):
            print("epoch: ", epoch)
            for start in range(0, len(data_array), partition_length):
                active_in_indices, active_hidden_indices = \
                           self.dropout_weight_matrices()
                for i in range(start, start + partition_length):
                    self.train_single(data_array[i][active_in_indices], 
                                     labels_one_hot_array[i]) 
                    
                self.weight_matrices_reset(active_in_indices, active_hidden_indices)        
            
    def confusion_matrix(self, data_array, labels):
        cm = {}
        for i in range(len(data_array)):
            res = self.run(data_array[i])
            res_max = res.argmax()
            target = labels[i][0]
            if (target, res_max) in cm:
                cm[(target, res_max)] += 1
            else:
                cm[(target, res_max)] = 1
        return cm
        
    
    def run(self, input_vector):
        # input_vector can be tuple, list or ndarray
        
        # Forward Pass
        if self.bias:
            # adding bias node to the end of the input_vector
            input_vector = np.concatenate( (input_vector, [self.bias]) )
        input_vector = np.array(input_vector, ndmin=2).T
        #wih = self.wih * self.active_input_percentage
        
        output_vector = np.dot(self.wih, input_vector)
        output_vector = self.activation(output_vector)
        
        # Backward Pass
        if self.bias:
            output_vector = np.concatenate( (output_vector, [[self.bias]]) )
        #who = self.who * self.active_hidden_percentage    
        
        output_vector = np.dot(self.who, output_vector)
        output_vector = self.dactivation(output_vector)
        
        
        return output_vector
    
    def evaluate(self, data, labels):
        corrects, wrongs = 0, 0
        for i in range(len(data)):
            res = self.run(data[i])
            res_max = res.argmax()
            if res_max == labels[i]:
                corrects += 1
            else:
                wrongs += 1
        return corrects, wrongs
    
     def plot_graph(self):
        fig = plt.figure()
        ax = fig.add_axes([0, 0, 1, 1])
        ax.use_sticky_edges = False
        ax.scatter(self.data_plots_epoch, self.data_plots_train, color = 'red', label = 'train')
        ax.scatter(self.data_plots_epoch, self.data_plots_test, color = 'blue', label = 'test')
        ax.set_xlabel('Epochs')
        ax.set_ylabel('Accuracy')

        loc = plticker.MultipleLocator(base=1.0)
        ax.xaxis.set_major_locator(loc)
        loc = plticker.MultipleLocator(base=0.1)
        ax.yaxis.set_major_locator(loc)

        ax.set_ylim(0,1)
        for i,j in zip(self.data_plots_epoch, self.data_plots_train):
            ax.annotate(str(j),xy=(i + 0.01,j + 0.04))

        for i,j in zip(self.data_plots_epoch, self.data_plots_test):
            ax.annotate(str(j),xy=(i + 0.01,j + 0.04))

        title_act = ''

        if self.activation_function == Sigmoid:
            title_act = 'Sigmoid'

        if self.activation_function == relu:
            title_act = 'ReLU'

        if self.activation_function == leaky_relu:
            title_act = 'Leaky ReLU'

        ax.set_title(self.ds_name + ' Accuracy using ' + title_act)
        
        plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
       
        plt.show()


### 6. Loading data and Training NN

Load the data from the pickle file, initialise the Neural Network with the desired arguments and then train them.

Notable arguments:

    no_of_in_nodes - original image data size (28x28) being passed through equals input size
    no_of_out_nodes - desired amount of output nodes for the data
    no_of_hidden_nodes - desired number of hidden nodes to process the data
    learning_rate - the desired learning rate
    activation_function - the desired activation functions; 
        includes: Sigmoid 'sigmoid', ReLU 'relu', and Leaky ReLU 'leakyrelu'

To use different activation functions for the Neural Network, initialise a new NeuralNetwork class and set the value of the activation_function variable as 'sigoid' for Sigmoid, 'relu' for ReLu, and 'leakyrelu' for LeakyReLu.

For Softmax pass the activation_funciton variable, when initialising the Neural Network, to 'softmax'.

To use dropout, when running the Train change the ative_input_percentage, active_hidden_percentage, and no_of_dropout_tests accordingly.

In [12]:
# Load data from Pickle file
with open(os.path.join(".","pkl_fashionmnist.pkl"), "br") as fh:
    data = pickle.load(fh)
train_imgs = data[0]
test_imgs = data[1]
train_labels = data[2]
test_labels = data[3]
train_labels_one_hot = data[4]
test_labels_one_hot = data[5]

img_size = 28 # dimensions
num_of_labels = 10 # 0, 1, 2, ... 9
image_pixels = img_size * img_size

epochs = 10

simple_network = NeuralNetwork(no_of_in_nodes = image_pixels, 
                               no_of_out_nodes = 10, 
                               no_of_hidden_nodes = 100,
                               activation_function = 'sigmoid',
                               learning_rate = 0.1)
    
 
simple_network.train(train_imgs, 
                     train_labels_one_hot, 
                     active_input_percentage=1,
                     active_hidden_percentage=1,
                     no_of_dropout_tests = 100,
                     epochs=epochs)

corrects, wrongs = simple_network.evaluate(train_imgs, train_labels)
print("accruracy train: ", corrects / ( corrects + wrongs))
corrects, wrongs = simple_network.evaluate(test_imgs, test_labels)
print("accruracy: test", corrects / ( corrects + wrongs))

epoch:  0
60000
60000
0
1
2
3
4
5
6


