Success criteria:
· Must have distinct classes for fully connected and activation layers
· Must be trainable to solve simple problems i.e XOR gates
· Must have the capability to show users % error each training epoch

Analysis:
The system will use forward and backward propagation to adjust weights and biases of the overall network. These functions will be implemented by using the Numpy library to access the operations necessary to do so (dot product and matrix transposition). The network will be limited in its scope ude to only containing FC and activation layers, lacking convolutional layers makes the networks unsuitable for image processing, as data would have to be highly altered in order to be processed by the network

Design:
As above, the network will need FC and activation layers, requiring the implementation of forward and back propagation algorithms. Code produced will be primarily object oriented to allow multiple layers to be managed easily when constructing the network overall to solve basic problems. To make the code easier to maintain files will.

The code:

In [None]:
class Layer:
    def __init__(self):
        self.inp = None
        self.output = None

    # computes the output given input X
    def forward_prop(self, input):
        pass

    # computes dE/dX given dE/dY 
    def backward_prop(self, output_error, learningr):
        pass

First is the basic layer class that other layers will all inherit from, it has empty input/output variables and forward/backward propagation functions, which will be altered depending on the type of layer they are used in.

In [None]:
from layer import Layer

# inherit from basic class Layer
class ActivationLayer(Layer):
    def __init__(self, activation, activation_prime):
        self.activation = activation
        self.activation_prime = activation_prime

    # passes input through activation function
    def forward_prop(self, input_data):
        self.inp = input_data
        self.output = self.activation(self.inp)
        return self.output

    # Returns dE/dX for a known dE/dY.
    def backward_prop(self, output_error, learningr):
        return self.activation_prime(self.inp) * output_error

The second class made is the activation layer, which inherits from the basic layer class. it is initiallised with an activation function and its derivative. for this code Tanh is the activation function used however any sigmoid function can be used instead. The backwards propagation function is simple here as dE/dX = dE/dY * f'(x)

In [None]:
from layer import Layer
import numpy as np

# inherit from base class Layer
class FCLayer(Layer):
    def __init__(self, input_size, output_size):
        self.weights = np.random.rand(input_size, output_size) - 0.5
        self.bias = np.random.rand(1, output_size) - 0.5

    def forward_prop(self, input_data):
        self.inp = input_data
        self.output = np.dot(self.inp, self.weights) + self.bias
        return self.output

    def backward_prop(self, output_error, learnr):
        input_error = np.dot(output_error, self.weights.T)
        weights_error = np.dot(self.inp.T, output_error)
        

        # update parameters
        self.weights -= learnr * weights_error
        self.bias -= learnr * output_error
        return input_error

Third class defines the fully connected layer and propogation methods, computing Input.weight + bias = output, and calculating input error using the transposed matrix of weights dotted with the output error. Weight error is calculated by dotting transposed inputs and the output error. Variables are also updated in the backwards prop function (weight and bias).

In [None]:
import numpy as np

# loss function and its derivative
def mse(y_true, y_pred):
    return np.mean(np.power(y_true-y_pred, 2))

def mse_prime(y_true, y_pred):
    return 2*(y_pred-y_true)/y_true.size

functions to calculate error using mean squared error method, though this could be substitued for another method in the network class

In [None]:
class Network:
    def __init__(self):
        self.layers = []
        self.loss = None
        self.loss_prime = None

    # add layer 
    def add(self, layer):
        self.layers.append(layer)

    # loss select
    def use(self, loss, loss_prime):
        self.loss = loss
        self.loss_prime = loss_prime

    # predict output for given input
    def predict(self, input_data):
        
        set = len(input_data)
        result = []

        #run 
        for i in range(set):
            # forward propagation
            output = input_data[i]
            for eachlayer in self.layers:
                output = eachlayer.forward_prop(output)
            result.append(output)

        return result

    # train 
    def fit(self, x_train, y_train, epochs, learningr):
        
        samples = len(x_train)

        # training 
        for i in range(epochs):
            err = 0
            for j in range(samples):
                # forward propagation
                output = x_train[j]
                for layer in self.layers:
                    output = layer.forward_prop(output)

                # loss 
                err += self.loss(y_train[j], output)

                # backward propagation
                error = self.loss_prime(y_train[j], output)
                for layer in (self.layers)[::-1]:
                    error = layer.backward_prop(error, learningr)

            # calculate average error 
            err /= samples
            print('epoch %d/%d   error=%f' % (i+1, epochs, err))

Network class to build networks using previous classes and methods within, length of training and learning rate are decided in the fit method, where the network iterates over the layers, adjusting weights/biases through the object methods.Prints epoch and average error to the terminal.

Testing:
During testing the following dataset will be used
(0,0),(0,1),(1,0),(1,1)
  0  ,  1  ,  1  ,  0
 This is because the operation used is simple, allowing for logical pathways to simply and quickly be tested for errors.
 

TEST NETWORK:

In [None]:
import numpy as np

from network import Network
from fc_layer import FCLayer
from activation_layer import ActivationLayer
from housekeeping import tanh, tanh_prime
from losses import mse, mse_prime

# training data
x_train = np.array([[[0,0]], [[0,1]], [[1,0]], [[1,1]]])
y_train = np.array([[[0]], [[1]], [[1]], [[0]]])

# network
net = Network()
net.add(FCLayer(2, 3))
net.add(ActivationLayer(tanh, tanh_prime))
net.add(FCLayer(3, 1))
net.add(ActivationLayer(tanh, tanh_prime))

# train
net.use(mse, mse_prime)
net.fit(x_train, y_train, epochs=1000, learningr=0.1)

# test
out = net.predict(x_train)
print(out)

Initial code outputs:
    [array([[0.52692813]]), array([[0.52822712]]), array([[0.5136926]]), array([[0.51501981]])]
This indicated an error somewhere in the code, as all results would be returning 1(1sf), which indicated an error in the coding of the loss function.
This was resolved by changing an operation from ** to np.power().

Results after bug fix:
    [array([[0.0007457]]), array([[0.97984435]]), array([[0.97532576]]), array([[-0.0014717]])]
which gives an extremely close output to the expectations of the XOR gate.

Conclusion:
From the results of the code, it is suitable for training on simple problems in a short time, however as in the analysis it is unsuitable for some tasks such as image processing as only simple layers have been implemented. Additionally with more complex problems due to the lack of concurrency in program training which would make it take too long and not able to fully utilise any processor or GPU avaliable to it. 