In [31]:
import sys
import numpy as np
import matplotlib as plt
from nnfs.datasets import spiral_data

## Dataset Creation

In [32]:
#Create dataset with 100 data points for 3 classes of spirals
#Results in 300 total data points for classification
X, y = spiral_data(100, 3)

## Hidden Layers

In [33]:
#Creating hidden layers
class Layer_Dense:
    #Initialize using the number of inputs and neurons
    def __init__(self, n_inputs, n_neurons):
        #Create an array of size (n_inputs x n_neurons) that are random based on the normal distrution
        #scaled by a tenth
        self.w = 0.1 * np.random.randn(n_inputs, n_neurons)
        #Create a row vector based on the number of neurons
        self.bias = np.zeros([1, n_neurons])
    #Method to compute the output, takes in an input matrix
    def forward(self, inputs):
        self.output = inputs@self.w + self.bias

## Activation function

In [34]:
#Implementation of ReLU function        
class Activation_ReLU:
    def forward(self, inputs):
        self.output = np.maximum(0, inputs)
        
class Activation_Softmax:
    def forward(self, input):
        exp_val = np.exp(input - np.max(input, axis=1, keepdims=True))
        probability = exp_val/np.sum(exp_val, axis=1, keepdims=True)
        
        self.output = probability

The Rectified Linear Activation function is a piecewise function that  can be demonstrated as the following:

\begin{align*}
    &\text{If } x>0,     &  x\\
    &\text{If } x\leq 0, &  0
\end{align*}

This is important because it is a fast compututation that allows for the fitting of a non-linear signal given several neurons. This achieved by using a combination of the soft clipping for negative values and linear maping for natural number.

The second activation function is to create normalization through the Softmax function. This can be displayed as the following:

\begin{equation*}
    S_{i,j} = \frac{e^{z_{i,j}}}{\sum_{l=1}^{L}e^{z_{i,j},l}}
\end{equation*}

This computes a quotient of the elements of the matrix z as powers of e and the sum of each of the elements of the same matrix z as the powers of e.

## 1st Hidden Layer

In [35]:
#Create the first hidden layer with 2 input and 3 output features
layer1 = Layer_Dense(2,3)
activation1 = Activation_ReLU()
layer1.forward(X)
activation1.forward(layer1.output)

This is creating 3 neurons that take in 2 inputs. This layer is layer is forward propogated with inputs $x$, weights $w$, and bias $b$ for each neuron in the model. The relationship of the output $o_1$ can shown as the following:

\begin{equation*}
    o_1  =  x^{T} w + b
\end{equation*}

The activation of a neuron is obtained by using the ReLU activation function, described above.

## 2nd Hidden Layer

In [36]:
#Create second hidden layer with 3 input and 3 output features
layer2 = Layer_Dense(3,3)
activation2 = Activation_Softmax()
layer2.forward(activation1.output)
activation2.forward(layer2.output)

This second hidden layer is created with 3 neurons taking three inputs and results in three outputs. The input of this layer is found by taking the output of the first hidden layer, $o_1$. The forward propogation with weights $w$, bias $b$, and output $o_2$ can be shown as the following:

\begin{equation*}
    o_2 = o_1^T w + b
\end{equation*}

This is the result of our last layer. Unlike the previous the usage of the softmax activation function is used to modify the data, so the output is between zero and one, $0 \leq o_1 \leq 1$. And the sum of the outputs is equal to one, $\sum_{i=1}^n o_1 = 1$.

This is important in the last layer because it allows for a probability associated with the activation of neurons in the last layer. This is beneficial compared to the ReLU activation function because it does not lose data. The ReLU function can be detrimental in this stage for a classification model because it clips the negati

In [37]:
print(activation2.output.shape)

(300, 3)
