# Dev notebook

I use this notebook to develop my implementation of a simple and minimal neural network framework.

Inspiration for this network is drawn from the book [Neural Networks from Scratch (NNFS)](https://nnfs.io/) & [Pytorch's implementation](https://pytorch.org/)

Notes:
- The work on experimenting with backpropagation has been moved to another notebook to keep this one minimal
- I have chosen to use a similar naming convention to that used by pytorch (why reinvent the wheel), this has the benefit of ensuring that when we compare implementations the architecture of the networks is the same. 

In [1]:
# Import for dev and testing
import nnfs 
import numpy as np

import matplotlib.pyplot as plt

from abc import ABC, abstractmethod
from nnfs.datasets import spiral_data

nnfs.init()

## Defining Base Module

This is a module that contains all the base attributes and function needed by every class in the framework

In [2]:
class Module(ABC):
    """Base class for all classes in frame work to ensure the same attributes and common function names."""

    def __init__(self) -> None:
        # Attributes to hold input and outputs
        self.input = None
        self.output = None
    
    @abstractmethod
    def forward(self):
        pass

    @abstractmethod
    def backward(self):
        pass

##  Defining layers 

In this section We define and test the layers

Notes:
- since we intend to use ReLU as one of our activation functions we will use the He weight initialization method as described in https://arxiv.org/abs/1502.01852

### Linear Layer

In [3]:
class LinearLayer(Module):
    """Linear transformation layer of the type o = ixW + b,
    
    where I is the incoming vector, W is the layers weight matrix, b is bias vector and o is the dot product of the 
    i and W plus the bias
    
    Args:
        in_features (int): the size of the input features 
        out_features (int): the size of the output features
        
    Attributes:
        weights (np_array) numpy array of in_features x n_neurons
        biases  (np_array) numpy array of 1 x n_neurons
        inputs  (np_array) numpy array of latest batch of inputs
        inputs  (np_array) numpy array of latest batch of outputs
        d_w     (np_array) The current gradients with respect to the weights 
        d_x     (np_array) The current gradients with respect to the inputs
        d_b     (np_array) The current gradients with respect to the biases
    """

    def __init__(self, in_features, out_features) -> None:
        super().__init__()
        # initializing weights and biases 
        #self.weights = np.random.normal(0.0, np.sqrt(2/in_features), (in_features, out_features))
        # Using a simpler initialization  for testing 
        self.weights = 0.01 * np.random.randn(in_features, out_features)
        self.bias = np.zeros((1, out_features))

    def forward(self, inputs):
        # Saving inputs for backward step
        self.input = inputs
        self.output = np.dot(inputs, self.weights) + self.bias
        return self.output

    def backward(self, d_vals):
        """Backpropagation  of the linear function

        Args:
            d_vals (np_array) array of derivatives from the previous layer/function.
        """
        self.d_w = np.dot(self.input.T, d_vals)
        self.d_x = np.dot(d_vals, self.weights.T)
        self.d_b = np.sum(d_vals, axis=0, keepdims=True)

#### Testing Linear Layer

In [4]:
# The sample data is a list of coordinate, ie two points 
# The layer therefore will take 2 inputs 
# We have given it 3 out features (3 neurons) so we expect to see an output with the shape (n_samples*n_neurons, n_neurons)
# In out case that should be (300, 3) 
X, _ = spiral_data(samples=100, classes=3)
linear1 = LinearLayer(2, 3)
output = linear1.forward(X)
print(output[:10, :])

assert output.shape == (300,3)


[[ 0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [-1.0475188e-04  1.1395361e-04 -4.7983500e-05]
 [-2.7414842e-04  3.1729150e-04 -8.6921798e-05]
 [-4.2188365e-04  5.2666257e-04 -5.5912682e-05]
 [-5.7707680e-04  7.1401405e-04 -8.9430439e-05]
 [-3.5430698e-04  3.5025488e-04 -2.3363481e-04]
 [-8.9267001e-04  1.0767876e-03 -1.9453237e-04]
 [-9.3350781e-04  1.0723802e-03 -3.1227397e-04]
 [-1.1243758e-03  1.3112801e-03 -3.3629674e-04]
 [-1.3386955e-03  1.6200906e-03 -2.8101794e-04]]


## Defining Activation Functions

### ReLu

$$y = \begin{cases}
   x &x> 0 \\
   0 & otherwise
\end{cases} $$

In [5]:
class ReLU(Module):
    """Applies Rectified linear Unit function to vector."""
    def __init__(self) -> None:
        # initializing attributes needed for backwards 
        super().__init__()
        self.d_relu = None
    
    def forward(self, x):
        # storing inputs needed for backwards 
        self.inputs = x
        self.output = np.maximum(x, 0)
        return self.output
    
    def backward(self, d_vals):
        self.d_relu = d_vals.copy()
        self.d_relu[self.inputs <= 0] = 0

#### Testing ReLU

In [6]:
i = [-2, 3, 4, 0, 0.1, -44]
test_relu = ReLU()

# Checking values are as expected 
assert np.all(np.array([0., 3, 4, 0, 0.1, 0.]) == test_relu.forward(i))

test_relu.output

array([0. , 3. , 4. , 0. , 0.1, 0. ])

### Softmax

$$\text{softmax}(x)_i = \frac{exp(x_i)}{\sum_{j}^{ }exp(x_j))}$$

The soft max represents the confidence score for each output class and adds up to 1.

In [7]:
class Softmax(Module):
    """Applies Softmax function to input matrix."""

    def __init__(self) -> None:
        super().__init__()
        self.confidence_scores = None

    def forward(self, x):
        # exponenets of each value
        exp_vals = np.exp(x - np.max(x, axis=1, keepdims=True))
        exp_sum = np.sum(exp_vals, axis=1, keepdims=True)
        # Normalization to get the proabilities 
        self.output = exp_vals/exp_sum
        return self.output

    def _backward(self, d_vals):
        # Initialize array for gradients wrt to inputs
        self.d_soft = np.zeros_like(d_vals)
        
        _iter = enumerate(zip(self.output, d_vals))
        for i, conf_score, d_val in _iter:
            # Flatten confidence scores
            cs = conf_score.reshape(-1, 1)
            # Find the Jacobian matrix of the output 
            j_matrix = np.diagflat(cs) - np.dot(cs, cs.T)
            # get the gradient 
            self.d_soft[i] = np.dot(j_matrix, d_val)
    
    def backward(self, y_pred, y_true):
        """Does a the combined backward pass for CCE & Softmax as a single, faster step."""
        # Number of examples in the batch
        n = len(y_pred)

        # Getting descrete vals from one hot encoding 
        y_true = np.argmax(y_true, axis=1)
        
        self.d_soft = y_pred.copy()
        self.d_soft[range(n), y_true] -= 1
        self.d_soft = self.d_soft / n
        return self.d_soft


#### Testing Softmax

In [8]:
softmax = Softmax()
softmax.forward([[1,2,44]])

array([[2.11513104e-19, 5.74952226e-19, 1.00000000e+00]])

## Defining Loss - Categorical Cross-Entropy

$$ L_i = -\sum_j y_{i,j}\log(\hat{y}_{i,j}) $$

With taking one hot encoding into account we can simplify this down to:

$$ L_i = -y_{i,k}\log(\hat{y}_{i,k}) $$

where K is the index of the correct class

In [9]:
class CategoricalCrossEntropyLoss:
    """Calculates the CCE loss for a given set of predictions.
    This method expect a softmax output and one-hot encoded label mask
    
    y_pred (np_array): matrix of confidence scores of the prediction
    y_true (np_array): matrix of one-hot encoded true lables of the classes
    """
    def forward(y_pred, y_true):
        # Clipping and applying one hot encoded labels as mask 
        # to zero out scores corresponding to incorrect classes
        # We clip to make sure that none of the reaming classes are 0 or 
        # exactly 1 
        clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)
        corrected = np.sum(clipped*y_true, axis=1)
        # Taking the -ve log of the remaining confidence scores 
        negative_log = -np.log(corrected)
        return np.mean(negative_log)

    def backward(y_pred, y_true):
        """Backpropagation  of the CCE Loss

        Args:
            y_pred (np_array) array of predictions.
            y_true (np_array) array of correct labels.
        """
        return (-y_true/y_pred)/len(y_pred)

#### Testing CCE Loss

In [10]:
y_pred = np.array([[0.7, 0.1, 0.2], [0.1,0.5,0.4],[0.02,0.9,0.08]])
y_true = np.array([[1,0,0], [0,1,0], [0,1,0]])

loss_function = CategoricalCrossEntropyLoss
loss_function.forward(y_pred, y_true)

0.38506088005216804

## Defining Optimizers

### Stochastic Gradient Decent 

$$ \text{Update} = -\text{Learning Rate} \cdot \text{Gradient}$$

In [18]:
class SDG:
    """Stochastic Gradient Decent class used to update layer paramers
    The update is the -ve learning rate multiplied by the gradient calculated in the backward step.

    Attr:
        lr (float) Learning rate to scale the gradients by for the update
    """

    def __init__(self, learning_rate, decay) -> None:
        self.lr = learning_rate
        self.decay = decay
    
    def update(self, layers):
        """Update a layers parameters.
        """
        # pre update step

        for layer in layers:
            if type(layer) == LinearLayer:
                layer.weights += -self.lr*layer.d_w
                layer.bias += -self.lr*layer.d_b
            else:
                raise NotImplementedError(f'SDG does not support {layer.__class__}')

## Defining Utility functions

### One-hot encoding function 

In [12]:
def one_hot_encode_index(y, n):
    return np.eye(n)[y]

#### Testing one hot masker

In [13]:
n=3
y_test = np.array([0,1,2, 1, 2])

one_hot_encode_index(y_test, n)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)

### Define Accuracy 

In [14]:
def accuracy(y_pred, y_true):
    """Calculates the accuracy of a batch of predictions"""
    return np.mean(np.argmax(y_pred, axis=1) == np.argmax(y_true, axis=1))

## Integration Testing 

### Data initialization 

In [15]:
X, y = spiral_data(samples=100, classes=3)
y = one_hot_encode_index(y, 3)

### Network Setup

In [21]:
# Initializing Network Components 
relu = ReLU()
softmax = Softmax()
cce_loss = CategoricalCrossEntropyLoss
optimizer = SDG(0.85, 1e-2)
linear1 = LinearLayer(2, 64)
linear2 = LinearLayer(64, 3)

update_layers = [linear1, linear2]

n_epochs = 10000

for epoch in range(n_epochs + 1):
    # Forward Pass
    linear1.forward(X) 
    relu.forward(linear1.output)
    linear2.forward(relu.output)
    y_pred = softmax.forward(linear2.output)

    # Calculating loss and accuracy 
    loss = cce_loss.forward(y_pred, y)
    acc = accuracy(y_pred, y) 

    # Printing results
    if not epoch % 100:
        print(f"Epoch:{epoch}, Loss:{loss:.3f}, accuracy:{acc:.3f}")

    # Backward pass
    softmax.backward(y_pred, y)
    linear2.backward(softmax.d_soft)
    relu.backward(linear2.d_x)
    linear1.backward(relu.d_relu)

    # Optimization Step
    optimizer.update(update_layers)


    

Epoch:0, Loss:1.099, accuracy:0.300
Epoch:100, Loss:1.078, accuracy:0.447
Epoch:200, Loss:1.066, accuracy:0.457
Epoch:300, Loss:1.062, accuracy:0.463
Epoch:400, Loss:1.060, accuracy:0.463
Epoch:500, Loss:1.060, accuracy:0.453
Epoch:600, Loss:1.058, accuracy:0.460
Epoch:700, Loss:1.057, accuracy:0.450
Epoch:800, Loss:1.054, accuracy:0.460
Epoch:900, Loss:1.049, accuracy:0.450
Epoch:1000, Loss:1.040, accuracy:0.453
Epoch:1100, Loss:1.047, accuracy:0.430
Epoch:1200, Loss:1.042, accuracy:0.430
Epoch:1300, Loss:1.037, accuracy:0.443
Epoch:1400, Loss:1.030, accuracy:0.443
Epoch:1500, Loss:1.026, accuracy:0.453
Epoch:1600, Loss:1.023, accuracy:0.450
Epoch:1700, Loss:1.020, accuracy:0.453
Epoch:1800, Loss:1.018, accuracy:0.453
Epoch:1900, Loss:1.017, accuracy:0.430
Epoch:2000, Loss:1.029, accuracy:0.403
Epoch:2100, Loss:1.050, accuracy:0.463
Epoch:2200, Loss:1.019, accuracy:0.407
Epoch:2300, Loss:1.002, accuracy:0.430
Epoch:2400, Loss:1.022, accuracy:0.500
Epoch:2500, Loss:1.022, accuracy:0.47