# Backpropagation Neural Network

Backpropagation directs the weight changes down the gradient of steepest descent (hence Gradient Descent) of the error function and adjusts the weights according to a learning parameter which is set by the user.

This is possible because of the continuous nature and differentiability of the activation functions in the neurons.

## Momentum

## Bold Driver

## Annealing

## Weight Decay

## Learning Rate

Gradient of the error function

- $\frac{\delta E}{\delta w_{ij}}$ is the rate of change of Error $E$ with respect to a weight $w_{ij}$. This is the gradient of the error function.

## Updating weights

When updating weights, we are **subtracting** the gradient of the error function ($\times \rho$):

$$
w^*_{i,j} = w_{i,j} + \rho \delta_j u_i
$$


![Gradient of the Error function](figures/error-gradient.png)


Too small weights - stuck in local minima

- we start with random weights and biases

- searching multidiensional spaces result less often in being caught in local minima.

## When do we stop learning?


- We tend to stop learning when the error of an independent validation set increases.

- Every $x$ number of epochs, we test it against this unseen validation set.


### Symmetry in Weight Space

Network with $M$ hidden nodes exhbits symetry by a factor of $M!2^M$.

For example, if we have 3 hidden nodes, we will have $3!2^3 = 48$ the same global minima.

## Measuring Performance

### Mean Squared Error

$$
MSE = \frac{\sum(O - M)^2}{n}
$$

where:

- $O$ is the observed value
- $M$ is the modelled value
- $n$ is the number of example


In [None]:
from typing import Callable
import time

import numpy as np
import pandas as pd

In [None]:
def mean_squared_error(observed: np.ndarray, modelled: np.ndarray):
    """Calculate the Mean Squared Error.
    
    Args:
        observed: Array of the observed values.
        modelled: Array of the modelled values.
        
    Returns:
        The Mean Squared Error for the given arrays.
    """
    return np.sum(np.power(observed - modelled, 2)) / len(observed)

### Root Mean Squared Error

- no upper bound
- for perfect model, RMSE = 0
- records real unit

In [None]:
def rmse(observed: np.ndarray, modelled: np.ndarray):
    """Calculates the Root Mean Squared Error.
     
    Args:
        observed: Array of the observed values.
        modelled: Array of the modelled values.
        
    Returns:
        The Root Mean Squared Error for the given arrays.
    """
    return np.sqrt(np.sum(np.power(observed - modelled, 2)) / len(observed))
    

### Mean Squared relative Error

- error relative to the observed value.

In [None]:
def msre(observed: np.ndarray, modelled: np.ndarray):
    """Calculates the Mean Squared Relative Error.
     
    Args:
        observed: Array of the observed values.
        modelled: Array of the modelled values.
        
    Returns:
        The Mean Squared Relative Error for the given arrays.
    """
    return np.sum(np.power((modelled - observed) / observed, 2)) / len(observed)

### Coefficient of Efficiency

- +1 represents perfect model

In [None]:
def ce(observed: np.ndarray, modelled: np.ndarray):
    """Calculates the Coefficient of Efficiency.
     
    Args:
        observed: Array of the observed values.
        modelled: Array of the modelled values.
        
    Returns:
        The Coefficient of Efficiency for the given arrays.
    """
    return 1 - (np.sum(np.power(modelled - observed), 2) / np.sum(np.power(observed - np.mean(observed), 2)))

### R-Squared - Coefficient of Determination

- measures the coincidence of the shape
range from 0 ro 1

In [None]:
def rsqr(observed: np.ndarray, modelled: np.ndarray):
    """Calculates the Coefficient of Determination - .
     
    Args:
        observed: Array of the observed values.
        modelled: Array of the modelled values.
        
    Returns:
        The Mean Squared Error for the given arrays.
    """
    dividend = np.sum(np.multiply(observed - observed.mean(), modelled - modelled.mean()))
    divisor = np.sqrt(np.multiply(np.sum(np.power(observed - observed.mean(), 2)), np.sum(np.power(modelled - modelled.mean(), 2))))
    return np.power(dividend / divisor, 2)

## Number of Hidden Nodes

- It is not fixed
- The general rule of thumb is, having $n$ inputs, to try $\frac{n}{2}$ to $2n$ hidden nodes.

## Things to manipulate

- number of hidden nodes
- step size
- activation function (optionally)

In [None]:
class Perceptron():
    """Object representing perceptron with two inputs.

    Attributes:
        e: A training set.
        w0: Bias weight.
        w1: The weight of the first input.
        w2: The weight of the second input.
        epochs: Number of epochs before stabilisation.
    """
    def __init__(self, e, activation_function = lambda s: 1 if s > 0 else -1):
        '''Initialises Perceptron object.'''
        self.w0 = 0
        self.w1 = 0
        self.w2 = 0
        self.e = e
        self.activation_function = activation_function

    def train(self):
        """Trains perceptron."""
        self.epochs = 1
        stable = False
        while not stable:
            stable = True
            for example in self.e:
                print(example)
                if self.classify(example[1], example[2]) == example[3]:
                    pass
                else:
                    self.w0 += example[3] * example[0]
                    self.w1 += example[3] * example[1]
                    self.w2 += example[3] * example[2]
                    stable = False
            if not stable:
                self.epochs += 1

    def classify(self, x1, x2):
        """Classifies an object."""
        s = (self.w1 * x1) + (self.w2 * x2) + self.w0
        return self.activation_function(s)



## Activation Functions

- Every node on hidden layers and output layer has an activation function.



- We might have **linear activation function in the output layer** instead of using sigmoid or tanh.


### Sigmoid Function

$$
f(x) = \frac{1}{1 + e^{-x}}
$$

#### First Order Derivative

We also need to calculate the first order differential of the function.

$$
f'(x) = f(x)(1 - f(x))
$$

$$
f'(S_j) = u_j (1 - u_j)
$$

#### Delta Values

- $\delta_j = (C - u_O) f'(S_O)$ O is the output node.
- $\delta_j = w_{j,O} \delta_O f'(S_j)$ for hidden layer nodes.



#### Notes

Sigmoid func has gradient problem.


In [None]:
class Sigmoid:
    """Represents Sigmoid activation function.
    """
    
    def __init__(self):
        """Initialises a sigmoid object."""
        self.vectorised_func = np.vectorize(self.func)
        self.vectorised_der = np.vectorize(self.der)
        
    def func(self, x):
        """Calculates output of the Sigmoid function."""
        return 1 / (1 + np.e ** (-x))
    
    def der(self, x):
        """Calculates output of the derivative of the Sigmoid function.
        """
        return self.func(x) * (1 - self.func(x))

### Tanh

$$
tanh x = \frac{e^x - e^{-x}}{e^x + e^{-x}}
$$

#### First Order Derivative

$$
tanh' x = 1 - tanh^2 x
$$

$$
f'(S_j) = 1 - u^2_j
$$

In [None]:
class Tanh:
    """Represents tanh activation function.
    """
    
    def __init__(self):
        """Initialises a tanh object."""
        self.vectorised_func = np.vectorize(self.func)
        self.vectorised_der = np.vectorize(self.der)
    
    def func(self, x):
        """Calculates output of the tanh function."""
        return (np.e ** x - np.e ** (-x)) / (np.e ** x + np.e ** (-x))
    
    def der(self, x):
        """Calculates output of the derivative of the tanh function.
        """
        return 1 - self.func(x) ** 2

### ReLU - Rectified Linear Unit

$$
f(x) = \begin{cases}
    x & \text{if $x > 0$}\\
    0 & \text{otherwise}
\end{cases}
$$

#### First Order Derivative

In [None]:
"""Contains definition of a layer of a neural network.
"""

np.random.seed(0)


class Layer:
    """Layer of a neural network.
    
    Attributes:
        weights: set of weights of the layer.
        biases: set of biases of the layer.
        activation_function: Activation Function of the layer.
        number_of_inputs: Number of inputs coming to the layer.
        number_of_inputs: Number of inputs coming to the layer.
        output: the most recent output of the layer.
        previous_weights: Previous weights.
        previous_biases: Previous biases.
    """
    def __init__(self,
                 number_of_inputs: int,
                 number_of_neurons: int,
                 activation_function
                ):
        """Initialises a NeuralNetwork instance.
        """
        self.number_of_neurons = number_of_neurons
        self.activation_function = activation_function
        random_generator = np.random.default_rng(5)
        #NNFS self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        low = -2 / number_of_inputs
        high = 2 / number_of_inputs
        self.weights = random_generator.uniform(
            low=low,
            high=high,
            size=(number_of_inputs, number_of_neurons)
        )
        self.previous_weights = self.weights.copy()
        self.biases = random_generator.uniform(
            low=low,
            high=high,
            size=(1, number_of_neurons)
        )
        self.previous_biases = self.biases.copy()
        self.delta = np.nan
    
    def forward_pass(self, inputs: np.ndarray):
        """Does the forward pass through the layer.
        
        Args:
            inputs: Inputs to the layer.
        """
        self.sum = np.dot(inputs, self.weights) + self.biases
        self.output = self.activation_function.vectorised_func(self.sum)
        
    def update_weights(self, learning_parameter, inputs):
        """Updates the layer's weights.
        
        Args:
            learning_parameter: Learning parameter of the network.
            inputs: Inputs to the layer.
        """
        self.weights = self.weights + learning_parameter * np.dot(inputs.T, self.delta)
        self.biases = self.biases + learning_parameter * self.delta

In [None]:
class HiddenLayer(Layer):
    """Hidden Layer of a neural network.
    
    Attributes:
        weights: set of weights of the layer.
        biases: set of biases of the layer.
        activation_function: Activation Function of the layer.
        number_of_inputs: Number of inputs coming to the layer.
        number_of_neurons: Number of neurons in the layer.
        output: the most recent output of the layer.
        delta: Delta value (output of the backward pass) of the layer.
    """
    
    def backward_pass(self, output_weights, output_delta):
        """Does the backward pass through the layer.
        
        Args:
            y: Correct output (label) of the training example.
        """
        self.delta = output_delta * np.multiply(self.activation_function.vectorised_der(self.sum), output_weights.T)

In [None]:
class OutputLayer(Layer):
    """Single-node output layer of a neural network.
    
    Attributes:
        weights: set of weights of the layer.
        biases: set of biases of the layer.
        activation_function: Activation Function of the layer.
        number_of_inputs: Number of inputs coming to the layer.
        number_of_neurons: Number of neurons in the layer.
        output: the most recent output of the layer.
        delta: Delta value (output of the backward pass) of the layer.
    """
        
    def backward_pass(self, y):
        """Does the backward pass through the layer.
        
        Args:
            y: Correct output (label) of the training example.
        """
        self.delta = (y - self.output) * self.activation_function.vectorised_der(self.sum)

In [None]:
"""Contains NeuralNetwork class definition.

Run after running the Data Preprocessing notebook.
"""

class NeuralNetwork:
    """A neural network with single hidden layer and single node on outputlayer.
    
    Attributes:        
        training_set: Set that instance is to be trained on.
        validation_set: Set the instance is repeteadly tested on.
        test_set: Set that instance is to be tested on.
        hidden_nodes: Number of nodes on the hidden layer.
        activation_function: Activation function used for
            the neural network.
        learning_parameter: Step size parameter.
        epochs: Number of epochs the nerual network has gone through during the last training.
        validation_frequency: Frequency of testing on validation set expressed in epochs.
        previous_validation_error: Previous validation error.
    """
    
    def __init__(
        self,
        training_set: np.ndarray,
        validation_set: np.ndarray,
        test_set: np.ndarray,
        hidden_nodes: int,
        activation_function,
        epoch_limit,
        validation_frequency,
        learning_parameter: float = 0.1,
    ):
        """Initialises a NeuralNetwork instance.
        """
        # Input is a matrix where each row is one instance
        self.training_set = training_set
        self.validation_set = validation_set
        self.test_set = test_set
        self.learning_parameter = learning_parameter
        self.number_of_inputs = self.training_set.shape[1] - 1
        self.hidden_layer = HiddenLayer(
            self.number_of_inputs,
            hidden_nodes,
            activation_function
        )
        self.output_layer = OutputLayer(
            self.hidden_layer.number_of_neurons,
            1,
            activation_function
        )
        self.epoch_limit = epoch_limit
        self.validation_frequency = validation_frequency
        self.epochs = 0
        self.previous_validation_error = np.inf
    
    
    def train(self):
        """Trains NeuralNetwork instance.
        """
        # Loop through the training set
        for i in range(self.epoch_limit):
            self.epochs = i + 1
            for training_example in self.training_set:
                # Split individual examples into inputs (item) and label (c).
                item, c = np.hsplit(training_example, [self.training_set.shape[1] - 1])
                item = item.reshape(1, -1)
                self.forward_pass(item)
                self.backward_pass(c)
                self.update_weights(item)
            # Test on validation set.
            if (i + 1) % self.validation_frequency == 0:
                current_validation_error = self.test(self.validation_set)
                print(f"Current MSE is: {current_validation_error}")
                if self.previous_validation_error < current_validation_error:
                    print(f"The best validation error is: {self.previous_validation_error}")
                    break
                else:
                    self.previous_validation_error = current_validation_error
            
            
    def forward_pass(self, inputs):
        """Performs forward pass through the network.
        
        Args:
        inputs: Vector of values represneting a training example.
        """
        self.hidden_layer.forward_pass(inputs)
        self.output_layer.forward_pass(self.hidden_layer.output)       
    
    def backward_pass(self, c):
        """Performs backward pass through the network.
        
        Args:
        c: The label for the training example.
        """
        self.output_layer.backward_pass(c)
        self.hidden_layer.backward_pass(
            self.output_layer.weights,
            self.output_layer.delta
        )
        
    def update_weights(self, inputs):
        """Update weights in the network.
        """
        self.hidden_layer.update_weights(self.learning_parameter, inputs)
        self.output_layer.update_weights(self.learning_parameter, self.hidden_layer.output)
        
        
    def test(self, test_set, error_function=mean_squared_error):
        """Tests the neural network and returns error value.
        
        Args:
            test_set: Array of test examples (including labels).
            error_function: Error function to use.
        """
        predicted_values = np.empty(shape=(test_set.shape[0], 1))
        correct_values = test_set[:, test_set.shape[1] - 1]
        correct_values = correct_values.reshape(-1, 1)
        for i in range(len(test_set)):
                # Split individual examples into inputs (item) and label (c).
                item, c = np.hsplit(test_set[i], [test_set.shape[1] - 1])
                item = item.reshape(1, -1)
                self.forward_pass(item)
                predicted_values[i] = self.output_layer.output
        # Calculate error.
        return mean_squared_error(correct_values, predicted_values)
    
    def calculate_error(error_function=mean_squared_error):
        """Calculates Error for.
        """
        pass
    
    def predict(self, inputs):
        """Predicts value for given predictor values.
        """
        self.hidden_layer.forward_pass(inputs)
        self.output_layer.forward_pass(self.hidden_layer.output)
        return self.output_layer.output

In [None]:
# Train.

training_set = pd.read_csv("data/training-set.csv")
training_set = training_set.to_numpy() # Convert to a numpy array.
training_set = training_set[:, 1:] # Get rid of the index column.

validation_set = pd.read_csv("data/validation-set.csv")
validation_set = validation_set.to_numpy() # Convert to a numpy array.
validation_set = validation_set[:, 1:] # Get rid of the index column.

test_set = pd.read_csv("data/test-set.csv")
test_set = test_set.to_numpy() # Convert to a numpy array.
test_set = test_set[:, 1:] # Get rid of the index column.

neural_network = NeuralNetwork(
    training_set=training_set,
    validation_set=validation_set,
    test_set=test_set,
    hidden_nodes=16,
    activation_function = Sigmoid(),
    epoch_limit=1000,
    validation_frequency=10,
    learning_parameter=0.2 
)
start_time = time.perf_counter()
neural_network.train()
end_time = time.perf_counter()
print(f"Training time: {end_time - start_time} seconds")
neural_network.test(test_set)

"""
# TODO: Define list of activation functions.
# TODO: Define list of number of hidden nodes.
# TODO: Define list of learning parameter values.
for activation_function in activation_functions:
    for n_hidden_nodes in hidden_nodes:
        for learning_parameter in learning_parameters:
            # TODO: create a nerual network instance with specified attriobutes.
            # TODO: Start measuring time.
            # TODO: Train the instance.
            # TODO: Check running time
            # TODO: Print time, activation function, number of epochs, and learning parameter.
"""

## Testing

## Batch Learning

Batch size may improve efficiency. Showing all sampes at once can cause overfitting. It will be bad at generalsing.

Typical batch size: 32



In [None]:
inputs = [[1, 2, 3, 2.5],
          [2.0, 5.0, -1.0, 2.0],
          [-1.5, 2.7, 3.3, -0.8]]

weights = [[0.2, 0.8, -0.5, 1.0],
          [0.5, -.91, 0.26, -0.5],
          [-0.26, -.27, 0.17, 0.87]]

biases = [2, 3, 0.5]


weights2 = [[0.1, -0.14, 0.5],
          [-0.5, 0.12, -0.33],
          [-0.44, 0.73, -0.13]]

biases2 = [-1, 2, -0.5]

layer1_outputs = np.dot(inputs, np.array(weights).T) + biases

layer2_outputs = np.dot(layer1_outputs, np.array(weights2).T) + biases2

print(layer2_outputs)

## Feature Data Set

Feature data set is usaully denoted with `X`.

Labels are usually denoted with `y`.