# Backpropagation Neural Network

Backpropagation directs the weight changes down the gradient of steepest descent (hence Gradient Descent) of the error function and adjusts the weights according to a learning parameter which is set by the user.

This is possible because of the continuous nature and differentiability of the activation functions in the neurons.

## Momentum

## Bold Driver

## Annealing

## Weight Decay

## Learning Rate

Gradient of the error function

- $\frac{\delta E}{\delta w_{ij}}$ is the rate of change of Error $E$ with respect to a weight $w_{ij}$. This is the gradient of the error function.

## Updating weights

When updating weights, we are **subtracting** the gradient of the error function ($\times \rho$):

$$
w^*_{i,j} = w_{i,j} + \rho \delta_j u_i
$$


![Gradient of the Error function](figures/error-gradient.png)


Too small weights - stuck in local minima

- we start with random weights and biases

- searching multidiensional spaces result less often in being caught in local minima.

## When do we stop learning?


- We tend to stop learning when the error of an independent validation set increases.

- Every $x$ number of epochs, we test it against this unseen validation set.


### Symmetry in Weight Space

Network with $M$ hidden nodes exhbits symetry by a factor of $M!2^M$.

For example, if we have 3 hidden nodes, we will have $3!2^3 = 48$ the same global minima.

## Measuring Performance

### Mean Squared Error

$$
MSE = \frac{\sum(O - M)^2}{n}
$$

where:

- $O$ is the observed value
- $M$ is the modelled value
- $n$ is the number of example


In [None]:
from typing import Callable
import time
import json

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
def mse(observed: np.ndarray, modelled: np.ndarray):
    """Calculate the Mean Squared Error.
    
    Args:
        observed: Array of the observed values.
        modelled: Array of the modelled values.
        
    Returns:
        The Mean Squared Error for the given arrays.
    """
    return np.sum(np.power(observed - modelled, 2)) / len(observed)

### Root Mean Squared Error

- no upper bound
- for perfect model, RMSE = 0
- records real unit

In [None]:
def rmse(observed: np.ndarray, modelled: np.ndarray):
    """Calculates the Root Mean Squared Error.
     
    Args:
        observed: Array of the observed values.
        modelled: Array of the modelled values.
        
    Returns:
        The Root Mean Squared Error for the given arrays.
    """
    return np.sqrt(np.sum(np.power(observed - modelled, 2)) / len(observed))
    

### Mean Squared relative Error

- error relative to the observed value.

In [None]:
def msre(observed: np.ndarray, modelled: np.ndarray):
    """Calculates the Mean Squared Relative Error.
     
    Args:
        observed: Array of the observed values.
        modelled: Array of the modelled values.
        
    Returns:
        The Mean Squared Relative Error for the given arrays.
    """
    return np.sum(np.power((modelled - observed) / observed, 2)) / len(observed)

### Coefficient of Efficiency

- +1 represents perfect model

In [None]:
def ce(observed: np.ndarray, modelled: np.ndarray):
    """Calculates the Coefficient of Efficiency.
     
    Args:
        observed: Array of the observed values.
        modelled: Array of the modelled values.
        
    Returns:
        The Coefficient of Efficiency for the given arrays.
    """
    return 1 - (np.sum(np.power(modelled - observed, 2)) / np.sum(np.power(observed - np.mean(observed), 2)))

### R-Squared - Coefficient of Determination

- measures the coincidence of the shape
range from 0 ro 1

In [None]:
def rsqr(observed: np.ndarray, modelled: np.ndarray):
    """Calculates the Coefficient of Determination - .
     
    Args:
        observed: Array of the observed values.
        modelled: Array of the modelled values.
        
    Returns:
        The Mean Squared Error for the given arrays.
    """
    dividend = np.sum(np.multiply(observed - observed.mean(), modelled - modelled.mean()))
    divisor = np.sqrt(np.multiply(np.sum(np.power(observed - observed.mean(), 2)), np.sum(np.power(modelled - modelled.mean(), 2))))
    return np.power(dividend / divisor, 2)

## Number of Hidden Nodes

- It is not fixed
- The general rule of thumb is, having $n$ inputs, to try $\frac{n}{2}$ to $2n$ hidden nodes.

## Things to manipulate

- number of hidden nodes
- step size
- activation function (optionally)

In [None]:
class Perceptron():
    """Object representing perceptron with two inputs.

    Attributes:
        e: A training set.
        w0: Bias weight.
        w1: The weight of the first input.
        w2: The weight of the second input.
        epochs: Number of epochs before stabilisation.
    """
    def __init__(self, e, activation_function = lambda s: 1 if s > 0 else -1):
        '''Initialises Perceptron object.'''
        self.w0 = 0
        self.w1 = 0
        self.w2 = 0
        self.e = e
        self.activation_function = activation_function

    def train(self):
        """Trains perceptron."""
        self.epochs = 1
        stable = False
        while not stable:
            stable = True
            for example in self.e:
                print(example)
                if self.classify(example[1], example[2]) == example[3]:
                    pass
                else:
                    self.w0 += example[3] * example[0]
                    self.w1 += example[3] * example[1]
                    self.w2 += example[3] * example[2]
                    stable = False
            if not stable:
                self.epochs += 1

    def classify(self, x1, x2):
        """Classifies an object."""
        s = (self.w1 * x1) + (self.w2 * x2) + self.w0
        return self.activation_function(s)



## Activation Functions

- Every node on hidden layers and output layer has an activation function.



- We might have **linear activation function in the output layer** instead of using sigmoid or tanh.


### Sigmoid Function

$$
f(x) = \frac{1}{1 + e^{-x}}
$$

#### First Order Derivative

We also need to calculate the first order differential of the function.

$$
f'(x) = f(x)(1 - f(x))
$$

$$
f'(S_j) = u_j (1 - u_j)
$$

#### Delta Values

- $\delta_j = (C - u_O) f'(S_O)$ O is the output node.
- $\delta_j = w_{j,O} \delta_O f'(S_j)$ for hidden layer nodes.



#### Notes

Sigmoid func has gradient problem.


In [None]:
class Sigmoid:
    """Represents Sigmoid activation function.
    """
    
    def __init__(self):
        """Initialises a sigmoid object."""
        self.vectorised_func = np.vectorize(self.func)
        self.vectorised_der = np.vectorize(self.der)
        
    def func(self, x):
        """Calculates output of the Sigmoid function."""
        return 1 / (1 + np.e ** (-x))
    
    def der(self, x):
        """Calculates output of the derivative of the Sigmoid function.
        """
        return self.func(x) * (1 - self.func(x))

### Tanh - Hyperbolic tangent

$$
tanh x = \frac{e^x - e^{-x}}{e^x + e^{-x}}
$$

#### First Order Derivative

$$
tanh' x = 1 - tanh^2 x
$$

$$
f'(S_j) = 1 - u^2_j
$$

In [None]:
class Tanh:
    """Represents tanh activation function.
    """
    
    def __init__(self):
        """Initialises a tanh object."""
        self.vectorised_func = np.vectorize(self.func)
        self.vectorised_der = np.vectorize(self.der)
    
    def func(self, x):
        """Calculates output of the tanh function."""
        return (np.e ** x - np.e ** (-x)) / (np.e ** x + np.e ** (-x))
    
    def der(self, x):
        """Calculates output of the derivative of the tanh function.
        """
        return 1 - self.func(x) ** 2

### ReLU - Rectified Linear Unit

$$
f(x) = \begin{cases}
    x & \text{if $x > 0$}\\
    0 & \text{otherwise}
\end{cases}
$$

#### First Order Derivative

$$
f'(x) = \begin{cases}
    1 & \text{if $x > 0$}\\
    0 & \text{if $x < 0$}
\end{cases}
$$

The derivative of the ReLU function is not defined at $x = 0$. However, when implementing the derivation of ReLU we tend to define $\frac{d}{dx} \text{ReLU}(0) = 0$, instead of returning undefined.

In [None]:
class Relu:
    """Represents ReLU activation function.
    """
    
    def __init__(self):
        """Initialises a Relu object."""
        self.vectorised_func = np.vectorize(self.func)
        self.vectorised_der = np.vectorize(self.der)
    
    def func(self, x):
        """Calculates output of the ReLU function."""
        if x > 0:
            return x
        else:
            return 0
    
    def der(self, x):
        """Calculates output of the derivative of the ReLU function.
        """
        if x > 0:
            return 1
        else:
            return 0

### Leaky ReLU - Leaky Rectified Linear Unit

$$
f(x) = \begin{cases}
    x & \text{if $x > 0$}\\
    0.01x & \text{otherwise}
\end{cases}
$$

#### First Order Derivative

$$
f'(x) = \begin{cases}
    1 & \text{if $x > 0$}\\
    0.01 & \text{if $x < 0$}
\end{cases}
$$

The derivative of the Leaky ReLU function is not defined at $x = 0$. However, when implementing the derivation of Leaky ReLU we tend to define $\frac{d}{dx} \text{LeakyReLU}(0) = 0.01$, instead of returning undefined.

In [None]:
class LeakyRelu:
    """Represents Leaky ReLU activation function.
    """
    
    def __init__(self):
        """Initialises a LeakyRelu object."""
        self.vectorised_func = np.vectorize(self.func)
        self.vectorised_der = np.vectorize(self.der)
    
    def func(self, x):
        """Calculates output of the Leaky ReLU function."""
        if x > 0:
            return x
        else:
            return 0.01 * x
    
    def der(self, x):
        """Calculates output of the derivative of the Leaky ReLU function.
        """
        if x > 0:
            return 1
        else:
            return 0.01

In [None]:
def destandardise(x: np.ndarray, max_value: float, min_value: float):
    """Destandardises data using minimum and maximum values.
    
    Args:
    x: A numpy.ndarray instance of standardised data.
    max_value: A maximum value for the destandardisation formula.
    min_value: A minimum value for the destandardisation formula.
    
    Returns:
    numpy.ndarray.
    """
    return ((x - 0.1) * (max_value - min_value)) / 0.8 + min_value

In [None]:
class Backpropagation:
    """Backpropagation algorithm for training neural networks.
    
    Attributes:
        neural_network: NeuralNetwork instance being trained.
        epochs: Number of epochs the nerual network has gone through during the training.
        previous_validation_error: Previous error (MSE) on the validation set.
        previous_training_error: Previous error (MSE) on the training set.
    """
    
    def __init__(
        self,
        neural_network
    ):
        """Initialises backpropagation object."""
        self.neural_network = neural_network
        self.epochs = 0
        self.previous_validation_error = np.inf
        self.previous_training_error = np.inf
        self.validation_errors = []
        self.training_errors = []
        self.tested_at_epochs = []
    
    def train(
        self,
        training_set: np.ndarray,
        validation_set: np.ndarray,
        validation_frequency: int,
        learning_rate: float,
        epoch_limit: int
    ):
        """Trains NeuralNetwork instance using Backpropagation.
        
        Args:
        training_set: Set that instance is trained on.
        validation_set: Set the instance is tested on during training.
        validation_frequency: Frequency of testing on validation set.
            Expressed in epochs.
        learning_rate: Learning rate.
        epoch_limit: The maximum number of epochs to train the instace for.
        """
        # Loop through the training set.
        for i in range(epoch_limit):
            # Annealing
            #learning_rate = self.simulated_annealing(
            #   start_rate=learning_rate,
            #   end_rate=0.01,
            #    epoch_limit=epoch_limit,
            #   epochs_passed=self.epochs
            #)
            self.epochs = i + 1
            for training_example in training_set:
                # Split individual examples into inputs (item) and label (c).
                item, c = np.hsplit(training_example, [training_set.shape[1] - 1])
                item = item.reshape(1, -1)
                c = c.reshape(1, -1)
                self.forward_pass(item)
                self.backward_pass(c, self.epochs, learning_rate)
                self.update_weights(item, learning_rate)
            # Bold driver.
            #if (i + 1) % 1000 == 0:
            #   # Test against the training set.
            #    observed, predicted = self.neural_network.test(training_set)
            #    current_training_error = mse(observed, predicted)
            #    learning_rate = self.bold_driver(
            #        current_training_error,
            #        learning_rate
            #    )
            #   self.previous_training_error = current_training_error
            # Test on validation set.
            if (i + 1) % validation_frequency == 0:
                observed_test, predicted_test = self.neural_network.test(training_set)
                current_test_error = mse(observed_test, predicted_test)
                self.training_errors.append(current_test_error)
                
                observed, predicted = self.neural_network.test(validation_set)
                current_validation_error = mse(observed, predicted)
                self.validation_errors.append(current_validation_error)
                self.tested_at_epochs.append(self.epochs)
                print(f"Current Validation Error: {current_validation_error}")
                if self.previous_validation_error < current_validation_error:
                    for layer in self.neural_network.layers:
                        layer.restore_weights_and_biases()
                    break
                else:
                    self.previous_validation_error = current_validation_error
                    for layer in self.neural_network.layers:
                        layer.save_weights_and_biases()
        
    
    def forward_pass(self, inputs):
        """Performs forward pass through the network.
        
        Args:
        inputs: Vector of values represneting a training example.
        """   
        for i, layer in enumerate(self.neural_network.layers):
            if i == 0:
                layer.forward_pass(inputs)
            else:
                previous_layer = self.neural_network.layers[i-1]
                layer.forward_pass(previous_layer.output)
                
        
    def backward_pass(self, c, epochs_passed, learning_rate):
        """Performs backward pass through the network.

        Args:
        c: The label for the training example.
        """
        reversed_layers = list(reversed(self.neural_network.layers))
        for i, layer in enumerate(reversed_layers):
            if i == 0:
                # output layer backward pass
                layer.delta = np.multiply(
                    c - layer.output,
                    layer.activation_function.vectorised_der(layer.sum)
                )
            else:
                # hidden layer backward pass
                next_layer = reversed_layers[i-1]
                layer.delta = np.multiply(
                    np.dot(next_layer.weights, next_layer.delta.T).T,
                    layer.activation_function.vectorised_der(layer.sum)
                )
            
    def update_weights(
        self,
        inputs: np.ndarray,
        learning_rate: float
    ):
        """Updates weights in the network.
        
        Args:
        inputs: Inputs to the network.
        learning_rate: Learning rate.
        """
        for i, layer in enumerate(self.neural_network.layers):
            #previous_weights = layer.weights.copy()
            #previous_biases = layer.biases.copy()
            if i == 0:
                layer.weights = layer.weights + learning_rate * np.dot(inputs.T, layer.delta)
            else:
                previous_layer = self.neural_network.layers[i-1]
                layer.weights = layer.weights + learning_rate * np.dot(
                    previous_layer.output.T,
                    layer.delta
                )
            layer.biases = layer.biases + learning_rate * layer.delta
            #weights_delta = layer.weights - previous_weights
            #biases_delta = layer.biases - previous_biases
            #layer.weights = layer.weights + 0.9 * weights_delta
            #layer.biases = layer.biases + 0.9 * biases_delta
            
    def bold_driver(
        self,
        current_training_error: float,
        learning_rate: float
    ):
        """Implements the Bold Driver extension."""
        if ((current_training_error / self.previous_training_error) * 100) >= 4:
            # Decrease the learning rate if the error has increased.
            learning_rate = learning_rate * 0.7
            for layer in self.neural_network.layers:
                layer.restore_weights_and_biases()
            print("Decreased")
        elif ((current_training_error / self.previous_training_error) * 100) <= 96:
            # Increase the learning rate if the error has decreased.
            learning_rate = learning_rate * 1.05
            print("Increased")
        # Check if the learning rate is the range.
        if learning_rate < 0.01:
            learning_rate = 0.01
        elif learning_rate > 0.5:
            learning_rate = 0.5
        return learning_rate
                
            
    def simulated_annealing(
        self,
        start_rate: float,
        end_rate: float,
        epoch_limit: int,
        epochs_passed: int
    ) -> float:
        """Returns annealed value of the learning rate.

        Args:
            start_rate: Initial learning rate value.
            end_rate: Final learning rate value.
            epoch_limit: Limit of epochs.
            epochs_passed: Number of epochs that have elapsed.

        Returns:
            A float representing annealed learning rate.
        """
        divisor = 1 + np.e ** (10 - (20 * epochs_passed) /  epoch_limit)
        return end_rate + (start_rate - end_rate) * (1 - (1 / divisor))
    
    def weight_decay(
        self,
        epochs_passed: int,
        learning_rate: float
    ) -> float:
        """Calculates the penalty term for the error function.
        
        Args:
            epochs_passed: Number of epochs that have elapsed.
            learning_rate: Learning rate value.

        Returns:
            The penalty term for the error function.
        """
        weights_and_biases_sum = 0
        n = 0
        for layer in self.neural_network.layers:
            weights_and_biases_sum += np.sum(
                np.power(
                    np.vstack((layer.weights, layer.biases)),
                    2
                )
            )
            n += layer.weights.shape[0] * layer.weights.shape[1] + layer.biases.shape[1]
        omega = (1 / (2 * n)) * weights_and_biases_sum
        regularisation_parameter = 1 / (learning_rate * epochs_passed)
        return regularisation_parameter * omega

In [None]:
"""Contains definition of a layer of a neural network.
"""

np.random.seed(0)


class Layer:
    """Layer of a neural network.
    
    Attributes:
        weights: set of weights of the layer.
        biases: set of biases of the layer.
        activation_function: Activation Function of the layer.
        number_of_neurons: Number of neurons on the layer.
        output: the most recent output of the layer.
        saved_weights: Weights saved at previous validaton point.
        saved_biases: Biases saved at previous validaton point.
        delta: Delta values for neurons on the layer.
    """
    def __init__(self,
                 number_of_inputs: int,
                 number_of_neurons: int,
                 activation_function
                ):
        """Initialises a NeuralNetwork instance."""
        self.number_of_neurons = number_of_neurons
        self.activation_function = activation_function
        random_generator = np.random.default_rng(5)
        low = -2 / number_of_inputs
        high = 2 / number_of_inputs
        self.weights = random_generator.uniform(
            low=low,
            high=high,
            size=(number_of_inputs, number_of_neurons)
        )
        self.saved_weights = self.weights.copy()
        self.biases = random_generator.uniform(
            low=low,
            high=high,
            size=(1, number_of_neurons)
        )
        self.saved_biases = self.biases.copy()
        self.delta = np.nan
    
    def forward_pass(self, inputs: np.ndarray):
        """Does the forward pass through the layer.
        
        Args:
            inputs: Inputs to the layer.
        """
        self.sum = np.dot(inputs, self.weights) + self.biases
        self.output = self.activation_function.vectorised_func(self.sum)
            
    def save_weights_and_biases(self):
        """Saves current weights and biases to keep them after updating.
        """
        self.saved_weights = self.weights.copy()
        self.saved_biases = self.biases.copy()
        
    def restore_weights_and_biases(self):
        """Restores saved weights and biases.
        """
        self.weights = self.saved_weights
        self.biases = self.saved_biases     

In [None]:
"""Contains NeuralNetwork class definition.

Run after running the Data Preprocessing notebook.
"""

class NeuralNetwork:
    """A neural network with single hidden layer and single node on the output layer.
    
    Attributes:
        number_of_inputs: Number of inputs to the network.
        layers: List of network's hidden layers and the output layer.
    """
    
    def __init__(
        self,
        number_of_inputs: int,
        network_architecture,
    ):
        """Initialises a NeuralNetwork instance.
        """
        self.number_of_inputs = number_of_inputs
        self.layers = []
        for i, item in enumerate(network_architecture):
            if i == 0:
                number_of_layer_inputs = number_of_inputs
            else:
                number_of_layer_inputs = self.layers[-1].number_of_neurons
            layer = Layer(
                number_of_inputs=number_of_layer_inputs,
                number_of_neurons=item[0],
                activation_function=item[1]
            )
            self.layers.append(layer)
    
    def train(
        self,
        training_set: np.ndarray,
        validation_set: np.ndarray,
        validation_frequency: int,
        learning_rate: float,
        epoch_limit: int,
        training_algorithm
    ):
        """Trains NeuralNetwork instance.
        
        Args:
        training_set: Set that instance is trained on.
        validation_set: Set the instance is tested on during training.
        validation_frequency: Frequency of testing on validation set.
            Expressed in epochs.
        learning_rate: Learning rate.
        epoch_limit: The maximum number of epochs to train the instace for.
        """
        training_algorithm.train(
            training_set=training_set,
            validation_set=validation_set,
            validation_frequency=validation_frequency,
            learning_rate=learning_rate,
            epoch_limit=epoch_limit
        )   
                    
    def test(self, test_set) -> list:
        """Tests the neural network and returns both observed and modelled values.
        
        Args:
            test_set: Array of test examples (including labels).
        Returns: two-element list with observed values being the first
            element and modelled being the second.
        
        """
        predicted_values = np.empty(shape=(test_set.shape[0], 1))
        correct_values = test_set[:, test_set.shape[1] - 1]
        correct_values = correct_values.reshape(-1, 1)
        for i in range(len(test_set)):
            # Split individual examples into inputs (item) and label (c).
            item, c = np.hsplit(test_set[i], [test_set.shape[1] - 1])
            item = item.reshape(1, -1)
            predicted_values[i] = self.predict(item)
        correct_values = destandardise(correct_values, max_value=max_value, min_value=min_value)
        predicted_values = destandardise(predicted_values, max_value=max_value, min_value=min_value)
        return [correct_values, predicted_values]
       
    def save_network(self, file_path: str):
        """Saves the network to the file in JSON format.
        
        Args:
        file_path: File path to the target file.
        """
        layers_details = []
        for layer in self.layers:
            layer_details = {
                "weights": layer.weights.tolist(),
                "biases": layer.biases.tolist(),
                "activation_function": layer.activation_function.__class__.__name__
            }
            layers_details.append(layer_details)

        neural_network = {
            "layers": layers_details
        }
        try:
            with open(file_path, "w") as f:
                json.dump(neural_network, f)
        except IOError as e:
            print(e)
        except Exception as e:
            print(f"Unexpected exception: {e}")
            
    @classmethod
    def load_network(cls, file_path: str):
        """Returns the network loaded from the file in JSON format.
        
        Args:
        file_path: File path to the target file.
        
        Returns:
        NeuralNetwork instance.
        """
        try:
            with open(file_path, "r") as f:
                nn_details = json.load(f)
            layers = []
            for layer_details in nn_details['layers']:
                weights = np.array(layer_details['weights'])
                biases = np.array(layer_details['biases'])
                layer = Layer(
                    number_of_inputs=weights.shape[0],
                    number_of_neurons=weights.shape[1],
                    activation_function=eval(f"{layer_details['activation_function']}()")
                )
                layer.weights = weights
                layer.biases = biases
                layers.append(layer)
        except IOError as e:
            print(e)
            return None
        except Exception as e:
            print(f"Unexpected exception: {e}")
            return None
        architecture = []
        for layer in layers:
            layer_details = tuple([layer.weights.shape[1], layer.activation_function])
            architecture.append(layer_details)
        neural_network = cls(
            number_of_inputs=layers[0].weights.shape[0],
            network_architecture=architecture
        )
        neural_network.layers = layers
        return neural_network
                
    
    def predict(self, inputs):
        """Predicts value for given predictor values.
        """
        for i, layer in enumerate(self.layers):
            if i == 0:
                layer.forward_pass(inputs)
            else:
                previous_layer = self.layers[i-1]
                layer.forward_pass(previous_layer.output)
        return self.layers[-1].output

In [None]:
# Load min and max values for the destandardisation process.
with open("standardisation.json", "r") as f:
    min_max_values = json.load(f)

min_value = min_max_values["min"]
max_value = min_max_values["max"]

In [None]:
# Train.

training_set = pd.read_csv("data/training-set.csv")
training_set = training_set.to_numpy() # Convert to a numpy array.
training_set = training_set[:, 1:] # Get rid of the index column.

validation_set = pd.read_csv("data/validation-set.csv")
validation_set = validation_set.to_numpy() # Convert to a numpy array.
validation_set = validation_set[:, 1:] # Get rid of the index column.

test_set = pd.read_csv("data/test-set.csv")
test_set = test_set.to_numpy() # Convert to a numpy array.
test_set = test_set[:, 1:] # Get rid of the index column.


neural_network = NeuralNetwork(
    number_of_inputs=training_set.shape[1]-1,
    network_architecture=[[6, Sigmoid()], [1, Sigmoid()]]
)

backpropagation = Backpropagation(
    neural_network=neural_network
)
    
start_time = time.perf_counter()
neural_network.train(
    training_set=training_set,
    validation_set=validation_set,
    validation_frequency=10,
    learning_rate=0.2,
    epoch_limit=10000,
    training_algorithm=backpropagation
)
#end_time = time.perf_counter()
#print(f"Training time: {end_time - start_time} seconds")
#print(f"Smallest Validation Error: {backpropagation.previous_validation_error}")
#print(f"Number of epochs: {backpropagation.epochs}")
#observed, predicted = neural_network.test(test_set)
#print(f"{mse(observed, predicted)},")
#print(f"{rmse(observed, predicted)},")
#print(f"{msre(observed, predicted)},")
#print(f"{ce(observed, predicted)},")
#print(f"{rsqr(observed, predicted)}\n")

#neural_network.save_network("sigmoid-6-0.2-sa.json")
    
#neural_network2 = NeuralNetwork.load_network("sigmoid-6-0.2-sa.json")
    
#print(f"Hidden weights are equal? {np.array_equal(neural_network.hidden_layer.weights, neural_network2.hidden_layer.weights)}")
#print(f"Hidden biases are equal? {np.array_equal(neural_network.hidden_layer.biases, neural_network2.hidden_layer.biases)}")
#print(f"Output weights are equal? {np.array_equal(neural_network.output_layer.weights, neural_network2.output_layer.weights)}")
#print(f"Output biases are equal? {np.array_equal(neural_network.output_layer.biases, neural_network2.output_layer.biases)}")

#print(f"{neural_network.activation_function} = {neural_network2.activation_function}")




In [None]:
%%script false --no-raise-error

with open("sigmoid-6-0.2-sa.csv", "w") as f:
    f.write(f"{neural_network.activation_function.__class__.__name__},")
    f.write(f"{neural_network.hidden_nodes},")
    f.write(f"{0.4},")
    f.write(f"{end_time - start_time},")
    f.write(f"{neural_network.epochs},")
    f.write(f"{neural_network.previous_validation_error},")
    observed, predicted = neural_network.test(test_set)
    f.write(f"{mse(observed, predicted)},")
    f.write(f"{rmse(observed, predicted)},")
    f.write(f"{msre(observed, predicted)},")
    f.write(f"{ce(observed, predicted)},")
    f.write(f"{rsqr(observed, predicted)}\n")


with open("neural-network-configs.csv", "a") as f:
    activation_functions = [Sigmoid(), Tanh(), Relu(), LeakyRelu()]
    hidden_nodes = range((training_set.shape[1] - 1) // 2, 2 * training_set.shape[1] - 1)
    learning_rates = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
    for activation_function in activation_functions:
        for n_hidden_nodes in hidden_nodes:
            for learning_rate in learning_rate:
                neural_network = NeuralNetwork(
                    number_of_inputs=training_set.shape[1]-1,
                    network_architecture=[[n_hidden_nodes, activation_function], [1, activation_function]]
                )            
                start_time = time.perf_counter()
                neural_network.train(
                    training_set=training_set,
                    validation_set=validation_set,
                    validation_frequency=10,
                    learning_rate=learning_rate,
                    epoch_limit=500
                )
                end_time = time.perf_counter()
                f.write(f"{activation_function.__class__.__name__},")
                f.write(f"{n_hidden_nodes},")
                f.write(f"{learning_rate},")
                f.write(f"{end_time - start_time},")
                f.write(f"{neural_network.epochs},")
                f.write(f"{neural_network.previous_validation_error},")
                observed, predicted = neural_network.test(validation_set)
                f.write(f"{mse(observed, predicted)},")
                f.write(f"{rmse(observed, predicted)},")
                f.write(f"{msre(observed, predicted)},")
                f.write(f"{ce(observed, predicted)},")
                f.write(f"{rsqr(observed, predicted)}\n")

with open("neural-network-configs-selected-validation-set.csv", "w") as f:
    configs = [
        [Sigmoid(), 5, 0.2],
        [Sigmoid(), 6, 0.2],
        [LeakyRelu(), 14, 0.6],
        [LeakyRelu(), 14, 0.2],
        [Relu(), 9, 0.2],
        [Tanh(), 15, 0.1]
    ]
    for config in configs:
        neural_network = NeuralNetwork(
            number_of_inputs=training_set.shape[1]-1,
            network_architecture=[[config[1], config[0]], [1, config[0]]]
        )
        backpropagation = Backpropagation(
            neural_network=neural_network
        )
        start_time = time.perf_counter()
        neural_network.train(
            training_set=training_set,
            validation_set=validation_set,
            validation_frequency=10,
            learning_rate=config[2],
            epoch_limit=100000,
            training_algorithm=backpropagation
        )
        end_time = time.perf_counter()
        f.write(f"{config[0].__class__.__name__},")
        f.write(f"{config[1]},")
        f.write(f"{config[2]},")
        f.write(f"{end_time - start_time},")
        f.write(f"{backpropagation.epochs},")
        f.write(f"{backpropagation.previous_validation_error},")
        observed, predicted = neural_network.test(validation_set)
        f.write(f"{mse(observed, predicted)},")
        f.write(f"{rmse(observed, predicted)},")
        f.write(f"{msre(observed, predicted)},")
        f.write(f"{ce(observed, predicted)},")
        f.write(f"{rsqr(observed, predicted)}\n")

In [None]:
%%script false --no-raise-error
df = pd.DataFrame({
   'Training Error': backpropagation.training_errors,
   'Validation Error': backpropagation.validation_errors
   }, index = backpropagation.tested_at_epochs)
lines = df.plot(kind="line", xlabel="Epoch", ylabel="MSE")
plt.savefig('figures/training-validation-error.png')

## Testing

## Batch Learning

Batch size may improve efficiency. Showing all sampes at once can cause overfitting. It will be bad at generalsing.

Typical batch size: 32



In [None]:
inputs = [[1, 2, 3, 2.5],
          [2.0, 5.0, -1.0, 2.0],
          [-1.5, 2.7, 3.3, -0.8]]

weights = [[0.2, 0.8, -0.5, 1.0],
          [0.5, -.91, 0.26, -0.5],
          [-0.26, -.27, 0.17, 0.87]]

biases = [2, 3, 0.5]


weights2 = [[0.1, -0.14, 0.5],
          [-0.5, 0.12, -0.33],
          [-0.44, 0.73, -0.13]]

biases2 = [-1, 2, -0.5]

layer1_outputs = np.dot(inputs, np.array(weights).T) + biases

layer2_outputs = np.dot(layer1_outputs, np.array(weights2).T) + biases2

print(layer2_outputs)

## Feature Data Set

Feature data set is usaully denoted with `X`.

Labels are usually denoted with `y`.