# Neural Network

> Back Propagation Implementation - December 2024
>
> NEC First Assignment - Universitat Rovira i Virgili
>
> *Andrea Pujals Bocero*

## Workflow
1. Initialization
2. Feed-forward propagation
3. Error back-propagation
4. Update of weights and thresholds
5. Training loop
6. Evaluation and visualization

In [1]:
import numpy as np
import matplotlib.pyplot as plt

### 1. Initialization
Define parameters in the class constructor. Random initialization of weights to break symmetry and scaled to 0.01. Zero initialization of activations and deltas.


In [4]:
class NeuralNet:
    def __init__(self, layers, epochs, learning_rate, momentum, fact, val_split):
        self.L = len(layers)  # Number of layers
        self.n = layers       # Number of units in each layer
        self.epochs = epochs  # Number of training epochs
        self.lr = learning_rate  # Learning rate
        self.momentum = momentum  # Momentum term
        self.fact = fact  # Activation function
        self.val_split = val_split  # Validation set percentage
        
        # Initialize activations, weights, thresholds, and other variables
        self.h = [np.zeros(n) for n in layers]  # Fields
        self.xi = [np.zeros(n) for n in layers]  # Activations
        self.w = [None] + [np.random.randn(layers[i], layers[i-1]) for i in range(1, self.L)]
        self.theta = [None] + [np.random.randn(layers[i]) for i in range(1, self.L)]
        #self.w = [None] + [np.random.randn(layers[i], layers[i-1]) * 0.01 for i in range(1, self.L)]  # Weights
        #self.theta = [np.zeros(n) for n in layers]  # Thresholds
        self.delta = [np.zeros(n) for n in layers]  # Propagated errors
        self.d_w = [None] + [np.zeros_like(w) for w in self.w[1:]]  # Weight updates
        self.d_theta = [np.zeros(n) for n in layers]  # Threshold updates
        self.d_w_prev = [None] + [np.zeros_like(w) for w in self.w[1:]]  # Previous weight updates
        self.d_theta_prev = [np.zeros(n) for n in layers]  # Previous threshold updates
        
        # Loss tracking
        self.train_losses = []
        self.val_losses = []

Activation funciton method: This method applies a chosen activation function to the input h, determining the output of neurons in a layer.
Activation functions introduce non-linearity, enabling the network to model complex relationships.

Sigmoid function:

$ g(h) = \frac{1}{1 + e^{-h}} $

Relu function:

$ g(h) = \max(0, h) $

Tanh function:

$ g(h) = \tanh(h) $

Linear function:

$ g(h) = h $

In [5]:
    def activation(self, h):
        # Compute the activation function
        if self.fact == 'sigmoid':
            return 1 / (1 + np.exp(-h))
        elif self.fact == 'relu':
            return np.maximum(0, h)
        elif self.fact == 'tanh':
            return np.tanh(h)
        elif self.fact == 'linear':
            return h

Then, the derivative is used during back-propagation to compute the gradient of the loss function with respect to the weights:

Derivative of sigmoid function:

$ g'(h) = g(h) \cdot (1 - g(h)) $

Derivative of relu function:

$ g'(h) =
\begin{cases} 
1 & \text{if } h > 0 \\
0 & \text{otherwise}
\end{cases}
$

Derivative of tanh function:

$ g'(h) = 1 - \tanh^2(h) $

Derivative of linear function:

$ g'(h) = 1 $





In [6]:
    def activation_derivative(self, h):
        # Compute the derivative of the activation function
        if self.fact == 'sigmoid':
            act = 1 / (1 + np.exp(-h))
            return act * (1 - act)
        elif self.fact == 'relu':
            return np.where(h > 0, 1, 0)
        elif self.fact == 'tanh':
            return 1 - np.tanh(h) ** 2
        elif self.fact == 'linear':
            return np.ones_like(h)

### 2. Feed-forward propagation
The pre-activation field for each layer is computed as:

$ \xi^{(\ell)}_i = g(h^{(\ell)}_i) $

The activation for each unit is computed using an activation function 𝑔:

$h^{(\ell)}_i = \sum_j w^{(\ell)}_{ij} \cdot \xi^{(\ell-1)}_j - \theta^{(\ell)}_i$

In [7]:
    def forward(self, X):
      # Compute forward propagation

      self.xi[0] = X  # Input layer activations
      for l in range(1, self.L):
        self.h[l] = np.dot(self.w[l], self.xi[l - 1]) - self.theta[l]
        self.xi[l] = self.activation(self.h[l])
      return self.xi[-1]  # Return output layer activations

### 3. Error back-propagation
The error for the output layer L is:

$ \Delta^{(L)}_i = g'(h^{(L)}_i) \cdot (o_i{(x)} - z_i) $

For each hidden layer ℓ, the error is propagated backward:

$ \Delta^{(\ell-1)}_j = g'(h^{(\ell-1)}_j) \cdot \sum_i \Delta^{(\ell)}_i \cdot w^{(\ell)}_{ij} $




In [8]:
    def backward(self, y_true):
      # Compute backward propagation

      # Compute delta for the output layer
      self.delta[-1] = (self.xi[-1] - y_true) * self.activation_derivative(self.h[-1])
      # Propagate errors backward
      for l in range(self.L - 2, 0, -1):
        self.delta[l] = np.dot(self.w[l + 1].T, self.delta[l + 1]) * self.activation_derivative(self.h[l])

### 4. Update of weights and thresholds
The weights are updated using the delta rule with momentum:

$ \delta w^{(\ell)}_{ij} = -\eta \cdot \Delta^{(\ell)}_i \cdot \xi^{(\ell-1)}_j + \alpha \cdot \delta w^{(\ell)}_{ij, \text{prev}} $

$ w^{(\ell)}_{ij} \to w^{(\ell)}_{ij} + \delta w^{(\ell)}_{ij} $

Thresholds are updated similarly:

$ \delta \theta^{(\ell)}_i = \eta \cdot \Delta^{(\ell)}_i + \alpha \cdot \delta \theta^{(\ell)}_i(\text{prev}) $

$ \theta^{(\ell)}_i \to \theta^{(\ell)}_i + \delta \theta^{(\ell)}_i $



In [9]:
    def update_weights_thresholds(self):
        # Update weights and thresholds using momentum
        for l in range(1, self.L):
            self.d_w[l] = -self.lr * np.outer(self.delta[l], self.xi[l - 1]) + self.momentum * self.d_w_prev[l]
            self.w[l] += self.d_w[l]
            self.d_w_prev[l] = self.d_w[l]

            self.d_theta[l] = self.lr * self.delta[l] + self.momentum * self.d_theta_prev[l]
            self.theta[l] += self.d_theta[l]
            self.d_theta_prev[l] = self.d_theta[l]

### 5. Training loop
Shuffle data, loop through epochs, minimize quadratic error:

$ E[o] = \frac{1}{2} \sum_{\mu=1}^p \sum_{i=1}^m (o_i(x^{\mu}) - z_i^{\mu})^2 $


In [10]:
    def fit(self, X, y):
        # Train the neural network

        # Split data into training and validation sets
        n_train = int((1 - self.val_split) * len(X))
        X_train, X_val = X[:n_train], X[n_train:]
        y_train, y_val = y[:n_train], y[n_train:]

        for epoch in range(self.epochs):
            # Shuffle training data
            indices = np.random.permutation(len(X_train))
            X_train, y_train = X_train[indices], y_train[indices]

            # Train on each sample
            for i in range(len(X_train)):
                self.forward(X_train[i])
                self.backward(y_train[i])
                self.update_weights_thresholds()

            # Compute losses
            train_loss = np.mean((self.predict(X_train) - y_train) ** 2)
            val_loss = np.mean((self.predict(X_val) - y_val) ** 2)
            self.train_losses.append(train_loss)
            self.val_losses.append(val_loss)

            print(f"Epoch {epoch + 1}/{self.epochs}, Train Loss: {train_loss}, Val Loss: {val_loss}")

### 6. Evaluation and visualization
Evaluating the model’s performance on unseen data (test sets). The predict method corresponds to the output layer’s activations during forward propagation:

$ o(x) = \xi^{(L)} $


In [11]:
    def predict(self, X):
        # Generate predictions
        predictions = []
        for sample in X:
            predictions.append(self.forward(sample))
        return np.array(predictions)

Error visualization. Training error should decrease.

In [None]:
    def loss_epochs(self):
        # Return the evolution of the training loss and the validation loss
        return self.train_losses, self.val_losses

In [12]:
    def plot_errors(self):
        # Plot training and validation losses
        epochs = np.arange(1, len(self.train_losses) + 1)
        plt.figure(figsize=(10, 6))
        plt.plot(epochs, self.train_losses, label="Training Loss", marker='o')
        plt.plot(epochs, self.val_losses, label="Validation Loss", marker='o')
        plt.xlabel("Epochs")
        plt.ylabel("Mean Squared Error (MSE)") # Loss
        plt.legend()
        plt.title("Training and Validation Loss Over Epochs")
        plt.grid()
        plt.show()

### Example of usage

In [None]:
if __name__ == "__main__":
    # Define network architecture and parameters
    layers = [4, 9, 5, 1]
    epochs = 100
    learning_rate = 0.01
    momentum = 0.9
    activation_function = 'sigmoid'  # Options: 'sigmoid', 'relu', 'tanh', 'linear'
    validation_split = 0.2  # 20% of data used for test

    nn = NeuralNet(layers, epochs, learning_rate, momentum, activation_function, validation_split)

    # Example synthetic dataset
    X = np.random.rand(100, 4)  # 100 samples, 4 features
    y = np.random.rand(100, 1)  # 100 target values

    nn.fit(X, y)
    predictions = nn.predict(X)

    # Visualize training and validation errors
    nn.plot_errors()