# 3.1:  Mean Squared Error (MSE) Function

The **error function** or **loss function** measures the difference between the network’s predictions and the actual values (labels). The objective during training is to **minimize the error** so that the network learns to improve its predictions.

### MSE Formula
For a network that makes a prediction $y_{\text{pred}}$ and an actual value $y_{\text{real}}$, the **Mean Squared Error (MSE)** function is defined as:
$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_{\text{pred}, i} - y_{\text{real}, i})^2
$$

where:

- $n$ is the number of examples.
- $y_{\text{pred}, i}$ is the network’s prediction for the $i$-th example.
- $y_{\text{real}, i}$ is the actual value or label corresponding to the $i$-th example.

### MSE Calculation Example
Suppose we have:

- **Network predictions:** $[0.8, 0.6, 0.2]$
- **Actual values:** $[1.0, 0.0, 0.0]$

To calculate the MSE:

1. Calculate the squared difference between each prediction and its corresponding actual value.
2. Calculate the average of these values.

The MSE in this case would be:
$$
\text{MSE} = \frac{1}{3} \left((0.8 - 1.0)^2 + (0.6 - 0.0)^2 + (0.2 - 0.0)^2\right) = 0.2
$$

### MSE Implementation in Python:

In [2]:
import numpy as np

def mse(y_pred, y_real):
    return np.mean((y_pred - y_real) ** 2)

y_pred = np.array([0.8, 0.6, 0.2])
y_real = np.array([1.0, 0.0, 0.0])

error = mse(y_pred, y_real)
print("Mean Squared Error (MSE):", error)


Mean Squared Error (MSE): 0.14666666666666664


# 3.2: Cross-Entropy for Classification

For multiclass classification problems, the **Cross-Entropy** function is an alternative to MSE, especially useful for correctly classifying each class.

### Cross-Entropy Formula
The cross-entropy between a prediction $y_{\text{pred}}$ and an actual value $y_{\text{real}}$ is defined as:
$$
\text{Cross-Entropy} = - \sum_{i=1}^{n} y_{\text{real}, i} \cdot \log(y_{\text{pred}, i})
$$

where:

- $y_{\text{pred}, i}$ is the probability predicted by the network for class $i$,
- $y_{\text{real}, i}$ is the actual label of class $i$ (usually 1 for the correct class and 0 for the others).

### Cross-Entropy Calculation Example
Suppose we have:

- **Network predictions:** $[0.7, 0.2, 0.1]$
- **Actual values:** $[1, 0, 0]$ (the correct class is the first one).

The cross-entropy is calculated as:
$$
\text{Cross-Entropy} = - [1 \cdot \log(0.7) + 0 \cdot \log(0.2) + 0 \cdot \log(0.1)] = - \log(0.7) \approx 0.357
$$

### Cross-Entropy Implementation in Python:

In [3]:
import numpy as np

# Define the cross-entropy function
def cross_entropy(y_pred, y_real):
    epsilon = 1e-15  # Avoid log(0)
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    return -np.sum(y_real * np.log(y_pred))

# Example usage
y_pred = np.array([0.7, 0.2, 0.1])
y_real = np.array([1, 0, 0])

error = cross_entropy(y_pred, y_real)
print("Cross-Entropy:", error)

Cross-Entropy: 0.35667494393873245


# 3.3: Gradient Calculation for MSE and Cross-Entropy

To optimize the network, we need to calculate the **gradient** of the error with respect to each prediction. This allows us to adjust the weights and biases to reduce the error.

### MSE Gradient
The derivative of MSE with respect to a prediction $$y_{\text{pred}}$$ is:
$$
\frac{\partial \, \text{MSE}}{\partial \, y_{\text{pred}}} = \frac{2}{n} (y_{\text{pred}} - y_{\text{real}})
$$

### Cross-Entropy Gradient
For cross-entropy, when using the **Softmax** function in the output layer, the gradient of the cross-entropy with respect to each prediction is simply the difference between the prediction and the actual value:
$$
\frac{\partial \, \text{Cross-Entropy}}{\partial \, y_{\text{pred}}} = y_{\text{pred}} - y_{\text{real}}
$$

This is due to the way Softmax and Cross-Entropy interact, which simplifies the gradient calculation, making it very efficient in multiclass classification problems.

### Gradient Implementation in Python:

In [4]:
# Gradient of MSE
def gradient_mse(y_pred, y_real):
    return 2 * (y_pred - y_real) / len(y_real)

# Gradient of Cross-Entropy (when using Softmax)
def gradient_cross_entropy(y_pred, y_real):
    return y_pred - y_real

# Example usage
grad_mse = gradient_mse(y_pred, y_real)
grad_cross_entropy = gradient_cross_entropy(y_pred, y_real)

print("Gradient of MSE:", grad_mse)
print("Gradient of Cross-Entropy:", grad_cross_entropy)

Gradient of MSE: [-0.2         0.13333333  0.06666667]
Gradient of Cross-Entropy: [-0.3  0.2  0.1]


# Practical Exercise: Add MSE and Cross-Entropy to the Neural Network

**Objective:** Extend the neural network class we previously built to calculate error using **MSE** and **Cross-Entropy**, as well as the gradient of each error function.

### Exercise Steps

1. **Add Error Functions:**
   - Define a function within the class to calculate **Mean Squared Error (MSE)**.
   - Define another function within the class to calculate **Cross-Entropy**.

2. **Calculate the Error Gradient:**
   - Add a method to calculate the **gradient** of MSE.
   - Add a method to calculate the **gradient** of Cross-Entropy, assuming the network uses **Softmax** in the output layer.

3. **Evaluate the Neural Network:**
   - Create a method that allows the use of both error functions and compares the results, using MSE for regression or continuous outputs, and Cross-Entropy for multiclass classification problems.

### One-Hot Encoding of Labels

**One-hot encoding** is a method that converts categorical labels (for example, the classes in a classification problem) into binary vectors. It is particularly useful in neural networks, as it converts each class label into a format that the network can easily interpret.

#### How Does One-Hot Encoding Work?
Imagine we have three classes in a classification problem:

- Class 0
- Class 1
- Class 2

Using one-hot encoding, each class is converted into a binary vector where only one element is 1 (indicating the class), and the rest are 0.

For example:

- Class 0: $[1, 0, 0]$
- Class 1: $[0, 1, 0]$
- Class 2: $[0, 0, 1]$

This format allows neural networks to use **error functions** like **cross-entropy** to compare predictions (probabilities for each class) with the true class.

In [5]:
import numpy as np

def relu(x):
    return np.maximum(0, x)

def softmax(x):
    exp_values = np.exp(x - np.max(x)) 
    return exp_values / np.sum(exp_values)

def one_hot_encode(label, num_classes):
    one_hot = np.zeros(num_classes)
    one_hot[label] = 1
    return one_hot

def accuracy(y_pred, y_real):
    pred_class = np.argmax(y_pred)
    true_class = np.argmax(y_real)
    return 1 if pred_class == true_class else 0

class NeuralNetworkWithErrors:
    def __init__(self):
        # Initialize weights and biases for each layer
        self.weights_hidden = np.array([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])  
        self.bias_hidden = np.array([0.1, -0.1, 0.05])                       
        self.weights_output = np.array([[0.7, 0.8, 0.9], [-0.1, -0.2, -0.3], [0.3, 0.2, 0.1]]) 
        self.bias_output = np.array([0.1, -0.1, 0.05])                       

    def forward(self, input_data):
        # Forward pass through the hidden layer
        hidden_layer_input = np.dot(self.weights_hidden, input_data) + self.bias_hidden
        hidden_layer_output = relu(hidden_layer_input)
        
        # Forward pass through the output layer with Softmax
        output_layer_input = np.dot(self.weights_output, hidden_layer_output) + self.bias_output
        output = softmax(output_layer_input)  # Final probabilities
        
        return output

    def mse(self, y_pred, y_real):
        return np.mean((y_pred - y_real) ** 2)

    def cross_entropy(self, y_pred, y_real):
        epsilon = 1e-15 
        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
        return -np.sum(y_real * np.log(y_pred))

    def gradient_mse(self, y_pred, y_real):
        return 2 * (y_pred - y_real) / len(y_real)

    def gradient_cross_entropy(self, y_pred, y_real):
        return y_pred - y_real

y_pred = np.array([0.7, 0.2, 0.1])
y_real = np.array([1, 0, 0])     
acc = accuracy(y_pred, y_real)
print("Accuracy:", acc)

network = NeuralNetworkWithErrors()

input_data = np.array([2.0, 3.0])

y_pred = network.forward(input_data)

label = 0  # Correct class is class 0
y_real = one_hot_encode(label, num_classes=3)

# Calculate the error using MSE and Cross-Entropy
error_mse = network.mse(y_pred, y_real)
error_cross_entropy = network.cross_entropy(y_pred, y_real)

# Calculate the gradients for each error function
grad_mse = network.gradient_mse(y_pred, y_real)
grad_cross_entropy = network.gradient_cross_entropy(y_pred, y_real)
acc = accuracy(y_pred, y_real)

print("Network prediction (probabilities):", y_pred)
print("Actual label (one-hot encoded):", y_real)
print("Mean Squared Error (MSE):", error_mse)
print("Cross-Entropy:", error_cross_entropy)
print("Gradient of MSE:", grad_mse)
print("Gradient of Cross-Entropy:", grad_cross_entropy)
print("Accuracy:", acc * 100)

Accuracy: 1
Network prediction (probabilities): [0.97384346 0.00231927 0.02383728]
Actual label (one-hot encoded): [1. 0. 0.]
Mean Squared Error (MSE): 0.0004192531262015088
Cross-Entropy: 0.026504708469792967
Gradient of MSE: [-0.01743769  0.00154618  0.01589152]
Gradient of Cross-Entropy: [-0.02615654  0.00231927  0.02383728]
Accuracy: 100
