# Neural Network (MLP)
This Notebooks shows the implementation of a Neural Network fully vectorized including backpropagation and inference

In [1]:
import numpy as np
import pandas as pd
import matplotlib as plt

# Activation Functions
Non-linear activation functions allow neural networks to learn complex patterns in data.

## Linear
No transformation applied. Used for regression output layers.

$$f(x) = x$$

## ReLU
Most popular for hidden layers. Computationally efficient and helps avoid vanishing gradients.

$$f(x) = \max(0, x)$$

## Tanh
Squashes values to (-1, 1). Zero-centered, often better than sigmoid for hidden layers.

$$f(x) = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$$

## Sigmoid
Squashes values to (0, 1). Used for binary classification output layers.

$$f(x) = \frac{1}{1 + e^{-x}}$$

## Softmax
Used for multi-class classification output layers. Converts logits to probabilities.
$$f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}$$

**Note:** There are many more activation functions available (e.g., Leaky ReLU, ELU, GELU, Swish, etc.), each with their own characteristics and use cases.



In [2]:
# Linear doesn't need an implementation (f(x) = x)
def relu(x):
    return np.maximum(x, 0)

def tanh(x):
    return np.tanh(x)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def softmax(x, axis=-1):
    # Subtract max along axis for numerical stability
    x_shifted = x - np.max(x, axis=axis, keepdims=True)
    exps = np.exp(x_shifted)
    return exps / np.sum(exps, axis=axis, keepdims=True)

# Loss Functions

## Mean Squared Error (MSE)

Used for regression tasks, it measures the average squared difference between actual and predicted values:
$$ MSE = \frac{1}{n} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 $$

## Cross Entropy Loss

Used for classification tasks to compare predicted probabilities with true labels.

### Binary Cross Entropy

For two-class problems:
$$ L_{BCE} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)] $$
$y_i$ = true label (0 or 1), $\hat{y}_i$ = predicted probability.

### Categorical Cross Entropy

For multi-class problems where each sample belongs to one class:
$$ L_{CCE} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c}) $$

* $C$ = number of classes
* $y_{i,c}$ = 1 if correct class, else 0
* $\hat{y}_{i,c}$ = predicted probability for class $c$


In [None]:
def mse(y, y_hat):
    return np.mean((y-y_hat) ** 2)

def binary_cross_entropy(y, y_hat):
    return - np.mean(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))

def categorical_cross_entropy(y, y_hat):
    # Multiply y * log(y_hat) keeps only the correct class log-probabilities
    # axis=1 sums across classes for each sample, then np.mean averages over the batch
    return -np.mean(np.suxm(y * np.log(y_hat)), axis=1)