# Neural Network (MLP)
This Notebooks shows the implementation of a Neural Network fully vectorized including backpropagation and inference

In [1]:
import numpy as np
import pandas as pd
import matplotlib as plt

# Activation Functions
Non-linear activation functions allow neural networks to learn complex patterns in data.

## Linear
No transformation applied. Used for regression output layers.

$$f(x) = x$$

## ReLU
Most popular for hidden layers. Computationally efficient and helps avoid vanishing gradients.

$$f(x) = \max(0, x)$$

## Tanh
Squashes values to (-1, 1). Zero-centered, often better than sigmoid for hidden layers.

$$f(x) = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$$

## Sigmoid
Squashes values to (0, 1). Used for binary classification output layers.

$$f(x) = \frac{1}{1 + e^{-x}}$$

## Softmax
Used for multi-class classification output layers. Converts logits to probabilities.
$$f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}$$

**Note:** There are many more activation functions available (e.g., Leaky ReLU, ELU, GELU, Swish, etc.), each with their own characteristics and use cases.



In [2]:
# Linear doesn't need an implementation (f(x) = x)
def relu(x):
    return np.maximum(x, 0)

def tanh(x):
    return np.tanh(x)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def softmax(x, axis=-1):
    # Subtract max along axis for numerical stability
    x_shifted = x - np.max(x, axis=axis, keepdims=True)
    exps = np.exp(x_shifted)
    return exps / np.sum(exps, axis=axis, keepdims=True)