# 04 - Neural Networks

#### Let's get to the real deal ML - neural nets, aka Multilayer Perceptrons.

We'll **build a minimal neural net from scratch** and use it to predict a single number from a synthetic dataset.  

**Architecture**: 2 input vars, 1 layer of 4 hidden neurons, one continuous output

## What we'll do
0. Setup & make a tiny synthetic dataset (2 features, 1 target)  
1. Define the loss function (**MSE**)  
2. Define the **forward pass** (with bias via input augmentation)  
3**Backprop**: derive gradients and implement training loop (next, after you confirm)

## 0. Setup & synthetic data

We'll create a small nonlinear target so the hidden layer is actually useful.

In [10]:
import numpy as np

# Reproducibility
rng = np.random.default_rng(42)

# Create synthetic data: N samples, 2 features
N = 200
x1 = rng.uniform(-2.0, 2.0, size=N)
x2 = rng.uniform(-2.0, 2.0, size=N)
X = np.stack([x1, x2], axis=1)  # shape (N, 2)

# Nonlinear target with a bit of noise
noise = rng.normal(0.0, 0.1, size=N)
y = 1.2 * np.sin(x1) + 0.7 * x2 - 0.3 * x1 * x2 + noise
y = y.reshape(y.size, 1)  # shape (N, 1)

X.shape, y.shape

# Now, let's plot it - this is a rotatable 3-D plot, drag it around to get familiar with the data
import plotly.express as px
import pandas as pd

# Wrap into a DataFrame for nicer plotting
df = pd.DataFrame({"x1": X[:,0], "x2": X[:,1], "y": y.squeeze()})

fig = px.scatter_3d(
    df, x="x1", y="x2", z="y",
    color="y", opacity=0.7,
    title="Interactive 3D view of the dataset"
)
fig.update_traces(marker=dict(size=4))
fig.show()



## 1. Let's Define the loss
### Since we are preicting a single number, out good old friend make sense

For a single example with prediction \(\hat{y}\) and true value \(y\):
$$
L = \tfrac{1}{2}(y - \hat{y})^2
$$

For a batch of \(N\) examples:
$$
L = \frac{1}{2N}\sum_{i=1}^{N} \left(y^{(i)} - \hat{y}^{(i)}\right)^2
$$

We include $ (\tfrac{1}{2}) $ so the derivative of the square is clean.

In [18]:
def mse_loss(y_pred: np.ndarray, y_true: np.ndarray) -> float:
    """
    Mean Squared Error with a 1/2 factor:
    """
    return 0.5 * np.square(y_pred - y_true).mean()

## lets test it real quick
y_pred_fake = np.array([1, 2, 3, 4, 5])
y_actual_fake = np.array([1, 2, 3, 4, 5])

assert 0.0 == mse_loss(y_pred_fake, y_actual_fake)


y_pred_fake = np.array([1, 2, 3, 4, 5])
y_actual_fake = np.array([2, 4, 6, 8, 10])

# 1 + 4 + 9 + 16 + 25 = 55 / (2 * 5) = 5.5

assert 5.5 == mse_loss(y_pred_fake, y_actual_fake)

print("MSE Test Success.")





MSE Test Success.


## 2. Forward pass (with biases via augmentation
We’ll absorb biases by **augmenting** inputs with a constant 1.

- Original input **X** has 2 features, so shape is `(N, 2)`.
- $N$ in this case could be the number of datapoints we run through the network at the same time
- During training N could be the number of datapoints of the entire datasets
- More commonly N would be the number of data points in the training mini-batch 
- Augmented input **X** = `[X  1]` (append a column of ones), so shape is `(N, 3)`.  
  From here on, **X** means the augmented one.

### First layer (input to hidden, 4 neurons)

- Weights `W[1]` has shape `(3, 4)`:
  3 rows = augmented inputs, 4 columns = 4 neurons in the first hidden layer

- Pre-activations:
$$
Z^{[1]} = X\, W^{[1]}
$$ 
- $Z$ is (N by 4), since $X$ is (N by 3) and we dotted it with $W^{[1]}$ which is (3 by 4)


- Activations:
$$
H = \sigma\!\left(Z^{[1]}\right)
$$

### Hidden layer augmentation

- Now since we are feeding the activations to another weight matrix, the one for the output layer, and we also want to have a bias unit for the output layer, we do the same augmentation trick for the activations output by our hidden layer
$$
H = [\,H\;\;1\,] \quad \in \mathbb{R}^{N \times 5}
$$

### Output layer (hidden to output, linear)

- Weights `W[2]` has shape `(5, 1)`.

- Predictions:
$$
\hat{y} = H\, W^{[2]} \quad \in \mathbb{R}^{N \times 1}
$$

### Sigmoid
$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

#### Shape cheat-sheet
    X: (N, 3)
    
    W[1]: (3, 4)
    
    Z[1], H: (N, 4)
    
    # Then we augment H
    
    H: (N, 5)
    
    W[2]: (5, 1)
    
    y_hat: (N, 1)

In [26]:
def augment_with_ones(X: np.ndarray) -> np.ndarray:
    '''Append a column of ones to X. If X is (N, d), returns (N, d+1).'''
    ones = np.ones((X.shape[0], 1), dtype=X.dtype)
    return np.hstack([X, ones])

X_test= np.array([
    [1, 2],
    [3, 4],
    [5, 6]
])

# print("Testing augment_with_ones")
# print(f"X_test.shape: {X_test.shape}")
# X_test_augmented = augment_with_ones(X_test)
# print(f"X_test_augmented.shape: {X_test_augmented.shape}")
# print("========")
# print("X_test:")
# print(X_test)
# print("========")
# print("X_test_augmented:")
# print(X_test_augmented)


def sigmoid(z: np.ndarray) -> np.ndarray:
    return 1.0 / (1.0 + np.exp(-z))

# print("========")
# H_sigmoid_test = np.array([-10, -5, -1, 0.1, 0.25, 0.5, 1, 5])
# print("Sigmoid test for: ", H_sigmoid_test)
# np.set_printoptions(suppress=True)
# print(sigmoid(H_sigmoid_test))

def init_params(rng: np.random.Generator = np.random.default_rng(0), scale: float = 0.1):
    '''Initialize weights for 2 to 4 to 1 MLP with bias via augmentation.'''
    W1 = rng.normal(0.0, scale, size=(3, 4))  # (in+1=3) x (hidden=4)
    W2 = rng.normal(0.0, scale, size=(5, 1))  # (hidden+1=5) x (out=1)
    return {"W1": W1, "W2": W2}

# print("========")
# print("Test init_params:")
# print(init_params())

def forward(X: np.ndarray, params: dict):
    '''
    Forward pass.
    X: shape (N, 2)
    Returns: y_hat, cache
    '''
    W1, W2 = params["W1"], params["W2"]
    X_aug = augment_with_ones(X)           # (N, 3)
    Z1 = X_aug @ W1                        # (N, 3)
    H = sigmoid(Z1)                        # (N, 3)
    H_aug = augment_with_ones(H)           # (N, 4)
    y_hat = H_aug @ W2                     # (N, 1) linear output
    cache = {"X_aug": X_aug, "Z1": Z1, "H": H, "H_aug": H_aug, "y_hat": y_hat}
    return y_hat, cache

# Try a dry run
params = init_params(rng)
y_hat, cache = forward(X, params)
print("Shapes -> X:", X.shape, "| W1:", params["W1"].shape, "| W2:", params["W2"].shape, "| y_hat:", y_hat.shape)
print("Initial loss (untrained):", mse_loss(y, y_hat))

Shapes -> X: (200, 2) | W1: (3, 4) | W2: (5, 1) | y_hat: (200, 1)
Initial loss (untrained): 0.9068401815446941
