# Lesson 13 - Neural Networks and Backprop


## Objectives
- Implement a 1-hidden-layer neural network.
- Carry out forward and backward propagation.
- Visualize decision boundaries.


## From the notes

**Neural networks**
- Layered composition of affine transforms and nonlinearities.
- Backprop computes gradients efficiently.

_TODO: Align the notation with the CS229 main notes PDF._


## Intuition
Neural networks stack linear transforms and nonlinearities to form flexible function approximators. Backprop efficiently computes gradients for all parameters.


## Data
We create a 2D dataset that requires nonlinear separation.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

X = np.random.uniform(-1.5, 1.5, size=(200, 2))
y = (X[:,0] * X[:,1] > 0).astype(int)

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def train_nn(X, y, hidden=8, lr=0.5, iters=2000):
    m, n = X.shape
    W1 = np.random.randn(n, hidden) * 0.5
    b1 = np.zeros(hidden)
    W2 = np.random.randn(hidden, 1) * 0.5
    b2 = np.zeros(1)
    y = y.reshape(-1, 1)
    for _ in range(iters):
        z1 = X @ W1 + b1
        a1 = np.tanh(z1)
        z2 = a1 @ W2 + b2
        a2 = sigmoid(z2)
        dz2 = a2 - y
        dW2 = a1.T @ dz2 / m
        db2 = dz2.mean(axis=0)
        dz1 = dz2 @ W2.T * (1 - np.tanh(z1)**2)
        dW1 = X.T @ dz1 / m
        db1 = dz1.mean(axis=0)
        W1 -= lr * dW1
        b1 -= lr * db1
        W2 -= lr * dW2
        b2 -= lr * db2
    return W1, b1, W2, b2

W1, b1, W2, b2 = train_nn(X, y)


## Experiments


In [None]:
def predict(X):
    a1 = np.tanh(X @ W1 + b1)
    a2 = sigmoid(a1 @ W2 + b2)
    return (a2 > 0.5).astype(int).ravel()

preds = predict(X)
(preds == y).mean()


## Visualizations


In [None]:
plt.figure(figsize=(6,4))
plt.scatter(X[:,0], X[:,1], c=y, cmap="coolwarm", alpha=0.7)
plt.title("Nonlinear dataset")
plt.xlabel("x1")
plt.ylabel("x2")
plt.show()

grid_x1 = np.linspace(-1.5, 1.5, 120)
grid_x2 = np.linspace(-1.5, 1.5, 120)
xx1, xx2 = np.meshgrid(grid_x1, grid_x2)
grid = np.c_[xx1.ravel(), xx2.ravel()]
grid_preds = predict(grid).reshape(xx1.shape)
plt.figure(figsize=(6,4))
plt.contourf(xx1, xx2, grid_preds, levels=2, cmap="coolwarm", alpha=0.4)
plt.scatter(X[:,0], X[:,1], c=y, cmap="coolwarm", alpha=0.7)
plt.title("Neural network decision boundary")
plt.xlabel("x1")
plt.ylabel("x2")
plt.show()


## Takeaways
- Backprop efficiently computes gradients for all weights.
- Hidden layers enable nonlinear decision boundaries.


## Explain it in an interview
- Explain the forward and backward pass in a simple neural network.
- Describe why activation functions are necessary.


## Exercises
- Try ReLU instead of tanh and compare performance.
- Increase hidden units and observe overfitting.
- Add L2 regularization to the loss.
