# Les 11: Alles Samen - Een Complete Neural Network Library

**Mathematical Foundations - IT & Artificial Intelligence**

---

## 11.0 Recap: De Reis Tot Nu Toe

We hebben een indrukwekkende reis gemaakt door de wiskunde achter deep learning:

### Deel 1: Lineaire Algebra
- **Vectoren en matrices**: data representatie
- **Matrixvermenigvuldiging**: de forward pass
- **Lineaire transformaties**: wat lagen doen

### Deel 2: Calculus
- **Afgeleiden**: hoe verandering te meten
- **Gradient descent**: parameters optimaliseren
- **Backpropagation**: efficiënt gradiënten berekenen

### Deel 3: Statistiek
- **Kansverdelingen**: output interpretatie
- **Verwachtingswaarde/variantie**: batch norm, initialisatie
- **Maximum Likelihood**: waarom onze loss functies werken

Nu is het tijd om alles samen te brengen in een **complete, werkende neural network library**!

## 11.1 Leerdoelen

Na deze les kun je een complete neural network library from scratch bouwen. Je begrijpt hoe alle wiskundige concepten samenwerken. Je kunt een flexibel netwerk ontwerpen met verschillende lagen. Je kunt het netwerk trainen en evalueren op echte data.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from abc import ABC, abstractmethod

np.set_printoptions(precision=4, suppress=True)
np.random.seed(42)

print("Libraries geladen!")

## 11.2 De Architectuur

Onze library zal bestaan uit:

1. **Layer** (abstracte klasse): interface voor alle lagen
2. **Linear**: lineaire transformatie (Les 2-4)
3. **Activations**: ReLU, Sigmoid, Tanh, Softmax (Les 5)
4. **BatchNorm**: normalisatie (Les 9)
5. **Loss functions**: MSE, CrossEntropy (Les 10)
6. **Optimizers**: SGD, SGD+Momentum (Les 6)
7. **Sequential**: container voor het netwerk

Dit is vergelijkbaar met hoe PyTorch en Keras werken!

## 11.3 Layer Base Class

In [None]:
class Layer(ABC):
    """Abstracte base class voor alle lagen."""
    
    def __init__(self):
        self.params = {}  # Leerbare parameters
        self.grads = {}   # Gradiënten van parameters
        self.training = True
    
    @abstractmethod
    def forward(self, x):
        """Forward pass: bereken output gegeven input."""
        pass
    
    @abstractmethod
    def backward(self, dout):
        """Backward pass: bereken gradiënten."""
        pass
    
    def __call__(self, x):
        return self.forward(x)
    
    def train(self):
        self.training = True
    
    def eval(self):
        self.training = False

## 11.4 Linear Layer

De kern van neurale netwerken: z = Wx + b

**Forward**: z = X @ W + b

**Backward**:
- dW = X.T @ dout
- db = sum(dout)
- dX = dout @ W.T

In [None]:
class Linear(Layer):
    """Lineaire laag: z = Wx + b"""
    
    def __init__(self, in_features, out_features, init='he'):
        super().__init__()
        
        # Initialisatie (Les 9)
        if init == 'he':
            std = np.sqrt(2.0 / in_features)
        elif init == 'xavier':
            std = np.sqrt(2.0 / (in_features + out_features))
        else:
            std = 0.01
        
        self.params['W'] = np.random.randn(in_features, out_features) * std
        self.params['b'] = np.zeros(out_features)
    
    def forward(self, x):
        self.x = x  # Cache voor backward
        return x @ self.params['W'] + self.params['b']
    
    def backward(self, dout):
        n = self.x.shape[0]
        
        # Gradiënten voor parameters
        self.grads['W'] = self.x.T @ dout / n
        self.grads['b'] = np.mean(dout, axis=0)
        
        # Gradiënt voor input (naar vorige laag)
        return dout @ self.params['W'].T

# Test
linear = Linear(3, 2)
x = np.random.randn(4, 3)  # 4 samples, 3 features
out = linear.forward(x)
print(f"Input shape: {x.shape}")
print(f"Output shape: {out.shape}")
print(f"W shape: {linear.params['W'].shape}")

## 11.5 Activation Functions

Activatiefuncties introduceren non-lineariteit.

In [None]:
class ReLU(Layer):
    """ReLU: f(x) = max(0, x)"""
    
    def forward(self, x):
        self.mask = (x > 0)
        return np.maximum(0, x)
    
    def backward(self, dout):
        return dout * self.mask


class Sigmoid(Layer):
    """Sigmoid: f(x) = 1 / (1 + e^-x)"""
    
    def forward(self, x):
        self.out = 1 / (1 + np.exp(-np.clip(x, -500, 500)))
        return self.out
    
    def backward(self, dout):
        return dout * self.out * (1 - self.out)


class Tanh(Layer):
    """Tanh: f(x) = tanh(x)"""
    
    def forward(self, x):
        self.out = np.tanh(x)
        return self.out
    
    def backward(self, dout):
        return dout * (1 - self.out ** 2)


class LeakyReLU(Layer):
    """Leaky ReLU: f(x) = x if x > 0 else alpha * x"""
    
    def __init__(self, alpha=0.01):
        super().__init__()
        self.alpha = alpha
    
    def forward(self, x):
        self.x = x
        return np.where(x > 0, x, self.alpha * x)
    
    def backward(self, dout):
        return dout * np.where(self.x > 0, 1, self.alpha)

In [None]:
# Visualiseer activatiefuncties
x = np.linspace(-3, 3, 100)

activations = {
    'ReLU': ReLU(),
    'Sigmoid': Sigmoid(),
    'Tanh': Tanh(),
    'LeakyReLU': LeakyReLU(0.1)
}

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

for ax, (name, act) in zip(axes.flatten(), activations.items()):
    y = act.forward(x)
    ax.plot(x, y, 'b-', linewidth=2, label='f(x)')
    
    # Toon ook de afgeleide
    dy = act.backward(np.ones_like(x))
    ax.plot(x, dy, 'r--', linewidth=2, label="f'(x)")
    
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_title(name)
    ax.legend()
    ax.grid(True, alpha=0.3)
    ax.axhline(y=0, color='k', linewidth=0.5)
    ax.axvline(x=0, color='k', linewidth=0.5)

plt.tight_layout()
plt.show()

## 11.6 Batch Normalization

In [None]:
class BatchNorm(Layer):
    """Batch Normalization (Les 9)"""
    
    def __init__(self, n_features, momentum=0.9, epsilon=1e-5):
        super().__init__()
        self.momentum = momentum
        self.epsilon = epsilon
        
        # Leerbare parameters
        self.params['gamma'] = np.ones(n_features)
        self.params['beta'] = np.zeros(n_features)
        
        # Running statistics
        self.running_mean = np.zeros(n_features)
        self.running_var = np.ones(n_features)
    
    def forward(self, x):
        if self.training:
            # Batch statistics
            self.mu = np.mean(x, axis=0)
            self.var = np.var(x, axis=0)
            
            # Update running statistics
            self.running_mean = self.momentum * self.running_mean + (1 - self.momentum) * self.mu
            self.running_var = self.momentum * self.running_var + (1 - self.momentum) * self.var
        else:
            self.mu = self.running_mean
            self.var = self.running_var
        
        # Normalize
        self.x_centered = x - self.mu
        self.std = np.sqrt(self.var + self.epsilon)
        self.x_norm = self.x_centered / self.std
        
        # Scale and shift
        self.x = x
        return self.params['gamma'] * self.x_norm + self.params['beta']
    
    def backward(self, dout):
        n = dout.shape[0]
        
        # Gradiënten voor gamma en beta
        self.grads['gamma'] = np.sum(dout * self.x_norm, axis=0)
        self.grads['beta'] = np.sum(dout, axis=0)
        
        # Gradiënt voor x
        dx_norm = dout * self.params['gamma']
        dvar = np.sum(dx_norm * self.x_centered * -0.5 * (self.var + self.epsilon)**(-1.5), axis=0)
        dmu = np.sum(dx_norm * -1 / self.std, axis=0) + dvar * np.mean(-2 * self.x_centered, axis=0)
        dx = dx_norm / self.std + dvar * 2 * self.x_centered / n + dmu / n
        
        return dx

## 11.7 Loss Functions

In [None]:
class Loss(ABC):
    """Abstracte base class voor loss functies."""
    
    @abstractmethod
    def forward(self, y_pred, y_true):
        pass
    
    @abstractmethod
    def backward(self):
        pass
    
    def __call__(self, y_pred, y_true):
        return self.forward(y_pred, y_true)


class MSELoss(Loss):
    """Mean Squared Error: L = (1/n) Σ (y_pred - y_true)²"""
    
    def forward(self, y_pred, y_true):
        self.y_pred = y_pred
        self.y_true = y_true
        return np.mean((y_pred - y_true) ** 2)
    
    def backward(self):
        n = self.y_pred.shape[0]
        return 2 * (self.y_pred - self.y_true) / n


class CrossEntropyLoss(Loss):
    """Cross-Entropy Loss met ingebouwde softmax."""
    
    def forward(self, logits, y_true):
        self.y_true = y_true
        n = logits.shape[0]
        
        # Softmax
        exp_logits = np.exp(logits - np.max(logits, axis=1, keepdims=True))
        self.probs = exp_logits / np.sum(exp_logits, axis=1, keepdims=True)
        
        # Cross-entropy
        correct_logprobs = -np.log(self.probs[np.arange(n), y_true] + 1e-10)
        return np.mean(correct_logprobs)
    
    def backward(self):
        n = self.probs.shape[0]
        grad = self.probs.copy()
        grad[np.arange(n), self.y_true] -= 1
        return grad / n


class BinaryCrossEntropyLoss(Loss):
    """Binary Cross-Entropy Loss."""
    
    def forward(self, y_pred, y_true):
        self.y_pred = np.clip(y_pred, 1e-10, 1 - 1e-10)
        self.y_true = y_true
        return -np.mean(y_true * np.log(self.y_pred) + (1 - y_true) * np.log(1 - self.y_pred))
    
    def backward(self):
        n = self.y_pred.shape[0]
        return (self.y_pred - self.y_true) / (self.y_pred * (1 - self.y_pred) * n)

## 11.8 Optimizers

In [None]:
class Optimizer(ABC):
    """Abstracte base class voor optimizers."""
    
    def __init__(self, layers, lr=0.01):
        self.layers = layers
        self.lr = lr
    
    @abstractmethod
    def step(self):
        pass
    
    def zero_grad(self):
        for layer in self.layers:
            layer.grads = {}


class SGD(Optimizer):
    """Stochastic Gradient Descent."""
    
    def __init__(self, layers, lr=0.01, momentum=0, weight_decay=0):
        super().__init__(layers, lr)
        self.momentum = momentum
        self.weight_decay = weight_decay
        self.velocity = {}
    
    def step(self):
        for i, layer in enumerate(self.layers):
            for name, param in layer.params.items():
                if name not in layer.grads:
                    continue
                    
                key = (i, name)
                grad = layer.grads[name]
                
                # Weight decay (L2 regularisatie)
                if self.weight_decay > 0:
                    grad = grad + self.weight_decay * param
                
                # Momentum
                if self.momentum > 0:
                    if key not in self.velocity:
                        self.velocity[key] = np.zeros_like(param)
                    self.velocity[key] = self.momentum * self.velocity[key] - self.lr * grad
                    layer.params[name] += self.velocity[key]
                else:
                    layer.params[name] -= self.lr * grad


class Adam(Optimizer):
    """Adam optimizer."""
    
    def __init__(self, layers, lr=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
        super().__init__(layers, lr)
        self.beta1 = beta1
        self.beta2 = beta2
        self.epsilon = epsilon
        self.m = {}  # First moment
        self.v = {}  # Second moment
        self.t = 0   # Timestep
    
    def step(self):
        self.t += 1
        
        for i, layer in enumerate(self.layers):
            for name, param in layer.params.items():
                if name not in layer.grads:
                    continue
                
                key = (i, name)
                grad = layer.grads[name]
                
                # Initialize moments
                if key not in self.m:
                    self.m[key] = np.zeros_like(param)
                    self.v[key] = np.zeros_like(param)
                
                # Update moments
                self.m[key] = self.beta1 * self.m[key] + (1 - self.beta1) * grad
                self.v[key] = self.beta2 * self.v[key] + (1 - self.beta2) * grad**2
                
                # Bias correction
                m_hat = self.m[key] / (1 - self.beta1**self.t)
                v_hat = self.v[key] / (1 - self.beta2**self.t)
                
                # Update parameters
                layer.params[name] -= self.lr * m_hat / (np.sqrt(v_hat) + self.epsilon)

## 11.9 Sequential Model

In [None]:
class Sequential:
    """Container voor sequentiële lagen."""
    
    def __init__(self, layers):
        self.layers = layers
    
    def forward(self, x):
        for layer in self.layers:
            x = layer.forward(x)
        return x
    
    def backward(self, dout):
        for layer in reversed(self.layers):
            dout = layer.backward(dout)
        return dout
    
    def __call__(self, x):
        return self.forward(x)
    
    def train(self):
        for layer in self.layers:
            layer.train()
    
    def eval(self):
        for layer in self.layers:
            layer.eval()
    
    def parameters(self):
        """Return alle lagen met parameters."""
        return [layer for layer in self.layers if layer.params]

## 11.10 Alles Samen: MNIST Classifier

In [None]:
# Laad MNIST
from sklearn.datasets import fetch_openml

print("MNIST laden...")
mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')
X, y = mnist.data / 255.0, mnist.target.astype(int)

# Train/test split
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]

print(f"Training: {X_train.shape}, Test: {X_test.shape}")

In [None]:
# Bouw het netwerk
model = Sequential([
    Linear(784, 256),
    BatchNorm(256),
    ReLU(),
    Linear(256, 128),
    BatchNorm(128),
    ReLU(),
    Linear(128, 10)
])

# Loss en optimizer
criterion = CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=0.001)

print("Model architectuur: 784 → 256 → 128 → 10")
print(f"Aantal lagen: {len(model.layers)}")

In [None]:
def accuracy(model, X, y):
    """Bereken accuracy."""
    model.eval()
    logits = model(X)
    preds = np.argmax(logits, axis=1)
    model.train()
    return np.mean(preds == y)

# Training loop
batch_size = 128
n_epochs = 10
n_batches = len(X_train) // batch_size

train_losses = []
test_accs = []

print("Training starten...\n")

for epoch in range(n_epochs):
    model.train()
    
    # Shuffle
    idx = np.random.permutation(len(X_train))
    X_shuffled = X_train[idx]
    y_shuffled = y_train[idx]
    
    epoch_loss = 0
    
    for batch in range(n_batches):
        start = batch * batch_size
        end = start + batch_size
        
        X_batch = X_shuffled[start:end]
        y_batch = y_shuffled[start:end]
        
        # Forward
        logits = model(X_batch)
        loss = criterion(logits, y_batch)
        epoch_loss += loss
        
        # Backward
        optimizer.zero_grad()
        dout = criterion.backward()
        model.backward(dout)
        
        # Update
        optimizer.step()
    
    # Evalueer
    avg_loss = epoch_loss / n_batches
    test_acc = accuracy(model, X_test, y_test)
    
    train_losses.append(avg_loss)
    test_accs.append(test_acc)
    
    print(f"Epoch {epoch+1:2d}: Loss = {avg_loss:.4f}, Test Acc = {test_acc:.4f}")

print(f"\nFinale test accuracy: {test_accs[-1]*100:.2f}%")

In [None]:
# Visualiseer training
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(train_losses, 'b-', linewidth=2)
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].set_title('Training Loss')
axes[0].grid(True, alpha=0.3)

axes[1].plot(test_accs, 'r-', linewidth=2)
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].set_title('Test Accuracy')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Visualiseer voorspellingen
model.eval()

fig, axes = plt.subplots(2, 5, figsize=(14, 6))
indices = np.random.choice(len(X_test), 10, replace=False)

for ax, idx in zip(axes.flatten(), indices):
    img = X_test[idx].reshape(28, 28)
    logits = model(X_test[idx:idx+1])
    pred = np.argmax(logits)
    true = y_test[idx]
    
    ax.imshow(img, cmap='gray')
    color = 'green' if pred == true else 'red'
    ax.set_title(f'Pred: {pred}, True: {true}', color=color)
    ax.axis('off')

plt.suptitle('Model Voorspellingen', fontsize=14)
plt.tight_layout()
plt.show()

## 11.11 Samenvatting

We hebben een complete neural network library gebouwd met:

| Component | Wiskundige basis | Les |
|-----------|------------------|-----|
| Linear layer | Matrixvermenigvuldiging | 2-4 |
| Activations | Niet-lineaire functies + afgeleiden | 5 |
| BatchNorm | Gemiddelde + variantie | 9 |
| Loss functions | Maximum Likelihood | 10 |
| Optimizers | Gradient descent | 6 |
| Backprop | Kettingregel | 7 |

**Alle wiskunde uit deze cursus komt samen in een werkend neuraal netwerk!**

---

**Mathematical Foundations** | Les 11 van 12 | IT & Artificial Intelligence

---