<h1>Coding Session #2 - Neuronale Netze</h1>

Diese Datei ist ein Jupyter Notebook. Dieses besteht aus Textblöcken im Markdown Format und ausführbaren Code-Zellen. Diese erkennen Sie an dem kleinen Pfeilsymbol links daneben.

Führen Sie bitte als erstes folgende Zelle aus, um sicherzustellen, dass die benötigten Bibliotheken installiert sind:

In [None]:
!pip install -r requirements.txt

## 1 Neuronales Netz

Ein neuronales Netz besteht aus mehreren Schichten. Jede Schicht (Layer) beinhaltet mindestens ein Neuron.

<img src="images/network_layer.png" width="300px"></img>

Jeder Layer sagt für einen Input $X$ einen Output $\hat{Y}$ vorher: $\hat{Y}=W\cdot X+b$

In einem neuronalen Netz verarbeitet jeder Layer den Output des davorliegenden Layers (bzw. der erste Layer den Netzwerk Input $X$), wobei $H$ sogenannte Hidden Layer sind.

<img src="images/neural_network.png" width="300px"></img>

Damit ergibt sich die gesamte Netzwerkfunktion als

$H=W_h\cdot X+b_h$

$\hat{Y}=W_{out}\cdot H + b_{out}$

Die Backpropagation erfolgt Layerweise. 

In [None]:
import numpy as np

# Base Module Class -> all layers and activations will inherit from this
class Module:
    """Base class for all modules (layers, activations, etc.)"""
    def forward(self, input:np.ndarray):
        raise NotImplementedError

    def backward(self, grad_output, eta = None):
        raise NotImplementedError

# Layers  
class DenseLayer(Module):
    """Fully connected layer"""
    def __init__(self, input_size, output_size):
        self.weights = np.random.randn(input_size, output_size) * 0.1
        self.biases = np.zeros((1, output_size))
    
    def forward(self, input):
        self.input = input.copy()
        return np.dot(input, self.weights) + self.biases
    
    def backward(self, grad_output, eta):
        grad_input = np.dot(grad_output, self.weights.T)
        grad_weights = np.dot(self.input.T, grad_output)
        grad_biases = np.sum(grad_output, axis=0, keepdims=True)
        
        # Update weights and biases
        self.weights -= eta * grad_weights
        self.biases -= eta * grad_biases
        
        return grad_input

# Activation Functions 
class ReLU(Module):
    """ReLU activation function"""
    def forward(self, input):
        self.input = input
        return np.maximum(0, input)
    
    def backward(self, grad_output):
        grad_input = grad_output.copy()
        grad_input[self.input <= 0] = 0
        return grad_input
    
class Sigmoid(Module):
    """Sigmoid activation function"""
    def forward(self, input):
        self.output = 1 / (1 + np.exp(-input))
        return self.output
    
    def backward(self, grad_output):
        return grad_output * (self.output * (1 - self.output))
    
class Tanh(Module):
    """Tanh activation function"""
    def forward(self, input):
        self.output = np.tanh(input)
        return self.output
    
    def backward(self, grad_output):
        return grad_output * (1 - self.output ** 2)

# Loss Functions
class MSELoss:
    """Mean Squared Error Loss"""
    def forward(self, prediction, target):
        self.prediction = prediction
        self.target = target
        return np.mean((prediction - target) ** 2)
    
    def backward(self, prediction, target):
        return 2 * (prediction - target) / target.size

# Neural Network Class
class NeuralNetwork(Module):
    """Neural Network class to manage layers and training"""
    def __init__(
            self,
            layers:list[Module]
    ):
        self.layers = layers

    def forward(self, input):
        for layer in self.layers:
            input = layer.forward(input)
        return input
    
    def backward(self, grad_output, eta):
        for layer in reversed(self.layers):
            if isinstance(layer, DenseLayer):
                grad_output = layer.backward(grad_output, eta)
            else:
                grad_output = layer.backward(grad_output)
        return grad_output

In [None]:
from utilities.data import generate_sinusoidal_data                 # Custom module for generating sinusoidal data
from utilities.visualization import plot_data_points, plot_series   # Custom module for visualizing 2D data

N               = 1000      # Number of data points  
EPOCHS          = 100       # Number of training epochs
LEARNING_RATE   = 0.01#0.01      # Learning Rate
BATCH_SIZE      = 4         # Batch size for training

# Generate sinusoidal data
X, Y = generate_sinusoidal_data(
    num_samples = N,
    noise       = 0.05
)

# plot data points
plot_data_points(X, Y)

model = NeuralNetwork(
    layers = [
        DenseLayer(input_size=1, output_size=20),
        Tanh(),
        DenseLayer(input_size=20, output_size=20),
        Tanh(),
        DenseLayer(input_size=20, output_size=20),
        Tanh(),
        DenseLayer(input_size=20, output_size=1),
    ]
    # layers = [
    #     DenseLayer(input_size=1, output_size=20),
    #     Sigmoid(),
    #     DenseLayer(input_size=20, output_size=20),
    #     Sigmoid(),
    #     DenseLayer(input_size=20, output_size=20),
    #     Sigmoid(),
    #     DenseLayer(input_size=20, output_size=1),
    # ]
)
loss_fn = MSELoss()

losses = []

for epoch in range(1, EPOCHS+1):
    loss_epoch = 0.
    indices = np.random.choice(range(N), N, replace=False)
    Y_hat = np.zeros_like(Y)

    for batch in range(0, N, BATCH_SIZE):
        batch_indices = indices[batch:batch + BATCH_SIZE]
        X_batch = X[batch_indices]
        Y_batch = Y[batch_indices]

        # Forward pass
        Y_hat_batch = model.forward(X_batch)

        # Compute loss
        loss = loss_fn.forward(Y_hat_batch, Y_batch)
        loss_epoch += loss

        # Backward pass
        grad_loss = loss_fn.backward(Y_hat_batch, Y_batch)
        model.backward(grad_loss, LEARNING_RATE)

        # store predictions
        Y_hat[batch_indices] = Y_hat_batch
    loss_epoch /= (N / BATCH_SIZE)
    losses.append(loss_epoch)

    if epoch % 10 == 0 or epoch == 1 or epoch == EPOCHS:
        plot_data_points(input=X, target=Y, prediction=Y_hat, title=f"Sinusoidal Data Fitting - Epoch {epoch}")
    print(f"Epoch {epoch}/{EPOCHS}, MSE: {loss_epoch:0.6f}")

plot_series(data=losses, title="Training Loss over Epochs", xlabel="Epochs", ylabel="MSE Loss")
print("Training complete.")

**Einfluss verschiedener Faktoren auf das Training**<br>
1. Entfernen Sie die Aktivierungsfunktionen zwischen den Layern. Was passiert?
2. Ersetzen Sie die `ReLU`Aktivierungen durch `Tanh`und anschließen durch `Sigmoid`. Welchen Effekt hat dies auf das Ergebnis (`MSE` Werte vergleichen)?
3. Setzen Sie die `BATCHSIZE=N`, um während des Trainings nicht batchweise, sondern auf dem gesamten Datensatz zu optimieren. Was passiert?