Aufgaben Linear Networks
========================



## Imports



the common imports:



In [None]:
%matplotlib inline

from collections.abc import Callable
from typing import Tuple

import matplotlib.pyplot as plt
import numpy as np
import numpy.typing as npt
import torch
from numpy.typing import NDArray
from torch import Tensor

## Linear Regression Network



In the following exercise you should create a linear regression
model from scratch and test it on some synthetically created data:



In [None]:
def synthetic_data(
    w: Tensor, b: Tensor, num_examples: int
) -> tuple[Tensor, Tensor]:  # @save
    """Generate y = Xw + b + noise."""
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))


true_w = torch.tensor([2, -3.4])
true_b = 4.2

n_samples = 100
X, y = synthetic_data(true_w, true_b, n_samples)
K = 2

Our goal is to fit a simple regression model with Batch Gradient Descent.
We start with randomly chosen values for the weights and zero bias.
First, implement the function below.



In [None]:
torch.manual_seed(0)
w = torch.normal(0, 0.01, size=(K, 1))
b = torch.zeros(1)


def linreg(X: Tensor, w: Tensor, b: Tensor) -> Tensor:
    """The linear regression model."""
    return X @ w + b

Now we need to define the loss functions to be used:



In [None]:
def squared_loss(y_hat: Tensor, y: Tensor) -> Tensor:
    """Compute the sum of the quadratic errors"""
    return (0.5 * (y_hat - y) ** 2).sum() / y.shape[0]

Now we need to implement the training loop for Gradient Descent.
You should not use `autograd` for computing the gradient, instead
build on the closed formula presented in the lecture.



In [None]:
step = 0.1
n_epoch = 100

loss_arr = torch.zeros(n_epoch)  # to record current loss

for i in range(n_epoch):
    # 1.  Compute the prediction y_hat
    y_hat = linreg(X, w, b)

    # remember the loss for plotting it later
    loss_arr[i] = squared_loss(y_hat, y)

    # 2. Use y_hat and y to compute the gradients
    grad_w = ((y_hat - y) * X).mean(axis=0, keepdim=True).T
    grad_b = (y_hat - y).T.mean()

    # 3. Update the parameters
    w -= step * grad_w
    b -= step * grad_b

plt.semilogy(loss_arr)

## Linear networks with autograd



The goal now is to use `autograd` the compute the gradient.
You can use the same skeleton as before



In [None]:
torch.manual_seed(0)
w = torch.normal(0, 0.01, size=(K, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

step = 0.1
n_epoch = 100

loss_arr = np.zeros(n_epoch)  # to record current loss

for i in range(n_epoch):
    # 1.  Compute the prediction y_hat
    y_hat = linreg(X, w, b)

    loss = squared_loss(y_hat, y)

    # remember the loss for plotting it later
    loss_arr[i] = loss.detach().numpy()

    # 2. Use the computed loss to compute the gradients
    loss.backward()

    # 3. Update the parameters, remember to zero the gradients
    with torch.no_grad():
        w -= step * w.grad
        b -= step * b.grad
        w.grad.zero_()
        b.grad.zero_()

plt.plot(loss_arr)

## Linear classification



We  want to implement a linear network for classification.
We use the famous IRIS data set as an example.



In [None]:
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

This time the network is implemented as a class,
the only thing missing is the implementation of the softmax function,
for example
$$
\mathrm{softmax}(y_1) = \frac{e^{y_1}}{ \sum_{i=1}^p  e^{y_i} }.
$$
You have to implement it below.

In [None]:
def softmax(y: Tensor) -> Tensor:
    y_exp = y.exp()
    return y_exp / y_exp.sum(axis=1, keepdim=True)


class SoftmaxNetwork:
    def __init__(self, num_input, num_output, dtype=torch.float64) -> None:
        """
        Args:
            num_input: dimension of input space
            num_output: number of output classes
        """
        self.w = torch.randn((num_input, num_output), dtype=dtype).requires_grad_(True)
        self.b = torch.randn(num_output, dtype=dtype).requires_grad_(True)

    def forward(self, X) -> Tensor:
        """
        Args:
            X: tensor of shape (n, d)
        """
        y = X @ self.w + self.b
        return softmax(y)

Next we have to implement the cross entropy loss, it is already finished:



In [None]:
def cross_entropy(y_hat: Tensor, y: Tensor) -> Tensor:
    return (-torch.log(y_hat[range(len(y_hat)), y])).mean()

Note that this implementation does not require a one-hot-encoding for $y$
(but there is one side effect: $y$ has to be of type `torch.int64`!).

The final step is to implement a function that runs the training for us:



In [None]:
def run_training(
    net: SoftmaxNetwork,
    X: Tensor,
    y: Tensor,
    f_loss: Callable[[Tensor, Tensor], Tensor],
    n_epochs: int,
    lr=0.1,
) -> NDArray:
    """Run the training.
    Args:
       net: an instance of SoftmaxNetwork
       X, y: training data
       f_loss: the loss function
       n_epochs: number of epochs
       lr: the learning rate

    Returns:
      training loss: np.array (loss per epoch)
    """
    loss_arr = np.zeros(n_epochs)
    for i in range(n_epochs):
        net.w.requires_grad_(True)
        net.b.requires_grad_(True)
        y_hat = net.forward(X)

        loss = f_loss(y_hat, y)

        loss_arr[i] = loss.detach().numpy()

        loss.backward()
        net.b.requires_grad_(False)
        net.w.requires_grad_(False)

        net.w -= lr * net.w.grad
        net.b -= lr * net.b.grad
        net.w.grad.zero_()
        net.b.grad.zero_()

    return loss_arr

Now train the model, don't forget to cast X and y to Pytorch tensors.



In [None]:
net = SoftmaxNetwork(4, 3)
train_loss = run_training(
    net, torch.from_numpy(X), torch.from_numpy(y), cross_entropy, n_epochs=100, lr=0.1
)

plt.plot(train_loss)
plt.xlabel("epochs")
plt.ylabel("loss")
plt.title("Learning curve");

1.  Run the training several times, and observe the different learning curves.
2.  Try the same with a lower learning rate, say $lr=0.05$. Do you see any differences?

Finally check the accuracy of the model, that is the fraction of correctly predicted examples.
Of course this is on training only. If you like you can try to split the
data into train and test and evaluate your network on the test data set.
A useful function for this is `train_test_split` found in `sklearn.model_selection`.

