 # Tutorial 08 - Machine Learning
 
 ## Dr. David C. Schedl

 Note: this tutorial is geared towards students **experienced in programming** and aims to introduce you to **Digital Imaging / Computer Vision** techniques.



## Setup
As first step, we need to import the necessary libraries. 

In [None]:
# Setup and import of libraries
import math
import random
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons, make_blobs, make_circles


# Let's set the random seed to make this reproducible (the same for everybody).
np.random.seed(1337)
random.seed(1337)

# Computing Gradients
Let's start simple with the quadratic function $f(x) = 3x^2 - 4x + 5$.

In [None]:
def f(x):
    return 3 * x**2 - 4 * x + 5

Let's plot it in the range $[-5, 5]$.

In [None]:
xs = np.arange(-5, 5, 0.25)
ys = f(xs)
plt.plot(xs, ys, label="f(x)")
plt.legend()
plt.show()

The derivative of this function is $\frac{df(x)}{dx} = 6x - 4$.

The minimum ($0 = \frac{df(x)}{dx}$) is at $x = \frac{4}{6} = \frac{2}{3}$.

Let's numerically derive the function and let's look at some values. 

In [None]:
h = 0.000001
x = -4  # test with -4, 0, 2/3 and 4
(f(x + h) - f(x)) / h

## Simple Gradient Descent

With the gradient we can now implement a simple gradient descent algorithm, which iteratively updates the value of $x$ in the direction of the negative gradient.
The parameters are the learning rate (often denoted as $\alpha$) and the number of iterations $N$.

🤔 What happens if you change the learning rate or the number of iterations?

In [None]:
# simple gradient descent

x = -4
_xs = [x]
learning_rate = 0.05
N = 10  # number of iterations
for i in range(N):
    df = (f(x + h) - f(x)) / h
    x = x - learning_rate * df
    _xs.append(x)

print(f"x' reached: {x:.3f} after {N} iterations and should be {2/3:.3f}!")
plt.plot(xs, ys, label="f(x)")  # plot the function
plt.plot(
    _xs, f(np.array(_xs)), "r.", label="x'"
)  # plot the path our gradient descent took
plt.legend()
plt.show()

## The same with PyTorch

Let's reuse the quadratic function $f(x) = 3x^2 - 4x + 5$.

PyTorch implements backpropagation. 
After calling `backward` every tensor involved in the computation has a gradient (`.grad`).

In [None]:
import torch

# test with -4, 0, 2/3 and 4
x = torch.Tensor([-4]).double()
x.requires_grad = True

# the quadratic function
f = 3 * x**2 - 4 * x + 5

print("f =", f.data.item())

f.backward()  # with backward we compute the gradients

print("---")
print("gradient x =", x.grad.item())

In [None]:
x = torch.Tensor([-4]).double()
x.requires_grad = True


# Let's use an optimizer
optimizer = torch.optim.SGD(
    [x],
    lr=learning_rate,  # learning rate
)

N = 10  # number of iterations

# optimization
for k in range(N):

    # sets all gradients to zero (this is important)
    optimizer.zero_grad()

    # compute the quadratic function
    f = 3 * x**2 - 4 * x + 5

    f.backward()

    # update x (sgd)
    optimizer.step()
    # the same as
    # x -= learning_rate * x.grad.item() # NOTE: does not run!

    print(f"step {k}, gradient x = {x.grad.item()}")

print(f"---")
print(f"x' reached: {x.data.item():.3f} after {N} iterations and should be {2/3:.3f}!")

# Recap: Naive Line Fitting (from Tutorial 07)

Let's look at a simple example of line fitting, where we try to fit a line with a simple line equation: $y = mx + b$.
We use the `scipy.optimize` package to fit the line to the data. <br>
Note that this will only work for a single line and breaks if there are multiple lines or noise in the data.

In [None]:
from scipy.optimize import curve_fit

image = np.zeros((50, 50), dtype=np.uint8)
image[3:33, 10:40] = np.eye(30) * 255
# let's add random noise (off if N=0)
N = 0
image[np.random.randint(0, 50, N), np.random.randint(0, 50, N)] = 255


# get all the non-zero points
points = np.argwhere(image)
ys, xs = points[:, 0], points[:, 1]

# a simple line equation y = mx + b (m is the slope, which you might also know as k)
def line_eq(x, m, b):
    return m * x + b


# find m and b
(m, b), _ = curve_fit(line_eq, xs, ys)
print(m, b)


# yshat = line_eq(xs, m, b)

# show
plt.imshow(image, cmap="gray")
# plt.plot(xs, line_eq(xs, m, b))
plt.show()

## Let's solve it with PyTorch

In [None]:
import torch
import torch.nn as nn

# Convert numpy array to torch tensors
_xs = torch.from_numpy(xs).float()
_ys = torch.from_numpy(ys).float()

# define the model parameters (m and b)
_m = torch.Tensor([0.1]).float()
_m.requires_grad = True
_b = torch.Tensor([0]).float()
_b.requires_grad = True


# Define loss function and optimizer
criterion = nn.MSELoss(reduction="sum")
optimizer = torch.optim.Adam([_m, _b], lr=1e-1)

print(f"The initial model parameters are: m={_m.item():.3f}, b={_b.item():.3f}")
yhat = line_eq(_xs, _m, _b)
loss = criterion(yhat, _ys)
print(f"The initial loss is: {loss.item():.3f}")

# Train the model
for epoch in range(1000):
    optimizer.zero_grad()
    yhat = line_eq(_xs, _m, _b)
    loss = criterion(yhat, _ys)
    loss.backward()  # compute gradients
    optimizer.step()  # update parameters (m and b)

    if epoch % 100 == 0:
        print(f"step {epoch}, loss = {loss.item():.3f}")


# Show the image
plt.imshow(image, cmap="gray")
yhat = line_eq(_xs, _m, _b)
plt.plot(xs, yhat.detach().numpy())
plt.show()

print(f"The final model parameters are: m={_m.item():.3f}, b={_b.item():.3f}")
print(f"The final loss is: {loss.item():.3f}")

### Exercise 1 📝: Play with the hyper parameters
 
Change the initial parameters ($m,b$) and the hyperparameters (learning rate and epochs) and see how it affects traininig. 


*Advanced:* Try to change the optimizer (see the [PyTorch docs](https://pytorch.org/docs/stable/optim.html#algorithms)) and see how it affects the training. For example, try to use the `SGD` optimizer.

# Let's train a linear classifier to classify 2D data points. 

We will at first generate some data using scikit-learn.

In [None]:
from sklearn.datasets import make_moons, make_blobs, make_circles

# Set the seed for the random number generator
torch.manual_seed(42)
np.random.seed(42)

num_classes = 2
N = 100  # number of points

# make up a dataset
X, y = make_blobs(n_samples=N, centers=num_classes)  # , noise=0.1)
# TODO: use later -> X, y = make_moons(n_samples=N, noise=0.1)

# y = y * 2 - 1  # make y be -1 or 1
# visualize in 2D
plt.figure(figsize=(5, 5))
plt.scatter(X[:, 0], X[:, 1], c=y, s=20, cmap=plt.cm.Spectral)
plt.show()
print(np.unique(y))

We define our Linear Classifier in PyTorch.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define the number of inputs and outputs
num_inputs = 2  # dimension (x, y)
num_outputs = num_classes

# Define the input and label tensors
inputs = torch.tensor(X, dtype=torch.float32)
labels = torch.tensor(y, dtype=torch.long)

# Define the model
class LinearClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(num_inputs, num_outputs)

    def forward(self, x):
        return self.linear(x)


# Instantiate the model
model = LinearClassifier()

# Define the loss function
criterion = nn.CrossEntropyLoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)

### Exercise 02 📝: Train the model
 
- Train our linear classifier. Use the optimizer to update the model parameters. Try to answer the following questions:
    - How many epochs do you need to train the model? 
    - How do the loss and accuracy change over time?
- Plot the decision boundary of the model (below).

### Exercise 03 📝: Change the data generation function
- Can you still classify the points if you switch to the following data generation function?
    ```python 
    X, y = make_moons(n_samples=N, noise=0.1)
    ```

In [None]:
logits = model(inputs)
_, preds = logits.max(1)
acc = (preds == labels).float().mean()
print(f"Initial (random) Accuracy: {acc.item():.3f}")


# Todo: Train the model

In [None]:
# visualize decision boundary (similar to tensorflow playground)

import numpy as np
import matplotlib.pyplot as plt


# Define a function to plot the decision boundaries
def plot_decision_boundary(model):
    x_min, x_max = inputs[:, 0].min() - 0.1, inputs[:, 0].max() + 0.1
    y_min, y_max = inputs[:, 1].min() - 0.1, inputs[:, 1].max() + 0.1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))
    Xmesh = np.c_[xx.ravel(), yy.ravel()]
    logits = model(torch.from_numpy(Xmesh).float())
    _, Z = logits.max(1)
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.2)


# Plot the points and decision boundaries
plt.scatter(inputs[:, 0], inputs[:, 1], c=labels, cmap=plt.cm.Spectral)
plot_decision_boundary(model)
plt.show()