<a href="https://colab.research.google.com/github/AleMazzeo2001/PyTorch_Tutorials/blob/main/02_Training_models_with_Gradient_Descent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training models with Gradient Descent


Thanks to automatic differentiation, pytorch makes it easy to implement and use gradient-based training algoerithms, like Gradient Descent.

We will see this in action by training a logistic regression model on the "exam" dataset.

Let's start by donwloading and visualizing the data.

In [None]:
!wget https://pastebin.com/raw/KTmF6b1u -O exam.txt

import torch
import matplotlib.pyplot as plt

f = open('exam.txt', 'r')
data = [float(s) for s in f.read().split()]
f.close()

XY = torch.tensor(data).view(-1, 3)
X = XY[:, :2]
Y = XY[:, 2].long()
print(X.shape, X.dtype)
print(Y.shape, Y.dtype)

plt.scatter(X[:, 0], X[:, 1], c=Y, cmap='rainbow')
plt.xlabel('hours studying')
plt.ylabel('hours attending lectures')

## Inference

Define the parameters and the function of the model. Remember that we need to compute:


$$ z = w.T x + b, \;\; p = \frac{1}{1 + e^{-z}}. $$

Try to implement inference for multiple feature vectors in parallele, without using any loop.


In [None]:
def logreg_inference(X, w, b):
    # ... complete

w = # ... complete
b = # ... complete
probs = logreg_inference(X, w, b)
Yhat = (probs > 0.6).long()
print(Yhat.shape)

## Training

Define the loss function, and the training loop.
Recall the Gradient Descent update rule:
$$ w \gets w - \eta \nabla_w, L$$
$$ b \gets b - \eta \nabla_b. L$$

For logistic regression we have:\
$$ \nabla_w L = \frac{1}{m} X^T (\hat{p} - Y), $$
$$ \nabla_b L = \frac{1}{m} \sum_{i=0}^{m-1} (\hat{p} - Y). $$

The loss is the average cross entropy:
$$ L = \frac{1}{m}\sum_{i=0}^{m-1} -Y_i \log \hat{p}_i -(1-Y_i) \log (1 - \hat{p}_i). $$


In [None]:
STEPS = 100000
LR = 0.002
m = X.shape[0]

def cross_entropy(Yhat, p):
    # ... complete

w = # ... complete
b = # ... complete
for step in range(STEPS):
    p = # ... complete
    loss = # ... complete
    w_grad = # ... complete
    b_grad = # ... commplete
    w -= LR * w_grad
    b -= LR * b_grad
    if step % 1000 == 0:
        Yhat = (p > 0.5).long()
        accuracy = (Yhat == Y).float().mean()
        print(step, loss.item(), accuracy.item())
        steps.append(step)
        losses.append(loss.item())
        accuracies.append(accuracy.item())

Modfy the code above to use automatic
1. automatic differentiation;
2. a pytorch optimizer.

In [None]:
# Modify the training loop to track the loss and the accuracy over the training iterations.

plt.plot(steps, losses)
plt.figure()
plt.plot(steps, accuracies)

## Visualization

This code shows the decision boundary of the classifier.

In [None]:
x1, x2 = torch.meshgrid(torch.linspace(0, 160, 100), torch.linspace(0, 70, 100))
Xgrid = torch.stack([x1, x2], dim=2).reshape(-1, 2)
p = logreg_inference(Xgrid, w, b).detach()

plt.scatter(X[:, 0], X[:, 1], c=Y, cmap='rainbow')
plt.contour(x1, x2, p.view(100, 100), levels=[0.25, 0.5, 0.75])