<a href="https://colab.research.google.com/github/BenjyLimmy/FIT3181/blob/main/FIT5215_Tute1b_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <span style="color:#0b486b">  FIT3181/5215: Deep Learning (2024)</span>
***
*CE/Lecturer (Clayton):*  **Dr Trung Le** | trunglm@monash.edu <br/>
*Lecturer (Clayton):* **Prof Dinh Phung** | dinh.phung@monash.edu <br/>
*Lecturer (Malaysia):*  **Dr Arghya Pal** | arghya.pal@monash.edu <br/>
*Lecturer (Malaysia):*  **Dr Lim Chern Hong** | lim.chernhong@monash.edu <br/>  <br/>
*Head Tutor 3181:*  **Miss Vy Vo** |  \[tran.vo@monash.edu \] <br/>
*Head Tutor 5215:*  **Dr Van Nguyen** |  \[van.nguyen1@monash.edu \]

<br/> <br/>
Faculty of Information Technology, Monash University, Australia
***

# Tutorial 1b: Logistic Regression with PyTorch


This tutorial aims to introduce the Logistic Regression which can be regarded as a feed-forward neural network with one layer.

## Import Necessary Libraries

In [None]:
import torch
import torch.nn as nn
import numpy as np
from sklearn import datasets
from sklearn.preprocessing import StandardScaler  # for feature scaling
from sklearn.model_selection import train_test_split  # for train/test split

## Prepare Data

We first load the `breast cancer` dataset from `sklean` datasets and then split into 80% for training and 20% for testing.

In [None]:
# Prepare data
bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target

n_samples, n_features = X.shape
print(f'number of samples: {n_samples}, number of features: {n_features}')

# split data to 80% for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)


number of samples: 569, number of features: 30


**<span style="color:red">Exercise 1</span>:** Write the code to print out the first 10 feature vectors in `X_train` and `y_train`. Write the code to show the unique labels in `y_train`.

In [None]:
#Your answer here
print(X_train[0:10])
print(y_train[0:10])
print(np.unique(y_train))


[[1.288e+01 1.822e+01 8.445e+01 4.931e+02 1.218e-01 1.661e-01 4.825e-02
  5.303e-02 1.709e-01 7.253e-02 4.426e-01 1.169e+00 3.176e+00 3.437e+01
  5.273e-03 2.329e-02 1.405e-02 1.244e-02 1.816e-02 3.299e-03 1.505e+01
  2.437e+01 9.931e+01 6.747e+02 1.456e-01 2.961e-01 1.246e-01 1.096e-01
  2.582e-01 8.893e-02]
 [1.113e+01 2.244e+01 7.149e+01 3.784e+02 9.566e-02 8.194e-02 4.824e-02
  2.257e-02 2.030e-01 6.552e-02 2.800e-01 1.467e+00 1.994e+00 1.785e+01
  3.495e-03 3.051e-02 3.445e-02 1.024e-02 2.912e-02 4.723e-03 1.202e+01
  2.826e+01 7.780e+01 4.366e+02 1.087e-01 1.782e-01 1.564e-01 6.413e-02
  3.169e-01 8.032e-02]
 [1.263e+01 2.076e+01 8.215e+01 4.804e+02 9.933e-02 1.209e-01 1.065e-01
  6.021e-02 1.735e-01 7.070e-02 3.424e-01 1.803e+00 2.711e+00 2.048e+01
  1.291e-02 4.042e-02 5.101e-02 2.295e-02 2.144e-02 5.891e-03 1.333e+01
  2.547e+01 8.900e+01 5.274e+02 1.287e-01 2.250e-01 2.216e-01 1.105e-01
  2.226e-01 8.486e-02]
 [1.268e+01 2.384e+01 8.269e+01 4.990e+02 1.122e-01 1.262e-01 1.128

We use `StandardScaler()` from `sklearn` to normalize the training/testing sets. We convert the training/testing numpy arrays to PyTorch arrays and then reshape them.

In [None]:
# scale data
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# convert to tensors
X_train = torch.from_numpy(X_train.astype(np.float32))
X_test = torch.from_numpy(X_test.astype(np.float32))
y_train = torch.from_numpy(y_train.astype(np.int64))
y_test = torch.from_numpy(y_test.astype(np.int64))


## Training/Testing Procedure

We now present the `fundamental workflow of PyTorch` including training a model based on the training set and testing the trained model on the testing set. This fundamental workflow is the same for various PyTorch models.

### Prepare Model

First, we need to declare and define a model, which is a computational graph showing how to compute the model output from the input vector $x$. Specifically, given a data point $x$ (i.e., [1,30]) a batch $x$ (i.e., [64,30]), or even the entire training set $x$ (i.e., [569,30]), we compute
- logits = xW + b
- pred_probs = softmax(logits)

In [None]:
# Create model
# f = wx + b, softmax at the end
class LogisticRegression(nn.Module):

    def __init__(self, n_input_features):
        super(LogisticRegression, self).__init__()
        self.linear = nn.Linear(n_input_features, 2)

    def forward(self, x):
        logits = self.linear(x)
        pred_probs = torch.nn.Softmax(dim=-1)(logits) #for asking question only
        return logits #return the logits

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = LogisticRegression(n_features).to(device)  #load the model to the current device

**<span style="color:red">Exercise 2</span>:** Explain the forward function. What are the meanings and dimensions of `logits` and `pred_probs`?

Given a mini-batch $x$, `logits` represents the logits/discriminative scores/values of data points in the batch, while `pred_probs` represents the prediction probabilities of data points in the batch

### Prepare Loss and Optimizer

We declare `loss_fn` as the cross entropy loss. To train our logistic regression, we invoke the SGD optimizer with the learning rate $0.01$.

In [None]:
# Loss and optimizer
learning_rate = 0.01
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [None]:
print(y_train.shape, y_train.squeeze().shape)

torch.Size([455]) torch.Size([455])


### Train Model By Feeding the Training Set All-in-Once

We train the model in $200$ epochs (i.e., going through the entire training set $100$ times). Here in each epoch, we input entire training set to the model to compute the cross-entropy loss over the training set and then use the optimizer to update the model parameters (i.e., W and b).

In [None]:
# training loop
num_epochs = 200

for epoch in range(num_epochs):
    # forward pass and loss
    X_train, y_train = X_train.to(device), y_train.to(device) #load the data to device (GPU or CPU)
    logits = model(X_train)

    loss = loss_fn(logits, y_train.squeeze().long())

    # backward pass to compute the gradient
    loss.backward()

    # updates the model parameter based on the gradient
    optimizer.step()

    # zero gradients
    optimizer.zero_grad()

    if (epoch+1) % 10 == 0:
        print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')

epoch: 10, loss = 0.3830
epoch: 20, loss = 0.3133
epoch: 30, loss = 0.2722
epoch: 40, loss = 0.2447
epoch: 50, loss = 0.2247
epoch: 60, loss = 0.2093
epoch: 70, loss = 0.1971
epoch: 80, loss = 0.1870
epoch: 90, loss = 0.1785
epoch: 100, loss = 0.1712
epoch: 110, loss = 0.1649
epoch: 120, loss = 0.1593
epoch: 130, loss = 0.1543
epoch: 140, loss = 0.1498
epoch: 150, loss = 0.1458
epoch: 160, loss = 0.1421
epoch: 170, loss = 0.1387
epoch: 180, loss = 0.1356
epoch: 190, loss = 0.1328
epoch: 200, loss = 0.1301


### Evaluate Trained Model on Testing Set

We compute the accuracy on the testing set (i.e., the testing accuracy).

In [None]:
# Ensure the model is in evaluation mode
model.eval()

# Disable gradient calculation
with torch.no_grad():
    # Load the data to the device (GPU or CPU)
    X_test, y_test = X_test.to(device), y_test.to(device)
    # Get the model's predictions
    logits = model(X_test.type(torch.float32))
    # Compute the predicted class
    y_predicted = torch.argmax(logits, dim=1)

    # Calculate the number of correct predictions
    corrects = (y_predicted == y_test).sum().item()
    print(f'correct = {corrects}')

    # Get the total number of samples
    totals = y_test.size(0)
    print(f'totals = {totals}')

    # Compute the accuracy
    acc = corrects / totals
    print(f'accuracy = {acc * 100:.2f}%')


correct = 103
totals = 114
accuracy = 90.35%


**<span style="color:red">Exercise 3</span>:** Explain the code above to compute the testing accuracy. What are `logits` and `y_predicted`?

`logits` is a 2D tensor of the shape $[114,2]$ in which each row is the logits of a data point in the testing set. `y_predicted` is a 1D tensor that contains the predicted labels of the data points in the testing set. You can print out the values of `logits` and `y_predicted` for more information.

**<span style="color:red">Exercise 4</span>:** Package the above code in a function, allowing you to try with different learning rates. Then, train the logistic regression models with different learning rates (i.e., 0.05, 0.04, 0.005, 0.001) and observe the loss tendency and testing accuracies.

In [None]:
#Your answer here
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
def train(X_train, y_train,learning_rate, num_epochs=20):
  model = LogisticRegression(n_features).to(device)  #load the model to the current device
  # Loss and optimizer
  loss_fn = nn.CrossEntropyLoss()
  optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
  X_train, y_train = X_train.to(device), y_train.to(device) #load the data to device (GPU or CPU)
  # training loop
  for epoch in range(num_epochs):
    # forward pass and loss
    logits = model(X_train)
    loss = loss_fn(logits, y_train.squeeze().long())
    # backward pass to compute the gradient
    loss.backward()
    # updates the model parameter based on the gradient
    optimizer.step()
    # zero gradients
    optimizer.zero_grad()
    if (epoch+1) % 10 == 0:
        print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')





In [None]:
for lr in [0.05, 0.04, 0.005, 0.001]:
  print(f'learning rate = {lr}')
  train(X_train, y_train, lr, num_epochs=50)
  print("---------------------------------------")

learning rate = 0.05
epoch: 10, loss = 0.2323
epoch: 20, loss = 0.1666
epoch: 30, loss = 0.1393
epoch: 40, loss = 0.1234
epoch: 50, loss = 0.1126
---------------------------------------
learning rate = 0.04
epoch: 10, loss = 0.2825
epoch: 20, loss = 0.1983
epoch: 30, loss = 0.1632
epoch: 40, loss = 0.1429
epoch: 50, loss = 0.1294
---------------------------------------
learning rate = 0.005
epoch: 10, loss = 0.6406
epoch: 20, loss = 0.5263
epoch: 30, loss = 0.4535
epoch: 40, loss = 0.4030
epoch: 50, loss = 0.3658
---------------------------------------
learning rate = 0.001
epoch: 10, loss = 0.4337
epoch: 20, loss = 0.4253
epoch: 30, loss = 0.4175
epoch: 40, loss = 0.4100
epoch: 50, loss = 0.4029
---------------------------------------


----

**The end**