<a href="https://colab.research.google.com/github/KonradGonrad/PyTorch-deep-learning/blob/main/02_neural_network_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 02. Neural Network classification with PyTorch

Classification is a problem of predicting whether something is one thing or another (sometimes there can be multiple things as another)

links:
- https://www.learnpytorch.io/02_pytorch_classification/
- https://github.com/mrdbourke/pytorch-deep-learning

## 1. Make classification data and get it ready

In [None]:
import sklearn

In [None]:
from sklearn.datasets import make_circles

In [None]:
# Make a 1000 samples
n_samples = 1000

# Create circles
X, y = make_circles(n_samples,
                    noise=0.03,
                    random_state = 42)

In [None]:
print(f"Features: {len(X)}, Labels: {len(y)}")

In [None]:
print(f"First 5 samples: \n{X[:5]} \n{y[:5]}")

In [None]:
# Make DataFrame of circle data
import pandas as pd
circles = pd.DataFrame({"X1": X[:, 0],
                        "X2": X[:, 1],
                        "label": y})
circles

In [None]:
# Visualize, Visualize, Visuzalize
import matplotlib.pyplot as plt

plt.scatter(x=X[:, 0],
            y=X[:, 1],
            c = y,
            cmap=plt.cm.RdYlBu)

Note: The data we're working with is often reffered to as a toy dataset, a dataset that is small enought to experiment by still sizeable to enought to practise the fundametals

### 1.1 Check input and output shapes

In [None]:
X.shape, y.shape

In [None]:
# View the first example of features and labels
X_sample = X[0]
y_sample = y[0]

print(f"Values for one sample of X: {X_sample} and the same for y: {y_sample}")
print(f"Shapes for one sample of X: {X_sample.shape} and the same for y: {y_sample.shape}")

### 1.2 turn data into tensors and create train and test splits

In [None]:
import torch
torch.__version__

In [None]:
X.dtype

In [None]:
# Turn data into tensors
X = torch.from_numpy(X).type(torch.float)
y= torch.from_numpy(y).type(torch.float)

In [None]:
X[:5], y[:5]

In [None]:
# Split into training and testing set
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2, # test size 20%
                                                    train_size=0.8, # train size 80%
                                                    random_state = 42
                                                    )

In [None]:
len(X_train), len(y_train), len(X_test), len(y_test)

## 2. Building a model

Let's build a model to classify our blue and red dots

To do so, we want to:
1. Setup device agnostic code so our code will run faster on GPU if there is one
2. Construct a model (by subclassing 'nn.Module)
3. Define a loss function and optimizer
4. Create a training and test loop

* For better visualization of model working, this site is very usefull:
https://playground.tensorflow.org/#activation=relu&regularization=L2&batchSize=20&dataset=circle&regDataset=reg-plane&learningRate=0.01&regularizationRate=0&noise=5&networkShape=8,8&seed=0.29212&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

In [None]:
# device agnostic code
import torch

DEVICE_DESTINATION = 'cuda' if torch.cuda.is_available() else 'cpu'

Now we've setup device agnostic code, let's create a model that:

1. Subclasses 'nn.Module' (Almost all models in PyTorch subclass 'nn.Module')
2. Create 2 'nn.Linear()' layers that are capable of handling the shapes of our data
3. Defubes a "forward()" method that outlines the forward pass (or forward computation) of the model
4. Instatite an instance of our model class and send it to the traget "device"

In [None]:
X_train.shape # Parameters, where are 800 rows with two elements as Parameters
y_train.shape # Outputs, where are 800 rows with one element as Output

In [None]:
from torch import nn
# 1. Construct a model that subclasses nn.Module
class CircleModelV0(nn.Module):
  def __init__(self):
    super().__init__()
    # 2. Create two nn.Linear layers capable of handling the shapes of our data
    self.layer_1 = nn.Linear(in_features=2, out_features=8) # Takes in 2 features and upscales to 5 features
    self.layer_2 = nn.Linear(in_features=8, out_features=1) # Takes in 5 features and gives 1 output, it's the output layer


  # 3. Create a forward() method that outlines the forward pass
  def forward(self, x):
    return self.layer_2(self.layer_1(x)) # x -> layer_1 -> layer_2 -> output

# 4. Instantitate an instane of our model class and send it to the target device
model_0 = CircleModelV0().to(DEVICE_DESTINATION)
model_0

In [None]:
next(model_0.parameters()).device

In [None]:
model_01 = torch.nn.Sequential(
    nn.Linear(in_features = 2, out_features=5),
    nn.Linear(in_features=5, out_features = 1)
).to(DEVICE_DESTINATION)

In [None]:
print(f"Model_0: \n{model_0}")
print(f"Model_01: \n{model_01}")

In [None]:
with torch.inference_mode():
  untrained_preds = model_0(X_test.to(DEVICE_DESTINATION))

In [None]:
print(f"Length of preds: {len(untrained_preds)} and shape: {untrained_preds.shape}")
print(f"Length of test samples: {len(y_test)} and shape: {y_test.shape}")
print(f"First ten of test samples: {y_test[:10]}")
print(f"first ten of preds: {torch.round(untrained_preds[:10])}")

### 2.1 Setup loss function and optimizer

Which loss function or optimizer should you use?

Again... this is problem specific.

For example for regression you might want MAE or MSE

For classification you might want binary cross entropy or categorical cross entropy (cross entropy)

And for optimizers, two of the most coomon and useful are SGD and Adam, however PyTorch have more other optimizers

* For the loss function we're going to use 'torch.nn.BECWithLogistsLoss()', for more on what binary cross entropy is, check out this article:
https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a
* For a definition on what a logit is in deep learning:
https://datascience.stackexchange.com/questions/31041/what-does-logits-in-machine-learning-mean

In [None]:
# Setup the loss function
# loss_fn = nn.BCELoss() # BCELoss = requires inputs to have gone through the sigmoid activation function prior to input to BCELoss

loss_fn = nn.BCEWithLogitsLoss() # BCEWithLogitsLoss = sigmoid activation function built-in, what means the output activation

optimizer = torch.optim.SGD(params=model_0.parameters(),
                            lr = 0.1)

In [None]:
# Calculate accuracy - out of 100 examples, what percantage does our model get right
def accuracy_fn(y_true, y_pred):
  correct = torch.eq(y_true, y_pred).sum().item()
  acc = (correct/len(y_pred)) * 100
  return acc

## 3. Train model

to train our model, we're going to need to build a training loop with the following steps:

1. Forward pass
2. Calculate the loss
3. Optimizer zero graz
4. Loss backward (backpropagation)
5. Optimizer step (gradien descent)

### 3.1 Going from raw logits -> predictions probabilities -> prediction labels

Our model outputs are going to be raw **logits**.

We can convert these **logits** into **prediction probabilities** by passing them to some kind of activation function (e.g. sigmoid for binary crossentropy and softmax multiclass classification)

Then we can convert our model's prediction probabilities to prediction labels by either rounding them or taking the argmax()

In [None]:
# View the first 5 outputs of the forward pass on the test data
print(f"Our model device: {next(model_0.parameters()).device}")

with torch.inference_mode():
  y_logits = model_01(X_test.to(DEVICE_DESTINATION))
y_logits[:5]

For our prediction propability values, we need to perform a range-style rounding on them:
* 'y_pred_probs' >= 0.5, 'y=1' (class 1)
* 'y_pred_probs' < 0.5, 'y=0' (class 0)

In [None]:
# Use the sigmoid activation function on our model logits to turn them into prediction probabilities
y_pred_probs = torch.sigmoid(y_logits)
torch.round(y_pred_probs[:5])

In [None]:
# Find the predicted labels
y_preds = torch.round(y_pred_probs)

# In full (logits -> pred probs -> pred labels)
y_pred_labels = torch.round(torch.sigmoid(model_0(X_test.to(DEVICE_DESTINATION))))

# Check for equality
print(torch.eq(y_preds.squeeze(), y_pred_labels.squeeze()))

### 3.2 building a training and testing loop

In [None]:
y_train.shape
y_preds.squeeze().shape

In [None]:
torch.manual_seed(42)

# Set he number of epochs
epochs = 100

# Put data to target device
X_train, y_train = X_train.to(DEVICE_DESTINATION), y_train.to(DEVICE_DESTINATION)
X_test, y_test = X_test.to(DEVICE_DESTINATION), y_test.to(DEVICE_DESTINATION)

# Build training and evaluation loop
for epoch in range(epochs):
  ### Training
  model_0.train()

  # 1. Forward pass
  y_logits = model_0(X_train).squeeze()
  y_pred = torch.round(torch.sigmoid(y_logits)) # Turn logits -> pred probs -> pred labels

  # 2. Calculate accuracy/loss
 # loss = loss_fn(torch.sigmoid(y_logits), # nn.BCELoss expects prediction # propabilities
 #                y_train)
  train_loss = loss_fn(y_logits, # nn.BCEWithLogitsLoss expects raw logits as input
                 y_train)
  train_acc = accuracy_fn(y_true=y_train,
                          y_pred=y_pred)

  # 3. Optimizer zero grad
  optimizer.zero_grad()

  # 4. loss backward
  train_loss.backward()

  # 5. Optimizer step
  optimizer.step()

  ### Testing
  model_0.eval()
  with torch.inference_mode():
    # 1. Forward pass
    test_logits = model_0(X_test).squeeze()
    test_pred = torch.round(torch.sigmoid(test_logits))

    # 2. Calculate accuracy/loss
    test_loss = loss_fn(test_logits,
                        y_test)
    test_acc = accuracy_fn(y_true=y_test,
                           y_pred=test_pred)

  # Print out what's happening
  if epoch % 10 == 0:
    print(f"epoch: {epoch} | train_loss: {train_loss:.5f} | train_acc: {train_acc:.2f} | test_loss: {test_loss:.5f} | test_acc: {test_acc:.2f}")


## 4. Make predictions and evaluate the model

From the metrics it looks like our model isn't learning anything..

So to inspect it let's make some predictions and make them visual

In other words, "Visualize, visualize, visualize"

To do so, we're going to import a function called 'plot_decision_boundry()'

To do so, we're going to import a function called 'plot_decision_boundary()' from: https://github.com/mrdbourke/pytorch-deep-learning/blob/main/helper_functions.py

In [None]:
import requests
from pathlib import Path

# Download helper functions from Learn PyTorch repo (if it's not already downloaded)

if Path("helper_functions.py").is_file():
  print("Helper_functions.py already exists, skipping download")
else:
  print("Downloading helper_functions.py")
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py", "wb") as f:
    f.write(request.content)

from helper_functions import plot_predictions, plot_decision_boundary

In [None]:
# Plot decision boundary of the model
plt.figure(figsize=(12,6))
plt.subplot(1, 2, 1)
plt.title('Train')
plot_decision_boundary(model_0, X_train, y_train)
plt.subplot(1, 2, 2)
plt.title('Test')
plot_decision_boundary(model_0, X_test, y_test)

## 5. Improving a model (from a model perspective)

* Add more layers - give the model more chances to learn about patterns in the data
* Add more hidden units - go from 5 hidden units to 10 hidden units
* Changing the activation functions
* Change the optimization function
* Change the learning rate
* Change the loss function

These options are all from a model's perspective beacouse they deal directly with the model, rather than the data

And beacouse these options are all values we (as Machine Learning engineers and data scientists) can change, they are reffered as **hyperparameters**

Let's try and improve our model by:
* Adding more hidden units: 5 -> 10
* Increase the number of layers: 2 -> 3
* Increase the number of epoch: 100 -> 1000

In [None]:
class CircleModelV1(nn.Module):
  def __init__(self):
    super().__init__()

    self.layer_1 = nn.Linear(in_features = 2, out_features = 10)
    self.layer_2 = nn.Linear(in_features = 10, out_features = 10)
    self.layer_3 = nn.Linear(in_features = 10, out_features = 1)

  def forward(self, x):
    #z = self.layer_1(x)
    #z = self.layer_2(z)
    #z = self.layer_3(z)
    return self.layer_3(self.layer_2(self.layer_1(x))) # This way of writing operations leverages speed ups where posible behind scenes

model_1 = CircleModelV1().to(DEVICE_DESTINATION)
model_1

In [None]:
# Create a loss function
loss_fn_v1 = torch.nn.BCEWithLogitsLoss()

# Create an optimizer
optimizer_v1 = torch.optim.SGD(model_1.parameters(),
                             lr = 0.01)

In [None]:
# Write a training and evaluation loop for model_1
torch.manual_seed(42)
torch.cuda.manual_seed(42)

epochs = 1000

X_train, y_train = X_train.to(DEVICE_DESTINATION), y_train.to(DEVICE_DESTINATION)
X_test, y_test = X_test.to(DEVICE_DESTINATION), y_test.to(DEVICE_DESTINATION)

for epoch in range(epochs):
  model_1.train()

  y_train_logits = model_1(X_train).squeeze()
  y_train_preds = torch.round(torch.sigmoid(y_train_logits))

  train_loss = loss_fn_v1(y_train_logits,
                          y_train)
  train_acc = accuracy_fn(y_true=y_train,
                         y_pred=y_train_preds)

  optimizer_v1.zero_grad()

  train_loss.backward()

  optimizer_v1.step()

  model_1.eval()
  with torch.inference_mode():
    y_test_logits = model_1(X_test).squeeze()
    y_test_preds = torch.round(torch.sigmoid(y_test_logits))

    test_loss = loss_fn_v1(y_test_logits,
                           y_test)
    test_acc = accuracy_fn(y_true=y_test,
                           y_pred=y_test_preds)

  if epoch % 100 == 0:
    print(f"epoch: {epoch} | train loss: {train_loss:.5f} | train acc: {train_acc:.2f} | test loss: {test_loss:.5f} | test acc: {test_acc:.2f}")


In [None]:
plt.figure(figsize=(12,6))
plt.title('Train')
plt.subplot(1,2,1)
plot_decision_boundary(model_1, X_train, y_train)
plt.title('Train')
plt.subplot(1,2,2)
plot_decision_boundary(model_1, X_test, y_test)

### 5.1 Preparing data to see if our model can fit a straight line

One way to troubleshoot to a larger problem is to test out a smaller problem

In [None]:
# Create some data (same as notebook 01)
weight = 0.7
bias = 0.3
start = 0
end = 1
step = 0.01

# Create data
X_regression = torch.arange(start, end, step).unsqueeze(dim=1)
y_regression = weight * X_regression + bias # Linear regression formula

# Check the data
print(len(X_regression))
print(X_regression[:5], y_regression[:5])

In [None]:
# Create train and test splits
train_split = int(0.8 * len(X_regression))
X_train_regression, y_train_regression = X_regression[:train_split], y_regression[:train_split]
X_test_regression, y_test_regression = X_regression[train_split:], y_regression[train_split:]

# Check the lengths of each
len(X_train_regression), len(y_train_regression), len(X_test_regression), len(y_test_regression)

In [None]:
plot_predictions(train_data = X_train_regression,
                 train_labels=y_train_regression,
                 test_data=X_test_regression,
                 test_labels=y_test_regression)

In [None]:
model_1

In [None]:
# Same architecture as model_1 (but using nn.Sequential())
model_11 = nn.Sequential(
    nn.Linear(in_features = 1, out_features=10),
    nn.Linear(in_features = 10, out_features=10),
    nn.Linear(in_features = 10, out_features = 1)
)
model_11

In [None]:
# Loss and optimizer
loss_fn = nn.L1Loss()
optimizer = torch.optim.SGD(model_11.parameters(),
                            lr=0.01)

In [None]:
# Train the model
torch.manual_seed(42)
torch.cuda.manual_seed(42)

X_train_regression, y_train_regression = X_train_regression.to(DEVICE_DESTINATION), y_train_regression.to(DEVICE_DESTINATION)
X_test_regression, y_test_regression = X_test_regression.to(DEVICE_DESTINATION), y_test_regression.to(DEVICE_DESTINATION)
model_11.to(DEVICE_DESTINATION)

epochs = 1000
for epoch in range(epochs):
  model_11.train()

  y_preds = model_11(X_train_regression)
  loss = loss_fn(y_preds, y_train_regression)

  optimizer.zero_grad()

  loss.backward()

  optimizer.step()

  model_11.eval()

  with torch.inference_mode():
    test_pred = model_11(X_test_regression)
    test_loss = loss_fn(test_pred, y_test_regression)

  if epoch % 100 == 0:
    print(f"epoch: {epoch} | train_loss: {loss:.5f} | test_loss: {test_loss:.5f}")

In [None]:
# Turn on evaluation mode
model_11.eval()

# Make predictions
with torch.inference_mode():
  y_preds = model_11(X_test_regression)

# Plot data and predictions
plot_predictions(train_data=X_train_regression.cpu(),
                 train_labels=y_train_regression.cpu(),
                 test_data=X_test_regression.cpu(),
                 test_labels=y_test_regression.cpu(),
                 predictions=y_preds.cpu())

## 6. The missiing piece: non-linearity

"What patterns could you draw if you were given an infinite amount of a straing and non-straight lines?""

Or in machine learning terms, an infinite (but really it is finite) of linear and non-linear functions?

### 6.1 Recreating non-linear data (red and blue circles)

In [None]:
# Make and plot data
import matplotlib.pyplot as plt
from sklearn.datasets import make_circles

n_samples = 1000
X, y = make_circles(n_samples,
                    noise=0.03,
                    random_state=42)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu)


In [None]:
# Train and test
import torch
from sklearn.model_selection import train_test_split

# Turn data into tensors
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, train_size = 0.8, random_state = 42)

print(f"X_train: {len(X_train)} | y_train: {len(y_train)} | X_test: {len(X_test)} | y_test: {len(y_test)}")

### 6.2 Building a model with non-linearity

* Linear = straight lines
* Non-Linear = non straight lines

Artificial neural networks are a large combination of linear (straight) and (non-linear) functions which are potentially able to find patterns in data

In [None]:
# Build a model with non-linear activation function
from torch import nn

class CircleModelV2(nn.Module):
  def __init__(self):
    super().__init__()
    self.layer_1 = nn.Linear(in_features = 2, out_features=10)
    self.layer_2 = nn.Linear(in_features = 10, out_features=10)
    self.layer_3 = nn.Linear(in_features = 10, out_features = 1)
    self.relu = nn.ReLU() # non-linear activation function
    # self.sigmoid == nn.Sigmoid()

  def forward(self, x):
    return self.layer_3(self.relu(self.layer_2(self.relu(self.layer_1(x)))))

model_2 = CircleModelV2().to(DEVICE_DESTINATION)
model_2

In [None]:
# Setup loss and optimizer

loss_fn = torch.nn.BCEWithLogitsLoss()

optimizer = torch.optim.SGD(params=model_2.parameters(),
                            lr = 0.1)

In [None]:
# Train loop

# Torch random seeds
torch.manual_seed(42)
torch.cuda.manual_seed(42)

X_train, y_train = X_train.to(DEVICE_DESTINATION), y_train.to(DEVICE_DESTINATION)
X_test, y_test = X_test.to(DEVICE_DESTINATION), y_test.to(DEVICE_DESTINATION)

epochs = 2000

for epoch in range(epochs):
  ## Train time
  model_2.train()

  # 1. Forward pass
  y_train_logits = model_2(X_train).squeeze()
  y_train_preds = torch.round(torch.sigmoid(y_train_logits))

  # 2. Calculate the loss

  train_loss = loss_fn(y_train_logits, y_train)
  train_acc = accuracy_fn(y_train, y_train_preds)

  # 3. Optimizer zero grad
  optimizer.zero_grad()

  # 4. Loss backward
  train_loss.backward()

  # 5. Optimizer step step step
  optimizer.step()

  ## Test time
  # Model evaluation
  model_2.eval()

  with torch.inference_mode():
    # Forward pass
    y_test_logits = model_2(X_test).squeeze()
    y_test_preds = torch.round(torch.sigmoid(y_test_logits))

    # Calculate the loss
    test_loss = loss_fn(y_test_logits, y_test)
    test_acc = accuracy_fn(y_test, y_test_preds)

  ## Print out what's happening
  if epoch % 200 == 0:
    print(f"epoch: {epoch} | train_loss: {train_loss:.5f} | train_acc: {train_acc:.2f}% | test_loss: {test_loss:.5f} | test_acc: {test_acc:.2f}%")

In [None]:
# Visualize, visualize, visualize
## Train and test data
plt.figure(figsize=(12,6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_2, X_train, y_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model_2, X_test, y_test)

In [None]:
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Linear model")
plot_decision_boundary(model_1, X_train, y_train) # Linear
plt.subplot(1, 2, 2)
plt.title("Non-Linear model")
plot_decision_boundary(model_2, X_train, y_train) # Non-Linear

## 7. Replicating non-linear activation functions

Neural networks, rather than us telling the model what to learn, we give it the tools to discover patterns in data and it tries to figure out patterns on its own

Abd these tools are linear & non-linear functions

In [None]:
# Create a tensor
A = torch.arange(-10, 10, 1, dtype=torch.float)
A.dtype

In [None]:
# Visualize the data
plt.plot(A)

In [None]:
plt.plot(torch.relu(A))

In [None]:
def relu_fn(x: torch.Tensor) -> torch.Tensor:
  return torch.maximum(torch.tensor(0), x) # inputs must be tensors

print(f"function for -5 is equal {relu_fn(torch.tensor(-5))}")
print(f"function for 5 is equal {relu_fn(torch.tensor(5))}")
print(f"Function for our tensor A:\n{A} \nIs equal: \n{relu_fn(A)}")

In [None]:
# Plot reLU activation function
plt.figure(figsize=(12, 6))
# first visualization
plt.subplot(1, 2, 1)
plt.title("Custom relu function")
plt.plot(relu_fn(A))
# Second visualization
plt.subplot(1, 2, 2)
plt.title("Original relu function by PyTorch")
plt.plot(torch.relu(A))

In [None]:
# Let's do the same for sigmoid
def sigmoid(x):
  return 1 / (1 + torch.exp(-x))

In [None]:
# Sigmoid comparision
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Custom sigmoid function")
plt.plot(sigmoid(A))
plt.subplot(1, 2, 2)
plt.title("Torch sigmoid function")
plt.plot(torch.sigmoid(A))

## 8. Putting it all together with a multi-class classification

* Binary classification = one thing or another (cat vs dog, spam vs not spam, fraud or not fraud)
* Multi-class classification = more than one thing or another (cat vs dog vs chicken)

### Binary vs multi-class classification
Big difference is that we use there softmax (torch.softmax) instead of sigmoid and Cross entropy (torch.nn.CrossEntropyLoss) instead of BCELoss

### 8.1 Creating a toy multi-class dataset


In [None]:
# Import dependecies
import torch
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs # https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html
from sklearn.model_selection import train_test_split

# Set the hyperparameters for data creations
NUM_CLASSES = 4
NUM_FEATURES = 2
RANDOM_SEED = 42

# 1. Create multi-class data
X_blob, y_blob = make_blobs(n_samples = 1000,
                            n_features = NUM_FEATURES,
                            centers = NUM_CLASSES,
                            cluster_std = 1.25,
                            random_state=RANDOM_SEED)

# 2. Turn data into tensors
X_blob = torch.from_numpy(X_blob).type(torch.float)
y_blob = torch.from_numpy(y_blob).type(torch.LongTensor)

# 3. Split into training and test
X_blob_train, X_blob_test, y_blob_train, y_blob_test = train_test_split(X_blob,
                                                                        y_blob,
                                                                        test_size=0.2,
                                                                        train_size=0.8,
                                                                        random_state=RANDOM_SEED

                                                                        )

# 4. Plot data (Visualize)
plt.figure(figsize=(10, 7))
plt.scatter(X_blob[:, 0], X_blob[:, 1], c=y_blob, cmap=plt.cm.RdYlBu)

### 8.2 Building a multi-class classification model in PyTorch

In [None]:
# Create device agnostic code
DEVICE_DESTINATION = 'cuda' if torch.cuda.is_available() else 'cpu'
DEVICE_DESTINATION

In [None]:
# Build a multi-class classification model
class BlobModel(nn.Module):
  def __init__(self, input_features, output_features, hidden_units = 8):
    """Initalizes multi-class classification model.

    Args:
      input_features (int): Number of input features of the model
      output_features (int): Number of outputs features (number of output classes)
      hidden_units (int): Number of hidden units between layers, default 8

    Returns:

    Example:

    """
    super().__init__()
    self.linear_layer_stack = nn.Sequential(
        nn.Linear(in_features=input_features, out_features=hidden_units),
        #nn.ReLU(),
        nn.Linear(in_features=hidden_units, out_features=hidden_units),
        #nn.ReLU(),
        nn.Linear(in_features=hidden_units, out_features=output_features)
    )
  def forward(self, x: torch.tensor) -> torch.tensor:
    return self.linear_layer_stack(x)

# Create an instance of BlobModel and send it to the target device
model_3 = BlobModel(input_features = 2,
                    output_features = 4,
                    hidden_units = 8).to(DEVICE_DESTINATION)

In [None]:
model_3

### 8.3 Create loss function and optimizer

In [None]:
# Create a loss function for multi-class classification
loss_fn = torch.nn.CrossEntropyLoss()

# Create and optimizer for multi-class classification
optimizer = torch.optim.SGD(params = model_3.parameters(),
                            lr = 0.1)

### 8.4 Getting prediction probabilites for a multi-class PyTorch model

In order to evaluate and train and test our model, we need to convert our model's outputs (logits) to prediction prbabilities and then to prediction labels

Logits -> Pred probs -> Pred labels

In [None]:
model_3.eval()
with torch.inference_mode():
  y_logits = model_3(X_blob_test.to(DEVICE_DESTINATION))


print(y_logits[:10])

In [None]:
y_blob_test[:10]

In [None]:
# Convert our model's logit outputs to prediction probabilities
y_pred_probs = torch.softmax(y_logits, dim=1)
print(y_pred_probs[:5])

In [None]:
y_blob_test

In [None]:
# Convert our model's prediction probabilities to prediction labels
y_preds = torch.argmax(y_pred_probs, dim = 1)
y_preds

### 8.5 Create training and testing loop

In [None]:
torch.manual_seed(RANDOM_SEED)
torch.cuda.manual_seed(RANDOM_SEED)

# Put data to the target device
X_blob_train, y_blob_train = X_blob_train.to(DEVICE_DESTINATION), y_blob_train.to(DEVICE_DESTINATION)
X_blob_test, y_blob_test = X_blob_test.to(DEVICE_DESTINATION), y_blob_test.to(DEVICE_DESTINATION)

# Set number of epochs
epochs = 100

for epoch in range(epochs):
  ### Training
  # It's train time
  model_3.train()

  # Do the forward pass
  y_train_logits = model_3(X_blob_train)
  y_train_preds = torch.softmax(y_train_logits, dim=1).argmax(dim=1)

  # Calculate the Loss
  train_loss = loss_fn(y_train_logits, y_blob_train)
  train_acc = accuracy_fn(y_blob_train, y_train_preds)

  # Optimizer zero grad, loss backward
  optimizer.zero_grad()
  train_loss.backward()
  optimizer.step()

  ### Testing
  model_3.eval()
  with torch.inference_mode():
    y_test_logits = model_3(X_blob_test)
    y_test_preds = torch.softmax(y_test_logits, dim=1).argmax(dim=1)

    test_loss = loss_fn(y_test_logits, y_blob_test)
    test_acc = accuracy_fn(y_blob_test, y_test_preds)

  # Print out what's happening
  if epoch % 10 == 0:
    print(f"Epoch: {epoch}| Train loss: {train_loss:.4f}, train acc: {train_acc:.2f}% | Test loss: {test_loss:.4f}, test acc: {test_acc:.2f}")

In [None]:
# Make predictions
model_3.eval()
with torch.inference_mode():
  y_logits = model_3(X_blob_test)

y_logits[:10]

In [None]:
# from logits -> prediction probabilities
y_pred_probs = torch.softmax(y_logits, dim = 1)
y_pred_probs[:10]

In [None]:
# from pred probabilities -> pred labels
y_preds = torch.argmax(y_pred_probs, dim = 1)
y_preds[:10]

In [None]:
# Visualize, visualize, visualize
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_3, X_blob_train, y_blob_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model_3, X_blob_test, y_blob_test)

## 9. A few more classification metrics... (to evaluate our classification model)

* Accuracy - out of 100 samples, how many does our model get right
* Precision
* Recall
* F1-score
* Confusion matrix
* Classification report

In [None]:
!pip install torchmetrics

In [None]:
from torchmetrics import Accuracy

# Setup metric
torchmetric_accuracy = Accuracy(task="multiclass",
                                num_classes=4).to(DEVICE_DESTINATION)

# Calculate accuracy
torchmetric_accuracy(y_preds, y_blob_test)