# PyTorch Workflow

In [None]:
what_were_covering = {
    1: "Data (prepare and load)",
    2: "Build model",
    3: "Fitting the model to data (training)",
    4: "Maing predictions and evaluating a model (inference)",
    5: "Saving and loading the model",
    6: "Putting it all together"
    }
what_were_covering

torch.nn -> https://pytorch.org/docs/stable/nn.html

In [None]:
import torch
from torch import nn # nn contains all of PyTorch's building blocks for neural networks
import matplotlib.pyplot as plt

# PyTorch version
torch.__version__

## Data (Preparing and loading)

Data can almost be anything... in maching learning/

* Excel
* Images of any ing
* Videos
* Audio like songs and podcasta
* DNA
* Text

Maching learning is mainly divides into 2 parts:
1. Get data into a numerical representation.
2. Build a model to learn patterns in theat numerical representation.

To showcase this, Let's create some **known** data using the linear regressing formula.

We'll use a linear regressiong formula to make a straight line with know `parameters`.

In [None]:
# Create parameters
weight = 0.7
bias = 0.3

# Create data
start = 0
end = 1
step = 0.02
x = torch.arange(start,end,step).unsqueeze(dim=1)
y = weight * x + bias

x[:10], y[:10],x.shape,y.shape

In [None]:
len(x),len(y)

### Splitting data into training and test sets

`Goal` -> **Generalization**: The ability for a maching learning model to perform well on data it hasn't seen before

In [None]:
# Creating a train/test split
train_split = int(0.8*len(x))
x_train,y_train = x[:train_split],y[:train_split]
x_test,y_test = x[train_split:],y[train_split:]
len(x_train),len(y_train),len(x_test),len(y_test)

Visualizing data

In [None]:
def plot_predictions(train_data=x_train,
                     train_labels=y_train,
                     test_data=x_test,
                     test_labels=y_test,
                     predictions=None):
  """
  Plots traing data,test data and compares predictions
  """
  plt.figure(figsize=(10,7))

  # Plot traing data in blue
  plt.scatter(train_data,train_labels,c="b",s=4,label="Training data")

  # Plot test data in green
  plt.scatter(test_data,test_labels,c="g",s=4,label="Test data")

  if predictions is not None:
    # plot predictions if they exist
    plt.scatter(test_data,predictions, c="r",s=4,label="Predictions")

  # Show legend
  plt.legend(prop={"size":14});

In [None]:
plot_predictions()

 ## 2. Build a model

 What our model does:
 * Start with random values (weights, bias)
 * Look at the training data and adjust the random values to better represent the ideal values

How does it do so?
Through two main algorithms:
1. Gradient descent
2. Backward propogation

(watch 3b1b for both)

In [None]:
# We use this only for simple models for understanding, for example with data from images, these parameters are defined by another module in nn.Module for us.
# Create a linear regressiong model class
class LinearRegressionModel(nn.Module): # <-almost everything in PyTorch inherits from nn.Module
  def __init__(self):
    super().__init__()
    self.weights = nn.Parameter(torch.randn(1, # start with a random weight to try to adjust it to the ideal weights
                                            requires_grad=True, #True by default, means PyTorch will track the gradients of this specific parameter for use with torch.autograd and gradient descent
                                            dtype=torch.float))
    self.bias = nn.Parameter(torch.randn(1,
                                         requires_grad=True,
                                         dtype=float))

  # Forward method to define the computation in the model
  def forward(self,x :torch.Tensor) -> torch.Tensor: # <- "x" is the input data
    return self.weights * x + self.bias # linear regression formula



### PyTorch model building essentials

* torch.nn - contains all of the building blocks for computational graphs (a neural network can be considerd a computational graph)
* torch.nn.Parameter - what parameters should our model try and learn, often a Pytorch layer from torch.nn will set these for us
* torch.nn.Module - The base class for all neural network modules, if you subclass it, you should overwrite forward()
* torch.optim - This is where the optimizer algorithm in PyTorch are stored, they help wiht gradient descent
* def forward() - All nn.module subclass require you to overwrite forward(), this method defines what happens is forward computation


### Checking the contents of our PyTorch model

Now we've created a model, let's see what's inside...

We can check our model parameters or what's inside our model using `.parameters()`.

In [None]:
# Create a random seed
torch.manual_seed(42)

# Create an instance of the model (this is a subclass of nn.Module)
model_0 = LinearRegressionModel()

# Check out the parameters
list(model_0.parameters())

In [None]:
# List named parameters
model_0.state_dict()

### Making prediction using `torch.inference_mode()`

To check our model's predictive power, Let's see how well it predicts `y_test` base on `x_test`

When we pass data through our model, it's going to run it through the `forward()` method

In [None]:
# Make prediction with model
with torch.inference_mode():
  y_preds = model_0(x_test)

# You can also do something with torch.no_grad(), however, torch.inference_mode() is preferred
# with torch.no_grad():
#   y_preds = model_0(x_test)

y_preds

In [None]:
y_test

In [None]:
plot_predictions(predictions=y_preds)

## 3. Train model

The whole idea of training is for a model to move from a *unknown* parameter (these may be random) to some *known* parameter.

In other words from a poor representation of the data to a better representation of th data.

One way to measure how poor or how wrong your model predictions are is to use a loss function.

* Note: Loss function may also be called cost function or criterion in different areas.

Things we need to train:

* **Loss function:** A function to measure how wrong your model's predictions are to the ideal output, lower is better.
* **Optimizer:** Takes into account the loss of a model and adjusts the model's parameters (eg: weights & bias) to imporve the loss function.
  * inside optimizer we set two parameters:
    * `params` - the model parameters you'd like to optimize, for example `params=model_0.parameters()`
    * `lr` (learning rate) - the learning rate is a hyperparameter that defines how big/small the optimizer changes the parameters with each step( a samll `lr` results in a small change, large `lr` results in large changes)
      

And specifically for PyTorch, we need:
* A training loop
* A testing loop

In [None]:
list(model_0.parameters())

In [None]:
model_0.state_dict()

In [None]:
from re import VERBOSE
# Setup a loss function
loss_fn = nn.L1Loss() # L1Loss - MAE (mean absolute error)

# Setup an optimizer, SDG-(stochastic gradient descent)
optimizer = torch.optim.SGD(params=model_0.parameters(),
                            lr=0.01,) # Larger the learning rate, larger the change in parameter

### Building a training loop and testing loop in PyTorch

A couple of thing we need in training loop:
0. Loop through the data and do...
1. Forward pass (this involves data moving though our model's `forward()` functions) to make predictions on data - also called forward propogation
2. Calculate the loss (compare forward pass predictions to  ground truth label)
3. Optimizer zero grad
4. Loss backward - move backwards through the network to calculate the gradients of each of the parameters of our model with respect to the loss (***backpropogation***)
5. Optimizer step - use the optimizer to adjust the model's parameters to try to imporve the loss. (**Gradient descent**)

In [None]:
torch.manual_seed(42)

# An epoch is one loop through the data
epochs = 200

# Tracking different values
epoch_count=[]
loss_values = []
test_loss_values = []


### Training
# 0. Loop through the data
for epoch in range(epochs):
  # Set the model to training mode
  model_0.train() # train mode in PyTorch sets all parameters that requires gradients to require gradients.

  # 1. Forward pass
  y_preds = model_0(x_train)

  # 2. Calculate the loss
  loss = loss_fn(y_preds,y_train)
  # print(f"Loss: {loss}")

  # 3. Optimizer zero grad
  optimizer.zero_grad()

  # 4. Perform backpropogation on the loss with respect to the parameters of the model
  loss.backward()

  # 5. Steop the Optimizer (perform gradient descent)
  optimizer.step()  # By default how the optimizer changes will accumulate through the loop so we have to zero them above in step 3 for the next iteration of the loop

  # Testing
  model_0.eval()  # turns off different settings in the model not needed for evaluating/testing (dropout, batch norm layers)
  with torch.inference_mode(): #turns off gradient tracking & a couple morethings behind the scenes

    # 1. Do the forward pass
    test_pred = model_0(x_test)

    # 2. caluclate the loss
    test_loss = loss_fn(test_pred,y_test)
  if epoch % 10 == 0:
    epoch_count.append(epoch)
    loss_values.append(loss)
    test_loss_values.append(test_loss)
    print(f"Epoch: {epoch} | Loss: {loss} | test_loss: {test_loss}")
    # Print our model state_dict
    print(model_0.state_dict())



In [None]:
# Plot the loss curve
import numpy as np
plt.plot(epoch_count,np.array(torch.tensor(loss_values).numpy()),label="Train loss")   # loss values is stored in tensor form by default, to convert it to numpy we use - np.array(torch.tensor(loss_values).numpy())
plt.plot(epoch_count,test_loss_values,label="Test loss")
plt.title("Training and test loss curves")
plt.ylabel("Loss")
plt.xlabel("Epochs")
plt.legend()

In [None]:
with torch.inference_mode():
  y_preds_new = model_0(x_test)

In [None]:
model_0.state_dict()

In [None]:
weight,bias

In [None]:
plot_predictions(predictions=y_preds_new)

## Saving a model in PyTorch

There are three main methods we should know about for saving and loading models in PyTorch.

1. `torch.save()` - allows us to save a PyTorch object in python's pickle format
2. `torch.load()` - allows us to load a saved PyTorch object.
3. `torch.nn.Module.load_state_dict()` - allows us to load a model's saved state dictionary.

In [None]:
# Saving our pytorch model

from pathlib import Path

# Create models directory
MODEL_PATH = Path("/content/drive/MyDrive/pytorch/models")
MODEL_PATH.mkdir(parents=True, exist_ok=True)

# 2. Create model save path

# pth - pytorch extension
MODEL_NAME = "01_pytorch_workflow_model_0.pth"
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

# for saving full model -> torch.save(model,PATH)
# torch.save(MODEL_NAME, MODEL_SAVE_PATH)

# Save the model state_dict()
print(f"Saving model to: {MODEL_SAVE_PATH}")
torch.save(obj=model_0.state_dict(),
           f=MODEL_SAVE_PATH)

In [None]:
# -l (long) to display detailed info...
!ls -l /content/drive/MyDrive/pytorch/models/


# import os
# os.listdir("/content/drive/MyDrive/pytorch/models/")

## Loading a PyTorch model

Since we've saved our model's `state_dict()` rather than the entire model, we'll create a new instance of our model class and load the saved `state_dict()` into that

In [None]:
model_0.state_dict()

In [None]:
# To load a saved state_dict we have to instantiate a new instance of our model class
loaded_model_0 = LinearRegressionModel()

# Load the saved state_dict of model_0 (this will update the new instance with updated parameters)

loaded_model_0.load_state_dict(torch.load(MODEL_SAVE_PATH))
# torch.load("/content/drive/MyDrive/pytorch/models/01_pytorch_workflow_model_0.pth")

In [None]:
loaded_model_0.state_dict()

In [None]:
# Make some predictions with our loaded model

loaded_model_0.eval()
with torch.inference_mode():
  loaded_model_preds = loaded_model_0(x_test)

loaded_model_preds

In [None]:
# Make some preds with origianl model
model_0.eval()
with torch.inference_mode():
  y_preds = model_0(x_test)
y_preds

In [None]:
# Compare loaded model preds with original model preds
y_preds == loaded_model_preds

## 6. Putting it all together
Let's go back through the above and see it all in one place

In [None]:
# IMPORTING PyTorch and matplotlib
import torch
from torch import nn
import matplotlib.pyplot as plt

torch.__version__

Create device-agnostic code.

This means if we've got access to a GPU, our code will use it (for faster computing)

If no GPU is available, the code will default to using CPU.

In [None]:
# Setup code
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

### 6.1 Data

In [None]:
#  Create some data using linear regression formula of y = weight * x + bias
weight = 0.7
bias = 0.3

# Create range values
start = 0
end = 1
step = 0.02

# Create X and y (features and labels)
x = torch.arange(start,end,step).unsqueeze(dim=1) # without unsqueeze, errors will pop up
y = weight*x+bias
x[:10], y[:10]

In [None]:
# Split data
train_split = int(0.8*len(x))
x_train,y_train = x[:train_split],y[:train_split]
x_test,y_test = x[train_split:],y[train_split:]
len(x_train),len(y_train),len(x_test),len(y_test)

In [None]:
# Plot the data
plot_predictions(x_train,y_train,x_test,y_test)

In [None]:
# Create a linear model by subclassing nn.Module
class LinearRegressionModelV2(nn.Module):
  def __init__(self):
    super().__init__()
    # Use nn.Linear() for creating the model parameters / also called: linear transform, probing layer, fully connected layer,dense layer
    self.linear_layer = nn.Linear(in_features=1,
                                  out_features=1) # One infeature to out... one x feature for one y feature

  def forward(self,x : torch.Tensor) -> torch.Tensor:  # x should be a torch.Tensor and returns torch.Tensor
    return self.linear_layer(x)

# Set manual seed
torch.manual_seed(42)

model_1 = LinearRegressionModelV2()
model_1, model_1.state_dict()

In [None]:
# Check the model current device
next(model_1.parameters()).device

In [None]:
# Set the model to use the target device (GPU)
model_1.to(device)
next(model_1.parameters()).device

In [None]:
model_1.state_dict()

### Training

For training we need:
* Loss function
* Optimizer
* Training loop
* Testing loop


In [None]:
# Setup loss function
loss_fn = nn.L1Loss() # Same as MAE

# Setup our optimizer
optimizer = torch.optim.SGD(params=model_1.parameters(),
                            lr=0.01)


In [None]:
# Let's write a training loop
torch.manual_seed(42)

epochs = 200

# Put data on the target device (device agnostic code for data)
x_train = x_train.to(device)
y_train = y_train.to(device)
x_test = x_test.to(device)
y_test = y_test.to(device)

for epoch in range(epochs):
  model_1.train()

  # 1. Forward pass
  y_pred = model_1(x_train)

  # 2. Calculate loss
  loss = loss_fn(y_pred,y_train)

  # 3. Optimizer zero grad -> to prevent accumulation of gradients, which might lead to incorrect predictions.
  optimizer.zero_grad()

  # 4. Perform backpropogation
  loss.backward()

  # 5. Step the optimizer
  optimizer.step()

  ### Testing

  model_1.eval()
  with torch.inference_mode():
    test_pred = model_1(x_test)
    test_loss = loss_fn(test_pred,y_test)

  # Print out what's happing

  if epoch % 10 == 0:
    print(f"Epoch: {epoch} | Loss: {loss} | test_loss: {test_loss}")


In [None]:
model_1.state_dict()

In [None]:
weight,bias

### Making and evaluating predictions

In [None]:
# Turn model into evaluation mode
model_1.eval()

# Make predictions on the test data
with torch.inference_mode():
  y_preds = model_1(x_test)
y_preds

In [None]:
# Check out our model predictions visually
plot_predictions(predictions=y_preds)