# PyTorch Workflow

The typical` DL/ML workflow in PyTorch involves:

- Preparing and loading collected data
- Building models
- Fitting/Training the model on the data
- Making Predictions aka. Inference
- Saving and reloading trained models
- Putting it all together

In [None]:
import torch
from torch import nn # nn contains all of PyTorch's building blocks for neural networks
import matplotlib.pyplot as plt
import time

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Check for PyTorch version
torch.__version__

# Setting up a device agnostic code
if torch.cuda.is_available():
  torch.set_default_device("cuda")
else:
  torch.set_default_device("cpu")

## Data (Preparing and Loading)

Machine Learning is a game of two parts:
- Get data into numerical representation
- Build a model to learn patterns in that data

To showcase this, lets create some known data using linear regression formula. We will make a straight line with known **parameters**.

> While creating models we come across the concept of features and labels. Features are the characteristics that the model will evaluate from the training data input and labels are the entities or names or values the model is trained to associate the features to. For example, a CNN trained on a dataset of celebrity faces will have the image of their faces as the feature data and the name of the celebrity will be the feature's label. 


In [None]:
# Establishing known parameters
weight = 1 # Weight is the coefficient of the weighted sum
bias = 2.3 # bias is an additional constant to be added to the weighted sum to threshold the activation
start = 0
end = 2
step = .02
X = torch.arange(start, end, step).unsqueeze(dim=1)
# We created a range of numbers from 0 to 1 with steps of 0.02 and then wrapped each number in another dimension so the dimension of X will be ([50,1])
Y = weight * X + bias
# We map each element of X to Y with a function
# print(f"x:{X[:10]}\ny:{Y[:10]}")
# print(len(X), len(Y), X.shape, Y.shape)


In [None]:
# Establishing known parameters
deg = 3
weight = torch.Tensor([3,2,2]).unsqueeze(dim=1)# Weight is the coefficient of the weighted sum
bias = 1 # bias is an additional constant to be added to the weighted sum to threshold the activation
start = -1
end = 1
step = .01
X = torch.arange(start, end, step).unsqueeze(dim=1)
Y = torch.zeros((int((end-start) / step),1))
for x in X:
  dom_elem_ind = (X == x).nonzero(as_tuple=True)[0].item()
  for i in range(0, len(weight), 1):
    coeff = deg - i
    Y[dom_elem_ind] += pow(x.item(), coeff) * weight[i]
  Y[dom_elem_ind] += bias
  # print(x,Y[dom_elem_ind][0].item())
  
  
# We created a range of numbers from 0 to 1 with steps of 0.02 and then wrapped each number in another dimension so the dimension of X will be ([50,1])
# Y = weight * X + bias
# We map each element of X to Y with a function
# print(f"x:{X[:10]}\ny:{Y[:10]}")
# print(len(X), len(Y), X.shape, Y.shape)


### Splitting data into learning and testing sets

One of the most important concepts in machine learning is dividing our data into a set that we learn the patterns from and a set on which we validate our model.

There are three types of datasets in machine learning:
- The training set
- The validation set
- The test set

This is done to generalize our models(make sure it is able to work on datasets that it has never seen before.)

In [None]:
# Creating a training and testing set of our data

train_split = int(.8 * len(X)) 
# We index with integers so it is crucial to convert it into an integer given that the value is implicitly typecast into a float 
train_split

X_train, Y_train = X[:train_split], Y[:train_split]
X_test, Y_test = X[train_split:], Y[train_split:]
X_test

A better way to understand our data is to visualize them

In [None]:
def plot_predictions(train_data = X_train.to("cpu"),
                     train_labels = Y_train.to("cpu"), 
                     test_data = X_test.to("cpu"), 
                     test_labels = Y_test.to("cpu"), 
                     predictions = None):
  """
  Plots the training data, test data and compares the predictions
  """
  plt.figure(figsize=(10,7)) # Figure dimensions in inches

  # Plotting the training data in blue
  plt.scatter(train_data, train_labels, c="b", s=4, label="Training Data") # (<data-x>, <data-y>, <color>, <scale>, <label>)

  # Plotting the test data in green
  plt.scatter(test_data, test_labels, c="orange", s=4, label="Test Data")

  # Checking if any predictions have been made
  if predictions is not None:
    # Plot the predictions
    plt.scatter(test_data, predictions, c="r", s=4, label="Predictions")

  # Displaying the legend
  plt.legend(prop={"size" : 14}); # (<property dictionary>)

In [None]:
plot_predictions()

Now let's build a model that is able to predict the function based on the training set for the test set.

In [None]:
# Creating a linear regression model class

class LinearRegressionModel(nn.Module): # Almost everything in PyTorch inherits nn.Module
  def __init__(self):
    super().__init__() # We initialize the parent class by referring to it as super
    self.weight = nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float))
    self.bias = nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float))

  # Defining a forward function to represent the computation in model
  def forward(self, x: torch.Tensor) -> torch.Tensor:
     # We did a little bit of type hinting here, we'll dive into it better below. However, x is the input data here
    return self.weight * x + self.bias # We returned the value of the function as per the current model's state (weight, bias)
  

In [None]:
# Creating a linear regression model class

class LinearRegressionModel(nn.Module): # Almost everything in PyTorch inherits nn.Module
  def __init__(self):
    super().__init__() # We initialize the parent class by referring to it as super
    self.weight = nn.Parameter(torch.rand_like(input=weight, requires_grad=True, dtype=torch.float))
    self.bias = nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float))
    self.deg = len(self.weight)
  # Defining a forward function to represent the computation in model
  def forward(self, X: torch.Tensor) -> torch.Tensor:
    # We did a little bit of type hinting here, we'll dive into it better below. However, x is the input data here
    Y = torch.zeros_like(input=X)
    self.weight.unsqueeze(dim=1)
    for x in X:
      dom_elem_ind = (X == x).nonzero(as_tuple=True)[0].item()
      # print("X", x.item())
      for i in range(0, len(self.weight)):
        coeff = deg - i
        # print(f"{i}th Weight", self.weight[i].item())
        Y[dom_elem_ind] += pow(x.item(), coeff) * self.weight[i].item()
      # print("Equals = ", Y[dom_elem_ind])
      Y[dom_elem_ind] += self.bias
      # print(Y[dom_elem_ind])
    return Y
     # We returned the value of the function as per the current model's state (weight, bias)

A couple of things to bring to light from the above code block:

* the `super` method calls the parent class and here we are initializing the parent nn.Module superclass
* the `requires_grad` argument asks PyTorch to keep track of gradients of the parameters as we will employ `gradient descent` and `back propagation` to update our parameters
* while defining the forward method we employed 'type hinting' which causes the method to expect a tensor as input and the `->` indicated that the method will return a tensor as well
* any subclass of nn.Module must override the forward method ( overridig is done by defining the method in the subclass again )

> What does the model do?

It:
* starts with random values for the parameters
* looks at the training data and adjust the random values to better suit or represent the ideal values

> How does it do so?

Through:
* Gradient Descent
* Back Propagation


## PyTorch Model Building Essentials

* torch.nn - contains all of the building blocks for computational graphs ( a neural network can be considered a computational graph)
* torch.nn.Parameters - what parameters should our model try to learn, often a PyTorch layer from torch.nn will set these for us
* torch.nn.Module - the base class for all neural network modules, don't forget to overwrite the forward method. The forward method defines wha happens in forward computation
* torch.optim - this is where all of the optimizers of PyTorch live, they determine the best values for our parameters via gradient descent


Let's see what's inside our model. The model parameters can be fetched with .parameters() method.

In [None]:
# Creating a random seed
torch.manual_seed(42)

# Creating an instance of the model
model_0 = LinearRegressionModel()
list(model_0.parameters())

A better way to list out our parameters is through the .state_dict() method. As the name suggests, the method returns a dictionary containing current state information of the model as in the parameter values and so on.

In [None]:
model_0.state_dict()

## Making predictions with `torch.inference_mode()`

When we pass our data through a model, it is going to run it through the forward model. How well the model is able to guess the value of "Y_test" as per the provided "X_test" determines the predictive power / accuracy of our model.

In [None]:
# We seen an application of context managers in python below. Context managers are used for resource management where resources no longer in use are automatically released by the program to accomodate for further resources.
with torch.inference_mode():
  y_preds_rand = model_0(X_test)

# We can have achieve similar results with the torch.no_grad() in the context manager however torch.inference_mode() has several benefits over no_grad() being a newer feature of PyTorch
# with torch.no_grad():
#   ypreds = model_0(X_test)

# y_preds_rand
print("Class")
plot_predictions(predictions=y_preds_rand.to("cpu"))

## Training our Model

The entire process of training revolves around starting with some arbitrary parameters (maybe random, maybe taken from some other models `transfer learning`), looking at the data we have, trying to figure out the underlying trend, rule, pattern etc. and changing our parameters to better represent the data we have. The model progressively gets better at predicting the output for an input as per the data's trend.

Quantifying the performance of our model allows for us to fine-tune it to be more accurate. One of the ways of doing so is through a `loss function`. Loss functions differ from cost functions in the way that cost functions find out how bad the model is, on average, at predicting the data for several sample inputs while a loss function evaluates how far a model's prediction is from the actual data (ideal prediction) for a single sample input.

Things we will need fot training our model:

  * **Loss Function** - Described above
  * **Optimizer** - Takes into account the loss of the prediction and adjusts the parameters such that the loss is minimized

And specifically for PyTorch we will need:
  * A training loop
  * A testing loop

In [None]:
# Setting up a loss function
loss_fn = nn.L1Loss() # Initializes the MAE (Mean Average Error) loss function

# Setting up a optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.01, nesterov=True, momentum=.1, dampening=.0)  # Initializes  the SGD (Stochastic Gradient Descent) optimizer


While setting up our loss function, there are a few things to keep in mind
* **params** : the parameters that are to be optimized through training
* **lr** : the learning rate, one of the most important hyperparameters. Hyperparameters are set by engineers to allow for faster convergence (completion of learning or achievement of local minimum in cost-parameters hyperspace). Smaller lr can cause overfitting of data. Larger parameters can cause overshooting minimum due to large steps.

> Overfitting of data is a condition where the model learns the noise and idiosyncrasies of the training data  as if they were genuinely contributing to the underlying pattern, like a student that meticulously memorizes the concepts and notes without gaining the underlying understanding required to solve problems not previously discussed. The model will perform surprisingly well with samples from training data but are severely compromised when faced with new datasets.

In [None]:
loss_fn
optimizer 

### Building a training loop and testing loop

Thing we will need:
0. Loop through the data
1. Forward pass (passing the data through forward method of our model) to make predictions- aka Forward Propagation
2. Calculate Loss for the prediction (compare prediction with ground-truth labels)
3. Optimizer zero grad
4. Loss Backwards - to calculate the gradient of each parameters of the model with respect to loss (**backpropagation**)
5. Optimizer step - adjust our model parameters to improve our loss (**gradient dscent**)


In [None]:
"""
# -1. Initializing some empty lists that will track the model metrics for review purpose. The efficiency of a model's learning can be crucial factor in design when the data to be crunched through gets massive
epoch_count = []
loss_values = []
test_loss_values = []

epochs = 100 # An epoch is a loop through data
# 0. Loop through the data

for i in range(epochs):
  # Set the model in training mode
  model_0.train() # Sets all parameters in the model that require gradients to require gradients

  # 1. Forward pass
  Y_preds = model_0(X_train)

  # 2. Calculate the loss 
  loss = loss_fn(Y_preds, Y_train)
  
  # 3. Optimizer zero grad (see last comment in cell)
  optimizer.zero_grad()

  # 4. Perform backpropagation on the loss with respect to the parameters of the model
  loss.backward()

  # 5. Step the optimizer (perform gradient descent)
  optimizer.step() 
  # The optimizer steps will accumulate as we go through the loop. We can get similar idea from the fact that in back propagation the nudge to the parameters be it weight or bias depends not only on the sample training data evaluated but for all data in the set (true backprop). So in order to zero thee optimizer step we pass the statement in 3

  model_0.eval() #Turns off model settings not required during testing / evaluation (batch norm, dropout etc.)
  with torch.inference_mode():  # Turns off gradient tracking && a couple more things bts.
    # 1. Do a forward pass
    y_predictions = model_0(X_test)
    # 2. Calculate loss
    loss_on_prediction = loss_fn(y_predictions, Y_test)

  # Print out the metrics and append the model progress to the tracking lists
  if i % 10 == 0:
    epoch_count.append(i)
    loss_values.append(loss)
    test_loss_values.append(loss_on_prediction)
    print(f"Epoch: {i} | Loss: {loss} | Test Loss: {loss_on_prediction}\nWeight: {model_0.weight.item()} | Bias: {model_0.bias.item()}")
    
plot_predictions(predictions=y_predictions.to("cpu"))
"""

In [None]:
# Refactoring the above cells into a function (recommended)
def model_train_test(model, labels, features, epochs, loss_function, optimizer, test_labels, test_features):
  epoch_count = []
  loss_values = []
  test_loss_values = []
  start = time.time()
  for i in range(epochs):
    model.train()
    predictions = model(features)
    loss = loss_function(predictions,labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    model.eval()
    with torch.inference_mode():
      y_prediction = model(test_features)
      loss_test = loss_function(y_prediction, test_labels)
    if i % 100 == 0:
      epoch_count.append(i)
      loss_values.append(loss.item())
      test_loss_values.append(loss_test.item())
      print(f"Epoch: {i} | Loss: {loss} | Test Loss: {loss_test}\nWeight: {model.weight} | Bias: {model.bias.item()}")
  end = time.time()
  print(f"Time Elapsed: {end - start}")
  return epoch_count, loss_values, test_loss_values

In [None]:
torch.manual_seed(42)
epoch_c, loss_v, test_loss_v = model_train_test(model_0, Y_train, X_train, 400, loss_fn, optimizer, Y_test, X_test)

In [None]:
with torch.inference_mode():
  plot_predictions(predictions=model_0(X_test).to("cpu"))

In [None]:
"""
plt.plot(epoch_c, loss_v, label = "Training Loss", c = 'r')
plt.plot(epoch_c, test_loss_v, label = "Test Loss", c = 'b')
plt.title("Training(Red) and Test(Blue) Loss Curves")
# The x and y labels
plt.ylabel("Loss")
plt.xlabel("Epoch")
plt.legend() # If you want to include the legend for the graph
"""

In [None]:
# Refactoring the loss curve plotter
def plot_loss(epoch_c, loss_v, test_loss_v, cl="Orange", ctl="Green"):
  plt.plot(epoch_c, loss_v, label = "Training Loss", c = cl)
  plt.plot(epoch_c, test_loss_v, label = "Test Loss", c = ctl)
  plt.title(f"Training({cl}) and Test({ctl}) Loss Curves")
  plt.ylabel("Loss")
  plt.xlabel("Epoch")
  plt.legend() 

In [None]:
plot_loss(epoch_c, loss_v, test_loss_v)

> A loss curve with decreasing ordinate is representative of a model that is getting better at predicting values, also observe the convergernce of the two curves, this explains that the model is not only getting better at understanding the pattern but also is able to predict values for novel inputs accurately, a separation at the right of the curve must be fixed as it is indicative of overfitting of the model to the data. 

## Saving trained models in PyTorch

There are three main ways to save and load our models in PyTorch

* `torch.save` - save your PyTorch object in Python's Pickle format
* `torch.load` - load your saved Pytorch object 
* `torch.nn.module.load_state_dict()` - load the saved state dictionary for your model.

The model, as it trains continuously updates its state dictionary. You can take state dictionary as the entire description of your model in its current state. The values for each parameter of your model is stored here. By loading your pytorch object, here model you may import the outlying framework of your model but the knowledge it had gained through training will not be brought along. 

To bring your model upto the stage where it was before, you will have to load the state_dictionary stored along with the model into the current model. One advantage of having models and their states separated is that you can compare the model's accuracy when trained with different datasets at different points in time (the time factor is not needed but was encountered)

In [None]:
# Saving our PyTorch model
from pathlib import Path

# 1. Create Models directory
MODEL_PATH = Path("models")
MODEL_PATH.mkdir(parents=True, exist_ok=True)

# 2. Create Model save path
MODEL_NAME = "01_workflow_model_0.pth"
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

# Saving the model state_dict
print("Saving the model state dictionary")
torch.save(obj=model_0.state_dict(), f=MODEL_SAVE_PATH)

# Note: I am using a cpu system for local work and colab for training heavy models. So, for me the availability of cuda cores imply that the code is being run in colab. To download the saved model from colab i use the following:
if torch.cuda.is_available():
  from google.colab import files
  files.download(f"{MODEL_SAVE_PATH}")
# Comment it out if you are using a system with cuda locally.

## Loading our saved model state_dict

More often than not, we will be saving our model's state dict instead of saving the entire model. Doing this we can increase the flexibility of our model's implementation as it lets us define our new model classes without being strictly limited to the previous definitions. One drawback of this might be the need to redefine a new model if the old model definition is not available, but given the advantages saving the state_dict only has over the entire model, we will be willing to redefine the model if needed.  

In [None]:
model_load_0 = LinearRegressionModel()
model_load_0.load_state_dict(torch.load(f="models\\01_workflow_model_0.pth"))
model_load_0.state_dict()

In [None]:
# Making predictions with our loaded model
with torch.inference_mode():
  model_load_0.eval()
  y_preds = model_load_0(X_test)
  plot_predictions(predictions=y_preds.to("cpu"))