In [2]:
import torch
from torch import nn

import matplotlib.pyplot as plt

In [3]:
# Data points imported from lesson 19

weight = 0.91
bias = 0.3

# Create
start = 0
end = 1
step = 0.02
# parameters for creating the tensor: X

X = torch.arange(start, end, step).unsqueeze(dim = 1)
# adds an extra dimension so theres an extra square bracket and each element of X, y are on different lines in the output for betting viewing
y = weight * X + bias # is Linear Regression Formula


trainSplit = int(0.8 * len(X)) # creating the train split by multiplying the upper bounds of the train split by the length of X to get the total number

XTrain, yTrain = X[:trainSplit], y[:trainSplit] # indexing to get all samples up until the trainsplit
XTest, yTest = X[trainSplit:], y[trainSplit:] # indexing to get all the samples from the trainsplit onwards, or what is left over after the trainsplit

In [4]:
# code imported from lesson 19
# using matplotlib to visualize the data points

def plotPredictions(trainData = XTrain, 
                    trainLabels = yTrain, 
                    testData = XTest, 
                    testLabels = yTest, 
                    prediction = None):
# Plots training data, test data and comparing predictions

    plt.figure(figsize = (10, 7)) 
    
    plt.scatter(trainData, trainLabels, c = "b", label = "Training Data")

    plt.scatter(testData, testLabels, c = "g", label = "Testing Data")

    if prediction is not None:
        plt.scatter(testData, prediction, c = "r", label = "Prediction")

    plt.legend(prop = {"size": 14})

In [5]:
# model imported from lesson 20

class LinearRegressionModel(nn.Module): # almost everything in Pytorch inherits from nn.Module, and can be considered the building blocks for pytorch
    def __init__(self):
        super().__init__()
        
        self.weights = nn.Parameter(torch.randn(1,
                                                requires_grad=True, # grad = True is set by default, one of the main algorithms for predictions
                                                dtype= torch.float))
        
        self.bias = nn.Parameter(torch.randn(1,
                                             requires_grad= True,
                                             dtype= torch.float))
        
    def forward(self, x: torch.Tensor) -> torch.Tensor: # -> means the return value, very similar to java, but for python its included outside of the method
        # 'x' is the input data
        return self.weights * x + self.bias # linear regression formula

In [6]:
# code imported from lesson 22
# creating a random seed
torch.manual_seed(246)

# creating an instance of the model which is a subclass of nn.Module
model0 = LinearRegressionModel() # type: ignore

In [11]:
# Code imported from lesson 25

lossFunction = nn.L1Loss()

optimizer = torch.optim.SGD(params=model0.parameters(),
                            lr= 0.01, # learning rate - is the most important hyperparameter that the programmer can set, the smaller the learning rate, the smaller the change
                                      # in the parameters, and the larger the learning rate, the larger the change of the parameters
                            )

### Building a training and Testing Loop in Pytorch

Things needed in a training loop:
1. Loop through the data
2. Forward pass (involving data moving through our model's forward() model) (data moving through the neural network model) (also called forward propagation) to make prediction on the data
3. Calculate the loss (compare forward pass predictions to ground truth labels ())
4. Optimizer zero grad
5. Loss Backward - moving data backwards through the network to calculate the gradient of each parameter of the model with respect to the loss (**backpropagation**)
6. Optimizer Step - using the optimizer in adjusting the model's parameter to attempt to improve the loss (**gradient descent**) in order to minimize the gradient (slope, change in x and y)

In [20]:
# Training Loop

torch.manual_seed(246)

# an epoch is a loop through the data, and is a hyperparameter since we set it ourselves 
epochs = 10

# 1. Loop through the data (have to execute all the training loop steps in the for loop below
for epoch in range(epochs):
    # setting the model to training mode, which is the calling the function .train() on the model, in Pytorch train mode sets all parameters that need parameters to require parameters
    model0.train() # default mode/state of the model

    # 2. Forward Pass
    yPred = model0(XTrain) # learning patterns on the training data, and evaluating data on the test data, and uses the Forward method

    # 3. Calculating the Loss (Mean Absolute Error or MAE) or the distance from the test values(green dots), and the values that the model gave(red dots)
    loss = lossFunction(yPred, yTrain) # lossFunction = nn.L1Loss()

    # 4. Optimizer zero grad 
    optimizer.zero_grad() # optimizer = torch.optim.SGD(params=model0.parameters(),lr= 0.01,)

    # 5. Backpropagation with respect to the loss with the models parameters
    loss.backward()

    # 6. Optimizer step (perform the gradient descent), makes calculations in how the model should adjust the parameters based on the loss of the back propagation 
    optimizer.step() 
    # by default, the change of the optimizer will accumulate through the loop, so step 4 is important to prevent 

    model0.eval() # turns off gradient tracking

### Pytorch Training Loop


Reason for keeping track of the gradient:
* Behind the scenes pytorch will keep track of the curve(gradient graph) of all the parameters(for custom neural networks that can have millions of parameters, it's impossible for the human to keep track of and comprehend)
* so Pytorch will track the gradient and will find the lowest point using back propagation and gradient descent 

Gradient Descent in Machine Learning:
* derivative of a function with more than one input variable
* starts with a random initial value 
* with each learning step, the model wants to get closer and closer to the minimum (and distance between the steps also gets smaller with each iteration)
* heads in the opposite direction of the slope/gradient(heads down the slope) to get to a gradient value of zero
* with a gradient value of zero in a loss function the loss is also zero
* the model just wants to get to the bottom or the minimum of the cost function
* starts with large steps at first, and as the model gets closer to the bottom, to prevent going over, the model uses increase smaller steps (pytorch does this automatically using torch.auto_grad())

![image.png](attachment:image.png)