<a href="https://colab.research.google.com/github/Renshui-MC/DeepLearning-ZerosToGans/blob/main/LinearRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Gradient Descent and Linear Regression with PyTorch**

+ understand what the **linear regression** and **gradient descent** are
+ implement a **linear regression model** using `PyTorch`
+ train your linear regression model using the **gradient descent algorithm**
+ implement gradient descent using `PyTorch` built-in

**Linear regression** is one of the foundamental algorithms in machine learning (ML). Most ML courses beigin with linear regression.

In a lineear regression model, you have 

1. **weight** to the **input variable** ($\vec{x}^{(i)}_j$, where $(i)$ represents training examples and $j$ represents features)
2. **target (output) variable** ($y^{(i)}$)
3. offset by some constant (bias)

```
y1  = w11 * feature1 + w12 * feature2 + w13 * feature3 + b1
y2  = w21 * feature1 + w22 * feature2 + w23 * feature3 + b2
```

Note that values of weights vary with targets. `b1` and `b2` are biases that are added to give feasible results, e.g., when all features are at a vlaue of zero we still get predictions for our targets.

**Learning** is a process to find the best set of weights `w11, w12, ..., b1 and b2` using **training data**, to accurately predict the new data. One of the fundational learning technique is called **gradient descent**. This technique can be used through the combination of `numpy` and `PyTorch`.

In [39]:
import numpy as np
import torch

##Training data

+ two matrices: `inputs` and `targets`
+ rows always represents **observation data**
+ columns always represent **targets**

First we use `numpy` arrays to store our data because `numpy` is a powerful python library to deal with matrices. Then we convert it to `pytorch` tensors for traning. **Floating point numbers** are good for mathematical operations. 

It is efficient to operate on the input and output variables separatelly. 

In [40]:
# Input variables (multiple features: temp (col1), rainfall (col2), humidity (col3))
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32') #you either do 73. or use 'dtype='float32' to convert to floating point

In [41]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

Next, we convert them to `PyTorch` tensors via `torch.from_numpy(variable_name)`.

In [42]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


##Create a linear regression model

Now we have everythin to create a linear regression model. We can beigin with random **weights** `w` and **biases** `b` based on a **normal distribution**.

In [43]:
# Weights and biases
w = torch.randn(2 , 3, requires_grad=True) #two rows and three columns
b = torch.randn(2, requires_grad=True) #one row and two columns
print(w)
print(b)

tensor([[ 1.7605, -1.8451,  0.8997],
        [-0.2172, -0.8063, -0.8036]], requires_grad=True)
tensor([ 1.1185, -1.2115], requires_grad=True)


Now we have

+ inputs in pytorch tensor form
+ targets in pytorch tensor form
+ weights in pytorch tensor form
+ biase in pytorch tensor form

the model can be created:

```
model = inputs x weights(transposed) + b 
```

To get the transpose of a matrix, use `.t()` and `@` represents **matrix multiplication.**




In [44]:
#Test output here
Output = inputs @ w.t() + b
print(Output)

tensor([[  44.6983, -105.6436],
        [  56.5331, -143.3608],
        [ -40.7817, -174.7607],
        [ 134.6371,  -87.7695],
        [   8.4395, -149.8543]], grad_fn=<AddBackward0>)


To avoid repetition, we want to create a function to pass the inputs and predict the targets.

In [45]:
def model(x):
  return x @ w.t()+b

Now let's pass the inputs to the created **linear regression model** and predict the target variables.

In [46]:
#predict target variables
preds = model(inputs)
print(preds)

tensor([[  44.6983, -105.6436],
        [  56.5331, -143.3608],
        [ -40.7817, -174.7607],
        [ 134.6371,  -87.7695],
        [   8.4395, -149.8543]], grad_fn=<AddBackward0>)


To know how accurate the model is we need to compare the predictions with actual targets.

In [47]:
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


Comparison shows that the predictions are quite off from the the targets, becuase the model is based on **random weights and biases**. Therefore, we need to improve the model. 

+ to evaluate the mean squared error (MSE)

```
MSE = $Σ[(preds - actual)^2]/#elements$
```

Note that **MSE** is a **single value**.

In [48]:
diff = preds - targets
#To get negative values

torch.sum(diff * diff) / diff.numel() #add all entry elements up and divide by the number of elements

tensor(32101.4941, grad_fn=<DivBackward0>)

##**The loss function**
To avoid repetition, we create the MSE function. 

+ `torch.sum` returns the sum of all elements in a matrix
+ `.numel()` returns the number of elements in a matrix

In [49]:
# MSE loss
def mse(P,T):
  diff = P - T
  return torch.sum(diff*diff)/diff.numel()

Now compute MSE using the created model above

In [50]:
# Compute loss
loss = mse(preds, targets)
print(loss)

tensor(32101.4941, grad_fn=<DivBackward0>)


From the **MSE** model, it is clear that the **square root of the error** is quite large. It indicates that the model is bad at predicting the target variables. **The lower the loss the better the model.**

##**To reduce the loss using gradient descent technique**

+  recall we set `requires_grad=True` to `w` and `b`
+ loss is a function of `w` and `b` 
+ compute gradients using `.backward()`
+ calculated gradients are stored in `.grad`
+ note `grad` can be implicitly created only for **scalar** outputs (loss function outputs a single scalar value)

In this particular example, not only are there multiple features (temp (col1), rainfall (col2), humidity (col3)), but also mutliple targets (apples and orange). This indicates that we need two sets of $\vec{w}_j$. Recall $j$ represents features. Three features are required to describe one target, e.g., $j = 3$. Therefore, we need six weights for two targets. Further, each target needs a cost function to train the weights. 

In [51]:
# Compute gradients 
loss.backward()

In [52]:
print(w.grad)
print(b.grad)

tensor([[ -2397.6421,  -5711.0923,  -2754.1711],
        [-18622.2852, -21137.3809, -12895.6318]])
tensor([ -35.4947, -224.2778])


If a gradient element is positive:

+ **increasing the weight** element's value slightly will **increase the loss**
+ **decreasing the weight** element's value slightly will **decrease the loss**

If a gradient element is negative:

+ **increasing the weight** element's value slightly will **decrease the loss**
+ **decreasing the weight** element's value slightly will **increase the loss**

`PyTorch` will do this judement for us. Now we need to choose an appropriate **learning rate ($α = 1e-5$)** to train the weights. `with torch.no_grad()` disables gradient calculation because we have obtained the gradients via `loss.backward()` and we do not need to recalculate the gradients again.

It should be noted that `w.grad` is just taking the derivative of the cost function, i.e., $\frac{\partial J(\vec{w},b)}{\partial w_{n}}$ where $J$ is the cost function. The following coding reads as:

$$w_{n} = w_{n}-α\frac{\partial J(\vec{w},b)}{\partial w_{n}}$$

In [53]:
with torch.no_grad():
  w -= w.grad * 1e-5
  b -= b.grad * 1e-5

With gradient descent technique (updated weights and biases), let's test if the model can give more accurate predictions. 

Remember to reset the gradients to zero by invoking the `.zero_()` method. We need to do this because PyTorch accumulates gradients. Otherwise, the next time we invoke `.backward` on the loss, the new gradient values are added to the existing gradients, which may lead to unexpected results.

In [54]:
preds = model(inputs)
loss = mse(preds, targets)#use the new weights
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)
print(loss)
print(torch.sqrt(loss))

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])
tensor(22938.9590, grad_fn=<DivBackward0>)
tensor(151.4561, grad_fn=<SqrtBackward0>)


##**Reduce the loss further by training for multiple epochs**

Let's train the model for $100$ epochs.

In [61]:
# Train for 100 epochs
for i in range(200):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

Let's verify if the loss gets lower:

In [62]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)
print(torch.sqrt(loss))

tensor(129.5387, grad_fn=<DivBackward0>)
tensor(11.3815, grad_fn=<SqrtBackward0>)


Let's compare preditions to targets:

In [63]:
preds

tensor([[ 60.8545,  72.1634],
        [ 88.6902,  96.7180],
        [ 98.0203, 138.9619],
        [ 41.8529,  48.4389],
        [101.2956, 105.3587]], grad_fn=<AddBackward0>)

In [64]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])