## **Linear Regression**
This notebook presents the issue of linear regression along with the explanation of individual sections, enabling an in-depth analysis of this issue. This course is based on aakashns's PyTorch for Deep Learning tutorial [linear-regression](https://jovian.ai/aakashns/02-linear-regression).




First we need to install the required libraries. The installation of **PyTorch** may differ based on your operating system or hardware.


In [1]:
# Uncomment and run the appropriate command for your operating system, if required

# Linux / Binder / Colab
!pip install numpy torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

# Windows
# !pip install numpy torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

# MacOS
# !pip install numpy torch torchvision torchaudio

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.7.0+cpu
  Downloading https://download.pytorch.org/whl/cpu/torch-1.7.0%2Bcpu-cp37-cp37m-linux_x86_64.whl (159.3 MB)
[K     |████████████████████████████████| 159.3 MB 4.5 kB/s 
[?25hCollecting torchvision==0.8.1+cpu
  Downloading https://download.pytorch.org/whl/cpu/torchvision-0.8.1%2Bcpu-cp37-cp37m-linux_x86_64.whl (11.8 MB)
[K     |████████████████████████████████| 11.8 MB 52 kB/s 
[?25hCollecting torchaudio==0.7.0
  Downloading torchaudio-0.7.0-cp37-cp37m-manylinux1_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 3.6 MB/s 
Collecting dataclasses
  Downloading dataclasses-0.6-py3-none-any.whl (14 kB)
Installing collected packages: dataclasses, torch, torchvision, torchaudio
  Attempting uninstall: torch
    Found existing installation: torch 1.9.0+cu102
    Uninstalling torch-1.9.0+cu102:
      Successfully uninstalled torch-1.9.0+cu102
  Attempting uninstall: torchvisi

Next step is to **inport** module. In addition to ***PyTorch*** we will also need ***numpy***.

Numpy is essential because it provides:
*   **Autograd**: The ability to automatically compute gradients for tensor operations is essential for training deep learning models.
*   **GPU support**: While working with massive datasets and large models, PyTorch tensor operations can be performed efficiently using a Graphics Processing Unit (GPU). Computations that might typically take hours can be completed within minutes using GPUs.






In [2]:
import torch
import numpy as np

## Linear Regression

*Linear regression* is one of the foundational algorithms in machine learning. In this section we will be creating a model that predicts crop yields for apples and oranges (***target variables***). </br>
To achive this we will be computing the average temperature, rainfall, and humidity which are (***input variables or features***). 

![linear-regression-training-data](https://i.imgur.com/6Ujttb4.png)

Linear regression model's target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

```
yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```

Visually, it means that the yield of apples is a linear or planar function of temperature, rainfall and humidity:

![linear-regression-graph](https://i.imgur.com/4DJ9f8X.png)

Now we need to load data. </br> We have to load verage temperature, rainfall, and humidity (in this order) as ***input variables***. </br> Then we will load crops as ***targets***. </br>
Also we are specyfic type of data, in this case is floating point number that is occupying 32 bits in computer memory.

In [3]:
# Input variables -> in order temperature, rainfall, humidity.
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

# Targets -> crops in order apples, oranges.
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

Now we are converting arrays to tensors. </br>
This is very common aproach since most of data will be in CSV format. </br>
It's very easy, we just need to use PyTorch method calles ```torch.from_numpy()``` where we just need to give numpy array.

In [4]:
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

At the begginig values of weights and biases (`w11, w12,... w23, b1 & b2`) can also be represented as matrices. </br>
We will initialized them as random values (and later we will try to find values that are closer to target). </br> The first row of `w` and the first element of `b` are used to predict the first target variable (in this case apples), and similarly, the second for oranges. </br> </br>

We will be using `torch.randn()` that creates a tensor with the given shape, with elements picked randomly from a normal distribution with ***mean = 0*** and ***standard deviation 1***.

In [14]:
# Weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)

Our ***model*** is simply a function that performs a matrix multiplication of the `inputs` and the weights `w` (transposed) and adds the bias `b` (replicated for each observation).

![matrix-mult](https://i.imgur.com/WGXLFvA.png) </br> </br>
`@` represents matrix multiplication in PyTorch, and the `.t` method returns the transpose of a tensor.

In [15]:
# Model: 
def model(x):
    return x @ w.t() + b

Now we can predict values using our model and compare them with values from table.

In [16]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[ 89.1041, 241.2144],
        [108.4099, 331.3131],
        [154.4875, 288.9799],
        [ 90.5994, 284.1300],
        [ 96.3335, 306.8941]], grad_fn=<AddBackward0>)


Now we will show the actual targets.

In [17]:
# actual targets
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


There is no similarity and shouldn't be because our model is initialized with random vlaues. </br>
To improve it we need to we need a way to *evaluate how well our model is performing*. </br> We can compare the model's predictions with the actual targets using the following method:

* Calculate the difference between the two matrices (`preds` and `targets`).
* Square all elements of the difference matrix to remove negative values.
* Calculate the average of the elements in the resulting matrix.

The result is a single number, known as the **mean squared error** (MSE).

In [18]:
# MSE loss function
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()


*   `torch.sum` returns the sum of all the elements in a tensor. 
*   `.numel` method of a tensor returns the number of elements in a tensor.



In [19]:
# Compute and show loss at this point (model with random values)
# Model is better when loss value is as small as possible
loss = mse(preds, targets)
print(loss)

tensor(21092.2539, grad_fn=<DivBackward0>)


With PyTorch, we can automatically compute the gradient (indicating the directions of the fastest increases in the value of a given scalar field) or derivative of the loss to the weights and biases because they have `requires_grad` set to `True`. 

The gradients are stored in the .grad property of the respective tensors.

In [22]:
# Compute gradients
loss.backward()

Step above is very important. We need to ensure to compute gradients

In [23]:
# Gradients for weights compared to weights
print(w)
print(w.grad)

tensor([[ 0.7201,  0.9764, -0.6742],
        [ 1.8216, -0.3591,  3.0881]], requires_grad=True)
tensor([[ 2907.0925,  2339.0322,  1461.5016],
        [17039.8691, 16260.9307, 10689.0986]])


## Adjusting weights and biases to reduce the loss

The loss is a [quadratic function](https://en.wikipedia.org/wiki/Quadratic_function) of our weights and biases, and our objective is to find the set of weights where the loss is the lowest. If we plot a graph of the loss with reference to any individual weight or bias element, it will look like the figure shown below.  </br> An important insight from calculus is that the gradient indicates the rate of change of the loss, i.e., the loss function's [slope](https://en.wikipedia.org/wiki/Slope) with reference to the weights and biases.

If a gradient element is **positive**:

* **increasing** the weight element's value slightly will **increase** the loss
* **decreasing** the weight element's value slightly will **decrease** the loss

![postive-gradient](https://i.imgur.com/WLzJ4xP.png)

If a gradient element is **negative**:

* **increasing** the weight element's value slightly will **decrease** the loss
* **decreasing** the weight element's value slightly will **increase** the loss

![negative=gradient](https://i.imgur.com/dvG2fxU.png)

The increase or decrease in the loss by changing a weight element is proportional to the gradient of the loss with reference to that element. This observation forms the basis of _the gradient descent_ optimization algorithm that we'll use to improve our model (by _descending_ along the _gradient_).

# Coding it
We can subtract from each weight element a small quantity proportional to the derivative of the loss with reference to that element to reduce the loss slightly.

In [24]:
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5

We multiply the gradients with a very small number (`10^-5` in this case) to ensure that we don't modify the weights by a very large amount. This number is called the ***learning rate*** of the algorithm. 

We use `torch.no_grad` to indicate to PyTorch that we shouldn't track, calculate, or modify gradients while updating the weights and biases.

In [25]:
# Applying reducing of loss function
loss = mse(preds, targets)
print(loss)

tensor(21092.2539, grad_fn=<DivBackward0>)


Before we proceed, we reset the gradients to zero by invoking the `.zero_()` method. </br> **This is requaired because PyTorch accumulates gradients.** </br>Otherwise, the next time we invoke `.backward` on the loss, the new gradient values are added to the existing gradients, which may lead to unexpected results.

In [26]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


## Training the model using gradient descent

As seen above, we reduce the loss and improve our model using the gradient descent optimization algorithm. Thus, we can _train_ the model using the following steps:

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

In [None]:
# Generate predictions
preds = model(inputs)
print(preds)

In [None]:
# Calculate the loss
loss = mse(preds, targets)
print(loss)

In [None]:
# Compute gradients
loss.backward()
print(w.grad)
print(b.grad)

In [None]:
# Now update the weights and biases using the gradients computed above.
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [None]:
# Let's take a look at the new weights and biases
print(w)
print(b)

Now With the new weights and biases, the model should have a lower loss

In [None]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

## Train for multiple epochs

To reduce the loss further, we can repeat the process of adjusting the weights and biases using the gradients multiple times. Each iteration is called an **_epoch_**.</br> Let's train the model for 100 epochs.