![image](https://analyticsindiamag.com/wp-content/uploads/2020/02/Pytorch.png)

###  This Notebook will give you the basic idea about Pytorch and give you step by step instructions that how to implement some basic operation in tensors using Pytorch.

#### This Notebook will cover the following topics:
#### **1. What are tensors**
#### **2. Difference between the basic arrays and tensors**
#### **3. How to create Tensors**
#### **4. Basic matrix operations on Tensors**
#### **5. What is AutoGrad**
#### **5. Linear Regression from Scratch**
#### **6. What are Data Loaders and DataClass in Pytorch**
#### **7. Performing Linear Regression using Pytorch Libraries**

     NOTE: This Notebook is Just for giving you a basic Idea about Pytorch. Pytorch is very deep and I really Don't want this Notebook to be too lengthy so you have to explore rest of it on your own! but Don't worry this Notebook will give you a headstart and will explain you some basics that will be very helpfull in your future journey

![](https://i.pinimg.com/originals/d5/85/e9/d585e948d82cbae60bb087175b9f70e1.jpg)


## What is Pytorch

PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab

## Difference between Numpy and Pytorch ?
The most important difference between the two frameworks is naming. Numpy calls tensors (high dimensional matrices or vectors) arrays while in PyTorch there’s just called tensors. Everything else is quite similar.

## Why PyTorch?

Even if you already know Numpy, there are still a couple of reasons to switch to PyTorch for tensor computation. The main reason is the GPU acceleration. As you’ll see, using a GPU with PyTorch is super easy and super fast. If you do large computations, this is beneficial because it speeds things up a lot.

![](https://tensorflownet.readthedocs.io/en/latest/_static/tensor-naming.png)

## Why Numpy?

Numpy is the most commonly used computing framework for linear algebra. A good use case of Numpy is quick experimentation and small projects because Numpy is a light weight framework compared to PyTorch.

## **NOTE: For this Notebook you should have a little bit knowledge of Numpy 📚📙**

[Check out this link for getting a basic Idea of Numpy ](https://www.kaggle.com/colinmorris/working-with-external-libraries)



![](https://i.imgflip.com/1cjwpr.jpg)

In [1]:
import torch 
import numpy as np

## **To check  if torch can access GPU or not**

For this you have to select accelator (GPU) from right --->
![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4127694%2Ffbbf954abb41e8bbbb6f5a668ec44849%2Fgpu.PNG?generation=1582892394479926&alt=media)

In [2]:
torch.cuda.is_available()

True

### Simple list 

In [3]:
data = [[1 , 2 ] , [3 , 4]]
print(data)
type(data)

[[1, 2], [3, 4]]


list

## **Creating Tensors using Torch** 

![](https://media.makeameme.org/created/what-is-tensor.jpg)

A tensor is a dimensional data structure. Vectors are one-dimensional data structures and matrices are two-dimensional data structures. Tensors are superficially similar to these other data structures, but the difference is that they can exist in dimensions ranging from zero to n 

There are different way we can create tensors for example creating tensors from simple list and from numpy array as shown below 🔽

In [4]:
np.array(data)

array([[1, 2],
       [3, 4]])

In [5]:
data = torch.tensor(data)
type(data)

torch.Tensor

In [6]:
data.dtype

torch.int64

In [7]:
array =np.random.rand(3 ,4 )
array

array([[0.21079095, 0.31025012, 0.38596169, 0.3766617 ],
       [0.2084766 , 0.30751253, 0.45611597, 0.0574945 ],
       [0.81359369, 0.28441014, 0.40976002, 0.43682   ]])

In [8]:
torch.from_numpy(array)

tensor([[0.2108, 0.3103, 0.3860, 0.3767],
        [0.2085, 0.3075, 0.4561, 0.0575],
        [0.8136, 0.2844, 0.4098, 0.4368]], dtype=torch.float64)

In [9]:
torch.tensor(array)

tensor([[0.2108, 0.3103, 0.3860, 0.3767],
        [0.2085, 0.3075, 0.4561, 0.0575],
        [0.8136, 0.2844, 0.4098, 0.4368]], dtype=torch.float64)

In [10]:
torch.ones(3 ,4)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [11]:
torch.zeros(3 ,4)

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [12]:
my_tensor = torch.rand(3 , 4)

In [13]:
my_tensor.dtype

torch.float32

### **Switching between CPU and GPU**
A torch.device is an object representing the device on which a torch.Tensor is or will be allocated.

The torch.device contains a device type ('cpu' or 'cuda') and optional 
device ordinal for the device type.

For switching from CPU to GPU use ".to('cuda')

![](https://naadispeaks.files.wordpress.com/2020/10/download.jpeg?w=300)

In [14]:
my_tensor.device

device(type='cpu')

In [15]:
my_tensor  = my_tensor.to("cuda")

    Found GPU%d %s which is of cuda capability %d.%d.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is %d.%d.
    


In [16]:
my_tensor

tensor([[0.9199, 0.5283, 0.3097, 0.3547],
        [0.4573, 0.5529, 0.3098, 0.0088],
        [0.2671, 0.1688, 0.8900, 0.3938]], device='cuda:0')

In [17]:
my_tensor[0]

tensor([0.9199, 0.5283, 0.3097, 0.3547], device='cuda:0')

In [18]:
my_tensor[: , 1:3]

tensor([[0.5283, 0.3097],
        [0.5529, 0.3098],
        [0.1688, 0.8900]], device='cuda:0')

## Operations on Tensors 

.mul for item by item multiplication 

![](https://miro.medium.com/max/1400/1*54rq3_-FZaJxKLdOYN8qjA.png)

There are 2 ways in which we can perform this operation 

In [19]:
print(my_tensor.mul(my_tensor))

print("\n",my_tensor * my_tensor)

tensor([[8.4620e-01, 2.7914e-01, 9.5894e-02, 1.2580e-01],
        [2.0909e-01, 3.0572e-01, 9.5964e-02, 7.7048e-05],
        [7.1316e-02, 2.8480e-02, 7.9215e-01, 1.5509e-01]], device='cuda:0')

 tensor([[8.4620e-01, 2.7914e-01, 9.5894e-02, 1.2580e-01],
        [2.0909e-01, 3.0572e-01, 9.5964e-02, 7.7048e-05],
        [7.1316e-02, 2.8480e-02, 7.9215e-01, 1.5509e-01]], device='cuda:0')


## Matmul (Matrix Multlipication)
**Matmul method is used for matrix multiplication of two tensors** 
![](https://i1.faceprep.in/Companies-1/matrix-multiplication-in-python.png)

Three different way that you can perform this operations are shown below 🔽


In [20]:
my_tensor.matmul(my_tensor.T)

tensor([[1.3470, 0.8118, 0.7501],
        [0.8118, 0.6108, 0.4946],
        [0.7501, 0.4946, 1.0470]], device='cuda:0')

In [21]:
torch.matmul(my_tensor , my_tensor.T)

tensor([[1.3470, 0.8118, 0.7501],
        [0.8118, 0.6108, 0.4946],
        [0.7501, 0.4946, 1.0470]], device='cuda:0')

In [22]:
my_tensor @ my_tensor.T

tensor([[1.3470, 0.8118, 0.7501],
        [0.8118, 0.6108, 0.4946],
        [0.7501, 0.4946, 1.0470]], device='cuda:0')

## **Matrix Addition**
![](https://i1.faceprep.in/Companies-1/matrix-addition-in-python.png)

you can provide axis as parameter to perform sum column-wise and row-wise

In [23]:
my_tensor.sum()  , my_tensor.sum(axis = 0) , my_tensor.sum(axis = 1)

(tensor(5.1610, device='cuda:0'),
 tensor([1.6442, 1.2500, 1.5095, 0.7573], device='cuda:0'),
 tensor([2.1126, 1.3287, 1.7197], device='cuda:0'))

#### **max and min will help you to get the max and min value from your tensors**

In [24]:
my_tensor.max() , my_tensor.min()

(tensor(0.9199, device='cuda:0'), tensor(0.0088, device='cuda:0'))

### **cat will help you cocatinate two tesnor together with axis parameter you can specify the orentation of concatinating** 

![](https://static.javatpoint.com/tutorial/numpy/images/numpy-concatenate.png)

In [25]:
torch.cat([my_tensor , my_tensor], axis = 1)

tensor([[0.9199, 0.5283, 0.3097, 0.3547, 0.9199, 0.5283, 0.3097, 0.3547],
        [0.4573, 0.5529, 0.3098, 0.0088, 0.4573, 0.5529, 0.3098, 0.0088],
        [0.2671, 0.1688, 0.8900, 0.3938, 0.2671, 0.1688, 0.8900, 0.3938]],
       device='cuda:0')

In [26]:
torch.cat([my_tensor , my_tensor ] ,axis = 0)

tensor([[0.9199, 0.5283, 0.3097, 0.3547],
        [0.4573, 0.5529, 0.3098, 0.0088],
        [0.2671, 0.1688, 0.8900, 0.3938],
        [0.9199, 0.5283, 0.3097, 0.3547],
        [0.4573, 0.5529, 0.3098, 0.0088],
        [0.2671, 0.1688, 0.8900, 0.3938]], device='cuda:0')

In [27]:
my_tensor.shape

torch.Size([3, 4])

In [28]:
my_tensor.size()

torch.Size([3, 4])

**Clips are alias for Clamps**

Clamps all elements in input into the range [ min, max ]. Letting min_value and max_value be min and max, respectively, this returns:

yi=min⁡(max⁡(xi,min_valuei),max_valuei)


If min is None, there is no lower bound. Or, if max is None there is no upper bound.

**NOTE : If min is greater than max torch.clamp(..., min, max) sets all elements in input to the value of max.**

In [None]:
my_tensor.clip(0.2 ,0.8)

### Way to create tensors to normal Numpy Arrays

In [None]:
my_tensor.cpu().detach().numpy()

## DAY 2

### AUTO GRAD

torch.autograd is PyTorch’s automatic differentiation engine that powers neural network training

Training a Neural Networks(NN) happens in two steps:

Forward Propagation: In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.

Backward Propagation: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent.

In [29]:
import torch 

We create two tensors a and b with requires_grad=True. This signals to autograd that every operation on them should be tracked.

In [30]:
a = torch.tensor([5.0] , requires_grad = True)
b = torch.tensor([6.0] , requires_grad = True)

In [31]:
a

tensor([5.], requires_grad=True)

In [32]:
b

tensor([6.], requires_grad=True)

## We create another tensor Y from a and b.

In [33]:
y = a**3 - b**2
y

tensor([89.], grad_fn=<SubBackward0>)

At first the grad will be none for both tensors

In [34]:
print(a.grad) , print(b.grad)

None
None


(None, None)

## Intiate the back-propoagtion

In [35]:
y.backward()

# dy/da = 3*a**2  = 75
# dy/db = -2b = -12

In [36]:
a.grad , b.grad

(tensor([75.]), tensor([-12.]))

![image.png](attachment:b95df6d6-6fb7-45d9-8cb1-8912713a9c3b.png)

In [37]:
Weight = torch.randn(10 , 1, requires_grad = True)
bias = torch.randn(1 , requires_grad = True) 

In [38]:
Weight

tensor([[-0.5050],
        [ 1.0702],
        [ 0.4295],
        [-1.5039],
        [-0.0229],
        [-1.1391],
        [ 0.3478],
        [-0.7717],
        [-1.7159],
        [-0.1437]], requires_grad=True)

In [39]:
bias

tensor([-0.3074], requires_grad=True)

## Creating some random features 

In [40]:
features = torch.rand(1, 10)

In [41]:
features

tensor([[0.7826, 0.9995, 0.6872, 0.6409, 0.2069, 0.8989, 0.3573, 0.5083, 0.0463,
         0.4108]])

## Linear Regression using Back-propogation 

For this you should have some basic knowledge of maths behing Linear Regression and Gradient Descent 

[Check out this You_tube Video for understanding the Gradient Descent concept](https://www.youtube.com/watch?v=vsWrXfO3wWw)

In [42]:
output = torch.matmul(features , Weight)+bias

In [43]:
loss = 1-output

In [44]:
output

tensor([[-1.7368]], grad_fn=<AddBackward0>)

In [45]:
loss.backward()

In [46]:
Weight.grad

tensor([[-0.7826],
        [-0.9995],
        [-0.6872],
        [-0.6409],
        [-0.2069],
        [-0.8989],
        [-0.3573],
        [-0.5083],
        [-0.0463],
        [-0.4108]])

**Learning rate** gives the rate of speed where the gradient moves during gradient descent. Setting it too high would make your path instable, too low would make convergence slow. Put it to zero means your model isn't learning anything from the gradients.

**torch.no_grad** :
Disabling gradient calculation is useful for inference, when you are sure that you will not call Tensor.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True.

In [47]:
learning_rate = 0.001
with torch.no_grad():
    Weight = Weight - learning_rate * Weight.grad.data

Modified Weights are : 

In [48]:
Weight

tensor([[-0.5042],
        [ 1.0712],
        [ 0.4301],
        [-1.5032],
        [-0.0227],
        [-1.1382],
        [ 0.3481],
        [-0.7712],
        [-1.7159],
        [-0.1433]])

## Example 2 (Linear Reggression)



w = Weights ||  b = Bias

In Linear Regression we find the value of bias and weights and using that values we predict new set of output  

In [49]:
import numpy as np
import torch 

Our data is small and right now we dont have any csv so I created my data in numpy. You can also create csv and import it here in Kaggle and using iloc methods of pandas you can get the same array 

In [50]:
inputs = np.array([[73 , 67 , 43],
                  [91 , 88 , 64],
                   [87 , 134 , 58],
                   [102 , 43 , 37],
                   [69 ,96 , 70]] , dtype = "float32")

In [51]:
targets = np.array([[56, 70],
                  [81 ,101],
                  [119, 133],
                  [22 ,37],
                  [103 , 119]] , dtype = "float32")

## Converting Normal arrays in tensors

In [52]:
inputs = torch.from_numpy(inputs)
target = torch.from_numpy(targets)

In [53]:
inputs

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])

Again we take the random values for bias and weights using the randn method

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTyHkcnQcqBu-GYtVbFaocaV33_bW4EyLe7Jw&usqp=CAU)

Here Y is the predicted value and x vector is features and beta one is weights and epislon one is the bias 

**torch.randn** creates a tensor with the given shape with elements picked randomly from a normal distribution with mean = 0 and standard deviation 1

In [54]:
w = torch.randn(2 , 3 , requires_grad = True)
b = torch.randn(2 , requires_grad = True)

In [55]:
w , b 

(tensor([[-1.8036, -0.1410, -0.6206],
         [-0.1952, -0.4783, -1.2281]], requires_grad=True),
 tensor([-0.0663, -0.0739], requires_grad=True))

# Defining a Model

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTyHkcnQcqBu-GYtVbFaocaV33_bW4EyLe7Jw&usqp=CAU)

In [56]:
def model(x):
    return x @ w.T + b
# @ for vector multiplication

Don't worry if you are not getting the matrix concept.
Just Understand this here in model function we are returning the predicted values and the predicted values is in the form of tensors( matrix )


**.T** use to transpose the matrix

we transpose our weight vector because we have created our weights in 2 dimensions and for vector muliplication purpose we have to transpose it

![](https://i.imgur.com/WGXLFvA.pnghttps://i.imgur.com/WGXLFvA.png)

In [57]:
# Predicted values
pred = model(inputs)
print(pred)

tensor([[-167.8614,  -99.1757],
        [-216.3191, -138.5230],
        [-211.8657, -152.3734],
        [-213.0607,  -85.9900],
        [-181.4899, -145.4230]], grad_fn=<AddBackward0>)


In [58]:
# actual target 
print(targets)

[[ 56.  70.]
 [ 81. 101.]
 [119. 133.]
 [ 22.  37.]
 [103. 119.]]


We can see that there are huge difference between our Actual targets and Our Predicted Targets
 
## Loss function

Before we improve our model, we need a way to evaluate how well our model is performing. We can compare the model's predictions with the actual targets using the following method:

* Calculate the difference between the two matrices (`preds` and `targets`).
* Square all elements of the difference matrix to remove negative values.
* Calculate the average of the elements in the resulting matrix.

The result is a single number, known as the **mean squared error** (MSE).

In [59]:
def mse(t1 , t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()
# numel will give the total number of elements
# * gives the element by element multiplication

In [60]:
# Compute the loss
loss = mse(pred , target)
loss

tensor(63664.8516, grad_fn=<DivBackward0>)

In [61]:
import math
math.sqrt(loss)

252.3189480845622

It tells us the our model is off by this value we have to minimize this error.(Low error better the model)

In [62]:
loss.backward()

The gradients are stored in the .grad property of the respective tensors

**NOTE :** The Derivative of the loss w.r.t the weights is itself a matrix , with the same dimensions 

In [63]:
print(w)
print(w.grad)

tensor([[-1.8036, -0.1410, -0.6206],
        [-0.1952, -0.4783, -1.2281]], requires_grad=True)
tensor([[-23157.8477, -24583.4902, -15291.2441],
        [-17952.8145, -20265.2031, -12443.1855]])


In [64]:
print(b)
print(b.grad)

tensor([-0.0663, -0.0739], requires_grad=True)
tensor([-274.3194, -216.2970])


## Resetting the gradient to zero 

.zero_() resets the gradient to zero 

In [65]:
w.grad.zero_()
b.grad.zero_()
w.grad , b.grad

(tensor([[0., 0., 0.],
         [0., 0., 0.]]),
 tensor([0., 0.]))

![](https://storage.googleapis.com/kagglesdsdata/datasets/1628643/2676548/grad%20steps.jpg?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20211004%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20211004T190914Z&X-Goog-Expires=345599&X-Goog-SignedHeaders=host&X-Goog-Signature=7655b98b732f2caafec13aa8a31ed206914d58bcf2a396698053a2507f13e0b83e395eb80ccdcd706ab8689f8dc6447abe65ebe3c480de65f3be75729273592c7488b012efdd503851425b3ec63e0c59145857157e9269bb7ce41c566ecd6e1f68d7d636e6f02d511f50fca35002e1951e88ea5d245844b264b14768135eb095d54c286d8648989c4398601723ede47da367c837d141a6eb36bb22a69f0eac7d5039995dd4efc0707500000fd8d958e16cf7a12bcc9dc47a27515e162f919b56ea99cce2dcf4a37936313d0d55b7827269c24159dd101316eb2621c0e046758a704aa3e60e650e7d325dbaeff3875a751b177b237dce38243ea1d77e6199cc44)

In [66]:
# making predictions
preds = model(inputs)
preds

tensor([[-167.8614,  -99.1757],
        [-216.3191, -138.5230],
        [-211.8657, -152.3734],
        [-213.0607,  -85.9900],
        [-181.4899, -145.4230]], grad_fn=<AddBackward0>)

In [67]:
with torch.no_grad():
    w-= w.grad * 0.0001
    b-= b.grad * 0.0001
    w.grad.zero_()
    b.grad.zero_()

In [68]:
w , b

(tensor([[-1.8036, -0.1410, -0.6206],
         [-0.1952, -0.4783, -1.2281]], requires_grad=True),
 tensor([-0.0663, -0.0739], requires_grad=True))

In [71]:
preds = model(inputs)
loss = mse(preds , target)
loss

tensor(47.8591, grad_fn=<DivBackward0>)

### We reduce loss little bit 

## **Train the model for multiple epochs**
 epoches simply means the no of iterations

In [70]:
# for 50 epoches
for i in range(50):
    preds = model(inputs)
    loss = mse(preds , target)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 0.0001
        b -= b.grad * 0.0001
        w.grad.zero_()
        b.grad.zero_()

In [72]:
preds = model(inputs)
loss = mse(preds , target)
loss

tensor(47.8591, grad_fn=<DivBackward0>)

In [73]:
preds

tensor([[ 57.3313,  71.3866],
        [ 79.8805,  94.4756],
        [123.7056, 145.3055],
        [ 21.8584,  42.6510],
        [ 97.4110, 105.0603]], grad_fn=<AddBackward0>)

In [74]:
target

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

As we can see loss is pretty low now 

# Linear Regression using PyTorch Bulit-In Library

In [78]:
import torch.nn as nn
from torch.utils.data import TensorDataset

In [79]:
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70], 
                   [74, 66, 43], 
                   [91, 87, 65], 
                   [88, 134, 59], 
                   [101, 44, 37], 
                   [68, 96, 71], 
                   [73, 66, 44], 
                   [92, 87, 64], 
                   [87, 135, 57], 
                   [103, 43, 36], 
                   [68, 97, 70]], 
                  dtype='float32')

# Targets (apples, oranges)

targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119],
                    [57, 69], 
                    [80, 102], 
                    [118, 132], 
                    [21, 38], 
                    [104, 118], 
                    [57, 69], 
                    [82, 100], 
                    [118, 134], 
                    [20, 38], 
                    [102, 120]], 
                   dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [76]:
inputs[:5]

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])


#### **Imagine TensorDataset as a Class in which you pass your data as arguments and use the methods which are provided by the class**

Tensor Dataset allows us to access rows from the inputs and targets as tuples and provide us standard API's for working with many different types of datasets in PyTorch

In [80]:
train_data = TensorDataset(inputs, targets)
train_data[0:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

#### **DataLoader can split the data in batches of a predefined size while training. It also provides other utilities like shuffling and random sampling of the data**

In [81]:
from torch.utils.data import DataLoader

In [82]:
batch_size = 5
train_loader = DataLoader(train_data, batch_size, shuffle=True)
#shuffle as the name suggest shuffles the data and gives us random values

When Data is too large In Pytorch we divide our data in different batches and because of working in different  batches there will be less load on memory.

**NOTE: Here Our Data is not that much big but still for the demonstration purpose we will work with batches** 

#### So here is the one batch of data

In [83]:
for x , y in train_loader:
    print("Batche 1 : ")
    print(x)
    print(y)
    # To check all the batches comment out the break statement
    break


Batche 1 : 
tensor([[ 88., 134.,  59.],
        [ 68.,  96.,  71.],
        [ 73.,  67.,  43.],
        [101.,  44.,  37.],
        [ 87., 134.,  58.]])
tensor([[118., 132.],
        [104., 118.],
        [ 56.,  70.],
        [ 21.,  38.],
        [119., 133.]])


## nn.Linear
Now we just have to provide number of features and no of targets. We don't have to calculate weights and bias manually nn.Linear will calculate that for us

**Our data only have 1 feature and 1 output**

In [84]:
model = nn.Linear(3 , 2)
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[-0.4713,  0.0224, -0.1384],
        [ 0.2005,  0.5134,  0.4501]], requires_grad=True)
Parameter containing:
tensor([-0.0790,  0.0966], requires_grad=True)


#### Model.parameters() method returns a list containing all the weights and bias matrices present in the model

In [85]:
list(model.parameters())

[Parameter containing:
 tensor([[-0.4713,  0.0224, -0.1384],
         [ 0.2005,  0.5134,  0.4501]], requires_grad=True),
 Parameter containing:
 tensor([-0.0790,  0.0966], requires_grad=True)]

In [86]:
preds = model(inputs)
preds[:4]

tensor([[-38.9331,  68.4849],
        [-49.8519,  92.3275],
        [-46.1039, 112.4391],
        [-52.3098,  59.2785]], grad_fn=<SliceBackward>)

## Loss Function

In [87]:
import torch.nn.functional as F

The nn.functional package contains many useful loss funcations and several other utilities

Instead of defining a loss function manually, we can use the built-in loss function mse_loss.

In [88]:
# ?nn.Linear

In [89]:
loss_fn = F.mse_loss
loss = loss_fn(model(inputs) ,targets )
loss

tensor(8076.9463, grad_fn=<MseLossBackward>)

# Optimizer

Instead of changing the weights and biased manually we will take the help of inbuilt Optimizers in Pytorch. 

SGD :  stochastic gradient descent

To understand this SGD [click here](https://www.youtube.com/watch?v=IU5fuoYBTAM&t=885s)

lr is the learning rate and we have to provide model.parameters() as and argument so that the optimizer knows which matrices should be modified druing the update step

In [90]:
opt = torch.optim.SGD(model.parameters() , lr = 0.00001)

## Train the model
![](https://storage.googleapis.com/kagglesdsdata/datasets/1628643/2676548/grad%20steps.jpg?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20211004%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20211004T190914Z&X-Goog-Expires=345599&X-Goog-SignedHeaders=host&X-Goog-Signature=7655b98b732f2caafec13aa8a31ed206914d58bcf2a396698053a2507f13e0b83e395eb80ccdcd706ab8689f8dc6447abe65ebe3c480de65f3be75729273592c7488b012efdd503851425b3ec63e0c59145857157e9269bb7ce41c566ecd6e1f68d7d636e6f02d511f50fca35002e1951e88ea5d245844b264b14768135eb095d54c286d8648989c4398601723ede47da367c837d141a6eb36bb22a69f0eac7d5039995dd4efc0707500000fd8d958e16cf7a12bcc9dc47a27515e162f919b56ea99cce2dcf4a37936313d0d55b7827269c24159dd101316eb2621c0e046758a704aa3e60e650e7d325dbaeff3875a751b177b237dce38243ea1d77e6199cc44)

In [91]:
# Utility function to train the model
def fit(num_epochs, model, loss_fn, opt, train_data):
    
    # Repeat for given number of epochs
    for epoch in range(num_epochs):
        
        # Train with batches of data
        for x,y in train_data:
            
            # 1. Generate predictions
            pred = model(x)
            
            # 2. Calculate loss
            loss = loss_fn(pred, y)
            
            # 3. Compute gradients
            loss.backward()
            
            # 4. Update parameters using gradients
            opt.step()
            
            # 5. Reset the gradients to zero
            opt.zero_grad()
        
        # Print the progress
        if (epoch+1) % 10 == 0:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

In [92]:
fit(100, model, loss_fn, opt, train_data)

Epoch [10/100], Loss: 91.3018
Epoch [20/100], Loss: 36.6783
Epoch [30/100], Loss: 19.1244
Epoch [40/100], Loss: 11.5506
Epoch [50/100], Loss: 7.4768
Epoch [60/100], Loss: 4.9833
Epoch [70/100], Loss: 3.3621
Epoch [80/100], Loss: 2.2861
Epoch [90/100], Loss: 1.5721
Epoch [100/100], Loss: 1.1046


In [93]:
preds = model(inputs)
preds

tensor([[ 56.9782,  70.7238],
        [ 81.4025, 100.3504],
        [119.3236, 134.1488],
        [ 21.1883,  38.3768],
        [100.5527, 117.7238],
        [ 55.7221,  69.6372],
        [ 81.1510, 100.3535],
        [119.5599, 134.7027],
        [ 22.4444,  39.4634],
        [101.5573, 118.8135],
        [ 56.7268,  70.7269],
        [ 80.1464,  99.2639],
        [119.5750, 134.1456],
        [ 20.1837,  37.2872],
        [101.8088, 118.8103]], grad_fn=<AddmmBackward>)

Compare both predictions and Target, there is little bit difference. Now try for More epoches!

In [94]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])