# What is PyTorch?

<img src="download.jpg">

PyTorch is a Python-based scientific computing package that uses the power of graphics processing units(GPU). It is also one of the preferred deep learning research platforms built to provide maximum flexibility and speed. It is known for providing two of the most high-level features; namely, tensor computations with strong GPU acceleration support and building deep neural networks on a tape-based autograd systems.

There are many existing Python libraries which have the potential to change how deep learning and artificial intelligence are performed, and this is one such library. One of the key reasons behind PyTorch’s success is it is completely Pythonic and one can build neural network models effortlessly. It is still a young player when compared to its other competitors, however, it is gaining momentum fast.

## Brief History about PyTorch

Since its release in January 2016, many researchers have continued to increasingly adopt PyTorch. It has quickly become a go-to library because of its ease in building extremely complex neural networks. It is giving a tough competition to TensorFlow especially when used for research work. However, there is still some time before it is adopted by the masses due to its still “new” and “under construction” tags.

PyTorch creators envisioned this library to be highly imperative which can allow them to run all the numerical computations quickly. This is an ideal methodology which fits perfectly with the Python programming style. It has allowed deep learning scientists, machine learning developers, and neural network debuggers to run and test part of the code in real time. Thus they don’t have to wait for the entire code to be executed to check whether it works or not.
You can always use your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch functionalities and services when required. Now you might ask, why PyTorch? What’ so special in using it to build deep learning models?

The answer is quite simple, PyTorch is a dynamic library (very flexible and you can use as per your requirements and changes) which is currently adopted by many of the researchers, students, and artificial intelligence developers. In the recent Kaggle competition, PyTorch library was used by nearly all of the top 10 finishers.

Some of the key highlights of PyTorch includes:

__Simple Interface:__ It offers easy to use API, thus it is very simple to operate and run like Python.

__Pythonic in nature:__ This library, being Pythonic, smoothly integrates with the Python data science stack. Thus it can leverage all the services and functionalities offered by the Python environment.

__Computational graphs:__ In addition to this, PyTorch provides an excellent platform which offers dynamic computational graphs, thus you can change them during runtime. This is highly useful when you have no idea how much memory will be required for creating a neural network model.

### Why we use PyTorch in research field?

Anyone who is working in the field of deep learning and artificial intelligence has likely worked with TensorFlow before, Google’s most popular open source library. However, the latest deep learning framework – PyTorch solves major problems in terms of research work. Arguably PyTorch is TensorFlow’s biggest competitor to date, and it is currently a much favored deep learning and artificial intelligence library in the research community.


You might be thinking why we use PyTorch? I list down the three factors for that

- Using API is Easy: It is as simple as Python.
- Pyt

### CPU v/s GPU
  - CPU have fewer but more powerful compute cores, and whereas GPUs have a large number of lower-performant cores.
  - CPUs are more suited to sequential tasks and GPUs are suitable for tasks with significant parallelization.

<img src="Cpu vs Gpu.png">

### Install

<mark style="background-color: Blue">__In CPU__</mark>

__For Windows__

* Install PyTorch using conda
      conda install pytorch torchvision cpuonly -c pytorch

* Using pip
      pip3 install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
        
__For Mac__

* Using conda
      conda install pytorch torchvision -c pytorch
    
* Using pip
      pip3 install torch torchvision
    
__For Linux__

* Using conda
      conda install pytorch torchvision cpuonly -c pytorch
    
* Using pip
      pip3 install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
      
      
      
<mark style="background-color: Green">__In GPU__</mark>

__For Windows__

* Install PyTorch using conda cuda=9.2 and Python=3.6
      conda install pytorch torchvision cudatoolkit=9.2 -c pytorch
     
* Using conda cuda=10.1 and Python=3.6 
      conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
     
* Install Pytorch using pip cuda=9.2 and Python=3.6
      pip3 install torch==1.4.0+cu92 torchvision==0.5.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
     
* Using pip cuda=10.1 and Python=3.6
      pip3 install torch torchvision
     
__For Linux__

* Install PyTorch using conda cuda=9.2 and Python=3.6
      conda install pytorch torchvision cudatoolkit=9.2 -c pytorch
     
* Using conda cuda=10.1 and Python=3.6
      conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
     
* Install Pytorch using pip cuda=9.2 and Python=3.6
      pip3 install torch==1.4.0+cu92 torchvision==0.5.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
     
* Using pip cuda=10.1 and Python=3.6
      pip3 install torch torchvision
     
__For Mac__

* Install PyTorch using conda  for cuda=9.2 and 10.1 we can use same command and Python=3.6

      conda install pytorch torchvision -c pytorch
         # MacOS Binaries dont support CUDA, install from source if CUDA is needed
       
* Install Pytorch using pip for cuda=9.2 and 10.1 we can use same command and Python=3.6

      pip3 install torch torchvision
         # MacOS Binaries dont support CUDA, install from source if CUDA is needed
      
You have to run all these commands in __Anaconda Prompt__ , if you want to install in a notebook just put " ! " mark before the command like: !pip3 install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

For more information about Installation you can go through this site : "https://pytorch.org/"

### Tensors

Tensor is similar to Numpy's ndarray, the additional point for Tensors in we can use it in GPUs to accelerate computing.

In [2]:
from __future__ import print_function
import torch

__Note:__ Uninitialized matrix is declared, but doesn't contain definite known values before it is used. When we created an Unintialized matrix, whatever values were allocated inside the memory will apear as the initial values.

Construct 6x3 matrix, uninitialized:

In [3]:
a = torch.empty(6,3) # Create very small random number
print(a)

tensor([[-7.1503e+23,  3.0845e-41,  7.0065e-44],
        [ 7.0065e-44,  6.3058e-44,  6.7262e-44],
        [ 7.1466e-44,  6.3058e-44,  6.8664e-44],
        [ 7.2868e-44,  1.1771e-43,  6.7262e-44],
        [ 7.7071e-44,  8.1275e-44,  7.4269e-44],
        [ 7.5670e-44,  8.1275e-44,  6.8664e-44]])


In [4]:
type(a)

#tensor is nothing but a dimentional array

torch.Tensor

Construct a randomly initialized matrix:

In [5]:
a = torch.rand(4,3)
print(a)

tensor([[0.7929, 0.3931, 0.2612],
        [0.8033, 0.2106, 0.7100],
        [0.9123, 0.2784, 0.5751],
        [0.7811, 0.4441, 0.8235]])


Construct a matrix filled zeros and of dtype long:

In [6]:
a = torch.zeros(4,3, dtype=torch.long)
print(a)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])


Construct a tensor with data:

In [7]:
type([7.8, 5])

# So basically we are converting a list into tensor so it become compatible with torch library

list

In [8]:
a = torch.tensor([7.8, 5])
type(a)

torch.Tensor

In [9]:
a
# tensor that we have created

tensor([7.8000, 5.0000])

In [10]:
# Let's check it out with string
a = torch.tensor(["svd", "sbd"])
type(a)

# Clearly it is throwing error as it is competable with tensor

ValueError: ignored

In [11]:
a = torch.ones(4,5)  # creating a tensor having matrix 4,5 filled with ones

a

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

or we can create new tensor with existing tensor.These methods we reuse its properties of input tensor, e.g. dtype, unless new values are provided by us.

In [12]:
a = a.new_ones(6,5, dtype=torch.double)    # new methods take in sizes
print(a)

a = torch.randn_like(a, dtype=torch.float)  # override dtype
print(a)                                    # result will be the same size

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], dtype=torch.float64)
tensor([[-0.7609, -0.1150,  1.2859, -0.0736,  1.4523],
        [-0.4819, -1.0716,  1.4172,  1.7510, -1.8690],
        [ 0.0197,  0.6880, -1.2671,  0.3874,  0.8064],
        [ 0.0250, -1.3038, -0.3132, -0.5563,  0.8167],
        [ 0.6206, -0.2883,  0.9317, -0.4791, -0.4700],
        [ 0.1473,  0.1295, -0.6647,  0.1566, -0.5587]])


Let's get the size:

In [14]:
print(a.size())

torch.Size([6, 5])


Note: <mark style="background-color: Yellow">torch_size</mark> is actually a tuple, so it supports all tuple operations.

### Operations

There are multiple syntaxes for operations. In the following examples, we used addition operation,

Addition: syntax1

In [15]:
b = torch.rand(6,5)
print(a + b)

# Remember to perfom addition , sizes should be  same

tensor([[-0.3556,  0.3536,  1.7095,  0.3673,  2.4044],
        [ 0.5101, -0.1641,  1.8436,  2.3393, -0.8879],
        [ 0.5220,  1.1758, -0.8054,  0.6265,  1.1007],
        [ 0.1410, -1.1399,  0.2879, -0.0839,  1.6992],
        [ 1.0413,  0.0253,  1.0440,  0.4524, -0.3674],
        [ 0.4779,  0.3270,  0.1174,  0.8425, -0.1501]])


Addition: syntax2

In [16]:
print(torch.add(a,b))

tensor([[-0.3556,  0.3536,  1.7095,  0.3673,  2.4044],
        [ 0.5101, -0.1641,  1.8436,  2.3393, -0.8879],
        [ 0.5220,  1.1758, -0.8054,  0.6265,  1.1007],
        [ 0.1410, -1.1399,  0.2879, -0.0839,  1.6992],
        [ 1.0413,  0.0253,  1.0440,  0.4524, -0.3674],
        [ 0.4779,  0.3270,  0.1174,  0.8425, -0.1501]])


Addition: providing an output as an argument

In [17]:
result = torch.empty(6, 5)  # empty matix
result

tensor([[-7.1537e+23,  3.0845e-41,  7.0065e-44,  7.0065e-44,  6.3058e-44],
        [ 6.7262e-44,  7.1466e-44,  6.3058e-44,  6.8664e-44,  7.2868e-44],
        [ 1.1771e-43,  6.7262e-44,  7.7071e-44,  8.1275e-44,  7.4269e-44],
        [ 7.7071e-44,  8.1275e-44,  6.8664e-44,  6.7262e-44,  6.4460e-44],
        [ 7.4269e-44,  6.7262e-44,  7.0065e-44,  6.8664e-44,  7.9874e-44],
        [ 6.8664e-44,  1.2612e-43,  1.1736e-01,  8.4248e-01, -1.5007e-01]])

In [18]:
result = torch.empty(6, 5)
torch.add(a, b, out=result)  # like a comparison , whatever output will received get compared with result
print(result)

tensor([[-0.3556,  0.3536,  1.7095,  0.3673,  2.4044],
        [ 0.5101, -0.1641,  1.8436,  2.3393, -0.8879],
        [ 0.5220,  1.1758, -0.8054,  0.6265,  1.1007],
        [ 0.1410, -1.1399,  0.2879, -0.0839,  1.6992],
        [ 1.0413,  0.0253,  1.0440,  0.4524, -0.3674],
        [ 0.4779,  0.3270,  0.1174,  0.8425, -0.1501]])


Addition: in place

In [19]:
# adds a to b
# Storing or can say updating these results in b itself
# Simple storing the outcome of a+b in b
b.add_(a)  # underscore means that we are trying to perfom inplace operation
print(b)

tensor([[-0.3556,  0.3536,  1.7095,  0.3673,  2.4044],
        [ 0.5101, -0.1641,  1.8436,  2.3393, -0.8879],
        [ 0.5220,  1.1758, -0.8054,  0.6265,  1.1007],
        [ 0.1410, -1.1399,  0.2879, -0.0839,  1.6992],
        [ 1.0413,  0.0253,  1.0440,  0.4524, -0.3674],
        [ 0.4779,  0.3270,  0.1174,  0.8425, -0.1501]])


__Note:__ Any operation that mutates a tensor in-place is post-fixed with an <mark style="background-color: red">_.</mark> For example: a.copy_(b), a.b_(), will change a.

We can use standard NumPy-like indexing with all bells and whistles!

In [20]:
print(a)

tensor([[-0.7609, -0.1150,  1.2859, -0.0736,  1.4523],
        [-0.4819, -1.0716,  1.4172,  1.7510, -1.8690],
        [ 0.0197,  0.6880, -1.2671,  0.3874,  0.8064],
        [ 0.0250, -1.3038, -0.3132, -0.5563,  0.8167],
        [ 0.6206, -0.2883,  0.9317, -0.4791, -0.4700],
        [ 0.1473,  0.1295, -0.6647,  0.1566, -0.5587]])


In [21]:
print(a[0])
print(a[0][1])

# We can esaily access it by index as well

tensor([-0.7609, -0.1150,  1.2859, -0.0736,  1.4523])
tensor(-0.1150)


In [22]:
print(a[:,2])  # all rows of 2nd column


tensor([ 1.2859,  1.4172, -1.2671, -0.3132,  0.9317, -0.6647])


In [23]:
# Acessing all rows upto particular column
print(a[: ,[2,3]])

tensor([[ 1.2859, -0.0736],
        [ 1.4172,  1.7510],
        [-1.2671,  0.3874],
        [-0.3132, -0.5563],
        [ 0.9317, -0.4791],
        [-0.6647,  0.1566]])


Resizing: We can resize or reshape tensor, use <mark style="background-color: Yellow">tensor.view</mark> for that:

In [24]:



a = torch.randn(3, 3)  # Creating 3*3 tensors
a

tensor([[ 0.2649,  0.9732,  1.1501],
        [-0.3381,  0.8602,  1.5548],
        [ 0.4144,  0.7607, -0.9832]])

In [25]:
 # view is use to see a data in a particular shape
# use for reshaping of a tensor
b = a.view(9) 
print(b)
print()
print(len(b))
# giving list 

tensor([ 0.2649,  0.9732,  1.1501, -0.3381,  0.8602,  1.5548,  0.4144,  0.7607,
        -0.9832])

9


In [26]:
c = a.view(-1, 9)  # the size -1 is inferred from other dimensions
print(c)
print()
print(len(c))

# giving list of list 

tensor([[ 0.2649,  0.9732,  1.1501, -0.3381,  0.8602,  1.5548,  0.4144,  0.7607,
         -0.9832]])

1


In [27]:
a = torch.randn(3, 3)  # Creating 3*3 tensors
b = a.view(9)
c = a.view(-1, 9)  # the size -1 is inferred from other dimensions
print(a.size(), b.size(), c.size())

torch.Size([3, 3]) torch.Size([9]) torch.Size([1, 9])


If you have one value tensor, use <mark style="background-color: Yellow">.item()</mark> to get the value of the Python number

In [28]:
a = torch.randn(1)
print(a)
print(a.item())

tensor([-1.0215])
-1.0214506387710571


### NumPy Bridge

Converting a Torch Tensor to NumPy array and vice versa is breeze.

The Torch Tensor and NumPy array will share their underlying memory locations (if the Torch Tensor is on CPU), and changing one will change the other.

__Converting a Torch tensor to NumPy array__

In [29]:
x = torch.ones(4)
print(x)

tensor([1., 1., 1., 1.])


In [30]:
y = x.numpy()
print(y)

[1. 1. 1. 1.]


See how numpy array changed in value

In [31]:
x.add_(1)
print(x)
print(y)

tensor([2., 2., 2., 2.])
[2. 2. 2. 2.]



__Converting NumPy array to Torch tensor__

lets see how changing the numpy array changed the Torch Tensor automatically

In [32]:
import numpy as np
f = np.ones(4)
g = torch.from_numpy(f)
np.add(f, 1, out=f)
print(f)
print(g)

[2. 2. 2. 2.]
tensor([2., 2., 2., 2.], dtype=torch.float64)


All the Tensors on the CPU except a CharTensor support converting to NumPy and back.

__CUDA Tensors__

Tensors can be moved onto any device using the <mark style="background-color: Yellow">.to</mark> method.

In [None]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    # Chnage the device from cuda to cpu

    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

# these kind of flexibility are only available on pytorch

## AUTOGRAD : Automatic Differentiaition

Definition - 

  - This class is an engine to calculate derivatives. It records the graph of all the operations performed on a gradient      enabled tensor and creates a acyclic graph called the dynamic computational graph(DCG).The leaves of this graph are input tensors and the roots are output tensors. Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule.

Let us see this in some more easy terms with some examples

## Tensor

<mark style="background-color: Yellow">torch.Tensor</mark> is the central class of the package.If we set its attribute <mark style="background-color: Yellow">.requires_grad</mark> as <mark style="background-color: dark grey">True</mark>, it starts to track all operations on it. When you finish your computation you can call <mark style="background-color: Yellow">.backward()</mark> and have all the gradients computed automatically. The gradient of this tensor will be accumulated into <mark style="background-color: Yellow">.grad</mark> attribute.

To stop a tensor from tracking history, you can call <mark style="background-color: Yellow">.detach()</mark> to detach it from the computation history, and to prevent future computation from being tracked.


To prevent tracking history(and using memory), you can also wrap the code block in with <mark style="background-color: Yellow">torch.no_grad():</mark>. This can be particularly helpful when evaluating a model because the model may have trainable parameters with <mark style="background-color: Yellow">requires_grad=True</mark>, but for which we don’t need the gradients.

There's one more class which is very important in autograd implementation - a <mark style="background-color: Yellow">Function</mark>

<mark style="background-color: Yellow">Tensor</mark> and <mark style="background-color: Yellow">Function</mark> are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a <mark style="background-color: Yellow">.grad_fn</mark> attribute that references a Function that has created the Tensor (except for Tensors created by the user - their <mark style="background-color: Yellow">grad_fn is None</mark>).

If you want to compute the derivatives, you can call <mark style="background-color: Yellow">.backward()</mark> on a <mark style="background-color: Yellow">Tensor</mark>. If <mark style="background-color: Yellow">Tensor</mark> is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to <mark style="background-color: Yellow">backward()</mark>, however if it has more elements, you need to specify a <mark style="background-color: Yellow">gradient</mark> argument that is a tensor of matching shape.

In [33]:
import torch

Create a tensor and set requires_grad=True to track computation with it.

In [34]:
a = torch.ones(2, 2, requires_grad=True)
print(a)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


Do a tensor operation:

In [35]:
b = a + 2
print(b)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


b was created as a result of an operatio, so it has a <mark style="background-color: Yellow">grad_fn</mark>

In [36]:
print(b.grad_fn)

<AddBackward0 object at 0x7fbf19dfa950>


Do more operation on b

In [39]:
c = b * b * 3
out = c.mean()

print(c)
print( out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
tensor(27., grad_fn=<MeanBackward0>)


<mark style="background-color: Yellow">.requires_grad_( ... )</mark>  changes an existing Tensor’s <mark style="background-color: Yellow">requires_grad</mark> flag in-place. The input flag defaults to False if not given.

In [41]:
p = torch.randn(3, 3)
p
# Creating randon tensor

tensor([[ 0.5590,  0.3693,  0.7105],
        [-0.3217, -1.0149,  1.0725],
        [ 0.9333, -0.7870,  0.3201]])

In [42]:
p = ((p * 3) / (p - 1))
p

tensor([[ -3.8023,  -1.7565,  -7.3644],
        [  0.7302,   1.5111,  44.4051],
        [-42.0046,   1.3212,  -1.4123]])

In [43]:
# If we do required gradient let's see
print(p.requires_grad)


# By default the requires_grad is Flase

False


In [44]:
p.requires_grad_(True)

# Performing gradient operation


tensor([[ -3.8023,  -1.7565,  -7.3644],
        [  0.7302,   1.5111,  44.4051],
        [-42.0046,   1.3212,  -1.4123]], requires_grad=True)

In [45]:
# Checking requires_grad
print(p.requires_grad)

True


In [40]:
p = torch.randn(3, 3)
p = ((p * 3) / (p - 1))
print(p.requires_grad)
p.requires_grad_(True)
print(p.requires_grad)
q = (p * p).sum()
print(q.grad_fn)

False
True
<SumBackward0 object at 0x7fbf1a9e0a50>


## Gradients

Let's backdrop now. Because <mark style="background-color: magenta">out</mark> contains a single scalar, <mark style="background-color: Yellow">out.backword</mark> is equivalent to  <mark style="background-color: Yellow">out.backward(torch.tensor(1.))</mark>. 

In [46]:
out.backward()

Print gradients d(out)/dx

In [48]:
print(a)

# Here a is 1,1,1,1


tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [49]:
print(a.grad)
# but we are receiving it's gradient as 4.50000 like -wise
# How?

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


You should have got a matrix of 4.5. Let’s call the out Tensor “o”. We have that 

$o= \frac{1}{4} \sum c_i $ $c_i= 3(a_i+2)^2$  and $c_i|_{a_i=1}= 27$ . 

Therefore, $\frac{\partial_0}{\partial a_i}=\frac{3(x_i+2)}{2}$, 

hence
$\frac{\partial_0}{\partial a_i}|_{a_i=1} = \frac{9}{2} = 4.5$



### Mathematically - Jacobians and vectors

Mathematically, the autograd class is just a Jacobian-vector product computing engine. A Jacobian matrix in very simple words is a matrix representing all the possible partial derivatives of two vectors. It’s the gradient of a vector with respect to another vector.

If a vector X = [x1, x2,….xn] is used to calculate some other vector f(X) = [f1, f2, …. fn] through a function f then the Jacobian matrix (J) simply contains all the partial derivative combinations as follows:

<img src="jacobian-vector.png">

Above matrix represents the gradient of f(X)with respect to X
Suppose a PyTorch gradient enabled tensors X as:
X = [x1, x2, ….. xn] (Let this be the weights of some machine learning model)
X undergoes some operations to form a vector Y
Y = f(X) = [y1, y2, …. ym]
Y is then used to calculate a scalar loss l. Suppose a vector v happens to be the gradient of the scalar loss l with respect the vector Y as follows

<img src="jacob1.png">       *The vector v is called the grad_tensor and passed to the backward() function as an argument*

To get the gradient of the loss l with respect to the weights X the Jacobian matrix J is vector-multiplied with the vector v.

<img src="jac1.png"> 

This method of calculating the Jacobian matrix and multiplying it with a vector v enables the possibility for PyTorch to feed external gradients with ease for even the non-scalar outputs.

Now let's have a look at an example of vector-Jacobian product:

In [None]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

Now in this case y is no longer a scalar.<mark style="background-color: Yellow">torch.autograd(torch.tensor(1.))</mark> could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to backward as argument:

In [None]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

You can also stop autograd from tracking history on Tensors with <mark style="background-color: Yellow">.requires_grad=True</mark> either by wrapping the code block in with <mark style="background-color: Yellow">torch.no_grad()</mark>:

In [None]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

Or by using .detach() to get a new Tensor with the same content but that does not require gradients:

In [None]:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())

## Neural Networks

We can construct neural networks using the <mark style="background-color: Yellow">torch.nn</mark> package.

Now that you had a glimpse of autograd, nn depends on autograd to define models and differentiate them. An nn.Module contains layers, and a method forward(input)that returns the output.

For example, look at this network that classifies digit images:
<img src="neural.png"> 


### convnet

It is a simple feed-forward network. It takes the input,feeds it through several layers one after the other, and then finally gives the output.

A typical training procedure for a neural network is as follows:
- Define the neural network that has some learnable parameters (or weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule: <mark style="background-color: light-blue">weight = weight - learning_rate * gradient</mark>
        
### Define the network

Let's define the network:

In [60]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# From neural network class, we are trying to inherit each and evry thing 
class network(nn.Module):

    def __init__(self):
        super(network, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = mx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = network()
print(net)

network(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


## Note - All matter is only architecture in deep learning 

You just have to define the <mark style="background-color: yellow">forward</mark>,  and the <mark style="background-color: yellow">backward</mark> function (where gradients are computed) is automatically defined for you using autograd. You can use any of the Tensor operations in the <mark style="background-color: yellow">forward</mark> function.

The learnable parameters of a model are returned by <mark style="background-color: yellow">net.parameters()</mark>.

In [62]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 3, 3])


Let’s try a random 32x32 input. Note: expected input size of this net (LeNet) is 32x32. To use this net on the MNIST dataset, please resize the images from the dataset to 32x32.

In [52]:
inp = torch.randn(1, 1, 32, 32)
out = net(inp)
print(out)

tensor([[-0.0266, -0.1268, -0.0702,  0.0369,  0.0086, -0.0078, -0.1302, -0.0121,
         -0.1122,  0.1233]], grad_fn=<AddmmBackward0>)


Zero the gradient buffers of all parameters and backprops with random gradients:

In [53]:
net.zero_grad()
out.backward(torch.randn(1, 10))

__Note:__
<mark style="background-color: yellow">torch.nn</mark> only supports mini-batches. The entire <mark style="background-color: yellow">torch.nn</mark> package only supports inputs that are a mini-batch of samples, and not a single sample.
For example, <mark style="background-color: yellow">nn.Conv2d</mark> will take in a 4D Tensor of <mark style="background-color: yellow">nSamples x nChannels x Height x Width</mark>.
If you have a single sample, just use <mark style="background-color: yellow">input.unsqueeze(0)</mark> to add a fake batch dimension.

__At this point, we covered:__
- Defined neutral network
- Processing input and calling backward

### Loss Function

A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.
There are several different loss functions under the nn package . A simple loss is: <mark style="background-color: yellow">nn.MSELoss</mark> which computes the mean-squared error between the input and the target.
    
For example:

In [54]:
output = net(inp)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(1.5721, grad_fn=<MseLossBackward0>)


Now, if you follow loss in the backward direction, using its <mark style="background-color: yellow">.grad_fn</mark>attribute, you will see a graph of computations that looks like this:

input  ->   conv2d  ->   relu  ->   maxpool2d  ->   conv2d ->   relu  ->   maxpool2d
      
      -> view -> linear -> relu -> linear -> relu -> linear
      
      -> MSELoss
      
      -> loss

So, when we call <mark style="background-color: yellow">loss.backward()</mark>, the whole graph is differentiated w.r.t. the loss, and all Tensors in the graph that has requires_grad=True will have their .grad Tensor accumulated with the gradient.


For illustration, let us follow a few steps <mark style="background-color: yellow">backward</mark>:

In [55]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

<MseLossBackward0 object at 0x7fbf19d7a510>
<AddmmBackward0 object at 0x7fbf19d7a650>
<AccumulateGrad object at 0x7fbf19d7ab10>


### Backprop

To backpropagate the error all we have to do is to <mark style="background-color: yellow">loss.backward()</mark>. You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.


Now we shall call <mark style="background-color: yellow">loss.backward()</mark>, and have a look at conv1’s bias gradients before and after the backward.

In [56]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-0.0016, -0.0085,  0.0090, -0.0062,  0.0060,  0.0034])


### Update the weights

The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):
    
   <mark style="background-color: yellow"> weight = weight - learning_rate * gradient </mark>
   
We can implement this using simple Python code:

In [57]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package: torch.optim that implements all these methods. Using it is very simple:

In [58]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(inp)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

Observe how gradient buffers had to be manually set to zero using <mark style="background-color:yellow">optimizer.zero_grad()</mark>. This is because gradients are accumulated as explained in the <mark style="font-color:red">Backprop</mark> section.