<a href="https://colab.research.google.com/github/NirmalMathi/DAT8/blob/master/Pytorch_DRL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Deep Learning with PyTorch**

    

In the previous chapter, we became familiar with open source libraries, which provided with a collection of reinforcement learning (RL) environments. 
    However, recent developments in RL,and especially its combination with deep learning (DL), now make it possible to solve much more challenging problems than ever before.
    This is partly due to the development of DL methods and tools. This chapter is dedicated to one such tool, **PyTorch**, which enables us to implement complex DL models with just a bunch of lines of Python code.

In this chapter,We are going to see

- Specific PyTorch library  and implementation details (assuming that you are already familiar with DL fundamentals)
- Higher-level libraries on top of **PyTorch**, with the aim of simplifying common DL problems
- The library **PyTorch ignite**, which will be used in some examples

#**Tensors and its creation**

A tensor is the fundamental building block of all DL toolkits.**Tensors is a multi dimensioal arrays.**

We call single dimensional arrays as **vectors**,two dimensional arrays as **matrix** and three/higher dimensional arrays as **Tensors**

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
import os
os.getcwd()


'/content'

!pip install torch


!pip install pytorch

!conda install pytorch torchvision cpuonly -c pytorch

Apart from dimensions, a tensor is characterized by the type of its elements. There are eight types supported by PyTorch: **three float types** (16-bit, 32-bit, and 64-bit) and **five integer types** (8-bit signed, 8-bit unsigned, 16-bit, 32-bit, and 64-bit). 

Tensors of different types are represented by different classes, with the most commonly used being **torch.FloatTensor** (corresponding to a 32-bit float),**torch.ByteTensor** (an 8-bit unsigned integer), and **torch.LongTensor** (a 64-bit signed integer). 

First way to create tensors is by below way of code,

In [0]:
import torch
import numpy as np
a = torch.FloatTensor(3, 5)
a

tensor([[5.3527e-36, 0.0000e+00, 3.7835e-44, 0.0000e+00,        nan],
        [0.0000e+00, 1.3733e-14, 6.4069e+02, 4.3066e+21, 1.1824e+22],
        [4.3066e+21, 6.3828e+28, 3.8016e-39, 0.0000e+00, 0.0000e+00]])

In above example, we imported both PyTorch and NumPy and created anuninitialized tensor of size 3×2. By default, PyTorch allocates memory for the tensor, but doesn't initialize it with anything.

In case if we want to make the values in Tensors to zero,we have to issue below below piece of code,


In [0]:
a.zero_() #inplace

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

- **Inplace and functional**


If we use any function with underscore atlast then it will be reflected in original data,this kind of approach is called **Inplace** and its very useful in memory and performance prespective

The **functional** equivalent creates a copy of the tensor withthe performed modification, leaving the original tensor untouched.


Second way to create a tensor by its **constructor** is to provide a Python iterable (for example, a list or tuple), which will be used as the contents of the newly created tensor:

In [0]:
torch.FloatTensor([[1,2,3],[3,2,1]])

tensor([[1., 2., 3.],
        [3., 2., 1.]])

Third way is to create tensors by using **Numpy** **arrays**,



In [0]:
n = np.zeros(shape=(3, 2))
n

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [0]:
b=torch.tensor(n)
b

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]], dtype=torch.float64)

If we want to **change the dtype**,then we can specify that in **torch.tensors** with required dtype as below,

In [0]:
b=torch.tensor(n, dtype=torch.float32)
b

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

Since the **0.4.0 release**, PyTorch has supported **zero-dimensional tensors** that correspond to scalar values Such tensors can be the result of some operations, such as summing all values in a tensor.

zero-dimensional tensors are natively supported and returned by the appropriate functions, and they can be created by the **torch.tensor()** function. 


For accessing the actual Python value of such a tensor, there is the special **item()** method


In [0]:
a = torch.tensor([1,2,3])
print("tensors are",a)

s = a.sum()
print("sum of tensors",s)

i=s.item()
print("value of tensors",i)

torch.tensor(1)


tensors are tensor([1, 2, 3])
sum of tensors tensor(6)
value of tensors 6


tensor(1)

For exploring more with pytorch,please refer below url,

**http://pytorch.org/docs/**

# **GPU** **tensors**

PyTorch transparently supports **CUDA GPUs**, which means that all operations have two versions—CPU and GPU—that are
automatically selected. 

Every tensor type that we saw above is for CPU and has its GPU equivalent. The only difference is that GPU tensors reside in the
**torch.cuda package**, instead of just torch . 

For example,**torch.FloatTensor** is a 32-bit float tensor that resides in CPU memory,but **torch.cuda.FloatTensor** is its GPU counterpart.

To convert from CPU to GPU, there is a tensor method, **to(device)** , that creates a copy of the tensor to a specified device (this could be CPU or GPU).

In [0]:
a = torch.FloatTensor([2,3])
print(a)

ca = a.to('cuda');
ca


tensor([2., 3.])


tensor([2., 3.], device='cuda:0')

**Here, we created a tensor on CPU, then copied it to GPU memory.**

Both copies can be used in computations and all GPU-specific machinery is transparent to the user:

In [0]:
b= a + 1
print("addition with 1 in cpu is",b)
c=ca + 1
print("addition with 1 in GPU is",c)
d=ca.device
d

addition with 1 in cpu is tensor([3., 4.])
addition with 1 in GPU is tensor([3., 4.], device='cuda:0')


device(type='cuda', index=0)

# Gradients

Computing gradients was extremely painful to implement and debug, even for the simplest neural network (NN). You had to calculate derivatives for all your functions, then apply the chain rule, and then implement the result of the calculations,thinking that everything is correct. 

This could be a very useful exercise for understanding the nuts and bolts of DL, but it wasn't something that you wanted to repeat over and over again by experimenting with different NN architectures.

Luckily, those days have gone now, Now, defining an NN of hundreds of layers requires nothing more than assembling it from predefined building blocks or, in the extreme case of you doing something fancy, defining the transformation expression manually.

All gradients will be carefully calculated for you, backpropagated, and applied to the network. To be able to achieve this, you need to define your network architecture in terms of the DL library used, which can be different in the details, but generally must be the same



There are two approaches for calculating gradients:

- **Static graph**: In this method, you need to define your calculations in advance and it won't be possible to change them at later stage. The graph will be processed and optimized by the DL library before any computation is made. This model is implemented in TensorFlow (<2), Theano, and many other DL toolkits.


- **Dynamic graph**: You don't need to define your graph in advance exactly as it will be executed; you just need to execute operations that you want to use for data transformation on your actual data. During this time, the library will record the order of the operations performed, and when you ask it to calculate gradients, it will unroll its history of operations, accumulating the gradients of the network parameters. This method is also called **notebook gradients** and it is implemented in PyTorch

Both methods have their strengths and weaknesses. For example, **static graph** is usually **faster**, as all computations can be moved to the GPU, minimizing the data transfer overhead. 

On the other hand, although **dynamic graph** has a **higher computation overhead**, it gives a developer much more freedom.

 For example, they can say, "For this piece of data, I can apply this network two times, and for this piece of data, I'll use a completely different model with gradients clipped by the batch mean. In the end, it's just a Python library with a bunch of functions, so just call them and let the library takes the pain.

# **Tensors** **and** **gradients**

PyTorch tensors have a **built-in gradient calculation** and tracking machinery, so all you need to do is convert the data into tensors and perform computations using the tensor methods and functions provided by torch library.

There are several **attributes** it has,

- **grad** : A property that holds a tensor of the **same shape containing computed gradients**
- **is_leaf : True**, if this tensor was constructed by the user and

 - **is_leaf : False**, if the object is a result of function transformation.
- **requires_grad : True** if this tensor requires gradients to be calculated.By default, the constructor has
- **requires_grad : False** , so if you want gradients to be calculated for your tensor, then you need to explicitly say so.

Lets consider below example,

In [0]:
a = torch.tensor([1.0, 1.0], requires_grad=True)
b = torch.tensor([2.0, 2.0])
print(a,b)

tensor([1., 1.], requires_grad=True) tensor([2., 2.])


In the above code, we created two tensors. 
The **first requires gradients** to be calculated and the **second doesn't requires gradients**


In [0]:
summing = a+b
print(summing)
res = (summing*2).sum()
print(res)


tensor([3., 3.], grad_fn=<AddBackward0>)
tensor(12., grad_fn=<SumBackward0>)


If we check the attributes that we calcualted,then we will find that a and b are the leaf nodes and every variable, except b, requires gradients to be calculated:

In [0]:
a.is_leaf, b.is_leaf




(True, True)

In [0]:
summing.is_leaf, res.is_leaf

(False, False)

In [0]:
a.requires_grad


True

In [0]:
b.requires_grad


False

To calculate the gradients of our graph:

We should keep **retain_graph = True**,because pytorch uses **dynamic computational graph** where the non-leaf buffers gets destroyed the first time **backward()** is called and hence, there’s no path to navigate to the leaves when **backward** is invoked the second time.Hence to overcome this we use **retain_graph = True** in backward propagation so that gradients of non leaf tensors will not be destroyed and kept in memory.


In [0]:
res.backward( retain_graph=True)
a.grad

tensor([4., 4.])

In [0]:
b.grad

Indeed, if we try to check the gradients of
b , we get nothing,this is because we didn't define gradient for b tensor.

Also,The reason for that is efficiency in terms of computations and memory. In **real life**, our network can have millions of optimized parameters, with hundreds of intermediate operations performed on them. During **gradient descent optimization**, the only things we want to adjust in the model are gradients of loss with respect to model parameters (weights)

# **Building** **Neural** **Network**

In the **torch.nn** package, you can find tons of predefined classes providing you with the basic functionality blocks.

All classes in the **torch.nn** packages inherit from the **nn.Module base class**.

Let's look at useful methods that all **nn.Module** children provide as follows,

**parameters()** :Returns an iterator of all variables that require gradient computation ie..,module weight.

**zero_grad()** : initializes all gradients of all parameters to zero.

**to(device)** : moves all module parameters to a given device (CPU or GPU).

**state_dict()** : returns the dictionary with all module parameters and is useful for model serialization

**load_state_dict()** :initializes the module with the state dictionary.

All available classes can be found in the documentation at http://pytorch.org/docs.

The best way to demonstrate NN blocks is through an example:

In [0]:
import torch.nn as nn

a = nn.Sequential(
nn.Linear(2, 6),
nn.ReLU(),
nn.Linear(6, 10),
nn.ReLU(),
nn.Linear(10, 20),
nn.Dropout(p=0.5),
nn.Softmax(dim=1))

a

Sequential(
  (0): Linear(in_features=2, out_features=6, bias=True)
  (1): ReLU()
  (2): Linear(in_features=6, out_features=10, bias=True)
  (3): ReLU()
  (4): Linear(in_features=10, out_features=20, bias=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Softmax(dim=1)
)

Here, we defined a three-layer NN with softmax on output, applied along dimension 1 (dimension 0 is batch samples), rectified linear unit (ReLU) nonlinearities, and dropout with 0.3.

In [0]:
b=a(torch.FloatTensor([[10,10]]))
b

tensor([[2.8380e-03, 5.5520e-03, 8.5291e-03, 3.3034e-01, 5.6342e-04, 5.5520e-03,
         5.5520e-03, 5.5520e-03, 5.5520e-03, 9.0620e-02, 5.5520e-03, 9.4176e-02,
         4.1645e-01, 5.5520e-03, 5.5520e-03, 7.9771e-04, 5.5520e-03, 1.3919e-04,
         2.4830e-05, 5.5520e-03]], grad_fn=<SoftmaxBackward>)

So, our mini-batch is successfully traversing through the network..Still not yet finished we need to do important things

# **Custom** **layers**

Now will look at how this can be done in a more generic and reusable way for making sequential task below,

In [0]:
import torch
import torch.nn as nn

#class creation for inheriting nn.Module..In the constructor, we pass three parameters: the input size, the output size, and the optional dropout probability. 
#The first thing we need to do is call the parent's constructor to let it initialize itself.

class mymodule(nn.Module):
    def __init__(self, num_inputs, num_classes, dropout_prob=0.3):
        super(mymodule, self).__init__()
        self.pipe = nn.Sequential(
            nn.Linear(num_inputs, 8),
            nn.ReLU(),
            nn.Linear(8, 15),
            nn.ReLU(),
            nn.Linear(15, num_classes),
            nn.Dropout(p=dropout_prob),
            nn.Softmax(dim=1)
        )

#the constructor completed, all those fields will be registered automatically

    def forward(self, x):
        return self.pipe(x)

#Here,we are override the forward function with our implementation of data transformation. 
#As our module is a very simple wrapper around other layers, we just need to ask them to transform the data.

if __name__ == "__main__":
    net = mymodule(num_inputs=3, num_classes=4)
    print(net)
    v = torch.FloatTensor([[6,2,6]])
    out = net(v)
    print(out)
    print("Cuda's availability is %s" % torch.cuda.is_available())
    if torch.cuda.is_available():
        print("Data from cuda: %s" % out.to('cuda'))

mymodule(
  (pipe): Sequential(
    (0): Linear(in_features=3, out_features=8, bias=True)
    (1): ReLU()
    (2): Linear(in_features=8, out_features=15, bias=True)
    (3): ReLU()
    (4): Linear(in_features=15, out_features=4, bias=True)
    (5): Dropout(p=0.3, inplace=False)
    (6): Softmax(dim=1)
  )
)
tensor([[0.3520, 0.3520, 0.2100, 0.0859]], grad_fn=<SoftmaxBackward>)
Cuda's availability is True
Data from cuda: tensor([[0.3520, 0.3520, 0.2100, 0.0859]], device='cuda:0',
       grad_fn=<CopyBackwards>)


The **forward**() method is called for every batch of data, so if you want to do some complex transformations based on the data you need to process, with multiple required parameters and dozens of optional arguments, and its possible and can be done

# **Loss** **function**

At the time of writing,** PyTorch 1.3.0** contains 20 different loss functions and even own functions can be written,

Some of them are,


- **nn.MSELoss :** The mean square error between arguments, which is the standard loss for regression problems.

- **nn.BCELoss:** Binary cross-entropy loss.This loss expects a single probability value (usually it's the output of the Sigmoid layer)

- **nn.CrossEntropyLoss :**Famous "maximum likelihood" criteria that are used in multi-class classification problems.This expects raw scores for each class and applies LogSoftmax internally.

# **Optimizers**

 The responsibility of the basic optimizer is to take the gradients of model parameters and change these parameters in order to decrease the loss value. By decreasing the loss value, we will get better model performance in future.
 
The most widely known optimisers are as follows:
 
- **SGD** : A vanilla stochastic gradient descent algorithm
- **RMSprop** : An optimizer proposed by Geoffrey Hinton
- **Adagrad** : An adaptive gradients optimizer
- **Adam** : A quite successful and popular combination of both RMSprop and Adagrad.



# **TensorBoard** **Monitoring**  

We need a generic solution to track lots of values over time for analysis purposes,we are going to explore one library for such tool known as **TensorboardX**(maintaining by Google)

- **TensorBoard 101**

Pre-requisite is nothing but the **tensorboard and tensorflow packages should be installed**.

There are several third-party open source libraries available in market. One of such thing, which is used here is **tensorboardX**

**(https://github.com/lanpa/tensorboardX)**. 

It can be installed with **pip install tensorboardX**

Will see this mechanism with simple code below,

In [0]:
!pip install tensorboardX
import math
from tensorboardX import SummaryWriter


if __name__ == "__main__":
    writer = SummaryWriter()

    funcs = {"sin_math": math.sin, "cos_math": math.cos, "tan_math": math.tan}

    for angle in range(-360, 360):
        angle_rad = angle * math.pi / 180
        for name, fun in funcs.items():
            val = fun(angle_rad)
            writer.add_scalar(name, val, angle)

    writer.close()

Collecting tensorboardX
[?25l  Downloading https://files.pythonhosted.org/packages/35/f1/5843425495765c8c2dd0784a851a93ef204d314fc87bcc2bbb9f662a3ad1/tensorboardX-2.0-py2.py3-none-any.whl (195kB)
[K     |█▊                              | 10kB 20.4MB/s eta 0:00:01[K     |███▍                            | 20kB 6.2MB/s eta 0:00:01[K     |█████                           | 30kB 7.2MB/s eta 0:00:01[K     |██████▊                         | 40kB 5.7MB/s eta 0:00:01[K     |████████▍                       | 51kB 6.8MB/s eta 0:00:01[K     |██████████                      | 61kB 7.9MB/s eta 0:00:01[K     |███████████▊                    | 71kB 8.3MB/s eta 0:00:01[K     |█████████████▍                  | 81kB 8.4MB/s eta 0:00:01[K     |███████████████                 | 92kB 9.3MB/s eta 0:00:01[K     |████████████████▊               | 102kB 9.2MB/s eta 0:00:01[K     |██████████████████▍             | 112kB 9.2MB/s eta 0:00:01[K     |████████████████████            | 122kB 9.

The result of running this will be zero output on the console, but you will see a new directory created inside the runs directory with a single file. 

To look at the result, we need to start TensorBoard.

**Please hit below command to start TensorBoard**,

C:\Users\xxx\Desktop\Data Science videos --**logdir runs**

**TensorBoard 2.0.1 at http://127.0.0.1:6006/**

After this,you can open **http://localhost:6006 in your browser** to see something like this:

TensorBoard allows you to analyze not only scalar values but also images, audio, text data, and embeddings, and it can even show you the structure of your network.

**Refer to the documentation of tensorboardX and tensorboard for all those features**


# **PyTorch Ignite**

PyTorch is an elegant and flexible library, which is useful for thousands of researchers, DL enthusiasts, industry
developers, and others. But it requires you to write more lines of code.Situation whre it all can be used is,

- When you have your own optimiser and you have to implement that in NN and check the response.

- In case of dealing more with gradients,loss and backpropagation.

- calculating training metrics, like loss values, accuracy,confusion matrix and F1-scores
 
- Putting a check point after some iterations or to know when model is best fitted in backpropagation

- To change learning rate during run time as a hyperparameter tuning

So,when we have to do some kind of above operations,its difficult to write it again and again for every dataset where you want to check gradients,loss,optimization,iterations,etc..,

For serving those purpose in DL,PyTorch provided some libraries like **ptlearn,ignite and so on..**

Initially we need to understand how these high level libraries work and then we can start using for solving above common problems with just a few lines of code.We will also use these type of high level libraries in latter stage to implement RL problems.



# **Ignite Concepts**

At a high level, Ignite simplifies the writing of the training loop in PyTorch DL. We all aware that the minimal training loop consists of:

- Sampling and sending a batch of training data
- Applying an NN to this batch to calculate the loss function
- Running backpropagation of the loss to get gradients on the
network's parameters in respect to the loss
- Asking the optimizer to apply the gradients to the network
- epeating until we get good acuuracy result


The central piece of Ignite is the **Engine** class, which loops over the data source, applying the processing function to the data batch. 

In addition to that, Ignite offers the ability to provide functions to be called at specific conditions of the training loop, called Events and could be at the:

- starting/end of the whole training process
- starting/end of a training epoch 
- starting/end of a single batch processing

for example, if you want to do some calculations every 100 batches or every second epoch,this can be possible with the help of ignite library.

Will show you some sample part of code which briefs the iginite concepts,

In [0]:
from ignite.engine import Engine, Events
def training(e, b):
    optimizer.zero_grad()
    x, y = batch()
    y_out = model(x)
    losses = loss_fn(y_out, y)
    losses.backward()
    optimizer.step()
return losses.item()
e = Engine(training)
e.run(data)

The use of **Ignite** is in the ability it provides to
extend the training loop with existing available functionality. 

You want to run model validation every 10 epochs?yes,you can do by writing those lines of code and pass it so that it will be done.

For further depth,please read the documentation on the official website:
**https://pytorch.org/ignite.**