# Basics of Pytorch

Pytorch is one of the most used Python libraries for deep learning.  
Pytorch will help you to accelerate the research by making them computationally faster and less expensive.  
This material will guide you on how PyTorch works, and how you can get started with it.   



## Tables of Content
1.  Getting Started with Pytorch
    + Installation of Pytorch
    + What is Pytorch?
2. Basics of Pytorch
    + Instruction to Tensors
    + Operations on Tensors
3. Autograd : Automatic Differentiaton
    + Record grad_fn on Tensors
    + Calcuate Gradient through Backward
4. Reference

### Getting Started with Pytorch
#### Installation of Pytorch
Visit https://pytorch.org/get-started/locally/ to select your preferences and run the install command  
To install the PyTorch binaries, you will need to use at least one of two supported package managers: **Anaconda** and **pip**  
PyTorch on Windows only supports **Python 3.x**(Python 2.x is not supported)  

#### What is Pytorch?
Pytorch is a **Python-based scientific computing package** that features several advantages:
+ A replacement for NumPy to use the power of GPUs
+ a deep learning research platform that provides maximum flexibility and speed  
+ Provides distributed training which makes it possible to use multiple GPUs. <br/> Distributed training enables to process larger batches of input data, reducing the computation time.
+ It creates dynamic computation graphs meaning that the graph will be created on the fly
+ Simple Interface: It offers easy to use API, thus it is very simple to operate and run like Python.

## Basics of Pytorch
### Instroduction to Tensors
Tensors are multidimensional arrays.   
Tensors are similar to NumPy’s ndarrays, but it can also be used on a GPU to accelerate computing.(this is not the case with NumPy arrays)  
See how we can create a Tensor in Pytorch and compares it Ndarray from Numpy:


In [None]:
import numpy as np
import torch

a = np.array(1) # initializing a numpy array
b = torch.tensor(1) # initializing a tensor

print(a, type(a))
print(b, type(b))

1 <class 'numpy.ndarray'>
tensor(1) <class 'torch.Tensor'>


Numpy ndarray is easily convertible to Pytorch tensor, and vice versa

In [None]:
a = np.array(1)
a_tensor = torch.from_numpy(a)
b = torch.tensor(1)
b_ndarray = b.numpy()
print(type(a_tensor), type(b_ndarray))

<class 'torch.Tensor'> <class 'numpy.ndarray'>


Construct a 5x3 matrix, uninitialized:

In [None]:
x = torch.empty(5, 3)
print(x)

tensor([[1.0286e-38, 1.0194e-38, 9.6429e-39],
        [9.2755e-39, 9.1837e-39, 9.3674e-39],
        [1.0745e-38, 1.0653e-38, 9.5510e-39],
        [1.0561e-38, 1.0194e-38, 1.1112e-38],
        [1.0561e-38, 9.9184e-39, 1.0653e-38]])


Construct a 5x3 matrix, uninitialized:

In [None]:
x = torch.rand(5, 3)
print(x)

tensor([[0.3259, 0.2677, 0.6262],
        [0.8661, 0.9747, 0.7917],
        [0.6942, 0.3614, 0.7542],
        [0.0391, 0.5435, 0.1833],
        [0.1373, 0.3018, 0.1778]])


Construct a matrix filled zeros and of dtype long:

In [None]:
x = torch.zeros(5, 3, dtype=torch.long)
print(x)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])


Construct a tensor directly from data:

In [None]:
x = torch.tensor([5.5, 3])
print(x)

tensor([5.5000, 3.0000])


Create a tensor based on an existing tensor.  
These methods will reuse properties of the input tensor, e.g. dtype, unless new values are provided by user

In [None]:
x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)                                      # result has the same size

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[-0.8990,  0.1204,  0.0139],
        [-0.0826, -0.9899, -1.4254],
        [-0.5144, -1.0925, -1.4376],
        [ 2.0616, -0.6864,  0.2773],
        [-2.0332, -0.6625,  0.8781]])


Get its size:

In [None]:
print(x.size())

torch.Size([5, 3])


### Operations on Tensors

There are multiple syntaxes for operations. In the following example, we will take a look at the addition operation.  
**Addition: syntax 1**

In [None]:
y = torch.rand(5, 3)
print(x + y)

tensor([[-0.1347,  0.7163,  0.5559],
        [ 0.5222, -0.6483, -0.9859],
        [-0.1845, -0.9577, -1.1934],
        [ 2.9317,  0.2723,  0.3253],
        [-1.4904, -0.0581,  1.3205]])


**Addition: syntax 2**

In [None]:
print(torch.add(x, y))

tensor([[-0.1347,  0.7163,  0.5559],
        [ 0.5222, -0.6483, -0.9859],
        [-0.1845, -0.9577, -1.1934],
        [ 2.9317,  0.2723,  0.3253],
        [-1.4904, -0.0581,  1.3205]])


See how basic operations are performed on tensor in Pytorch

In [None]:
a = torch.randn(3,3)
b = torch.randn(3,3)
# matrix addition
print(torch.add(a,b), '\n')

# matrix subtraction
print(torch.sub(a,b), '\n')

# matrix multiplication
print(torch.mm(a,b), '\n')

# matrix division
print(torch.div(a,b))

tensor([[-0.4026, -0.0567,  2.0486],
        [ 1.6303, -0.3304,  1.2894],
        [ 0.0702, -1.5686, -1.4387]]) 

tensor([[-1.2144, -1.9134,  1.1881],
        [ 0.6202,  1.1901,  1.6174],
        [-1.8409, -1.1924, -1.6784]]) 

tensor([[ 0.7208, -0.3061,  0.0077],
        [ 2.0626,  0.4444,  0.5878],
        [-2.5459,  0.5209, -0.3412]]) 

tensor([[ -1.9920,  -1.0611,   3.7613],
        [  2.2281,  -0.5654,  -8.8598],
        [ -0.9265,   7.3388, -13.0052]])


 **Calculating transpose** is also similar to NumPy:

In [None]:
# original matrix
print(a, '\n')

# matrix transpose
torch.t(a)

tensor([[-0.8085, -0.9851,  1.6184],
        [ 1.1252,  0.4299,  1.4534],
        [-0.8853, -1.3805, -1.5586]]) 



tensor([[-0.8085,  1.1252, -0.8853],
        [-0.9851,  0.4299, -1.3805],
        [ 1.6184,  1.4534, -1.5586]])

**Concatenating Tensors**  
Let’s say we have two tensors as shown below:

In [None]:
a = torch.tensor([[1,2],[3,4]])
b = torch.tensor([[5,6],[7,8]])
print(a, '\n')
print(b)
c0 = torch.cat((a,b),dim=0) # concate tensors at dimension 0
print(c0.shape)
c1 = torch.cat((a,b),dim=1) # concate tensors at dimension 1
print(c1.shape)

tensor([[1, 2],
        [3, 4]]) 

tensor([[5, 6],
        [7, 8]])
torch.Size([4, 2])
torch.Size([2, 4])


**Reshaping Tensors**  
Let’s say we have the following tensor:

In [None]:
a = torch.randn(2,4)
print(a)
a.shape

tensor([[-1.0335, -1.3989, -1.1457, -1.0985],
        [-1.8597,  0.8996,  0.6662, -0.0429]])


torch.Size([2, 4])

You can use either of two functions : **torch.view()** and **torch.reshape()**  
**torch.view()** returns tensor which share the underling data with the original tensor  
On the other hand, **torch.reshape()** may return a copy or a view of the original tensor.

In [None]:
b = a.reshape(1,8)
c = a.view(1,8)
print(b.shape)
print(c.shape)

torch.Size([1, 8])
torch.Size([1, 8])


**Indexing**  
You can use standard NumPy-like indexing

In [None]:
a = torch.randn(2,4)
print(a[:,:2].shape)

torch.Size([2, 2])


If you have a one element tensor, use .item() to get the scalar value.

In [None]:
x = torch.randn(1)
print(x)
print(x.item())

tensor([-1.0103])
-1.0102500915527344


**CUDA Tensors**  
Tensors can be moved onto any device using the .to method.

In [None]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda:0")          # a CUDA device object (to GPU No.0)
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

## Autograd : Automatic Differentiaion


Remember in your calculas class when you learn about gradient and differentation?  
In machine learning, it is necessary to compute the gradients of loss function in neural networks.  
However, it is inefficient to compute gradients of large composite function in very high dimensional spaces.  
Fortunately, PyTorch provides a technique called automatic differentiation.   
Pytorch records all the operations enabling it to compute gradients backward.  
This technique will help you to save time on each epoch 

Let’s look at an example to understand how the autograd works!

### Tensors with Gradients
Once you create tensors specifing the option **requires_grad** as **True**,  
gradients are stored for this particular tensor whenever you perform some operations  
When you finish your computation you can call .backward() and have all the gradients computed automatically.  
The gradient for this tensor will be accumulated into .grad attribute.

In [None]:
# Create a tensor and set requires_grad=True to track computation with it
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


Let’s now perform some operations on the defined tensor  
Each tensor has a .grad_fn attribute that references a Function that has created the Tensor  
(except for Tensors created by the user - their grad_fn is None).



In [None]:
y = x + 2
print(y)
print(y.grad_fn)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x00000139926E2198>


Do more operations on y

In [None]:
z = y * y * 3
out = z.mean()

print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


.requires_grad_( ... ) changes an existing Tensor’s requires_grad flag in-place.  
The input flag defaults to False if not given

In [None]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x000001398FB03C88>


### Calcuate Gradient through Backward

If you want to compute the derivatives, you can call **.backward()** on a Tensor.  
Once you call **.backward()**, its gradient is sotred in the attribute **.grad**  
If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward()  
However, if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.

In [None]:
a = torch.ones((2,2), requires_grad=True) # initializing a tensor
b = a + 5 # performing operations on the tensor
c = b.mean()

# back propagating
c.backward()
# check computed gradients
print(a.grad)

tensor([[0.2500, 0.2500],
        [0.2500, 0.2500]])


You can also stop autograd from tracking history on Tensors with **.requires_grad=False**
either by wrapping the code block in with torch.no_grad():

In [None]:
print(a.requires_grad)
print((a ** 2).requires_grad)

with torch.no_grad():
    print((a ** 2).requires_grad)

True
True
False


To prevent tracking history, (maybe for memory issue)  
you can use **.detach()** to get a new Tensor with the same content but that does not require gradients.  
Note that **.clone()** functon also returns a copy but that does trackes the gradients.

In [None]:
print(a.requires_grad)
b = a.detach()
c = a.clone()
print(b.requires_grad)
print(c.requires_grad)
print((a==b).all())
print((a==c).all())

True
False
True
tensor(True)
tensor(True)


### Reference

[What is PyTorch?] https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py  
[AUTOGRAD: AUTOMATIC DIFFERENTIATION] https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py  
[A Beginner-Friendly Guide to PyTorch and How it Works from Scratch] https://www.analyticsvidhya.com/blog/2019/09/introduction-to-pytorch-from-scratch/  
[What is PyTorch and how does it work?] https://hub.packtpub.com/what-is-pytorch-and-how-does-it-work/