<a href="https://colab.research.google.com/github/MZ-Makos/MachineLearning/blob/master/PYTORCH_TUTORIAL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **What is Pytorch?**

Pytorch is a python-based scientific computing package targeted for

1.   replacement for NumPy to use the power of GPUs
2.   deep learning research platform that provides maximum flexibility and speed



---


# **What is a Tensor?**

Similar to NumPy’s ndarrays, but can also be used on a GPU to accelerate computing.



In [1]:
from __future__ import print_function
import torch
x = torch.rand(5, 3)
print(x)

tensor([[0.9585, 0.3946, 0.5279],
        [0.3442, 0.6532, 0.6451],
        [0.9523, 0.7694, 0.1862],
        [0.1265, 0.3286, 0.4386],
        [0.0906, 0.1307, 0.6013]])


A tensor can have different datatypes;

In [3]:
x = torch.zeros(1, 3, dtype=torch.long)
print("\nx datatype:",x.dtype)
print("x: ", x)

y = torch.zeros(1, 3, dtype=torch.float)
print("\ny datatype:", y.dtype)
print("y: ", y)

z = torch.zeros(1, 3, dtype=torch.double)
print("\nz datatype:",z.dtype)
print("z: ", z)


x datatype: torch.int64
x:  tensor([[0, 0, 0]])

y datatype: torch.float32
y:  tensor([[0., 0., 0.]])

z datatype: torch.float64
z:  tensor([[0., 0., 0.]], dtype=torch.float64)


A tensor can be constructed 
1. directly from data;

In [4]:
x = torch.tensor([5.5, 3])
print(x)

tensor([5.5000, 3.0000])


2. based on an existing tensor.

In [5]:
x = x.new_ones(2, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

y = torch.randn_like(x)                       #result will have the same size 
print(y)                                      

z = torch.randn_like(y, dtype=torch.float)    # override dtype!
print(z)  

#get sizes of tensors;

print("\nSize of the tensors:\n",x.size(), y.size(), z.size())

tensor([[1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[-0.0491, -0.5335,  0.5258],
        [ 0.9456, -0.0299,  0.5641]], dtype=torch.float64)
tensor([[ 0.4084,  0.4161, -0.6045],
        [ 2.5592, -0.4316, -0.4037]])

Size of the tensors:
 torch.Size([2, 3]) torch.Size([2, 3]) torch.Size([2, 3])




---



# **Tensor Operations:**

Operations can be performed with different syntaxes. For addition;


In [6]:
#syntax 1:
x = torch.rand(2, 3)
y = torch.randn_like(x)
print(x + y)

tensor([[0.4309, 1.4328, 0.1483],
        [1.0261, 1.1815, 1.6062]])


In [7]:
#syntax 2:
print(torch.add(x, y))

tensor([[0.4309, 1.4328, 0.1483],
        [1.0261, 1.1815, 1.6062]])


In [8]:
#syntax 3: an output tensor as argument
result = torch.empty(2, 3)
torch.add(x, y, out=result)

tensor([[0.4309, 1.4328, 0.1483],
        [1.0261, 1.1815, 1.6062]])

In [9]:
#syntax 4: in-place, post-fixed with an _
print(y)

y.add_(x)

print(y)

tensor([[-0.2219,  1.0630,  0.1108],
        [ 0.2272,  0.6959,  0.8086]])
tensor([[0.4309, 1.4328, 0.1483],
        [1.0261, 1.1815, 1.6062]])


NumPy-like indexing can be used for tensors

In [13]:
print(x,"\n")
print(x[:, 1])

tensor([[0.6528, 0.3697, 0.0375],
        [0.7989, 0.4855, 0.7976]]) 

tensor([0.3697, 0.4855])


Resize/reshape a tensor with `torch.view`

In [14]:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


To get the value as a Python number from a one element tensor

In [15]:
x = torch.randn(1)
print(x)
print(x.item())

tensor([-0.3720])
-0.3720138370990753


For 100+ Tensor operations you can visit;

https://pytorch.org/docs/stable/torch.html

---



# **Converting a Torch tensor to a NumPy array, and vice versa**

Torch tensors and numpy arrays can be converted to each other. 

In [0]:
#tensor to numpy
x = torch.ones(5)
print(x)
y = x.numpy()
print(y)

In [0]:
#numpy to tensor
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
print(a)
print(b)

If underlying memory locations is on CPU, changing one will change the other; 

In [0]:
x.add_(1)
print(x)
print(y)

In [0]:
np.add(a, 1, out=a)
print(a)
print(b)



---

# **CUDA Tensors**

Tensors can be moved onto any device using the `.to` method

In [0]:
# run this cell only if CUDA is available
# Use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!


---


# **`Autograd` Package**

 

* Provides automatic differentiation for all operations on Tensors
* A define-by-run framework (backprop is defined by how the code is run, and that every single iteration can be different)
* If the attribute `.requires_grad`  of a tensor is set to as `True`, all opeations on the tensor will be tracked


In [0]:
x = torch.ones(2, 2, requires_grad=True)
print(x)
y = x + 2
print(y)

In [0]:
#y was created as a result of an operation, so it has a grad_fn.
print(y.grad_fn)

In [0]:
z = y * y * 3
out = z.mean()

print(z, out)

In [0]:
#change an existing tensor’s requires_grad flag in-place

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

* When the computation is finished, `.backward()` can be calle to compute all the gradients automatically (gadient will be accumulated into `.grad` attribute)


In [0]:
out.backward()
print(x.grad)

* You can also stop autograd from tracking history by wrapping the code block in with torch.no_grad():

In [0]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

* For more infomation: https://pytorch.org/docs/stable/autograd.html#function



---



# **Neural Networks (NN)**

* NN can be construted with `torch.nn` package.



In [0]:
import torch.nn as nn

* nn depends on autograd to define models and differentiate them. 
* A typical training procedure for a neural network is as follows:

    **i.** Define the neural network that has some learnable parameters (or weights). 
    
    > The learnable parameters of a model are returned by net.parameters()

  **ii.** Iterate over a dataset of inputs

    **iii.** Process input through the network

    **iv.** Compute the loss (how far is the output from being correct).

    **v.** Propagate gradients back into the network’s parameters with loss.backward()

    **vi.** Update the weights of the network. 

    > This can be performed by any of the various different update rules that are implemented in `torch.optim` package 


In [0]:
#CREATE INPUT, AND OUTPUT 

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
print(x.size(), y.size())

In [0]:
#DEFINE NN:

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
print(model)

In [0]:
#LEARNABLE PARAMETERS IN THE MODEL:
params = list(model.parameters())
print("Lenght of learnable parameters: ",len(params))
print("Size of the first parameter: ", params[0].size()) 

In [0]:
# WE WILL NEED A LOSS FUNCTION FOR STEP iv.

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')
print("Loss function:", loss_fn)

In [0]:
# TO UPDATE WEIGHTS, LETS USE ADAM 
# THAT IS ALREAY IMPLEMENTED IN torch.optim
import torch.optim as optim


learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

print("Optimizer: ", optimizer)

In [0]:
 #TRAINING LOOP:
for t in range(500):

    # Forward pass: 
    # Feed input to the model 
    # and compute predicted.
 
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero the gradients before running the backward pass.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    #with torch.no_grad():
    #    for param in model.parameters():
    #        param -= learning_rate * param.grad


    # and update the weights.
    
    optimizer.step()            
