<a href="https://colab.research.google.com/github/albertomanfreda/intensive_school_ml/blob/master/LessonTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#PyTorch

PyTorch is a deep learning framework, mainly developed by the Facebook AI Research (FAIR) group. 

The fundamental PyTorch element are tensors. PyTorch tensors are similar to NumPy ndarray (in fact they are purposely created to behave as closely as possible to them), but with two additional capabilities:

* They are able to keep track of a computational graph and gradient
* They allow to perform operation on GPUs instead of CPUs

Let's see how to manage PyTorch Tensors and how they are different or similar to NumPy ndarrays

In [None]:
# Import PyTorch
import torch

t = torch.Tensor([[3, 2],
                  [1, 3]])
# Internally, the data are stored inside a data attribute
print(t.data)
# However you don't need to explictly call data most of the time, as you can
# access elements and properties directly from the tensor object.

# The type of t is torch.FloatTensor (32 bit)
print(t.type())
# The shape is an object of type torch.Size, rather than a tuple.
print(t.shape)
# This is the same as ndarray
print(len(t))

In [None]:
# Other functionalities look just the same, or very similar
t1 = torch.Tensor([[3, 2],
                   [1, 3]])
# Operation along axis
print(t1.mean(dim=0))
# Create an tensor of a give shape initialized to 0. 
t2 = torch.zeros((2, 2))
# Sum a scalar to all the elements
t3 = t2 + 1.5
# Element-wise multiplication
print(t1 * t2)
# Matrix multiplication (equivalent of dot or matmul)
print(torch.mm(t1, t3))

You can easily convert NumPy ndarrays into PyTorch Tensors and vice-versa using the *numpy()* and *from_numpy()* functions. However, such functions do not copy the data: the tensor and the ndarray will share the same memory space. If one changes, the other also  will.

In [None]:
import numpy as np
import torch

# Create a PyTorch tensor
t = torch.zeros((3, 4))
# Create a NumPy array from the torch Tensors
y = t.numpy()

# The tensor and the ndarray share the same memory space. If we change y...
y[1, 1] = 5
#...t will change as well
print(t)

# Convert a ndarray to a tensor
x = np.full((2, 3), 1.5)
z = torch.from_numpy(x)

Note that, by default, NumPy arrays are initialized as np.float64, while PyTorch tensors as torch.FloatTensor, which is a 32 bit type for GPU comatibility.

Mixing 32 bits and 64 bits objects during execution will most likely generate errors, so, in order to interoperate between Tensors and ndarrays, PyTorch offers functions to cast between these types.

Since these functions do copy the objects, they can be also used to disentangle the NumPy ndarray from the PyTorch tensor. Beware, however, that copying very large arrays is an expensive operation, in terms of memory and CPU/GPU power.


In [None]:
import numpy as np
import torch

# Create a PyTorch tensor
t = torch.zeros((3, 4))
# Cast to double and create a NumPy array out of it
y = t.double().numpy()

# The tensor and the ndarray do not share the memory space. If we change y...
y[1, 1] = 5
#...t will stay the same
print(t)

# Convert a ndarray to a tensor, then downcast it to 32 bit
x = np.full((2, 3), 1.5)
z = torch.from_numpy(x).float()

## CPU vs GPU operations

The CPU (central processing unit) is a general-purpose processor, which runs most of what happens inside your computer. It is mostly designed for sequential operations, though a certain degree of parallelization is present.

A GPU (graphics processing unit), on the other side, is a specialized type of microprocessor which is extremely good at performing heavy operations, like floating point arithmetic, in a highly parallelized way. Though they are primarily designed for quick image rendering, they have also become a popular tool for a variety of data science tasks, including ML.

PyTorch supports GPU operations through CUDA, a parallel computing platform developed by NVIDIA. That means that an NVIDIA video card is required for GPU computing to work. The PyTorch module which supports CUDA operations is called (unsurprisingly) **cuda**.

In [None]:
import torch

# Print the number of GPUs available
print('Number of CUDA Devices:', torch.cuda.device_count())

# Print the name of the GPUs available
for i in range(torch.cuda.device_count()):
  print('\tCUDA device: ', torch.cuda.get_device_name(i))
print ('Current cuda device number: ', torch.cuda.current_device())

# Check if the GPU is available
if torch.cuda.is_available():
    #Assign cuda GPU located at location '0' to a variable, so we can refer to it
    cuda0 = torch.device('cuda:0')
    #Performing the addition on GPU
    a = torch.ones(3, 2, device=cuda0) # creating a tensor 'a' on GPU
    b = torch.ones(3, 2).to(cuda0) # creating a tensor 'b' and move it on GPU
    c = a + b # c will be created automatically on GPU
    print(c)
    # Copy c to CPU
    d = c.cpu()
    print(d)


Is GPU really more powerful? Let's test that with an example similar to the NumPy one.

In [None]:
import time

def expensive_operation(device, num_trials=5):
    """ Perform 1000 times the sum between two 1000x1000 tensors and time it.
    Repeat the operation a given number of times, returns the best (lowest)
    execution time."""
    times = []
    for i in range(num_trials):
        a = torch.zeros(1000, 1000, device=device)
        b = a.new_full(a.shape, 1.5)
        tstart = time.time()
        for i in range(1000):
          a = a + b
        times.append(time.time() - tstart)
    return min(times)

num_trials = 5
best_time_cpu = expensive_operation(torch.device('cpu'), num_trials=num_trials)
print('Best of {:d} trials, cpu: {:.5f}'.format(num_trials, best_time_cpu))

best_time_gpu = expensive_operation(torch.device('cuda:0'),
                                    num_trials=num_trials)
print('Best of {:d} trials, gpu: {:.5f}'.format(num_trials, best_time_gpu))

## Gradient tracking and back-propagation

One of the feature of PyTorch Tensors is that they are able to keep track of all the operations that are performed with them and automatically compute the **gradient** of the operation. This is done using the **autograd** module.

The gradient tracking can be activated when the Tensor is created or later, and stopped at any time.

In [None]:
import torch
# Create a tensor and activate gradient tracking
v = torch.ones(1, 2, requires_grad=True)
# The gradient is stored in the grad attribute (empty at the start)
print(v.grad)
y = 3 * (v + 2)**2
z = y.mean()
# The gradients will be updated at each operation.
# If you want to do something without keeping track of the gradients you can do:
with torch.no_grad():
    z *=2

In [None]:
!pip install torchviz

In [None]:
# Let's take a look at the gradient function graph using torchviz
import torchviz
torchviz.make_dot(z)

**autograd** is able to follow the graph and compute the gradient of z w.r.t any previous step by repeteadly applying the chain rule:

$\frac{\partial z}{\partial v} = \frac{\partial z}{\partial y} \cdot \frac{\partial y}{\partial v}$

(i have omitted the vector and tensor notation for simplicity).

In [None]:
# Calculate the chain of all gradients (using the chain rule) backward from z
torch.autograd.backward(z)
# or simply z.backward()
# This will give us dz/dv, which in this case is equal to 3v + 6
print(v.grad)
# Zero all the gradients
v.grad.zero_()
print(v.grad)


The most common use case for this feature is to the **Gradient Descent** training algorithm, which is one of the fundamental building blocks of many ML techniques. Usually, one defines a **loss** function (which is what the algorithm wants minimize) and compute the gradient of the loss function backwards from the network parameters at each step of the training.

PyTorch, automating the gradient computation, makes this completely straightforward.

## The nn and optimize modules

We have introduced the low-level ingredients of PyTorch: tesnors, and automatic gradient computation.

On top of that there is a whole library of pre-build classes and functions that are meant to be assembled to create ML models easily. You will see a few of them at work in the next lessons.

The **torch.nn** and **torch.optimize** modules contain many building blocks for ML models. One the most fundamental is the **nn.module** class, which is meant to be a Base class for Neural Networks models.

You can define your own NN by *inheriting* from nn.module.

In [None]:
import torch

""" In order to inherit from nn.Module you have to define two class methods: 
1) The constructor (a.k.a. __init__)
2) forward(), which defines what happens at each training step """

class MyModel(torch.nn.Module):
    """ Example dummy class inheriting from nn.Module."""
    def __init__(self, *other_args):
        # Call the constructor of nn.Module with the Python 3 'super' syntax
        super(MyModel, self).__init__()
        # Here goes the initialization code
        pass
    
    def forward(self, input):
        # Here you define what happens at each training step
        pass

