### Deep Learning Course
     
2021/2022
***
Gonçalo Faria, Rita Ramos, Marcos Treviso
***
# Pytorch Basics

Pytorch is a platform for deep learning in Python/C++. It provides tools for efficiently creating, training, testing and analyzing neural networks. 

We devided the lab in 2 parts, showing PyTorch serves two broad purposes: 
* PART I : A replacement for NumPy to use the power of GPUs. 
* PART II : An automatic differentiation library that is useful to implement neural networks. 

In [24]:
%matplotlib inline
import torch
import numpy as np
import matplotlib.pyplot as plt

# Tensors

Here we introduce the most fundamental PyTorch concept: the [Tensor](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html).

### PyTorch Tensor vs. Numpy


#### Numpy

Numpy is a great framework, but it cannot utilize GPUs to accelerate its numerical computations. 

For modern deep neural networks, GPUs often provide speedups of 50x or greater, so unfortunately numpy won’t be enough for modern deep learning.

#### PyTorch Tensor

A PyTorch Tensor is similar to a numpy array: both datatypes store n-dimensional arrays, and they support similar operations.    

Unlike numpy arrays, PyTorch tensors can utilize GPUs (or other hardware accelerators) to perform numeric computations faster. Behind the scenes, Tensors can keep track of a computational graph and gradients, which makes them ideal for training neural networks with gradient-based techniques.

In [25]:
#with numpy:
v1 = np.array([2, 4, 6, 8])

#with pytorch:
v2 = torch.tensor([2, 4, 6, 8]) 

print("v1:", v1)
print("v2:", v2)

print("\nshape v2:", v2.shape) #one-dimensional tensor of size 4 (vector)

v1: [2 4 6 8]
v2: tensor([2, 4, 6, 8])

shape v2: torch.Size([4])


# Tensor Initialization

A tensor can be initialized directly from different ways, such as: 
   - data/manually
   - numpy
   - randomly
   - constant values

In [26]:
#from data/manually
data = [1, 2, 3]
x_data = torch.tensor(data)
print("x_data", x_data)


#from numpy
np_array = np.array(data)
x_np_ex1 = torch.from_numpy(np_array) #same memory (the tensor shares the same data) -> changes to np_array impact the tensor
x_np_ex2 = torch.tensor(np_array) # a new tensor is created ("clone") -> changes to np_array have NO impact on the tensor

print("\nnp_array", np_array)
print("x_np_ex1", x_np_ex1)
print("x_np_ex2", x_np_ex2)

np_array[0] = 10 #Changes to np_array impact the tensor initialized "from_numpy"
print("\nmodified np_array", np_array)
print("x_np_ex1", x_np_ex1) #affected
print("x_np_ex2", x_np_ex2) 
 

# randomly, constant, etc.
x_rand = torch.rand((3)) 
x_ones_ex1 = torch.ones((1, 3)) # one row with 3 collumns (like numpy)
x_ones_ex2 = torch.ones((3, 1)) # 3 rows with 1 collumns (like numpy)
x_zeros = torch.zeros(3)
x_ordered = torch.arange(3, 6)

print("\nx_rand", x_rand)
print("x_ones_ex1", x_ones_ex1)
print("x_ones_ex2", x_ones_ex2)
print("x_zeros", x_zeros)
print("x_ordered", x_ordered)

x_data tensor([1, 2, 3])

np_array [1 2 3]
x_np_ex1 tensor([1, 2, 3])
x_np_ex2 tensor([1, 2, 3])

modified np_array [10  2  3]
x_np_ex1 tensor([10,  2,  3])
x_np_ex2 tensor([1, 2, 3])

x_rand tensor([0.5137, 0.0750, 0.9502])
x_ones_ex1 tensor([[1., 1., 1.]])
x_ones_ex2 tensor([[1.],
        [1.],
        [1.]])
x_zeros tensor([0., 0., 0.])
x_ordered tensor([3, 4, 5])


# Tensor attributes:

Tensor attributes describe their <b> shape</b>, <b>datatype</b>, and the <b>device</b> on which they are stored.

In [27]:
x = torch.rand((1, 3))
print(x)
print("\nShape of tensor:", x.shape) # you can also use tensor.size()
print("Datatype of tensor:", x.dtype)
print("Device tensor is stored on:", x.device) #we will see later on how to move to GPU, if available. 

tensor([[0.6789, 0.9084, 0.3753]])

Shape of tensor: torch.Size([1, 3])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


# Examples of tensors with different shapes

### Scalar

In [28]:
x = torch.tensor(7)

print("x:", x)
print("x value:", x.item())
print("shape:", x.size())

x: tensor(7)
x value: 7
shape: torch.Size([])


### Vector

In [29]:
x = torch.tensor([7])
print("x:", x)
print("shape:", x.size())

x = torch.rand(1)
print("\nx:", x)
print("shape:", x.size())

x = torch.zeros(3)
print("\nx:", x)
print("shape:", x.size())

x = torch.arange(10, 17)
print("\nx:", x)
print("shape:", x.size())

x: tensor([7])
shape: torch.Size([1])

x: tensor([0.6515])
shape: torch.Size([1])

x: tensor([0., 0., 0.])
shape: torch.Size([3])

x: tensor([10, 11, 12, 13, 14, 15, 16])
shape: torch.Size([7])


### Matrix

In [30]:
x = torch.tensor([[2, 4, 6,8], [1, 2, 3,4], [7, 8, 9,10]])

print("x:\n", x)
print("shape:", x.size())

x = torch.ones(2, 2)
print("\n\nx:", x)
print("shape:", x.size())

x = torch.zeros(3, 1)
print("\n\nx:", x)
print("shape:", x.size())


x = torch.zeros(1, 3)
print("\n\nnx:", x)
print("shape:", x.size())


x= torch.arange(1, 10).view(3, 3) #view reshapes the tensor into a 3 by 3 matrix
print("\n\nx:\n", x)
print("shape:", x.size())

x:
 tensor([[ 2,  4,  6,  8],
        [ 1,  2,  3,  4],
        [ 7,  8,  9, 10]])
shape: torch.Size([3, 4])


x: tensor([[1., 1.],
        [1., 1.]])
shape: torch.Size([2, 2])


x: tensor([[0.],
        [0.],
        [0.]])
shape: torch.Size([3, 1])


nx: tensor([[0., 0., 0.]])
shape: torch.Size([1, 3])


x:
 tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
shape: torch.Size([3, 3])


### N-dimensional array

Imagine you have an image with (256 w,  256 h,  3 channels).  

In [31]:
x=torch.rand((5, 5, 3))
print("\nx:", x)
print("shape:", x.size())


x: tensor([[[0.5909, 0.5704, 0.6766],
         [0.9771, 0.7883, 0.5064],
         [0.7817, 0.8638, 0.3586],
         [0.2290, 0.4221, 0.4919],
         [0.0623, 0.5447, 0.7263]],

        [[0.8058, 0.4750, 0.4826],
         [0.6060, 0.0751, 0.4560],
         [0.1548, 0.9278, 0.9967],
         [0.5356, 0.7814, 0.9352],
         [0.0518, 0.1771, 0.7220]],

        [[0.6040, 0.8810, 0.2602],
         [0.9277, 0.5776, 0.7868],
         [0.2128, 0.1125, 0.6718],
         [0.9720, 0.9029, 0.7324],
         [0.5129, 0.6961, 0.3349]],

        [[0.2779, 0.3309, 0.1639],
         [0.1473, 0.5443, 0.7085],
         [0.9247, 0.1969, 0.7367],
         [0.3480, 0.8934, 0.4501],
         [0.0401, 0.0370, 0.7977]],

        [[0.8333, 0.0026, 0.9844],
         [0.0261, 0.2384, 0.0117],
         [0.6167, 0.1412, 0.3586],
         [0.2427, 0.5737, 0.1998],
         [0.4529, 0.2098, 0.7251]]])
shape: torch.Size([5, 5, 3])


Example of an actual image and having it stored in a tensor

In [32]:
from PIL import Image
import torchvision.transforms.functional as TF
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
image_tensor = TF.to_tensor(image)
image_tensor = torch.permute(
    image_tensor,
    [1, 2, 0]) # make the color dimention the last one.
print("shape", image_tensor.shape)
plt.imshow(image_tensor)
plt.show()

ConnectionError: HTTPConnectionPool(host='images.cocodataset.org', port=80): Max retries exceeded with url: /val2017/000000039769.jpg (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6a8dbdf3a0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

Imagine now you want to represent a batch of images. Ex: for each iteration, 10 training examples of images.

In [None]:
urls = ["http://images.cocodataset.org/val2017/000000039769.jpg", 
        "http://images.cocodataset.org/val2017/000000039769.jpg"]
#all_images=torch.tensor([])
batched_images = [] 
for url in urls:
    image = Image.open(requests.get(url, stream=True).raw)
    image_tensor = torch.permute(
        TF.to_tensor(image),
        [1,2,0]) # make dimention the last one. 
    batched_images.append( image_tensor )

all_images = torch.stack(batched_images,axis=0) #concatenate in the first dim

print("shape", all_images.shape)


# Indexing

Indexing is similar to numpy, you can access values of specific row(s) and column(s). 

In [None]:
# = Matrix indexing = #
A= torch.arange(1, 10).view(3, 3)
print("A",A)

# Simple indexing
print("\nA[0]:", A[0]) #indexing by row 0
print("A[1]:", A[1]) #indexing by row 1
print("A[1, 2]:", A[1, 2])  # indexing row 1 with column 2 (More efficient)
print("A[0][2]:", A[0][2])  # indexing row 0 with column 2 (Less efficient)

# -- Slicing

# Rows between 1 and 2 (excluding the latter), 
# columns between 0 and 1 (excluding the latter)
print("A[1:2,0:1]:", A[1:2, 0:1])

# All rows except the last two,
# every other column
print("A[:-2,::2]:", A[:-2, ::2]) 

# -- Tensors as indices
#You can also do multi-index selection with gather
indexes = torch.tensor([[0, 2, 2], 
                        [0, 1, 1], 
                        [2, 0, 1]])

print("\n\nindexes", indexes)

indexed_dim1 = torch.gather(A, 1, indexes) # dim is 1
print("\nindexed_dim1", indexed_dim1)

indexed_dim0 = torch.gather(A, 0, indexes) # dim is 0
print("indexed_dim1", indexed_dim0)

# Operations

### Elementwise operations

In [None]:
v1 = torch.arange(10)
v2 = torch.arange(10, 20)
print("v1", v1)
print("v2",v2)

In [None]:
v1 + v2

In [None]:
v1 * v2

Torch supports true division between integer-valued tensors.

In [None]:
v1 / v2

Before PyTorch 1.7, dividing with integer tensors would do truncated division. If you still want this, you can run

In [None]:
torch.div(v1, v2, rounding_mode="trunc")

#### Operations with constants

In [None]:
x

In [None]:
x + 1

In [None]:
x ** 2

#### Multiplication of matrices

In [None]:
m1 = torch.rand(5, 4)
m2 = torch.rand(4, 5)

print("m1: %s\n" % m1)
print("m2: %s\n" % m2)
print(m1.matmul(m2)) #dot is for numpy -> use matmul instead

Oops... that can be misleading if you are used to numpy. Instead, call `matmul` (or `@`)

In [None]:
print(m1 @ m2)

`matmul` and `@` also support batching, so it is possible to multiply several pairs of matrices with a single function call.

In [None]:
B = 3
M = 5
N = 7
P = 2

m1 = torch.randn(B, M, N)
m2 = torch.randn(B, N, P)

prod = m1.matmul(m2)

print(prod)
print(prod.shape)

You can have more than one batch dimension as well:

In [None]:
B1 = 2
B2 = 3
N = 5
M = 7
P = 11

m1 = torch.rand(B1, B2, N, M)
m2 = torch.rand(B1, B2, M, P)

print(m1.matmul(m2).shape) 

Another option is to use the powerful `einsum` function. Let's say our input has the following representation:
- `b` = batch size 
- `c` = channels
- `i` = `m1` timesteps
- `j` = `m2` timesteps
- `d` = hidden size

In [None]:
torch.einsum('bcid,bcdj->bcij', m1, m2)

Learn more about `einsum` here: https://pytorch.org/docs/master/generated/torch.einsum.html#torch.einsum

### Broadcasting

Broadcasting means doing some arithmetic operation with tensors of different ranks, as if the smaller one were expanded, or broadcast, to match the larger.

Let's experiment with a matrix (rank 2 tensor) and a vector (rank 1).

In [None]:
m = torch.rand(5, 4)
v = torch.arange(4)

print("m:", m)
print("\nv:", v)

In [None]:
m_plus_v = m + v
print("m + v:\n", m_plus_v)

We can also reshape tensors (note that the function name is different from numpy)

In [None]:
v = v.view(2, 2)
v

In [None]:
v = v.view(4, 1)
v

Note that shape `[4, 1]` is not broadcastable to match `[5, 4]`!

In [None]:
m + v

... but `[1, 4]` is!

In [None]:
v = v.view(1, 4)
m + v

Two tensors are “broadcastable” if the following rules hold:

- Each tensor has at least one dimension.

- When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.

In [None]:
x = torch.rand(5, 7, 3)
y = torch.rand(5, 7, 3)
z = x + y
# same shapes are always broadcastable (i.e. the above rules always hold)

In [None]:
x = torch.rand((0,))
y = torch.rand(2,2)
z = x + y
# x and y are not broadcastable, because x does not have at least 1 dimension

In [None]:
# can line up trailing dimensions
x = torch.empty(5,3,4,1)
y = torch.empty(  3,1,1)
z = x + y
# x and y are broadcastable.
# 1st trailing dimension: both have size 1
# 2nd trailing dimension: y has size 1
# 3rd trailing dimension: x size == y size
# 4th trailing dimension: y dimension doesn't exist

In [None]:
# but:
x = torch.empty(5,2,4,1)
y = torch.empty(  3,1,1)
z = x + y
# x and y are not broadcastable, because in the 3rd trailing dimension 2 != 3

Always take care with tensor shapes! It is a good practice to verify in the interpreter how some expression is evaluated before inserting into your model code. 

In other words, **you can use pytorch's dynamic graph creation ability to debug your model by printing tensor shapes!**

And see more about broadcasting here: https://pytorch.org/docs/master/notes/broadcasting.html

# But what about GPUs?

You can change default device of cpu to GPU, if you have one. 

In [None]:
my_device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("my_device:", my_device)

x = torch.eye(3)  # data is on the cpu 
print("By default device tensor is stored on:", x.device)

# you can move data to the GPU by doing .to(device)
x=x.to(my_device)  # data is moved to my_device
print("\nDevice tensor is now stored on:", x.device) #it will still be cpu if you don't have gpu

If you have a GPU you should get something like: 
`device(type='cuda', index=0)` -> now the computation happens on the GPU.

# Tutorials: 

Get familiar with pytorch with more tutorials: 
- https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
- https://machinelearningmastery.com/pytorch-tutorial-develop-deep-learning-models/