<table>
<tr>
<td width=15%><img src="./img/UGA.png"></img></td>
<td><center><h1>Introduction to Python for Data Sciences</h1></center></td>
<td width=15%><a href="https://tung-qle.github.io/" style="font-size: 16px; font-weight: bold">Quoc-Tung Le</a> </td>
</tr>
</table>


# 1 - Pytorch: a Numpy library that can differentiate

Pytorch is a Python library that are arguably the most popular for deep learning. It contains many similar functions that are implemented in Numpy (see the [second notebook](2_Numpy_and_co.ipynb)). Attention: many functions of these two libraries might have the same names, but their functionalities can be (entirely) different!

As we will see in this tutorial, in comparison to Numpy, Pytorch provides an additional important feature: Automatic Differentiation (AD, sometimes shortened to autodiff). That means if we implement a function using Pytorch library, we can compute its gradient with respect to its parameters efficiently. The gradient will be then used in the optimization algorithm (see Optimization course for more details).

The following code demonstrates the autodiff feature of Pytorch: in the following, we want to compute the gradient of the function:

$$f(x) = \frac{1}{2}x^\top \mathbf{A} x,$$

which are given by: $\nabla f(x) = \frac{1}{2}(\mathbf{A} + \mathbf{A}^\top)x$.

In [2]:

import torch

dim = 20
# Create a parameter x and a matrix A
x = torch.randn(dim, requires_grad=True)
A = torch.randn((dim, dim))

# Compute the function f(x) and assign to the variable y
y = 0.5 * torch.dot(x, torch.matmul(A, x))

# Differentiating the function f by calling y.backward()
y.backward()

# Accessing the gradient of f with respect to x
print(x.grad)
print(A.grad)

# Checking the calculation with the closed form gradient formula
try:
    torch.testing.assert_close(0.5 * torch.matmul(A + A.T,x), x.grad)
    print("Two vectors are equal. All is good")
except:
    print("Wrong calculation")


tensor([ 4.9299, -0.4740,  0.0187, -1.3583,  2.4464, -0.6302, -0.6530,  4.5564,
         3.0746,  3.0185, -1.2823,  0.3387, -2.5504, -2.9233,  5.8600, -2.8698,
        -0.3299, -0.1357,  0.0966, -1.9036])
None
Two vectors are equal. All is good


## 1.1 - How to create Pytorch Tensors

There are many different methods to create a Pytorch tensor, either using Python list, numpy array or even randomization. They are shown in the following:

In [49]:
# Initialize a tensor using Python list
x = torch.tensor([[1.0, 2.0, 3.0], [-3.0, -2.0, -1.0]])
print(x)

tensor([[ 1.,  2.,  3.],
        [-3., -2., -1.]])


Unlike a Numpy array, a Pytorch tensor has many metadata fields. The following code shows how to access the information.

In [50]:
def show_info(x):
    print("Tensor info:")
    print(f"  shape        : {tuple(x.shape)}")
    print(f"  size         : {x.size()}")
    print(f"  dtype        : {x.dtype}")
    print(f"  device       : {x.device}")         # The output is either CPU or GPU, depending on the your implementation and hardware
    print(f"  requires_grad: {x.requires_grad}")  # If requires_grad = False, it is impossible to differentiate a function w.r.t. x. See the example with the quadratic function
    print(f"  is_leaf      : {x.is_leaf}")        # is_leaf = True if and only if requires_grad = False (convention) or it is initialized by the user and not a result of some operations
    print(f"  data         :\n{x}")    

show_info(x)

Tensor info:
  shape        : (2, 3)
  size         : torch.Size([2, 3])
  dtype        : torch.float32
  device       : cpu
  requires_grad: False
  is_leaf      : True
  data         :
tensor([[ 1.,  2.,  3.],
        [-3., -2., -1.]])


One can manually assign specific values for these metadata fields right at the moment or after creation.

Pay attention to the field _requires\_grad_ : it determines whether a variable can be differentiated or not. Therefore, when using Pytorch to perform optimization tasks, you need to ensure that _requires\_grad_ is __True__. Otherwise, the parameter will never be updated (we will see more about this in the exercise).

In [51]:
# We modify several metadata field of the same tensor x that we created previously

x.requires_grad = True
x = x.to(dtype = torch.float16)

show_info(x)
# We can also create a new tensor x, with desired metadata 
y = torch.tensor([[1, 2, 3], [-3, -2, -1]], requires_grad = True, dtype = torch.float16)
show_info(y)

Tensor info:
  shape        : (2, 3)
  size         : torch.Size([2, 3])
  dtype        : torch.float16
  device       : cpu
  requires_grad: True
  is_leaf      : False
  data         :
tensor([[ 1.,  2.,  3.],
        [-3., -2., -1.]], dtype=torch.float16, grad_fn=<ToCopyBackward0>)
Tensor info:
  shape        : (2, 3)
  size         : torch.Size([2, 3])
  dtype        : torch.float16
  device       : cpu
  requires_grad: True
  is_leaf      : True
  data         :
tensor([[ 1.,  2.,  3.],
        [-3., -2., -1.]], dtype=torch.float16, requires_grad=True)


One can use a Numpy array as the value of a newly-created Pytorch tensor as well

In [None]:
# Create a tensor using Numpy arrays

import numpy as np

x_array = np.reshape(np.arange(20), (4,5))
x = torch.Tensor(x_array)
print(x_array)
print(x)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]
tensor([[ 0.,  1.,  2.,  3.,  4.],
        [ 5.,  6.,  7.,  8.,  9.],
        [10., 11., 12., 13., 14.],
        [15., 16., 17., 18., 19.]])


Finally, Pytorch offers several methods of creating random or special tensors

In [17]:
# Create a tensor by randomization

# Using uniform distribution
x = torch.rand((3,2))
print(x)
print(x.size())

# Using Gaussian distribution
x = torch.randn((3,2))
print(x)
print(x.size())

# Creat an identity, all-one and all-zero matrices
x_all_ones = torch.ones((3,2))
x_all_zeros = torch.zeros((3,2))
x_identity = torch.eye(3)  

print("All ones matrix:\n {}".format(x_all_ones))
print("All zeros matrix:\n {}".format(x_all_zeros))
print("Identity matrix:\n {}".format(x_identity))

tensor([[0.6433, 0.1101],
        [0.8081, 0.0404],
        [0.8471, 0.4355]])
torch.Size([3, 2])
tensor([[-0.3327, -2.7870],
        [-2.6802,  0.7631],
        [-0.0599, -1.4182]])
torch.Size([3, 2])
All ones matrix:
 tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
All zeros matrix:
 tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
Identity matrix:
 tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])


## 1.2 - Operation on Pytorch tensors

Pytorch provides a plethora of tensor operations. For a complete presentation of its functionalities, you are advised to visit [this Pytorch documentation](https://docs.pytorch.org/docs/stable/torch.html). In the following, we will only introduce several frequent operations.

### 1.2.1 - Slicing, Joining and Mutating Operations

In [63]:
# Torch reshape
x = torch.tensor([i for i in range(100)])
print("Original tensor:\n {} \n".format(x))

# Reshape x to (10,10)
x_reshaped_1 = torch.reshape(x, (10, 10))
print("Reshape to a matrix 10 x 10:\n {}\n".format(x_reshaped_1))

# Reshape x to (2, 5, 5, 2)
x_reshaped_2 = torch.reshape(x, (2, 5, 5, 2))
print("Reshape to a tensor of dimensions 2 x 5 x 5 x 2: \n {}".format(x_reshaped_2))

# You can also obtain the same result using x_reshape_1

try:
    torch.testing.assert_close(x_reshaped_2, torch.reshape(x_reshaped_1, (2,5,5,2)))
    print("Same results")
except:
    print("Different results")

Original tensor:
 tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
        36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
        54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
        72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
        90, 91, 92, 93, 94, 95, 96, 97, 98, 99]) 

Reshape to a matrix 10 x 10:
 tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])


In [75]:
# Torch dimension transpose and permutation 
x = torch.arange(10).reshape((2,5))
print(x)

# For 2D tensor, transpose can be simply done by
x_transpose_1 = x.T
print("First method:\n {}".format(x_transpose_1))

# For general tensor with multiple dimension, use torch.transpose(your_tensor, dim0, dim1)
# dim0, dim1: two dimensions that you want to switch
x_transpose_2 = torch.transpose(x, 0, 1)
print("Second method:\n {}".format(x_transpose_2))

x_3d_tensor = torch.arange(30).reshape((2,3,5))
x_transpose_3 = torch.permute(x_3d_tensor, (1,2,0))
print(x_transpose_3.size())

tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
First method:
 tensor([[0, 5],
        [1, 6],
        [2, 7],
        [3, 8],
        [4, 9]])
Second method:
 tensor([[0, 5],
        [1, 6],
        [2, 7],
        [3, 8],
        [4, 9]])
torch.Size([3, 5, 2])


In [83]:
# Torch concatenate
# Concatenate a matrix 2 x 5 with another of size 6 x 5 to get a 8 x 5 matrix
x = torch.arange(10).reshape((2,5))
y = torch.arange(30).reshape((6,5))
z = torch.concat([x,y], dim = 0)
print("Size of the first new concatenated tensor: {}".format(z.size()))

# Concatenate a matrix 5 x 2 with another of size 5 x 6 to get a 5 x 8 matrix
x = torch.arange(10).reshape((5,2))
y = torch.arange(30).reshape((5,6))
z = torch.concat([x,y], dim = 1)
print("Size of the second new concatenated tensor: {}".format(z.size()))

Size of the first new concatenated tensor: torch.Size([8, 5])
Size of the second new concatenated tensor: torch.Size([5, 8])


In [85]:
# Torch stack
# In comparison to torch concatenate, torch stack creates a new tensor with one more dimension

tensor_list = [torch.randn(3,2) for i in range(5)]
tensor_stacked_0 = torch.stack(tensor_list, dim = 0)
tensor_stacked_1 = torch.stack(tensor_list, dim = 1)
tensor_stacked_2 = torch.stack(tensor_list, dim = 2)

print(tensor_stacked_0.size())
print(tensor_stacked_1.size())
print(tensor_stacked_2.size())

torch.Size([5, 3, 2])
torch.Size([3, 5, 2])
torch.Size([3, 2, 5])


### 1.2.2 - Pointwise Operations

Similar to Numpy, Pytorch has many different function for pointwise operations, i.e., those of the forms:

$$\begin{pmatrix} x_{i_1\ldots i_n} \end{pmatrix}_{i_1,\ldots,i_n} \mapsto \begin{pmatrix} f(x_{i_1\ldots i_n}) \end{pmatrix}_{i_1,\ldots,i_n} \qquad \text{or} \qquad \begin{pmatrix} x_{i_1\ldots i_n} \end{pmatrix}_{i_1,\ldots,i_n} \times \begin{pmatrix} y_{i_1\ldots i_n} \end{pmatrix}_{i_1,\ldots,i_n} \mapsto \begin{pmatrix} g(x_{i_1\ldots i_n}, y_{i_1\ldots i_n}) \end{pmatrix}_{i_1,\ldots,i_n}$$

where $f: \mathbb{R} \to \mathbb{R}$ and $g: \mathbb{R} \times \mathbb{R} \to \mathbb{R}$.

In [None]:
x = torch.randn(3,5)

# f(x) = sin(x), cos(x), ReLU(x), |x|, exponent

print(x)
print(torch.sin(x))
print(torch.cos(x))
print(torch.relu(x))
print(torch.abs(x))
print(torch.exp(x))

tensor([[ 0.2950, -2.2121,  0.5332, -0.5436,  0.9198],
        [-0.6403, -1.1062,  0.6193,  0.2440,  0.7453],
        [ 0.3132, -0.3966,  0.8940,  0.8641,  1.7163]])
tensor([[ 0.2907, -0.8013,  0.5083, -0.5172,  0.7955],
        [-0.5974, -0.8940,  0.5805,  0.2416,  0.6782],
        [ 0.3081, -0.3863,  0.7796,  0.7605,  0.9894]])
tensor([[ 0.9568, -0.5983,  0.8612,  0.8558,  0.6060],
        [ 0.8019,  0.4481,  0.8143,  0.9704,  0.7349],
        [ 0.9513,  0.9224,  0.6263,  0.6493, -0.1450]])
tensor([[0.2950, 0.0000, 0.5332, 0.0000, 0.9198],
        [0.0000, 0.0000, 0.6193, 0.2440, 0.7453],
        [0.3132, 0.0000, 0.8940, 0.8641, 1.7163]])
tensor([[0.2950, 2.2121, 0.5332, 0.5436, 0.9198],
        [0.6403, 1.1062, 0.6193, 0.2440, 0.7453],
        [0.3132, 0.3966, 0.8940, 0.8641, 1.7163]])
tensor([[1.3431, 0.1095, 1.7044, 0.5806, 2.5087],
        [0.5272, 0.3308, 1.8577, 1.2764, 2.1070],
        [1.3678, 0.6726, 2.4450, 2.3729, 5.5637]])


In [90]:
# Note that these previous functions do not change the value of x. In fact, they will compute the value f(x) and save it to a new memory allocation
# We can compute the function in-place by adding _ at the end of these functions

x = torch.randn(3,5)
print("Original value:\n {}".format(x))
torch.sin_(x)
print("New value:\n {}".format(x))

Original value:
 tensor([[-0.7299, -1.5813,  0.1297, -0.8237,  0.5331],
        [ 0.9853,  0.1155, -1.3892,  1.4084,  0.6984],
        [ 1.4100, -0.7215,  1.0117,  1.2595,  1.7525]])
New value:
 tensor([[-0.6668, -0.9999,  0.1294, -0.7336,  0.5082],
        [ 0.8334,  0.1153, -0.9836,  0.9868,  0.6430],
        [ 0.9871, -0.6605,  0.8477,  0.9519,  0.9835]])


In [91]:
x = torch.randn(3,5)
y = torch.randn(3,5)

# g(x, y) = x + y, x - y, x * y, x / y
print(x + y)
print(x - y)
print(x * y)
print(x / y)

tensor([[-2.4660, -0.6406, -1.3284,  2.3999, -0.3675],
        [-0.8461, -2.2814,  0.0337,  0.9836, -0.1036],
        [ 0.7977, -1.7252,  0.6651,  1.0309, -2.0270]])
tensor([[-1.7531, -3.7535,  0.7771,  2.8292, -0.1012],
        [ 0.3928,  0.3413,  1.0025, -1.1508, -0.7393],
        [-0.6586, -0.8763, -0.4916, -2.5989,  0.0535]])
tensor([[ 0.7520, -3.4196,  0.2902, -0.5613,  0.0312],
        [ 0.1404,  1.2721, -0.2510, -0.0892, -0.1340],
        [ 0.0507,  0.5521,  0.0502, -1.4229,  1.0264]])
tensor([[  5.9176,  -1.4116,   0.2619, -12.1791,   1.7601],
        [  0.3659,   0.7397,  -1.0696,  -0.0783,  -1.3258],
        [  0.0955,   3.0645,   0.1499,  -0.4320,   0.9486]])


In the previous example, noticing that $x$ and $y$ share the same shapes. However, it is also possible to perform operations for $x$ and $y$ of different shapes under certain conditions. Broadcasting allows to reduce memory footprint, simplify code and optimized performance.

Consider $x \in \mathbb{R}^{3 \times 5}$ and $y \in \mathbb{R}^{5}$. The value of $x + y$ will be given by $x + y^\star$ where $y^\star \in \mathbb{R}^{3 \times 5}$ is a matrix whose rows are all equal to $y$.

In [106]:
x = torch.randn(3,5)
y = torch.randn(5)

# Use broadcasting
print(x + y)

# More explicit code, but lengthy and not memory efficient because we need to explicitly compute y*
print(x + torch.stack([y for i in range(3)], dim = 0))

# If you want to add a certain vector to all columns, then do the following
z = torch.randn(3)

# First, reshape the vector to (3,1). Pytorch broadcasting will handle the rest
z = z.reshape((3,1))
print(x + z)

tensor([[-1.8343, -1.3276, -0.1556, -2.9641,  0.3843],
        [-1.7685, -1.1395, -0.1941, -0.3524, -0.7120],
        [-0.0455, -2.1708, -0.1397, -1.9597, -0.1432]])
tensor([[-1.8343, -1.3276, -0.1556, -2.9641,  0.3843],
        [-1.7685, -1.1395, -0.1941, -0.3524, -0.7120],
        [-0.0455, -2.1708, -0.1397, -1.9597, -0.1432]])
tensor([[-1.9162, -1.5888, -0.8377, -2.8165, -0.9353],
        [ 0.1495,  0.5992,  1.1237,  1.7951, -0.0318],
        [-0.3130, -2.6175, -1.0074, -1.9976, -1.6484]])


### 1.2.3 - Matrix and Tensor Operations

Dot product between two tensors - $\mathtt{torch.dot}$ and tensor norm - $\mathtt{torch.norm}$

Matrix-matrix and matrix-vector multiplicaiton - $\mathtt{torch.matmul}$ and its batch version - $\mathtt{torch.bmm}$

Matrix inversion - $\mathtt{torch.inverse}$

Singular Value Decomposition: Given a matrix $\mathbf{A}$ (of arbitrary dimension), it can always be written as:

$$\mathbf{A} = \mathbf{U}\mathbf{D}\mathbf{V}^\top$$

where $\mathbf{U}, \mathbf{V}$ are orthogonal and $\mathbf{D}$ is a (possibly rectangular) diagonal matrix. Pytorch allows to compute such decomposition with $\mathtt{torch.svd}$

## 1.3 - Compute the gradient of a function

# 2 - Several Pytorch modules for neural networks training

# Exercise