# File summary

In PyTorch everything is done using tensor operations. Basically, tensors are similar to NumPy arrays, which we are all familiar with. However, there are few differences:

1) Tensors in PyTorch can do automatic differentiation, while in NumPy we have to write gradients by ourselves, or use some other tools to do so

2) Tensors are supported by GPU. So, in case of complicated and big calculations, tensors might be more preferable

3) Hence, tensors are more used in Deep Learning. However, NumPy is sufficient in many classical Machine Learning problems, while still being simple and well-known tool. Therefore, in Machine Learning problems NumPy is probably more popular.

# 1 Basics

In [1]:
import torch
import numpy as np

### 1.1 Tensors

In [2]:
# torch tensor is like np.array
torch.tensor([1.0, 2.0], requires_grad=True)
# requires_grad means that we are going to use this tensor to compute gradients

tensor([1., 2.], requires_grad=True)

In [3]:
# create an empty tensor of size 3 x 2
torch.empty(3, 2)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [4]:
# create a tensor of ones
torch.ones(4, 4)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [5]:
# create a tensor with random values of size 2 x 3
torch.rand(2, 3)

tensor([[0.3492, 0.3032, 0.5313],
        [0.3499, 0.4998, 0.5187]])

### 1.2 Operations with tensors

In [6]:
a = torch.tensor([[1,2], [3,4]])
print(a)
print()
b = torch.tensor([[5,6], [7,8]])
print(b)

tensor([[1, 2],
        [3, 4]])

tensor([[5, 6],
        [7, 8]])


In [7]:
a + b

tensor([[ 6,  8],
        [10, 12]])

In [8]:
# element-wise multiplication
a * b

tensor([[ 5, 12],
        [21, 32]])

In [9]:
# matrix multiplication
a @ b

tensor([[19, 22],
        [43, 50]])

### 1.3 Some additional things

In [10]:
# convert tensor to numpy array
b = torch.tensor([[5,6], [7,8]])
b = b.numpy()
type(b)

numpy.ndarray

In [11]:
# convert numpy array to tensor
a = np.array([[1,2], [3,4]])
b = torch.from_numpy(a)
a += 1
print(b)
# note that when we change initial array, the tensor changes as well

tensor([[2, 3],
        [4, 5]])


# 2 Gradient Descent 

Here I want to show the difference between implementing Gradient Descent algorithm in PyTorch and in NumPy. We will do it on the simplest case. Here I will not explain what is Gradient Descent. For more information, please check one of my previous projects (https://github.com/AlgazinovAleksandr/My_projects/blob/main/Convex_optimization.ipynb).

### 2.1 Example

Suppose we have a vector x = [1,2,3,4,5,6] and a vector of y, where $y_i = 3x_i + 7$. So, in general $y = mx + b$, and we will try to find parameters m and b using Gradient Descent.

In [12]:
x = np.array([1,2,3,4,5,6])
y = 3 * x + 7

### 2.2 NumPy

We will start with writing a classical Gradient Descent implementation. Note that we do not even use NumPy to make it work. Here NumPy is used just to create the arrays.

In [13]:
def gradient_descent_numpy(x, y, num_epochs=5000, learning_rate=.01):
    
    m = 1
    b = 1
    
    for i in range(num_epochs):
        
        y_pred = m * x + b
        grad_m = np.mean(2 * (y_pred - y) * x)
        grad_b = np.mean(2 * (y_pred - y))
        m -= learning_rate * grad_m
        b -= learning_rate * grad_b
            
    return m, b

In [14]:
m, b = gradient_descent_numpy(x, y)
m, b

(3.000000014098132, 6.999999939643098)

Everything seems simple besides the fact that we need to know the gradients. Sometimes, it might be complicated or at least take some time to calculate them. In such cases torch becomes very useful.

### 2.3 PyTorch

We already know that if we set requires_grad=True, gradients can be calculated automatically if we ask to do so. This makes the process simpler. Note that we will talk about the differentiation in PyTorch in the following jupyter notebooks, and here I just want to show an example.

In [15]:
x = torch.tensor([1,2,3,4,5,6])
y = 3 * x + 7

In [16]:
def gradient_descent_torch(x, y, num_epochs=5000, learning_rate=.01):
    
    m = torch.tensor(1.0, requires_grad=True)
    b = torch.tensor(1.0, requires_grad=True)
    optimizer = torch.optim.SGD([m, b], lr=learning_rate)
    
    for i in range(num_epochs):
    
        y_pred = m * x + b
        loss = torch.mean((y_pred - y)**2)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
    
    return m, b

In [17]:
m, b = gradient_descent_torch(x, y)
float(m), float(b)

(3.0000157356262207, 6.999932765960693)

As we see, we did not specify the gradients on our own and still managed to get the same answer as we did before.

# References

1) https://www.youtube.com/watch?v=c36lUUr864M&t=2181s - a great PyTorch course for beginners

2) https://github.com/AlgazinovAleksandr/My_projects/blob/main/Convex_optimization.ipynb - my project on Convex Optimization

3) https://pytorch.org/docs/stable/index.html - PyTorch documentation.