# pytorch (very) fast tutorial

***pytorch*** is a package designed to provide tools for fast development of Gradient-based deep learning (GBDL)

The key data structure in pytorch is the multidimensional array, that is called a ***tensor***

Lets play a little with tensors and their more frequent operations.

In [1]:
import torch

In [2]:
a = torch.randn(3,4)
print(a)
#tensor([[-0.5871,  1.9131,  1.0906,  0.1642],
#        [ 0.6189,  2.2898, -1.1778, -0.2386],
#        [ 1.0007,  2.2037,  0.8421,  0.3906] ])

tensor([[ 0.3561, -0.7537, -0.5583, -2.0276],
        [-0.8399,  0.3919, -1.6627, -0.9940],
        [ 0.4773,  0.3329, -0.5464, -1.1929]])


Tensors use 0 based indexes. Every element in a tensor is a tensor:

In [3]:
a[1,1]

tensor(0.3919)

If the value of a tensor is wanted, then ...

In [4]:
a[1,1].item()

0.39187049865722656

tensors can be accessed with slices

In [5]:
print(a[0,:])
a[:,0]

tensor([ 0.3561, -0.7537, -0.5583, -2.0276])


tensor([ 0.3561, -0.8399,  0.4773])

Vectoriced operations can be run with tensors

In [6]:
a = torch.randn(2,2)
print('a', a)
b = torch.randn(2,2)
print('b',b)
c = a + b # elementwise sum operation
print('a + b',c)
d = a * b # elementwise product operation
print('a * b',d)
e = a.mm(b) # matrix multiplication operation
print('A * B',e)
f = a.matmul(b[:,0]) # Matrix-vector multiplication
print('A * b',f)

a tensor([[ 0.1216, -0.6360],
        [ 0.5570,  0.8360]])
b tensor([[0.4479, 0.7091],
        [0.1676, 0.2687]])
a + b tensor([[0.5695, 0.0731],
        [0.7245, 1.1047]])
a * b tensor([[ 0.0545, -0.4510],
        [ 0.0933,  0.2247]])
A * B tensor([[-0.0521, -0.0847],
        [ 0.3895,  0.6196]])
A * b tensor([-0.0521,  0.3895])


A given tensor can be transformed to another tensor with different dimensions (we can arrenge its elements to other dimensions, e.g from 1 dimension to 2 or more dimensions)
Several examples are given bellow:

In [7]:
a = torch.randn(2,2) # 2 x 2 dimensions
print(a)
b = a.unsqueeze(-1) # 2 x 2 x 1 dimensions
print(b)
b = a.unsqueeze(0) # 1 x 2 x 2 dimensions
print(b)
c = a.view(4,1) # rearrange data as 4 x 1
print(c)
a = torch.randn(2) # 1 x 2 dimensions
print(a)
b = a.unsqueeze(-1) # 1 x 1 x 1 dimensions
print(b)
c = a.expand(3,2)
print(c)

tensor([[-0.0457,  1.8301],
        [-0.0730,  0.7090]])
tensor([[[-0.0457],
         [ 1.8301]],

        [[-0.0730],
         [ 0.7090]]])
tensor([[[-0.0457,  1.8301],
         [-0.0730,  0.7090]]])
tensor([[-0.0457],
        [ 1.8301],
        [-0.0730],
        [ 0.7090]])
tensor([-0.8775, -1.4500])
tensor([[-0.8775],
        [-1.4500]])
tensor([[-0.8775, -1.4500],
        [-0.8775, -1.4500],
        [-0.8775, -1.4500]])


How can we detect if CUDA is available in pytorch?

In [8]:
do_I_have_cuda = torch.cuda.is_available()
if do_I_have_cuda:
  device = torch.device('cuda')
  a = a.to(device)
else:
  device = torch.device('cpu')
  a = a.to(device)

In ANN (Artificial Neural Networks) there is three basic ways to do backproagation when there are n examples for training it:

*   ***Full Gradient Descent***: one step for all the n examples
*   ***Stochastic Gradient Descent***: one step for each example
*   ***Mini-batch Stochastic Gradient Descent***: n/m steps for all the n examples, m is the size of the mini-batch

Why mini-batch is so commonly used? Because it provides a more stable gradient estimate and because of computational efficency.

***Autograd***: Automatic differentiation

pytorch contains methods for automatically compute gradients (frequently used in the backpropagation phase during training)

In [None]:
x = torch.randn(1, requires_grad=True)
# x is a tnsor that will record gradients
print(x)
y = x.exp() # e^x
print(y)
y.backward() # For every [x1, ..., xk] values used to compute y
             #   dy/dx is computed and stored in x_i.grad
             # Here, dy/dx = e^x = y
print(x.grad, y)

The calls of backward accumulate, so if we do not want this, we can de-activated computing grads:

In [11]:
x = torch.randn(1, requires_grad=True)
print("Original x:", x)

# Perform computations without accumulating gradients
with torch.no_grad():
    y = x.exp()  # e^x
    print("Computed y without gradient accumulation:", y)

# Enable gradient accumulation again
x.requires_grad_(True)
y = x.exp()
print("Computed y with gradient accumulation:", y)

# Backward pass to compute gradients
y.backward()

# Gradients are now accumulated in x.grad
print("Gradients of x:", x.grad)


Original x: tensor([-0.3780], requires_grad=True)
Computed y without gradient accumulation: tensor([0.6852])
Computed y with gradient accumulation: tensor([0.6852], grad_fn=<ExpBackward0>)
Gradients of x: tensor([0.6852])
