# PyTorch

It is composed by:
1. PyTorch tensors = numpy + GPU
2. Autograd (automatic differentiation engine) to compute the gradients for tensor operations. Eg: backpropagation.
3. Deep learning library that contains pre-trained models, loss functions, etc.

In [1]:
import torch
torch.__version__


'2.7.0+cu126'

In [2]:
print(torch.cuda.is_available())

True


In [4]:
# 0D tensor (scalar)
tensor0d = torch.tensor(1)
print(tensor0d)
# 1D tensor (vector)
tensor1d = torch.tensor([1, 2, 3])
print(tensor1d)
# 2D tensor (matrix)
tensor2d = torch.tensor([[1, 2], [3, 4]])
print(tensor2d)
# 3D tensor
tensor3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(tensor3d)

tensor(1)
tensor([1, 2, 3])
tensor([[1, 2],
        [3, 4]])
tensor([[[1, 2],
         [3, 4]],

        [[5, 6],
         [7, 8]]])


In [5]:
# default 64-bit integer
print(tensor1d.dtype)

torch.int64


In [6]:
# default 32-bit precision
floatvec = torch.tensor([1.0, 2.0, 3.0])
print(floatvec.dtype)

torch.float32


In [7]:
# change type
tensor1d_float = tensor1d.to(torch.float32)
print(tensor1d_float.dtype)

torch.float32


In [8]:
# shape of a tensor
print(tensor0d.shape)
print(tensor1d.shape)
print(tensor2d.shape)
print(tensor3d.shape)

torch.Size([])
torch.Size([3])
torch.Size([2, 2])
torch.Size([2, 2, 2])


In [10]:
# reshape a tensor
tensor2d.reshape(4, 1)

tensor([[1],
        [2],
        [3],
        [4]])

In [11]:
# reshape a tensor (common method)
tensor2d.view(4, 1)

tensor([[1],
        [2],
        [3],
        [4]])

In [12]:
# Transpose
tensor2d.T

tensor([[1, 3],
        [2, 4]])

In [13]:
# matmul 1
tensor2d.matmul(tensor2d.T)

tensor([[ 5, 11],
        [11, 25]])

In [14]:
# matmul 2
tensor2d @ tensor2d.T

tensor([[ 5, 11],
        [11, 25]])

In [16]:
# Suppose we have a model with the weight w1 and th bias b,
# to compute the gradients, pytorch computes a graph in the background
# as shown in the following figure
import torch.nn.functional as F

y = torch.tensor([1.0]) # true label
x1 = torch.tensor([1.1]) # input
w1 = torch.tensor([2.2]) # weight
b = torch.tensor([0.0]) # bias

z = x1 * w1 + b
a = torch.sigmoid(z) # predicted label

loss = F.binary_cross_entropy(a, y)
print("[a]", a)
print("[y]", y)
print("[loss]", loss)

[a] tensor([0.9183])
[y] tensor([1.])
[loss] tensor(0.0852)


The following figure illustrates the graph of the above 'model'.

As long as the final node, in this case `loss = L(a,y)` has the requires_grad attribute set to True, pytorch will build the graph to compute the gradients.

The way pytorch compute the gradients is from right to left, called backpropagation, it starts from the output layer (loss) and goes backward to the input layer.

In this way, pytorch computes the gradient of the loss respect to each parameter (weights and biases) to update these parameters during training.

![pytorch_automatic_differentiation.png](./images/pytorch_automatic_differentiation.png)


In [17]:
# in the previous code the code pytorch didn't build the graph
# because there were no terminal nodes with the requires_grad
# as True. In this code, the graph is built
import torch.nn.functional as F
from torch.autograd import grad

y = torch.tensor([1.0])
x1 = torch.tensor([1.1])
w1 = torch.tensor([2.2], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)

z = x1 * w1 + b
a = torch.sigmoid(z)

loss = F.binary_cross_entropy(a, y)
# by default, the graph is deleted after the gradients are computed
# we retain it to use it later
grad_L_w1 = grad(loss, w1, retain_graph=True)
grad_L_b = grad(loss, b, retain_graph=True)

print(grad_L_w1)
print(grad_L_b)

(tensor([-0.0898]),)
(tensor([-0.0817]),)


In [18]:
# anyway, the common way to compute the gradients is using the
# method backward, the results will be stored in the grad attribute
loss.backward()
print(w1.grad)
print(b.grad)

tensor([-0.0898])
tensor([-0.0817])
