# PyTorch Fundamentals

## Importing PyTorch

In [1]:
import torch
torch.__version__

'2.1.1'

## Tensor

Tensors are multi-dimensional matrices.  
PyTorch tensors are objects of the `torch.Tensor` class.

## Scalar

Scalars are zero-dimensional tensors, or in other words, tensors that hold a single element.

In [2]:
scalar = torch.tensor(7)
scalar

tensor(7)

In [3]:
scalar.ndim

0

The `item()` method returns the singular element of a scalar.  
It only works for scalars.

In [4]:
scalar.item()

7

## Vector

Vectors are one-dimensional tensors.

In [5]:
vector = torch.tensor([5, 3])
vector.shape

torch.Size([2])

## Matrix

Matrices are two-dimensional tensors.  
It is conventional to name matrices and tensors using upppercase, whereas scalars and vectors are named using lowercase.  
The terms matrix and tensor are often used interchangably.

In [6]:
MATRIX = torch.tensor([[1, 2], [3, 4]])
MATRIX.shape

torch.Size([2, 2])

## Initialising tensors

In [7]:
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtype

(tensor([[0.1635, 0.4552, 0.0026, 0.1222],
         [0.5348, 0.8299, 0.7334, 0.9770],
         [0.1373, 0.3247, 0.7063, 0.7918]]),
 torch.float32)

In [8]:
zeros = torch.zeros(size=(3, 4))
zeros

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [9]:
ones = torch.ones(size=(3, 4))
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

## Creating a range

In [10]:
zero_to_ten = torch.arange(start=0, end=10, step=1)
zero_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## Creating a likeness

In [11]:
zeros = torch.zeros_like(zero_to_ten)
ones = torch.ones_like(zero_to_ten)
zeros, ones

(tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
 tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]))

## Datatypes

The most common and default datatype is `torch.float32` or `torch.float`.  
`torch.float16` is also called `torch.half`.  
`torch.float64` is also called `torch.double`.  
The default value of the `dtype` attribute is `torch.float32`.  
The `requires_grad` attribute is set to `True` if we want the operations on the tensor to be recorded.

In [12]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0], dtype=torch.float16, device=None, requires_grad=False)
float_16_tensor.shape, float_16_tensor.dtype, float_16_tensor.device

(torch.Size([3]), torch.float16, device(type='cpu'))

## Getting information from tensors

In [13]:
sample_tensor = torch.rand(3, 4)
print(sample_tensor)
print(f"Shape of tensor: {sample_tensor.shape}")
print(f"Datatype of tensor: {sample_tensor.dtype}")
print(f"Device of tensor: {sample_tensor.device}")

tensor([[0.4584, 0.6134, 0.1268, 0.4145],
        [0.5393, 0.5050, 0.2049, 0.0521],
        [0.0531, 0.6261, 0.4240, 0.4981]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device of tensor: cpu


## Operations on tensors

In [14]:
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [15]:
tensor * 10

tensor([10, 20, 30])

In [16]:
tensor * tensor

tensor([1, 4, 9])

## Matrix multiplication

There are two rules for performing matrix multiplication.  
The inner dimensions must match - `(3, 2) @ (2, 4)` works as the inner dimensions are both `2`.  
The resulting matrix has the shape of the outer dimensions - `(3, 2) @ (2, 4)` results in a matrix of dimensions `(3, 4)`.

In [17]:
A = torch.rand(3, 2)
B = torch.rand(2, 4)
A @ B

tensor([[0.0443, 0.0967, 0.0794, 0.0809],
        [0.5437, 0.9895, 0.8872, 0.5852],
        [0.2664, 0.7250, 0.5410, 0.7851]])

In [18]:
torch.matmul(A, B)

tensor([[0.0443, 0.0967, 0.0794, 0.0809],
        [0.5437, 0.9895, 0.8872, 0.5852],
        [0.2664, 0.7250, 0.5410, 0.7851]])

In [19]:
t = torch.tensor([1, 2, 3])
torch.matmul(t, t)

tensor(14)

In [20]:
A = torch.rand(3, 2)
B = torch.rand(3, 2)
torch.mm(A, B.T)

tensor([[0.8957, 0.4147, 0.3146],
        [0.6063, 0.2824, 0.2141],
        [0.3311, 0.2969, 0.2105]])

`torch.nn.Linear` implements matrix multiplication between an input layer `x` and a weights matrix `A`.  
The operation it performs is represented by the equation $y = x \cdot A^T + b$.
The `manual_seed()` method is used for seeding, to ensure that the code that follows always produces the same output, despite the linear layer relying on random weights and biases.

In [21]:
torch.manual_seed(30)
linear = torch.nn.Linear(in_features=2, out_features=6)
x = torch.tensor([[1, 2], [3, 4], [5, 6]], dtype=torch.float32)
linear(x)

tensor([[1.8126, 0.8936, 0.6832, 0.6857, 1.4842, 0.0325],
        [3.6429, 1.8705, 2.1661, 2.1521, 2.2470, 1.0608],
        [5.4732, 2.8474, 3.6489, 3.6184, 3.0098, 2.0891]],
       grad_fn=<AddmmBackward0>)

## Aggregations

Some methods such as `mean()` require tensors to be of a certain dataype (such as `torch.float32`).  
The returned result of these methods is in the form of a zero-dimensional tensor.

In [22]:
x = torch.arange(0, 100, 10)
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
print(f"Mean: {x.type(torch.float32).mean()}")
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450


The same operations can be performed as `torch` methods.

In [23]:
torch.min(x), torch.max(x), torch.mean(x.type(torch.float32)), torch.sum(x)

(tensor(0), tensor(90), tensor(45.), tensor(450))

In [24]:
print(f"Position where maximum element occurs: {x.argmax()}")
print(f"Position where minimum element occurs: {x.argmin()}")

Position where maximum element occurs: 9
Position where minimum element occurs: 0


## Changing datatype

In [25]:
tensor = torch.arange(0.0, 100.0, 10.0)
print(tensor)
tensor = tensor.type(torch.int8)
print(tensor)

tensor([ 0., 10., 20., 30., 40., 50., 60., 70., 80., 90.])
tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)


## Reshaping versus creating a view

The `reshape()` method tries to create a tensor which is a copy of the original tensor.  
The `view()` method tries to create a tensor which shares the same underlying memory as the original tensor.  
To ensure that an entirely new tensor is created, the `clone()` method is used.

In [26]:
x = torch.tensor([1, 2, 3, 4, 5, 6])
y = x.reshape(2, 3)
y[:, 2] = 7
print(x)
print(y)

tensor([1, 2, 7, 4, 5, 7])
tensor([[1, 2, 7],
        [4, 5, 7]])


In [27]:
z = x.view(2, 3)
z[:, 2] = 9
print(x)
print(z)

tensor([1, 2, 9, 4, 5, 9])
tensor([[1, 2, 9],
        [4, 5, 9]])


In [28]:
w = x.reshape(2, 3).clone()
w[:, 2] = 8
print(x)
print(w)

tensor([1, 2, 9, 4, 5, 9])
tensor([[1, 2, 8],
        [4, 5, 8]])


## Other operations on tensors

The `dim` attribute of the `stack()` method determines the dimension along which the tensors are stacked.

In [29]:
x = torch.arange(0, 5, 1)
a = torch.stack([x, x, x, x], dim=0)
print(a)
b = torch.stack([x, x, x, x], dim=1)
print(b)

tensor([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]])
tensor([[0, 0, 0, 0],
        [1, 1, 1, 1],
        [2, 2, 2, 2],
        [3, 3, 3, 3],
        [4, 4, 4, 4]])


In [30]:
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
print(torch.stack([x, y], dim = 0))
print(torch.stack([x, y], dim = 1))

tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[1, 4],
        [2, 5],
        [3, 6]])


The `squeeze()` method gets rid of all single-entried dimensions.

In [31]:
tensor = torch.rand(1, 3, 1, 2)
print(tensor.shape)
tensor = tensor.squeeze()
print(tensor.shape)
tensor = tensor.unsqueeze(dim=0)
tensor = tensor.unsqueeze(dim=2)
print(tensor.shape)

torch.Size([1, 3, 1, 2])
torch.Size([3, 2])
torch.Size([1, 3, 1, 2])


The `permute` method is used for re-arranging tensors.

In [32]:
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
x = x.permute(1, 0)
print(x)
print(x.shape)

tensor([[1, 4],
        [2, 5],
        [3, 6]])
torch.Size([3, 2])


In [33]:
x = torch.rand(3, 200, 200)
x = x.permute(1, 2, 0)
print(x.shape)

torch.Size([200, 200, 3])


## Indexing

In [34]:
grid = torch.arange(1, 10, 1).reshape(1, 3, 3)
print(grid)
print(grid[0])
print(grid[0][0])
print(grid[0, :, 1])

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
tensor([1, 2, 3])
tensor([2, 5, 8])


## Tensors and NumPy arrays

In [35]:
import numpy as np

In [36]:
array = np.arange(0.0, 10.0)
print(array)
tensor = torch.from_numpy(array)
print(tensor)
array = tensor.numpy()
print(array)

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=torch.float64)
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]


## Seeding

`torch.manual_seed()` is used to set the seed for random number generation on all devices.  
`torch.random.manual_seed()` is used to seed the CPU.

In [37]:
torch.manual_seed(seed=30)
a = torch.rand(3, 4)
print(a)
torch.random.manual_seed(seed=30)
b = torch.rand(3, 4)
print(b)
print(a == b)

tensor([[0.9007, 0.7464, 0.4716, 0.8738],
        [0.7403, 0.7840, 0.8946, 0.6238],
        [0.4276, 0.8421, 0.7454, 0.6181]])
tensor([[0.9007, 0.7464, 0.4716, 0.8738],
        [0.7403, 0.7840, 0.8946, 0.6238],
        [0.4276, 0.8421, 0.7454, 0.6181]])
tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])


## Using GPU

In [38]:
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cpu'

The `to()` method puts a tensor on a particular device.

In [39]:
tensor = torch.rand(2, 2)
tensor.to(device)
tensor

tensor([[0.8883, 0.4127],
        [0.1748, 0.3426]])

The `cpu()` method puts a tensor on the CPU.  
One reason for having to put a tensor on the CPU is to make it compatible with NumPy.

In [40]:
tensor = tensor.cpu().numpy()
tensor

array([[0.88825244, 0.41274506],
       [0.17482036, 0.3426432 ]], dtype=float32)