# PyTorch

PyTorch is a deep learning framework used for research and development in machine learning and artificial intelligence.

A tensor is a fundamental data structure that is similar to arrays or matrices. Tensors are the building blocks of neural networks and are used to represent data in the form of multi-dimensional arrays

### Types of Tensors
![](./Tensor.PNG)

In [1]:
import torch
import torch.nn as nn
import numpy as np

In [2]:
scalar = torch.tensor(42.0)  # Creates a scalar tensor with the value 42.0. has 0 dimensions
vector = torch.tensor([1, 2, 3, 4, 5])  # Creates a 1-D tensor with 5 elements
matrix = torch.tensor([[1, 2, 3], [4, 5, 6]])  # Creates a 2-D tensor with 2 rows and 3 columns
four_dim_tensor = torch.randn(32, 3, 64, 64)  # Create a 4-D tensor with shape (batch_size, channels, height, width)
four_dim_tensor[0]

tensor([[[ 6.6731e-01,  4.1031e-01, -1.5540e-01,  ...,  2.0594e-01,
          -7.7255e-01, -1.1984e-01],
         [-5.6662e-01,  7.0067e-01, -3.4180e-01,  ..., -1.1127e+00,
           1.8683e+00, -1.1175e+00],
         [ 8.8551e-01, -6.9390e-01,  6.1661e-01,  ...,  9.9827e-01,
           5.2346e-01,  1.1848e-01],
         ...,
         [-1.3863e+00, -1.5750e+00, -9.7003e-01,  ..., -2.8985e-01,
           1.1005e+00, -3.1892e-01],
         [-4.7203e-01, -2.4616e-01, -1.9265e+00,  ..., -7.0816e-01,
          -5.6475e-01, -8.3122e-01],
         [-1.1669e+00, -1.0620e-03, -1.9500e-02,  ...,  7.5206e-01,
           1.8913e+00,  6.1879e-01]],

        [[ 4.6298e-01,  5.4132e-01, -6.9952e-01,  ..., -4.9100e-01,
          -7.0300e-01, -6.1826e-01],
         [-8.0461e-01, -6.8585e-01, -5.3335e-01,  ..., -5.4940e-02,
          -2.2417e-01, -9.4653e-01],
         [ 1.5007e+00, -9.9051e-01,  1.6570e-02,  ..., -3.8530e-01,
          -2.6853e-01, -4.9141e-01],
         ...,
         [ 8.4170e-01,  5

Different arguments can be provided for tensor creation:
* Data
* Dtype
* Device (specify the device (CPU or GPU) on which the tensor should be located using this argument. If not provided, the tensor will be created on the CPU by default.)
* Requires_grad (If set to True, the tensor will be set up to track operations on it for automatic differentiation (autograd) during backpropagation. This is useful for gradient-based optimization and training deep learning models.)

In [3]:
tensor = torch.tensor(data=[[1, 2, 3], [4, 5, 6]], 
dtype=torch.float32, 
device='cpu', 
requires_grad=False)

In [4]:
tensor = torch.tensor(data=[[1, 2, 3], [3, 2, 3]])
tensor.type(torch.float32)
tensor.numel()  # total elements in tensor

6

In [5]:
reshaped_tensor = torch.reshape(tensor, (3, 2))
reshaped_tensor

tensor([[1, 2],
        [3, 3],
        [2, 3]])

In [6]:
reshaped_tensor = torch.reshape(tensor, (-1, 2))  # -1 is used to infer one of dimensions
reshaped_tensor

tensor([[1, 2],
        [3, 3],
        [2, 3]])

In [7]:
x = torch.randn(32, 3, 64, 64)
x_flattened = x.view(x.size(0), -1)
x_flattened

tensor([[-2.1679, -0.4802,  0.6324,  ...,  0.7223, -0.5063, -0.7060],
        [ 1.2336,  0.3826,  0.2796,  ..., -0.2616,  0.6765, -0.0804],
        [-1.3299,  0.9156, -0.3471,  ...,  0.0298, -0.8025,  0.7965],
        ...,
        [ 0.9928, -0.0565, -0.2597,  ..., -0.2338,  0.4734,  0.0774],
        [-0.2038, -0.8475, -1.2998,  ...,  2.0775,  0.3445,  0.5881],
        [-0.6292, -2.4918,  1.1873,  ...,  0.4216,  2.0810,  1.0084]])

In [8]:
expanded_tensor = torch.unsqueeze(tensor, dim=0)  #  Returns a new tensor with a dimension of size one inserted at the specified position.
expanded_tensor.size(), tensor.size()

(torch.Size([1, 2, 3]), torch.Size([2, 3]))

### Permute function
The permute() function allows to rearrange dimensions in a tensor, providing with the flexibility to change the shape and orientation of data

In [9]:
permuted_tensor = tensor.permute(1, 0)  # Swap dimensions 0 and 1
permuted_tensor.shape, tensor.shape

(torch.Size([3, 2]), torch.Size([2, 3]))

In [10]:
tensor, permuted_tensor

(tensor([[1, 2, 3],
         [3, 2, 3]]),
 tensor([[1, 3],
         [2, 2],
         [3, 3]]))

In [11]:
# Transposing a Tensor (Swapping Rows and Columns)
transposed_tensor_1 = tensor.t()
transposed_tensor_2 = torch.transpose(tensor, 0, 1)  # Swap axes 0 and 1

print(f'The original tensor shape is: {tensor.shape},\n' 
      f'The transposed tensor using .t shape is: {transposed_tensor_1.shape},\n' 
      f'The transposed tensor using .tranpose shape is: {transposed_tensor_2.shape}')

The original tensor shape is: torch.Size([2, 3]),
The transposed tensor using .t shape is: torch.Size([3, 2]),
The transposed tensor using .tranpose shape is: torch.Size([3, 2])


Addition and subtraction between tensors same shape

In [12]:
tensor_a = torch.tensor([[4, 5, 7], [8, 9, 0]])
tensor_b = torch.tensor([[5, 4, 3], [9, 8, 7]])

tensor_a + tensor_b

tensor([[ 9,  9, 10],
        [17, 17,  7]])

In [13]:
tensor_a - tensor_b

tensor([[-1,  1,  4],
        [-1,  1, -7]])

Element-wise multiplication between 2 tensors of same shape

In [14]:
tensor_a * tensor_b

tensor([[20, 20, 21],
        [72, 72,  0]])

Matrix-wise multiplication (dot product) between 2 tensors where the inner dimensions match (the number of columns in the first tensor equals the number of rows in the second tensor)

In [15]:
tensor_c = torch.tensor([[5, 4, 3], [9, 8, 7], [1, 1, 1]])
matmu = torch.matmul(tensor_a, tensor_c)
matmu

tensor([[ 72,  63,  54],
        [121, 104,  87]])

In [16]:
div = tensor_a / tensor_b
div

tensor([[0.8000, 1.2500, 2.3333],
        [0.8889, 1.1250, 0.0000]])

In [17]:
result_exp = tensor_a ** tensor_b
result_exp

tensor([[     1024,       625,       343],
        [134217728,  43046721,         0]])

In [18]:
result_sqrt = torch.sqrt(tensor_a)
result_sqrt

tensor([[2.0000, 2.2361, 2.6458],
        [2.8284, 3.0000, 0.0000]])

In [19]:
result_log = torch.log(tensor_a)  # natural logarithm (base e)
result_log

tensor([[1.3863, 1.6094, 1.9459],
        [2.0794, 2.1972,   -inf]])

In [20]:
tensor_a = tensor_a.type(torch.float32)
total_sum = torch.sum(tensor_a)

# Compute the mean along axis 1 (rows)
mean_along_rows = torch.mean(tensor_a, dim=1)

# Compute the maximum value along axis 0 (columns)
max_along_columns = torch.max(tensor_a, dim=0)

# Compute the minimum value along axis 1 (rows)
min_along_rows = torch.min(tensor_a, dim=1)

total_sum, mean_along_rows, max_along_columns, min_along_rows

(tensor(33.),
 tensor([5.3333, 5.6667]),
 torch.return_types.max(
 values=tensor([8., 9., 7.]),
 indices=tensor([1, 1, 0])),
 torch.return_types.min(
 values=tensor([4., 0.]),
 indices=tensor([0, 2])))

Broadcasting in PyTorch

The key idea behind broadcasting is that the smaller tensor is "broadcasted" or expanded to match the shape of the larger one

In [21]:
scalar = 2
result_broadcast = tensor_a + scalar
print(f'broadcast results is: {result_broadcast} and of shape {result_broadcast.shape}')

broadcast results is: tensor([[ 6.,  7.,  9.],
        [10., 11.,  2.]]) and of shape torch.Size([2, 3])


Two 2x2 tensors, tensor_a and tensor_b, and we want to concatenate them along dimension 0 to create a new tensor with a shape of 4x2.

In [22]:
tensor_a, tensor_b = torch.tensor([[2, 2], [2, 2]]), torch.tensor([[4, 4], [4, 4]])
concatenated_tensor = torch.cat((tensor_a, tensor_b), dim=0)
print(f'concatenated tensor is: {concatenated_tensor} and of shape {concatenated_tensor.shape}')

concatenated tensor is: tensor([[2, 2],
        [2, 2],
        [4, 4],
        [4, 4]]) and of shape torch.Size([4, 2])


### AutoGrad and Gradients

Autograd, short for Automatic Differentiation, is a key feature of PyTorch that allows for automatic computation of gradients (derivatives) of tensors. It is an essential component for training deep learning models through backpropagation.
1. **Gradient Calculation** - In deep learning, we often need to compute gradients of a loss function with respect to model parameters. Autograd simplifies this process. When you perform operations on tensors that require gradients, PyTorch automatically tracks these operations and constructs a computation graph.

2. **Computation Graph** - A computation graph is a directed acyclic graph (DAG) that represents the sequence of operations applied to tensors. Each operation in the graph is a node, and tensors flowing through these nodes are edges. The graph allows PyTorch to trace how input tensors influence the output tensors, which is crucial for gradient calculation.

3. **Dynamic Computational Graph** - PyTorch uses a dynamic computation graph, which means the graph is built on-the-fly as operations are executed. This dynamic nature allows flexibility and is well-suited for models with varying architectures or inputs of different shapes.

4. **Gradients** - Once you have a computation graph, you can compute gradients by backpropagating through the graph. Gradients represent how a small change in each input tensor would affect the final output. The gradients are computed using the chain rule of calculus, and they indicate the direction and magnitude of parameter updates during optimization.

In [30]:
x = torch.tensor([3.0, 2.0, 3.0], requires_grad=True)  # start tracking gradient

In [31]:
# forward pass. PyTorch records these operations in the computation graph
y = x * 2
z = y.mean()

### Backward pass

To compute gradients, initiate the backward pass using the backward() method on a scalar tensor (usually a loss)

Chain Rule: backThe ward pass uses the chain rule of calculus to calculate the gradients. It starts from the final scalar value z and works backward through the computation graph to compute the gradients of intermediate tensors with respect to the target tensor (x in this case).

It computes ∂z/∂y, which is the gradient of z with respect to y. Then, it computes ∂y/∂x, which is the gradient of y with respect to x

In [32]:
z.backward()

The result of the backward pass is stored in the .grad attribute of the tensors with requires_grad=True. In this case, x.grad will contain the gradient of z with respect to x.

In [39]:
x.grad

tensor([36.])

In [None]:
# Create a tensor with Autograd enabled (requires_grad=True)
x = torch.tensor([2.0], requires_grad=True)

# Perform some operations with Autograd enabled
y = x * 3
z = y ** 2
w = z.mean()

# Compute gradients while Autograd is enabled
w.backward()

# Access the gradient of x
gradient_with_autograd = x.grad

# Print the gradient
print("Gradient with Autograd:", gradient_with_autograd.item())


# Now, let's turn Autograd off for a specific tensor
x.requires_grad_(False)

# Perform operations without Autograd (Autograd is off for x)
y = x * 3
z = y ** 2
w = z.mean()
try:
    # Attempt to compute gradients 
    w.backward()
except:
    print("This tensor does't have require gradients set to True")

Gradient with Autograd: 36.0
This tensor does't have require gradients set to True


### nn.Parameter

In PyTorch, nn.Parameter is a class that is a subclass of the torch.Tensor class. It is specifically designed to be used as a container for tensors that should be considered parameters of a PyTorch nn.Module. Parameters are tensors that are meant to be learned during the training process, such as weights and biases in a neural network.

Why nn.Parameter is useful?

* Requires Grad Calculation: When you create a tensor using nn.Parameter, it is automatically registered as a parameter of the parent module, and PyTorch keeps track of it for gradient computation during backpropagation. This means that any operations involving these tensors will have gradients computed, allowing them to be updated during training using optimization techniques like stochastic gradient descent (SGD).
* Initialization: Parameters created using nn.Parameter are typically initialized with random values (e.g., Gaussian or uniform distribution) by default. However, you can customize the initialization method if needed.
* Access: You can easily access the parameters of a PyTorch module using the parameters() method, which returns an iterable containing all the nn.Parameter objects within the module.

In [42]:
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        # Create an nn.Parameter for weight and bias
        self.weight = nn.Parameter(torch.randn(10, 5))
        self.bias = nn.Parameter(torch.zeros(10))


    def forward(self, x):
        # Use the parameters in the forward pass
        z = torch.matmul(x, self.weight.t()) + self.bias
        return z


# Instantiate the model
model = MyModel()

# Access and print the parameters
for param in model.parameters():
    print(param)

Parameter containing:
tensor([[ 0.3426,  0.2997,  0.0477,  0.9563, -0.8480],
        [-2.2974, -1.8996,  0.8777,  0.4756, -1.2931],
        [-0.3829,  0.3655, -0.4606, -2.3426,  0.7322],
        [-0.0354,  0.4037, -0.9113,  1.4786,  0.5362],
        [-0.0795,  0.3935,  1.7341, -0.0545,  0.2704],
        [-1.1718,  0.6017, -0.2846, -0.2442,  1.0230],
        [-0.8148,  0.9376,  0.7216, -1.5338,  0.6573],
        [ 0.3531, -0.1765,  0.6951,  2.3895,  0.3509],
        [-0.1698, -0.3017, -0.5952,  0.6879,  0.5409],
        [-0.1112, -1.4798, -1.2071, -0.5397,  1.2772]], requires_grad=True)
Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)
