# Course 1: Introduction to PyTorch

PyTorch is a great framework for machine learning. It is a python package that provides two high-level features:

1. Tensor computation (like numpy) with strong GPU acceleration
2. Deep Neural Networks built on a tape-based autograd system

You can reuse your favorite python packages such as numpy, scipy, etc. using PyTorch. PyTorch also provides a module named `torch.nn` that provides a higher level of abstraction over raw computational graphs created by using `torch.autograd`.

In this course, we will cover the basics of PyTorch and build a simple neural network to classify handwritten digits through MNIST dataset.

## Installation

You can install PyTorch using pip. You can find the installation instructions on the official website of PyTorch. Here is the link to the installation page: [PyTorch Installation](https://pytorch.org/get-started/locally/). However, it is recommended to create a virtual environment before installing PyTorch via `conda`, and install the required packages.

### Installation of Anaconda

You can download Anaconda from the following link: [Anaconda Download](https://www.anaconda.com/products/distribution). After downloading, you can install Anaconda by following the instructions provided on the official website.

It is worthwhile to figure out that it is not recommended to install packages directly on `base` environment. Instead, you can create a new environment and install the required packages in that environment:


In [None]:
!conda create --name training python=3.11
!conda activate training

### Installation of PyTorch

If you have Nvidia GPU, you can install PyTorch with GPU support. Otherwise, you can install PyTorch without GPU support. (Apple MPS is supported natively with `CPU` package, but CUDA is not directly included but requires extra installation).

## PyTorch Basics

The basic building block of PyTorch is a tensor. A tensor is a number, vector, matrix, or any n-dimensional array. You can create a tensor using the `torch` module. Here is an example of creating a tensor:

In [1]:
import torch

# Create a tensor
x = torch.tensor([5.5, 3])

print(x)

# Get the size of the tensor
print(x.size())

tensor([5.5000, 3.0000])
torch.Size([2])


### Operations on Tensors

You can perform operations on tensors. Here is an example of performing operations on tensors:

In [2]:
# Create a tensor
x = torch.tensor([5.5, 3])

# Create another tensor
y = torch.tensor([2, 1])

# Add two tensors
z = x + y
print(z)

# Multiply two tensors
z = x * y
print(z)

# Multiply a tensor by a scalar
z = x * 2
print(z)

tensor([7.5000, 4.0000])
tensor([11.,  3.])
tensor([11.,  6.])


### Broadcasting

Broadcasting is a powerful mechanism that allows PyTorch to work with tensors of different shapes. Here is an example of broadcasting:

In [3]:
# Create a tensor
x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Create another tensor
y = torch.tensor([1, 0, -1])

# Add two tensors
z = x + y
print(z)

tensor([[2, 2, 2],
        [5, 5, 5],
        [8, 8, 8]])


It is clear that when adding `y` to `x`, `y` is broadcasted to the shape of `x`. This is a powerful mechanism that allows PyTorch to work with tensors of different shapes.

## Autograd: Automatic Differentiation

Central to all neural networks in PyTorch is the `autograd` package. The `autograd` package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backpropagation is defined by how your code is run, and that every single iteration can be different.

### Tensor

`torch.Tensor` is the central class of the package. If you set its attribute `.requires_grad` as `True`, it starts to track all operations on it. When you finish your computation, you can call `.backward()` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into `.grad` attribute.

In [5]:
# Create a tensor
x = torch.tensor([5.5, 3], requires_grad=True)

# Perform operations on the tensor
y = x + 2

# Perform more operations on the tensor
z = y * y * 2

# Compute the mean of the tensor
out = z.mean()

print(z, out)

# Compute the gradients
out.backward()

# Print the gradients
print(x.grad)

tensor([112.5000,  50.0000], grad_fn=<MulBackward0>) tensor(81.2500, grad_fn=<MeanBackward0>)
tensor([15., 10.])


### Vector-Jacobian Product

In the previous example, we computed the gradients of the tensor. However, what if we have a vector-Jacobian product? In PyTorch, you can compute the vector-Jacobian product using the `backward` function. Here is an example of computing the vector-Jacobian product:

In [6]:
# Create a tensor
x = torch.randn(3, requires_grad=True)

# Perform operations on the tensor
y = x * 2

# Compute the norm of the tensor
while y.data.norm() < 1000:
    y = y * 2

# Create a vector
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)

# Compute the vector-Jacobian product
y.backward(v)

# Print the gradients
print(x.grad)

tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])


### Stop Autograd

By default, all tensors have `requires_grad=True`. However, if you want to stop autograd, you can use the `torch.no_grad()` block. Here is an example of stopping autograd:

In [7]:
# Create a tensor
x = torch.randn(3, requires_grad=True)

# Perform operations on the tensor
print(x.requires_grad)

# Stop autograd
with torch.no_grad():
    print((x ** 2).requires_grad)

True
False


## Neural Networks

Neural networks can be constructed using the `torch.nn` package. An `nn.Module` contains layers, and a method `forward(input)` that returns the `output`, and there is a `functional` subpackage that contains many useful functions like `ReLU`, `Sigmoid`, etc.

Since I am using Apple MacBook Pro with M2 Pro chip, I will use `mps` as the device. You can use `cuda` if you have an Nvidia GPU.

Here is an example of constructing a neural network:

In [12]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

device = torch.device("mps")

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16 * 6 * 6, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    
    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net().to(device)

print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


The `forward` function defines the computation performed at every call. You can use any tensor operation in the `forward` function.

The learnable parameters of a model are returned by `net.parameters()`. Here is an example of getting the learnable parameters of the model:

In [13]:
params = list(net.parameters())
print(len(params))
print(params[0].size())

10
torch.Size([6, 1, 3, 3])


### Loss Function

A loss function takes the `(output, target)` pair of inputs and computes a value that estimates how far away the output is from the target. There are several different loss functions under the `torch.nn` package. Here is an example of computing the loss function:

In [14]:
input = torch.randn(1, 1, 32, 32).to(device)
output = net(input)
target = torch.randn(10).to(device)
target = target.view(1, -1)
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(0.2022, device='mps:0', grad_fn=<MseLossBackward0>)


### Backpropagation

To backpropagate the error, we need to call the `loss.backward()`. However, before calling `loss.backward()`, the gradients of the model should be zeroed. Here is an example of backpropagation:

In [None]:
net.zero_grad()

### Optimizing the Neural Network

The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):

In [None]:
learning_rate = 0.01
optimizer = optim.SGD(net.parameters(), lr=learning_rate)

Here is an example of optimizing the neural network:


In [None]:
optimizer.zero_grad()
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()

## Training a Neural Network

Training a neural network typically consists of the following steps:

1. Load the data into `DataLoader`, especially transform the data into a tensor.
2. Define the neural network model.
3. Define the loss function.
4. Define the optimizer.
5. Train the neural network.
6. Evaluate the neural network.

Before training the network via MNIST, let us get started with `torchvision` package.