<a href="https://colab.research.google.com/github/Kabilan108/courses/blob/main/pytorch/pytorch_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pytorch Basics

In [1]:
import torch
import torchvision
import torch.nn as nn
import numpy as np
import torchvision.transforms as transforms

## Basic Autograd

**`torch.autograd`** is a *automatic differentiation engine* that powers neural network training. 

### Example 1

In [2]:
# Create tensors
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

# Build a computational graph
y = w * x + b  # y = 2 * x + 3

# Compute gradients - backprop
y.backward()

# Print gradients
print(x.grad)  # dy/dx
print(w.grad)  # dy/dw
print(b.grad)  # dy/db

tensor(2.)
tensor(1.)
tensor(1.)


### Example 2 

The code below creates a fully connected (linear) layer with 3 input features
and 2 output features.

- `linear.weight`: weight matrix responsible for the linear transformation in the
  fully connected layer. It's shape is determined by the input and output size.
  In this case, the weight matrix is of shape `(2, 3)`.

The linear transformation can be represented as $y = x * w^T$ where `x` is the
input tensor with shape (10,3), `w` is the weight matrix with shape (2,3). The
resulting output tensor will have the shape (10,2).

- `linear.bias`: bias vector added to the result of the linear transformation
  to add an additional degree of freedom to the model. The bias vector has the 
  same size as the output size of the layer. In this case, the output size is 2 . \
  Therefore, the shape of the bias vector is (2,).

The complete linear operation can be represented as $y = x * w^T + b$

In [3]:
# Create tensors of shape (10, 3) and (10, 2)
x = torch.randn(10, 3)
y = torch.randn(10, 2)

# Build a fully connected layer
linear = nn.Linear(3, 2)
print('w: ', linear.weight)
print('b:' , linear.bias)

w:  Parameter containing:
tensor([[-0.2014, -0.1715,  0.2293],
        [ 0.0206,  0.0352, -0.1699]], requires_grad=True)
b: Parameter containing:
tensor([ 0.1769, -0.4332], requires_grad=True)


In [6]:
# Build loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

# Forward pass
pred = linear(x)

# Compute loss
loss = criterion(pred, y)
print('loss: ', loss.item())

# Backward pass
loss.backward()

# Print out the gradients
print('dL/dw: ', linear.weight.grad)
print('dL/db: ', linear.bias.grad)

# 1-step gradient descent
optimizer.step()

loss:  1.1401374340057373
dL/dw:  tensor([[-0.7317, -0.5240, -0.5730],
        [ 1.2047, -0.8886,  0.6025]])
dL/db:  tensor([ 0.3424, -1.4678])


You can also perform gradient descent at the low level.  

```python
linear.weight.data.sub_(0.01 * linear.weight.grad.data)
linear.bias.data.sub_(0.01 * linear.bias.grad.data)
```

In [9]:
# Print out the loss after 1-step gradient descent
pred = linear(x)
loss = criterion(pred, y)
print('loss after 1 step optimization: ', loss.item())

loss after 1 step optimization:  1.1106849908828735


# Loading data from numpy

In [10]:
# Create a numpy array
x = np.array([[1, 2], [3, 4]])

# Convert the numpy array to a torch tensor
y = torch.from_numpy(x)

# Convert the torch tensor to a numpy array
z = y.numpy()

# Input pipeline

Constructing the input pipeline for PyTorch involves:

1. **Data Loading** - using builtin classes like `torchvision.datasets` and `torchtext.datasets` to download and construct datasets from benchmark datasets or custom datasets.

2. **Data Preprocessing** - use `transforms` module to apply preprocessing operations such as normalization, data augmentation, resizing, etc; can be composed into a single pipeline with `transofrms.Compose`

3. **Batching** - During training, it is common to process data in mini-batches to enable efficient GPU computation. The `torch.utils.data.DataLoader` class allows you to create an iterator that efficiently loads data in mini-batches. 

4. **Parallel Data Loading** - allows for paralel data using multiple worker threads to load data in parallel. Number of worker threads can be defined with `num_workers` in the data loader.

5. **Custom Datasets** - If working with a custom dataset, you can create a custom dataset class by inheriting from `torch.utils.data.Dataset` and implementing the **`__len__`** and **`__getitem__`** methods. The `__len__` method should return the number of samples in the dataset and the `__getitem__` method should return a single sample from the dataset.

6. **Handling Data Imbalance** - `torch.utils.data.sampler` creates custom sampling strategies to handle imbalanced data such as `WeightedRandomSampler` and `SubsetRandomSampler`

> The `DataLoader` is used to create a data loader for efficiently loading mini-batches during training. The resulting object is an iterator that can be used to loop over mini-batches of images and labels during training.

In [12]:
# Download and construct CIFAR-10 dataset
train_dataset = torchvision.datasets.CIFAR10(
    root='../data',
    train=True,
    transform=transforms.ToTensor(),
    download=True
)

# Fetch one data pair (read data from disk)
image, label = train_dataset[0]
print(image.size())
print(label)

# Data loader (this provides queues and threads in a very simple way)
# The dataloader 
train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=64,
    shuffle=True
)

# When iteration starts, queue and thread start to load data from files
data_iter = iter(train_loader)

# Mini-batch images and labels
images, labels = next(data_iter)

# Actual usage of the data loader is as below:
for images, labels in train_loader:
    # Training code goes here
    pass

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ../data\cifar-10-python.tar.gz


100.0%


Extracting ../data\cifar-10-python.tar.gz to ../data
torch.Size([3, 32, 32])
6


# Input Pipeline for Custom Dataset

In [13]:
# You should build your custom dataset as below.
class CustomDataset(torch.utils.data.Dataset):
    def __init__(self):
        # TODO:
        #   1. Initialize file paths or a list of file names
        pass

    def __getitem__(self, index):
        # TODO
        #   1. Read one data from file (e.g. using numpy.fromfile, PIL.Image.open)
        #   2. Preprocess the data (e.g. torchvision.Transform).
        #   3. Return a data pair (e.g. image and label)
        pass

    def __len__(self):
        # You should change 0 to the total size of your dataset.
        return 1

# You can then use the prebuilt data loader
custom_dataset = CustomDataset()
train_loader = torch.utils.data.DataLoader(
    dataset=custom_dataset,
    batch_size=64,
    shuffle=True
)

# Pretrained Model

In [17]:
# Download and load the pretrained ResNet-18
resnet = torchvision.models.resnet18(pretrained=True)

# If you want to finetune only the top layer of the model, set as below
for param in resnet.parameters():
    param.requires_grad = False

# Replace the top layer for finetuning
resnet.fc = nn.Linear(resnet.fc.in_features, 100)

# Forward pass
images = torch.randn(64, 3, 224, 224)
outputs = resnet(images)
print(outputs.size())

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to C:\Users\to561778/.cache\torch\hub\checkpoints\resnet18-f37072fd.pth
100.0%


torch.Size([64, 100])


# Save and Load the Model

In [18]:
# Save and load the entire model
torch.save(resnet, 'model.pth')
model = torch.load('model.pth')

# Save and load only the model parameters (recommended - smaller file size)
torch.save(resnet.state_dict(), 'params.pth')
resnet.load_state_dict(torch.load('params.pth'))

<All keys matched successfully>