# Deep Learning Development with PyTorch

## Data Preparation
- `Dataset` class defines how to access and prepropcess data from a file or data sources
- `Sampler` class defines how to sample data from a dataset in order to creat batches
- `DataLoader` class combines dataset and sampler and allows you to iterate over a set of batches
- PyTorch libraries (eg. Torchvision, Torchtext) also provide classes to support specialized data like computer vision and natural language data

### Data Loading

In [45]:
from torchvision.datasets import CIFAR10

In [46]:
train_data = CIFAR10(root="./train/", train=True, download=True)

In [47]:
train_data

Dataset CIFAR10
    Number of datapoints: 50000
    Root location: ./train/
    Split: Train

In [48]:
len(train_data)

50000

In [49]:
train_data.data.shape

(50000, 32, 32, 3)

In [50]:
print(train_data.targets)

[6, 9, 9, 4, 1, 1, 2, 7, 8, 3, 4, 7, 7, 2, 9, 9, 9, 3, 2, 6, 4, 3, 6, 6, 2, 6, 3, 5, 4, 0, 0, 9, 1, 3, 4, 0, 3, 7, 3, 3, 5, 2, 2, 7, 1, 1, 1, 2, 2, 0, 9, 5, 7, 9, 2, 2, 5, 2, 4, 3, 1, 1, 8, 2, 1, 1, 4, 9, 7, 8, 5, 9, 6, 7, 3, 1, 9, 0, 3, 1, 3, 5, 4, 5, 7, 7, 4, 7, 9, 4, 2, 3, 8, 0, 1, 6, 1, 1, 4, 1, 8, 3, 9, 6, 6, 1, 8, 5, 2, 9, 9, 8, 1, 7, 7, 0, 0, 6, 9, 1, 2, 2, 9, 2, 6, 6, 1, 9, 5, 0, 4, 7, 6, 7, 1, 8, 1, 1, 2, 8, 1, 3, 3, 6, 2, 4, 9, 9, 5, 4, 3, 6, 7, 4, 6, 8, 5, 5, 4, 3, 1, 8, 4, 7, 6, 0, 9, 5, 1, 3, 8, 2, 7, 5, 3, 4, 1, 5, 7, 0, 4, 7, 5, 5, 1, 0, 9, 6, 9, 0, 8, 7, 8, 8, 2, 5, 2, 3, 5, 0, 6, 1, 9, 3, 6, 9, 1, 3, 9, 6, 6, 7, 1, 0, 9, 5, 8, 5, 2, 9, 0, 8, 8, 0, 6, 9, 1, 1, 6, 3, 7, 6, 6, 0, 6, 6, 1, 7, 1, 5, 8, 3, 6, 6, 8, 6, 8, 4, 6, 6, 1, 3, 8, 3, 4, 1, 7, 1, 3, 8, 5, 1, 1, 4, 0, 9, 3, 7, 4, 9, 9, 2, 4, 9, 9, 1, 0, 5, 9, 0, 8, 2, 1, 2, 0, 5, 6, 3, 2, 7, 8, 8, 6, 0, 7, 9, 4, 5, 6, 4, 2, 1, 1, 2, 1, 5, 9, 9, 0, 8, 4, 1, 1, 6, 3, 3, 9, 0, 7, 9, 7, 7, 9, 1, 5, 1, 6, 6, 8, 7, 1, 3, 0, 

In [51]:
print(train_data.classes)

['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']


In [52]:
train_data.class_to_idx

{'airplane': 0,
 'automobile': 1,
 'bird': 2,
 'cat': 3,
 'deer': 4,
 'dog': 5,
 'frog': 6,
 'horse': 7,
 'ship': 8,
 'truck': 9}

### Data Transforms

In [53]:
from torchvision import transforms

In [54]:
train_transforms = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=(0.4914, 0.4822, 0.4465),
        std=(0.2023, 0.1994, 0.2010)
    )
])

In [55]:
train_data = CIFAR10(
        root="./train/",
        train=True,
        download=True,
        transform=train_transforms
    )

In [56]:
train_data

Dataset CIFAR10
    Number of datapoints: 50000
    Root location: ./train/
    Split: Train
    StandardTransform
Transform: Compose(
               RandomCrop(size=(32, 32), padding=4)
               RandomHorizontalFlip(p=0.5)
               ToTensor()
               Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.201))
           )

In [57]:
data, label = train_data[0]

In [58]:
type(data)

torch.Tensor

In [59]:
data.size()

torch.Size([3, 32, 32])

In [60]:
data

tensor([[[-2.4291, -2.4291, -2.4291,  ..., -2.4291, -2.4291, -2.4291],
         [-2.4291, -2.4291, -2.4291,  ..., -2.4291, -2.4291, -2.4291],
         [-2.4291, -2.4291, -2.4291,  ..., -2.4291, -2.4291, -2.4291],
         ...,
         [ 0.1879, -0.0835,  0.0328,  ..., -2.4291, -2.4291, -2.4291],
         [ 0.8858,  0.1297,  0.0910,  ..., -2.4291, -2.4291, -2.4291],
         [ 1.5061,  0.3430,  0.2267,  ..., -2.4291, -2.4291, -2.4291]],

        [[-2.4183, -2.4183, -2.4183,  ..., -2.4183, -2.4183, -2.4183],
         [-2.4183, -2.4183, -2.4183,  ..., -2.4183, -2.4183, -2.4183],
         [-2.4183, -2.4183, -2.4183,  ..., -2.4183, -2.4183, -2.4183],
         ...,
         [-0.4122, -0.6089, -0.3532,  ..., -2.4183, -2.4183, -2.4183],
         [ 0.1581, -0.6482, -0.5892,  ..., -2.4183, -2.4183, -2.4183],
         [ 0.8661, -0.3139, -0.4319,  ..., -2.4183, -2.4183, -2.4183]],

        [[-2.2214, -2.2214, -2.2214,  ..., -2.2214, -2.2214, -2.2214],
         [-2.2214, -2.2214, -2.2214,  ..., -2

### Data Batching

In [61]:
import torch

In [62]:
trainloader = torch.utils.data.DataLoader(
    train_data,
    batch_size=16,
    shuffle=True
)

In [63]:
data_batch, labels_batch = next(iter(trainloader))

In [64]:
data_batch.size()

torch.Size([16, 3, 32, 32])

In [65]:
labels_batch.size()

torch.Size([16])

In [66]:
test_data = CIFAR10(
    root="./test/",
    train=False,
    download=True
)

100%|██████████| 170M/170M [00:16<00:00, 10.3MB/s] 


In [67]:
testloader = torch.utils.data.DataLoader(
    test_data,
    batch_size=16,
    shuffle=False # since this is test no need to shuffle for repeatability
)

simplest way to create own dataset class is to 
- subclass `torch.utils.data.Dataset` class
- override
    - `getitem()`
    - `len()`

## Model Development

### Model Design

#### Using existing and pretrained models
- Where to get existing and pretrained models
    - [TorchVision Model Documentation](https://docs.pytorch.org/vision/stable/models.html)
    - [PyTorch Hub](https://pytorch.org/hub/)
        - load with `torch.hub.load()`
- VGG16 (aka OxfordNet) 
    - convolutional neural network by Visual Geometry Group from Oxford
    - large scale visual recognition challenge in 2014 with 92.7% top-5 accuracy
    - consist of 3 parts
        - features
        - avgpool
        - classifier

In [23]:
from torchvision import models

In [24]:
# downloads to ~/.cache/torch/hub/checkpoints/vgg16-xxx.pth
vgg16 = models.vgg16(pretrained=True)



In [25]:
vgg16.features

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (6): ReLU(inplace=True)
  (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (8): ReLU(inplace=True)
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace=True)
  (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (13): ReLU(inplace=True)
  (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (15): ReLU(inplace=True)
  (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (17): Conv2d(256, 512, kernel_si

In [26]:
vgg16.avgpool

AdaptiveAvgPool2d(output_size=(7, 7))

In [27]:
vgg16.classifier

Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=4096, out_features=1000, bias=True)
)

#### PyTorch NN module
- use `torch.nn` to create model
- typically 
    - create a new class that inherits `nn.Module`
    - define `__init__()` that creates layers as attributes
    - define `forward()` that defines how data is passed through our model

In [28]:
import torch.nn as nn 
import torch.nn.functional as F

In [29]:
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(2048, 256)
        self.fc2 = nn.Linear(256, 64)
        self.fc3 = nn.Linear(64, 2)
    
    def forward(self, x):
        x = x.view(-1, 2048)    # reshape the input into 2048-element vector, -1: tells PyTorch to infer the batch size
        x = F.relu(self.fc1(x)) # apply ReLU activation function to output of fc1
        x = F.relu(self.fc2(x)) # apply ReLU activation function to output of fc2
        x = F.softmax(self.fc3(x),dim=1) # apply SoftMax, dim=1 specifies applied on 2nd dimension, ie. output features
        return x

In [30]:
# to use
simplenet = SimpleNet()

In [31]:
simplenet

SimpleNet(
  (fc1): Linear(in_features=2048, out_features=256, bias=True)
  (fc2): Linear(in_features=256, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=2, bias=True)
)

In [32]:
input = torch.rand(2048) # create random 2048-element tensor input
output = simplenet(input)

In [33]:
output

tensor([[0.5222, 0.4778]], grad_fn=<SoftmaxBackward0>)

##### PyTorch NN 
- containers
    - Module, Sequential, ModuleList, ...
- linear layers
    - Identity, Linear, BiLinear
- convolutional layers
    - Conv1d, Conv2d, ...
- pooling layers
    - MaxPool1d, MaxPool2d, ...
- padding layers
    - ZeroPad2d, ReflectionPad1d, ...
- dropout layers
    - Dropout, Dropout2d, ...
- normalization layers
    - BatchNorm1d, BatchNorm2d, ...
- recurrent layers
    - RNNBase, RNN, LSTM, GRU, ...
- transformer layers
    - Transfomer, TransformerEncoder, TransformerDecoder, ...
- sparse layers and distance functions
    - Embedding, CosineSimilarity, PairwiseDistance, ...
- vision layers
    - PixelShuffle, Upsample, UpsamplingNearest2d, ..
- nonlinear activations
    - ReLU, LeakyReLU, Softmax, Sigmoid, ...

### Training

In [34]:
from torch import optim
from torch import nn
import torch.nn.functional as F

In [43]:
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    
    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)),(2,2))
        x = F.max_pool2d(F.relu(self.conv2(x)),2)
        x = x.view(-1, int(x.nelement()/x.shape[0]))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

device = ('cuda' if torch.cuda.is_available() else 'cpu')
model = LeNet5().to(device=device)

In [40]:
criterion = nn.CrossEntropyLoss() # Frequently used for Classification problem

In [41]:
optimizer = optim.SGD(  # Stochastic Gradient Descent optimizer
    model.parameters(), # pass model.parameters for your model
    lr=0.001,
    momentum=0.9
)

In [44]:
# Typical Training loop
N_EPOCHS = 10
for epoch in range (N_EPOCHS):              # loop over 10 epochs
    epoch_loss = 0.0
    for inputs, labels in trainloader:
        inputs = inputs.to(device)          # move to GPU if availaable
        labels = labels.to(device)

        optimizer.zero_grad()               # zero out gradients before each backpropagation or they'll accumulate

        outputs = model(inputs)             # perform forward pass
        loss = criterion(outputs, labels)   # compute loss
        loss.backward()                     # perform backpropagation; compute gradients
        optimizer.step()                    # adjust parameters based on gradients

        epoch_loss += loss.item()           # accumulate batch loss so we can average over the epoch
    
    print("Epoch: {} Loss: {}".format(epoch, epoch_loss/len(trainloader)))

Epoch: 0 Loss: 2.305431913909912
Epoch: 1 Loss: 2.305394929046631
Epoch: 2 Loss: 2.3054397127532957
Epoch: 3 Loss: 2.305402153778076
Epoch: 4 Loss: 2.305447946624756
Epoch: 5 Loss: 2.30545363243103


KeyboardInterrupt: 

- Loss Functions
    - nn.CrossEntropyLoss(), BCELoss(), ...
- Optimizer algorithms
    - [torch.optim docs](https://docs.pytorch.org/docs/stable/optim.html)
    - optim.SGD(), optim.AdamW(), ...



> `CrossEntropyLoss()` includes softmax classifiction, do not include `Softmax()` in output layer !!

### Validation

In [None]:
from torch.utils.data import random_split

In [None]:
train_set, val_set = random_split(
    train_data,
    [40000, 10000]
)

In [None]:
trainloader = torch.utils.data.DataLoader(
    train_set,
    batch_size=16,
    shuffle=True
)

In [None]:
len(trainloader) # 40,000/16 = 2500

2500

In [None]:
valloader = torch.utils.data.DataLoader(
    val_set,
    batch_size=16,
    shuffle=True
)

In [None]:
len(valloader) # 10,000/16 = 625

625

In [None]:
# Typical Training loop with validation
N_EPOCHS = 10
for epoch in range (N_EPOCHS):              

    # Training
    train_loss = 0.0
    model.train()                           # configure model for training
    for inputs, labels in trainloader:
        inputs = inputs.to(device)          
        labels = labels.to(device)

        optimizer.zero_grad()               

        outputs = model(inputs)             
        loss = criterion(outputs, labels)   
        loss.backward()                     
        optimizer.step()                   

        train_loss += loss.item()           
    
    # Validation
    val_loss = 0.0
    model.eval()                            # configure model for testing
    for inputs, labels in valloader:
        inputs = inputs.to(device)          
        labels = labels.to(device)

        outputs = model(inputs)
        loss = criterion(outputs, labels)

        val_loss += loss.item()
    
    print("Epoch: {} Train Loss: {} Val Loss: {}".format(epoch, train_loss/len(trainloader), val_loss/len(valloader)))

How to interpret
- if loss decreases for validation data, then model is doing well
- if training loss decreases but validation does not, likely model is overfitting

### Testing

In [None]:
num_correct = 0.0
for x_test_batch, y_test_batch in testloader:
        model.eval()                                                        # set model to evaluation for testing
        y_test_batch = y_test_batch.to(device)
        x_test_batch = x_test_batch.to(device)
        y_pred_batch = model(x_test_batch)                                  # predict outcome for each batch
        _, predicted = torch.max(y_pred_batch, 1)                           # select class index with highest probability
        num_correct += (predicted == y_test_batch).float().sum()            # compare prediction to true labe and count num of correct predictions
        accuracy = num_correct/(len(testloader) * testloader.batch_size)    # compute % of correct predictions

        print(len(testloader), testloader.batch_size)

        print("Test Accuracy: {}".format(accuracy))

## Model Deployment

### Saving Models
- uses `state_dict()` method to create a dictionary object that maps each layer to its parameter tensor 
    - ie. model's learnt parameters
- no need to save the arch 
    - will use constructor to create a blank model and `load_state_dict()` to set parameters

In [None]:
# to save
torch.save(model.state_dict(), "./lenet5_model.pt")

In [None]:
# to use
model = LeNet5().to(device)
model.load_state_dict(torch.load("./lenet5_model.pt"))

### Deploying to PyTorch Hub

In [None]:
# to load from PyTorch Hub
import torch
vgg16 = torch.hub.load('pytorch/vision', 'vgg16', pretrained=True)

if you create VGG16 and wanted to deploy to PyTorch
- need to include file `hubconf.py` in the root of your GitHub repository
    - set `torch` as dependency 
    - any function defined in this file will act as endpoint

[torch.hub docs](https://docs.pytorch.org/docs/stable/hub.html)

In [None]:
# hubconf.py

dependencies = ['torch', 'torchvision'] # List of required packages

def my_model(pretrained=False, **kwargs):
    """
    Loads a pre-trained MyModel.
    Args:
        pretrained (bool): If True, loads pre-trained weights.
    """
    model = MyModel(**kwargs)
    if pretrained:
        # Load pre-trained weights from a URL or local path
        state_dict = torch.hub.load_state_dict_from_url("https://example.com/mymodel_weights.pth")
        model.load_state_dict(state_dict)
    return model

### Deploying to Production
- refer to chapter 7