<a href="https://colab.research.google.com/github/beamscource/colab_notebooks/blob/main/pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Summary based on Programming PyTorch, NLP with PyTorch and https://deeplearning.cs.cmu.edu/F22/index.html.

Additional resources at https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial2/Introduction_to_PyTorch.html

# Introduction

PyTorch is a Python library which offers an **eager approach to differentiation** instead of defining static graphs, allowing for greater flexibility in the way networks are created, trained, and operated.

Similar to DyNet and Chainer, and in contrast to static frameworks like TensorFlow/Theano/Caffe, models are not compiled before execution. 

PyTorch has two lineages. First, it derives many features and concepts from Torch, which was a Lua-based neural network library that dates back to 2002. Its other major parent is Chainer, created in Japan in 2015.

The library also comes with modules that help with manipulating text, images, and audio (*torchtext*, *torchvision*, and *torchaudio*), along with built-in variants of popular architectures such as ResNet (with weights that can be downloaded to provide assistance with *transfer learning*).

In 2022, about 85% of pre-trained models on HuggingFace are PyTorch models (https://www.assemblyai.com/blog/pytorch-vs-tensorflow-in-2022/). Despite the fact that PyTorch is used by companies like Twitter, Salesforce, Tesla, Uber, and NVIDIA, the consensus seems to be that TF still offers better native deployment capabilities and that tf.keras might be better suited for a complete beginner.

All code examples can be found at https://github.com/falloutdurham/beginners-pytorch-deep-learning For more infos and tutorials see https://pytorch.org/hub/

# Tensors and Matrix Algebra

Tensors are objects ("multidimensional arrays" or matrices) which hold numerical data of a single type used to propagate through the network. For example, a 1st-order tensor is a vector (one dimensional array) and 2nd-order tensor is a matrix. If you are coming from Matlab, this feels very familiar.

In [None]:
import torch
import numpy as np

In [None]:
# creating a tensor from Python lists
x = torch.tensor([[0,0,1],[1,1,1],[0,0,0]])
x

tensor([[0, 0, 1],
        [1, 1, 1],
        [0, 0, 0]])

In [None]:
x.type()

'torch.FloatTensor'

In [None]:
# investigating the size of a tensor
x.shape

In [None]:
# or
x.size()

torch.Size([1, 3])

In [None]:
# helper function to investigate a tensor
def describe(x):
  print(f"Type: {x.type()}")
  print(f"Shape/size: {x.shape}")
  print(f"Values: \n{x}")

In [None]:
describe(torch.Tensor(2, 3))
describe(torch.rand(2, 3))   # uniform randomdescribe
describe(torch.randn(2, 3))  # random normal

In [None]:
# creating tensors filled with ones or zeros (don't have tensor keyword)
describe(torch.zeros(2, 3))
describe(torch.ones(2, 3))

In [None]:
# creating a tensor from NumPy array
# the type of the created tensor is DoubleTensor which corresponds to NumPy
# float64 matrix
npy  =  np.random.rand(2,  3)
describe(torch.from_numpy(npy))
# or
describe(torch.as_tensor(npy))
npy

torch.tensor() always copies data. If you have a numpy array and want to avoid a copy, use torch.as_tensor().

*Different types*

The default  tensor  type  when  using  torch.Tensor  constructor  is  a torch.FloatTensor. But it's possible to convert it to float,  long,  double format  by  specifying  it  at the initialization  or  using  one  of  the typecasting  methods.

See more infos at https://pytorch.org/docs/stable/tensors.html

In [None]:
# using dtype at initialization
torch.zeros([2, 4], dtype=torch.int32)

tensor([[0, 0, 0, 0],
        [0, 0, 0, 0]], dtype=torch.int32)

In [None]:
# calling a specific constructor at initialization
x = torch.FloatTensor([[1, 2, 3],
                   [4,5,6]])

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [None]:
# taypecasting
x.long()

*Indexing and slicing*

In [None]:
# creating a tensor with a short-cut
x = torch.arange(6).view(2, 3)
x

tensor([[0, 1, 2],
        [3, 4, 5]])

In [None]:
# indexing into a tensor works like in hierarchical lists (standard Python)
x[0][1:3]

tensor([1, 2])

In [None]:
# but also like in NumPy
x[0,0]

tensor([0, 1])

In [None]:
# access the first two elements in the first row (indexing is starting at zero)
# take from the row at index zero all elements until the element at index 2
x[0, :2] 

tensor([0, 1])

In [None]:
x[1,1:]

tensor([4, 5])

In [None]:
# access scalar values from a single-element tensor
torch.rand(1).item()


0.5620666146278381

In [None]:
# replace all elements of a tensor
x = torch.ones(4,8)
x.fill_(5)
x

Any PyTorch method with an underscore (_) refers to an inplace operation; that is, it modifies the content in place without creating a new object.

In [None]:
# indexing using PyTorch functions
# indices have to be of the type LongTensor
print(x)
indices = torch.LongTensor([0, 0])
# joining first row into a new tensor
describe(torch.index_select(x, dim=0, index=indices))

tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan, 4.7399e+16],
        [4.4721e+21, 1.5956e+25, 4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 6])
Values: 
tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan, 4.7399e+16],
        [1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan, 4.7399e+16]])


In [None]:
# extracting non-contogous elements by passing tensors as indices
print(x)
row_indices = torch.arange(2).long() # take from rows zero and one
col_indices = torch.LongTensor([0, 1]) # take from colums zero and one
print(row_indices)
print(col_indices)
describe(x[row_indices, col_indices])

tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan, 4.7399e+16],
        [4.4721e+21, 1.5956e+25, 4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]])
tensor([0, 1])
tensor([0, 1])
Type: torch.FloatTensor
Shape/size: torch.Size([2])
Values: 
tensor([1.4068e-34, 1.5956e+25])


*Concatenating*

In [None]:
# on columns
print(x)
describe(torch.cat([x, x], dim=0))

tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan, 4.7399e+16],
        [4.4721e+21, 1.5956e+25, 4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]])
Type: torch.FloatTensor
Shape/size: torch.Size([4, 6])
Values: 
tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan, 4.7399e+16],
        [4.4721e+21, 1.5956e+25, 4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00],
        [1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan, 4.7399e+16],
        [4.4721e+21, 1.5956e+25, 4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]])


In [None]:
# on rows
print(y)
describe(torch.cat([y, y], dim=1))

tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00],
        [       nan, 4.7399e+16, 4.4721e+21, 1.5956e+25],
        [4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]])
Type: torch.FloatTensor
Shape/size: torch.Size([3, 8])
Values: 
tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00, 1.4068e-34, 0.0000e+00,
         3.3631e-44, 0.0000e+00],
        [       nan, 4.7399e+16, 4.4721e+21, 1.5956e+25,        nan, 4.7399e+16,
         4.4721e+21, 1.5956e+25],
        [4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00, 4.7399e+16, 2.3868e-06,
         1.4838e-41, 0.0000e+00]])


In [None]:
# to keep tensors as separated elements
print(y)
describe(torch.stack([y, y], dim=0))

tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00],
        [       nan, 4.7399e+16, 4.4721e+21, 1.5956e+25],
        [4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3, 4])
Values: 
tensor([[[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00],
         [       nan, 4.7399e+16, 4.4721e+21, 1.5956e+25],
         [4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]],

        [[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00],
         [       nan, 4.7399e+16, 4.4721e+21, 1.5956e+25],
         [4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]]])


In [None]:
print(y)
describe(torch.stack([y, y], dim=1))

tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00],
        [       nan, 4.7399e+16, 4.4721e+21, 1.5956e+25],
        [4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]])
Type: torch.FloatTensor
Shape/size: torch.Size([3, 2, 4])
Values: 
tensor([[[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00],
         [1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00]],

        [[       nan, 4.7399e+16, 4.4721e+21, 1.5956e+25],
         [       nan, 4.7399e+16, 4.4721e+21, 1.5956e+25]],

        [[4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00],
         [4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]]])


*Manipulating tensors' dimensions*

In [None]:
# change dimensions of the tensor
x = torch.Tensor(2,6)
print(x)
# view is not changing the original tensor
# you have to assign it to a new tensor
x.view(3, 4)

tensor([[1.4062e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan, 1.5975e-43],
        [4.4721e+21, 1.5956e+25, 4.7399e+16, 2.3868e-06, 1.4838e-41, 1.5695e-43]])


tensor([[1.4062e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00],
        [       nan, 1.5975e-43, 4.4721e+21, 1.5956e+25],
        [4.7399e+16, 2.3868e-06, 1.4838e-41, 1.5695e-43]])

In [None]:
print(x)
y = x.view(3, 4)
print(y)

tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan, 4.7399e+16],
        [4.4721e+21, 1.5956e+25, 4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]])
tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00],
        [       nan, 4.7399e+16, 4.4721e+21, 1.5956e+25],
        [4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]])


In [None]:
# or if you want to operate on non-contigous tensors
print(x)
y = x.reshape(3, 4)
print(y)

tensor([[1.4062e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan, 1.5975e-43],
        [4.4721e+21, 1.5956e+25, 4.7399e+16, 2.3868e-06, 1.4838e-41, 1.5695e-43]])
tensor([[1.4062e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan, 1.5975e-43],
        [4.4721e+21, 1.5956e+25, 4.7399e+16, 2.3868e-06, 1.4838e-41, 1.5695e-43]])


In [None]:
# transposing a tensor (columns become rows)
torch.transpose(y, 0, 1)

tensor([[1.4068e-34,        nan, 4.7399e+16],
        [0.0000e+00, 4.7399e+16, 2.3868e-06],
        [3.3631e-44, 4.4721e+21, 1.4838e-41],
        [0.0000e+00, 1.5956e+25, 0.0000e+00]])

In [None]:
# re-arrange dimensions of a tensor
x = torch.rand(640, 480, 3)
y = x.permute(2,0,1)
y.shape

torch.Size([3, 640, 480])

*Operations on tensors*

In [None]:
# element-wise additon with mathematical symbols
torch.ones(1,2) + torch.ones(1,2)

tensor([[2., 2.]])

In [None]:
# or with built-in methods
torch.add(torch.ones(1,2), torch.ones(1,2))

tensor([[2., 2.]])

In [None]:
# summing alog the colums
print(y)
describe(torch.sum(y, dim=0))

tensor([[1.4068e-34, 0.0000e+00, 3.3631e-44, 0.0000e+00],
        [       nan, 4.7399e+16, 4.4721e+21, 1.5956e+25],
        [4.7399e+16, 2.3868e-06, 1.4838e-41, 0.0000e+00]])
Type: torch.FloatTensor
Shape/size: torch.Size([4])
Values: 
tensor([       nan, 4.7399e+16, 4.4721e+21, 1.5956e+25])


In [None]:
# or rows
describe(torch.sum(y, dim=1))

Type: torch.FloatTensor
Shape/size: torch.Size([3])
Values: 
tensor([1.4068e-34,        nan, 4.7399e+16])


In [None]:
# matrix multiplication
describe(torch.mm())

In [None]:
# access the max value
x.max().item()

1.0

*Broadcasting*

Borrowed from NumPy, broadcasting allows to perform operations between a tensor and a smaller tensor. You can broadcast across two tensors if, starting backward from their trailing dimensions: 
- the two dimensions are equal
- one of the dimensions is 1

*GPU vs CPU tensors*

By default, PyTorch tensors are created to be used by a CPU.

In [None]:
cpu_tensor = torch.rand(2)
cpu_tensor.device

device(type='cpu')

When doing linear algebra operations it make sense to utilize a GPU. To use a GPU, you need to first allocate the tensor on the GPU’s memory. Access to GPUs is provided via CUDA API that was created by NVIDIA and is limited to use only NVIDIA GPUs.

In [None]:
# in colab you need to change runtime environment for this to work
# transfer a tensor to a GPU
gpu_tensor = cpu_tensor.to("cuda")
gpu_tensor.device

To be device agnostic and write code that works whether it’s on the GPU or the CPU:

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

x = torch.rand(3, 3).to(device)
describe(x)

cpu
Type: torch.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[0.6116, 0.3273, 0.7642],
        [0.8197, 0.4571, 0.1784],
        [0.9317, 0.1341, 0.0010]])


Computations will break if both tensors involved are not used on the same device. It's computationally expensive to move data back and forth and therefore typical procedure involves doing parallelizable operations on the GPU and transfer the final results to the CPU.

In case you have multiple GPUs, the best practice is to use the CUDA_VISIBLE_DEVICES as environment variable when executing the Python training script.

In [None]:
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py

*Tensors and computations within a network*



Apart from storing the data itself, PyTorch tensors handle the intermediate results of gradient computation by setting *requires_grad* flag to True at instantiation time. This is required for model training.

At the end of a forward pass through the network, a single scalar (*loss*) is used to compute the backward pass which is initiated by using the *backward()* method. During the backward propagation, gradient vectors are computed for all tensors which where involved during the forward pass.  

It's possible to access the gradients for all nodes of the computational graph by using the *.grad* variable of a tensor. The network optimizer uses this variable to update the values of the parameters (model weights). 



In [None]:
x = torch.ones(2, 2, requires_grad=True)
print(x.grad)

None


In [None]:
# we require a scalar to apply the backward method to it
x = x.mean()
x.backward()

# Model objects

**Data sets and data loaders**

PyTorch has developed standard conventions of interacting with training data that make it fairly consistent to work with, whether you’re working with images, text, or audio. Those convetions include *datasets* and *data loaders*.

A Dataset is a PyTorch class which allows to "pre-package" training data into the right format. We might apply any manipulation to the data here and implement different methods such as get.length and get.label.

In [None]:
# to give a better idea how to define a custom set
from torch.utils.data import DataLoader, Dataset

# define your own dataset class by inheriting from PyTorch Dataset 
class dataset(Dataset):

  def __init__(self, path_to_data):
    raise NotImplementedError
    '''Load data from disk, pre-process it, and compile it
    to feature tensors and labels
    
    self.features = []
    self.labels = []
    
    file_list = dir(path_to_data)
    
    for i, file in enumerate(file_list)
    
      normalize data
      extract features here
    
      self.features = append(feature_from_file)
      self.labels = append(flabel_from_file)'''

  def __getitem__(self, index):
    raise NotImplementedError
    ''' Extract a single item (label + feature tensor) from the
    dataset 
    
    return self.features[index], self.labels[index]
    
    '''
  
  def __len__(self):
    raise NotImplementedError
    ''' Returns the length of all training features 
    
    return len(self.features)
    
    '''

# load your training data from disc into the dataset class
train_data = dataset(path_to_data)

A data loader is there to load *batches* of training data into the training pipeline. For that the loader uses the *\__getitem\__* method from the Dataset class. The loader also controls the *number of worker* pocesses and whether the training data should be shuffeled. By default, data loaders set the batch size to 1.

In [None]:
# create a DataLoader for the training data (before a training loop)
train_loader = DataLoader(train_data, batch_size=batch_size, num_workers=4, shuffle=True)

The same procedure has to be performed for the *validation* and *test* data sets.

**The network**


To create a network, we inherit from a class called *torch.nn.Network* and fill out the \__init\__ and forward methods:

In [None]:
# in this example the activations functions are included in the forward method

class SimpleNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.fc1 = nn.Linear(12288, 84) # called Dense in Keras
    self.fc2 = nn.Linear(84, 50)
    self.fc3 = nn.Linear(50,2)
  
  def forward(self):
    x = x.view(-1, 12288)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

# initialize the model
simplenet = SimpleNet()

In [None]:
# in this example the activation functions are included in the layers object and
# the forward method just calls it - seems to be more clean

class SimpleConvNet(nn.Module):
  '''
    Simple Convolutional Neural Network
  '''
  def __init__(self):
    super().__init__()
    self.layers = nn.Sequential(
      nn.Conv2d(1, 10, kernel_size=3),
      nn.ReLU(),
      nn.Flatten(),
      nn.Linear(1094500, 50),
      nn.ReLU(),
      nn.Linear(50, 20),
      nn.ReLU(),
      nn.Linear(20, 10)
    )

  def forward(self, x):
    '''Forward pass'''
    return self.layers(x)

# initialize the model
simpleconvnet = SimpleConvNet()

To train the network, we need an optimizer:

In [None]:
# initialize optimizer (before a training loop)
optimizer = torch.optim.Adam(simpleconvnet.parameters(), lr=0.001)

# Training loop

Next, we need to define a training loop for the network. Here are the required pieces:

In [None]:
# configuration options for the training
epochs = 1
batch_size = 16
learning_rate = 1e-4
loss_function = nn.CrossEntropyLoss() # includes the softmax activation

# initialize data loaders for train, test data
# initialize the optimizer

# initialize the neural network

# set fixed random number seed
torch.manual_seed(42)

# outer loop for the epoch numbers
for epoch in range(epochs):
  # inner loop for the mini-batches
  for batch in train_loader:
    optimizer.zero_grad() # reset gradients to zero before new batch
    features, labels = batch
    output = simpleconvnet(features)
    loss = loss_function(output, labels)
    loss.backward() # compute gradients
    optimizer.step() # adjust the weights based on the gradients

In [None]:
# to train on a GPU
simpleconvnet.to(device)

In [None]:
# saving a model
torch.save(simplenet, "path")

# loading a model
simplenet = torch.load("path")

This stores both the parameters and the structure of the model to a file. This might be a problem if you change the structure of the model at a later point. For this reason, it’s more common to save a model’s state_dict instead. This is a standard Python dict that contains the maps of each layer’s parameters in the model.

In [None]:
torch.save(model.state_dict(), "path")

# load
simplenet = SimpleNet()
simplenet_state_dict = torch.load("/tmp/simplenet")
simplenet.load_state_dict(simplenet_state_dict)

The benefit here is that if you extend the model in some fashion, you can supply a strict=False parameter to load_state_dict that assigns parameters to layers in the model that do exist in the state_dict, but does not fail if the loaded state_dict has layers missing or added from the model’s current structure.

Models can be saved to a disk during a training run and reloaded at another point so that training can continue where you left off.

# PyTorch Lightning

PyTorch Lightning is a wrapper library which allows to re-pack the training loop inside a single module. This does't really makes you write less code, but apparently you get out a better structured code. For a gentle introduction see https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09

Altough, the author suggests to use LightningModule for model and update fuctions, LightningDataModule for DataLoaders, and the Lightning's trainer function to train the model, there are examples where DataLoaders are put inside the LightningModule, which makes it kind of more slick.

- TO_DO: intro LIghtning

# Experiment logging

### Tensorboard

TO_DO

# Examples

**Feed forward network for image classification**

Save images.csv that contains an URL list from https://github.com/falloutdurham/beginners-pytorch-deep-learning/tree/master/chapter2 and copy it in the Colab session.

Use then the following script to download the images (1394) into the Colab session (takes over 20 minutes!):

In [None]:
import os
import sys
import urllib3
from urllib.parse import urlparse
import pandas as pd
import itertools
import shutil

from urllib3.util import Retry

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

classes = ["cat", "fish"]
set_types = ["train", "test", "val"]

def download_image(url, klass, data_type):
    basename = os.path.basename(urlparse(url).path)
    filename = "{}/{}/{}".format(data_type, klass, basename)
    if not os.path.exists(filename):
        try: 
            http = urllib3.PoolManager(retries=Retry(connect=1, read=1, redirect=2))
            with http.request("GET", url, preload_content=False) as resp, open(
                filename, "wb"
            ) as out_file:
                if resp.status == 200:
                    shutil.copyfileobj(resp, out_file)
                else:
                    print("Error downloading {}".format(url))
            resp.release_conn()
        except:
            print("Error downloading {}".format(url))

if __name__ == "__main__":
    if not os.path.exists("images.csv"):
        print("Error: can't find images.csv!")
        sys.exit(0)

    # get args and create output directory
    imagesDF = pd.read_csv("images.csv")

    for set_type, klass in list(itertools.product(set_types, classes)):
        path = "./{}/{}".format(set_type, klass)
        if not os.path.exists(path):
            print("Creating directory {}".format(path))
            os.makedirs(path)

    print("Downloading {} images".format(len(imagesDF)))

    result = [
        download_image(url, klass, data_type)
        for url, klass, data_type in zip(
            imagesDF["url"], imagesDF["class"], imagesDF["type"]
        )
    ]
    sys.exit(0)

In [None]:
# creating training, validation, and test sets for the image data
import torchvision
from torchvision import transforms

train_data_path = "train"

transforms = transforms.Compose([
    transforms.Resize(64),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225] )
    ])

train_data = torchvision.datasets.ImageFolder(root=train_data_path,
                                              transform=transforms)

val_data_path = "val"
val_data = torchvision.datasets.ImageFolder(root=val_data_path,
                                            transform=transforms)

test_data_path = "test"
test_data = torchvision.datasets.ImageFolder(root=test_data_path,
                                            transform=transforms)

In [None]:
# creating DataLoaders
from torch.utils.data import DataLoader

batch_size=64
train_data_loader = DataLoader(train_data, batch_size=batch_size)
val_data_loader  = DataLoader(val_data, batch_size=batch_size)
test_data_loader  = DataLoader(test_data, batch_size=batch_size)

In [None]:
import torch.nn as nn

# network
class SimpleNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.fc1 = nn.Linear(12288, 84) # called Dense in Keras
    self.fc2 = nn.Linear(84, 50)
    self.fc3 = nn.Linear(50,2)
  
  def forward(self):
    x = x.view(-1, 12288)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

# initialize the model
simplenet = SimpleNet()

In [None]:
optimizer = torch.optim.Adam(simplenet.parameters(), lr=0.001)

The training loop in a single training function:

In [None]:
def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device=0):
  for epoch in range(epochs):
    training_loss = 0.0
    valid_loss = 0.0
    model.train()
    for batch in train_loader:
      optimizer.zero_grad()
      inputs, target = batch
      inputs = inputs.to(device)
      target = targets.to(device)
      output = model(inputs)
      loss = loss_fn(output, target)
      loss.backward()
      optimizer.step()
      training_loss += loss.data.item()
    training_loss /= len(train_iterator)
      
    model.eval()
    num_correct = 0
    num_examples = 0
    for batch in val_loader:
      inputs, targets = batch
      inputs = inputs.to(device)
      output = model(inputs)
      targets = targets.to(device)
      loss = loss_fn(output,targets)
      valid_loss += loss.data.item()
      correct = torch.eq(torch.max(F.softmax(output), dim=1)[1],
                         target).view(-1)
      num_correct += torch.sum(correct).item()
      num_examples += correct.shape[0]
    valid_loss /= len(valid_iterator)
    
    print('Epoch: {}, Training Loss: {:.2f}, Validation Loss: {:.2f}, accuracy = {:.2f}'.format(epoch, training_loss,
    valid_loss, num_correct / num_examples))

In [None]:
train(simplenet, optimizer, torch.nn.CrossEntropyLoss(), train_data_loader, test_data_loader)

UnidentifiedImageError: ignored

**Example with prebuilt data set**

# Hugging Face with PyTorch



Hugging Face transformers is a Python library which allows to use pre-trained large language models and fine-tune them on your own data set using its Trainer API (see https://huggingface.co/course/chapter3/1?fw=pt).

Following this approach, tuning/training a PyTorch model becomes as easy as using Keras' model.fit(). See the following example:

In [None]:
!pip install pytorch_lightning
!pip install tensorboard
!pip install nlp
!pip install transformers

In [None]:
import torch
import pytorch_lightning as pl
import nlp
import transformers

In [None]:
class IMDBSentimentClassifier(pl.LightningModule):
    
    # initilize the model and model loss
    def __init__(self):
        super().__init__()
        # load a re-trained BERT model from HF transfomers
        self.model = transformers.BertForSequenceClassification.from_pretrained('bert-base-uncased')
        # cross-entropy loss from PyTorch
        self.loss = torch.nn.CrossEntropyLoss(reduction='none')

    def prepare_data(self):
        
        # load BERT tokenizer from HF transformers
        tokenizer = transformers.BertTokenizer.from_pretrained('bert-base-uncased')

        # convert the text field to token ids and add to the data set items
        # text sequences also get normalized here
        def _tokenize(x):
            x['token_ids'] = tokenizer.batch_encode_plus(
                    x['text'], 
                    max_length=32,
                    truncation=True, 
                    padding=True)
            return x

        # load IMDB data set from HF nlp and split it
        def _prepare_ds(split):
            ds = nlp.load_dataset('imdb', split='train[:10%]')
            ds = ds.map(_tokenize, batched=True)
            ds.set_format(type='torch', columns=['token_ids', 'label'])
            return ds

        self.train_ds, self.test_ds = map(_prepare_ds, ('train', 'test'))

    def forward(self, token_ids):
        mask = (token_ids != 0).float()
        logits, = self.model(token_ids, mask)
        return logits

    def training_step(self, batch, batch_idx):
        logits = self.forward(batch['token_ids'])
        loss = self.loss(logits, batch['label']).mean()
        return {'loss': loss, 'log': {'train_loss': loss}}

    def validation_step(self, batch, batch_idx):
        logits = self.forward(batch['token_ids'])
        loss = self.loss(logits, batch['label'])
        acc = (logits.argmax(-1) == batch['label']).float()
        return {'loss': loss, 'acc': acc}

    def validation_epoch_end(self, outputs):
        loss = torch.cat([o['loss'] for o in outputs], 0).mean()
        acc = torch.cat([o['acc'] for o in outputs], 0).mean()
        out = {'val_loss': loss, 'val_acc': acc}
        return {**out, 'log': out}

    def train_dataloader(self):
        return torch.utils.data.DataLoader(
                self.train_ds,
                batch_size=8,
                drop_last=True,
                shuffle=True,
                )

    def val_dataloader(self):
        return torch.utils.data.DataLoader(
                self.test_ds,
                batch_size=8,
                drop_last=False,
                shuffle=True,
                )

    def configure_optimizers(self):
        return torch.optim.SGD(
            self.parameters(),
            lr=1e-2,
            momentum=0.9,
        )
    
def main(_):
    model = IMDBSentimentClassifier()
    trainer = pl.Trainer(
        default_root_dir='root/logs',
        gpus=(1 if torch.cuda.is_available() else 0),
        max_epochs=10,
        logger=pl.loggers.TensorBoardLogger('root/logs/', name='imdb', version=0),
    )
    trainer.fit(model)


if __name__ == '__main__':
    main(_)

However, in case you need any additional customizations for your training you can still utilize all of the underlying PyTorch functionality and implement your own training loop (see https://huggingface.co/course/chapter3/4?fw=pt).

# Using AWS Sagemaker

TO_DO