<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Introduction-to-Pytorch" data-toc-modified-id="Introduction-to-Pytorch-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Introduction to Pytorch</a></span><ul class="toc-item"><li><span><a href="#Quickstart" data-toc-modified-id="Quickstart-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Quickstart</a></span><ul class="toc-item"><li><span><a href="#Working-with-data" data-toc-modified-id="Working-with-data-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>Working with data</a></span></li><li><span><a href="#Creating-models" data-toc-modified-id="Creating-models-2.1.2"><span class="toc-item-num">2.1.2&nbsp;&nbsp;</span>Creating models</a></span></li><li><span><a href="#Optimizing-model-parameters" data-toc-modified-id="Optimizing-model-parameters-2.1.3"><span class="toc-item-num">2.1.3&nbsp;&nbsp;</span>Optimizing model parameters</a></span></li><li><span><a href="#Save-a-model" data-toc-modified-id="Save-a-model-2.1.4"><span class="toc-item-num">2.1.4&nbsp;&nbsp;</span>Save a model</a></span></li><li><span><a href="#Loading-a-model" data-toc-modified-id="Loading-a-model-2.1.5"><span class="toc-item-num">2.1.5&nbsp;&nbsp;</span>Loading a model</a></span></li><li><span><a href="#Make-predictions" data-toc-modified-id="Make-predictions-2.1.6"><span class="toc-item-num">2.1.6&nbsp;&nbsp;</span>Make predictions</a></span></li></ul></li><li><span><a href="#Tensors" data-toc-modified-id="Tensors-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Tensors</a></span><ul class="toc-item"><li><span><a href="#Initializing-a-tensor" data-toc-modified-id="Initializing-a-tensor-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Initializing a tensor</a></span></li><li><span><a href="#Attributes-of-a-tensor" data-toc-modified-id="Attributes-of-a-tensor-2.2.2"><span class="toc-item-num">2.2.2&nbsp;&nbsp;</span>Attributes of a tensor</a></span></li><li><span><a href="#Tensor-operations" data-toc-modified-id="Tensor-operations-2.2.3"><span class="toc-item-num">2.2.3&nbsp;&nbsp;</span>Tensor operations</a></span></li><li><span><a href="#Bridge-with-numpy" data-toc-modified-id="Bridge-with-numpy-2.2.4"><span class="toc-item-num">2.2.4&nbsp;&nbsp;</span>Bridge with numpy</a></span></li></ul></li><li><span><a href="#Datasets-&amp;-dataloaders" data-toc-modified-id="Datasets-&amp;-dataloaders-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Datasets &amp; dataloaders</a></span></li></ul></li></ul></div>

# Introduction

Here, we will explore the PyTorch package. To download PyTorch, visit [this link](https://pytorch.org/get-started/locally/). Tutorials can be found [here](https://pytorch.org/tutorials). We will start with the [Introduction to Pytorch tutorial](https://pytorch.org/tutorials/beginner/basics/intro.html).

# Introduction to Pytorch

## Quickstart

### Working with data

PyTorch has two primitives to work with data: torch.utils.data.DataLoader and torch.utils.data.Dataset. Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset.

In [16]:
import numpy as np
import matplotlib.pyplot as plt
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

PyTorch offers domain-specific libraries such as TorchText, TorchVision, and TorchAudio, all of which include datasets. For this tutorial, we will be using a TorchVision dataset. The torchvision.datasets module contains Dataset objects for many real-world vision data like CIFAR, COCO. In this tutorial, we use the FashionMNIST dataset. Every TorchVision Dataset includes two arguments: transform and target_transform to modify the samples and labels respectively.

In [2]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw



We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

In [3]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


### Creating models

To define a neural network in PyTorch, we create a class that inherits from [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). We define the layers of the network in the __ init __ function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the GPU if available.

In [4]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


### Optimizing model parameters

To train a model, we need a loss function and an optimizer.

In [5]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model’s parameters.

In [6]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

We also check the model’s performance against the test dataset to ensure it is learning.

In [8]:
def test(dataloader, model, loss_fn):
    
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    
    test_loss /= num_batches
    correct /= size
    
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.

In [9]:
epochs = 5

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
    
print("Done!")

Epoch 1
-------------------------------
loss: 2.295400  [    0/60000]
loss: 2.300031  [ 6400/60000]
loss: 2.278214  [12800/60000]
loss: 2.273352  [19200/60000]
loss: 2.259055  [25600/60000]
loss: 2.225635  [32000/60000]
loss: 2.227098  [38400/60000]
loss: 2.198764  [44800/60000]
loss: 2.192537  [51200/60000]
loss: 2.153492  [57600/60000]
Test Error: 
 Accuracy: 42.2%, Avg loss: 2.164065 

Epoch 2
-------------------------------
loss: 2.172277  [    0/60000]
loss: 2.177418  [ 6400/60000]
loss: 2.119885  [12800/60000]
loss: 2.128477  [19200/60000]
loss: 2.099913  [25600/60000]
loss: 2.032957  [32000/60000]
loss: 2.053221  [38400/60000]
loss: 1.986133  [44800/60000]
loss: 1.983631  [51200/60000]
loss: 1.908596  [57600/60000]
Test Error: 
 Accuracy: 57.4%, Avg loss: 1.921042 

Epoch 3
-------------------------------
loss: 1.950893  [    0/60000]
loss: 1.940117  [ 6400/60000]
loss: 1.823532  [12800/60000]
loss: 1.846664  [19200/60000]
loss: 1.773741  [25600/60000]
loss: 1.701411  [32000/600

### Save a model

A common way to save a model is to serialize the internal state dictionary (containing the model parameters).

In [11]:
torch.save(model.state_dict(), "model.pth")

### Loading a model

The process for loading a model includes re-creating the model structure and loading the state dictionary into it.

In [12]:
model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

### Make predictions

In [13]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"


## Tensors

Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and NumPy arrays can often share the same underlying memory, eliminating the need to copy data (see Bridge with NumPy). Tensors are also optimized for automatic differentiation (we’ll see more about that later in the Autograd section).

### Initializing a tensor

Tensors can be initialized: 
1. directly from data
2. from a Numpy array
3. from another tensor
4. using `shape`

In [18]:
#1
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

#2
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

#3
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

#4
shape = (2,3,) # NOTE - #rows = 2, #columns = 3
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.9833, 0.1800],
        [0.6829, 0.3880]]) 

Random Tensor: 
 tensor([[0.2868, 0.9747, 0.6901],
        [0.1860, 0.5851, 0.8993]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


### Attributes of a tensor

In [24]:
tensor = torch.ones((3, 4))

print(f"Tensor: \n {tensor} \n ")

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Tensor: 
 tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]) 
 
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Tensor operations

Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing, indexing, slicing), sampling and more are comprehensively described [here](https://pytorch.org/docs/stable/torch.html).

Each of these operations can be run on the GPU (at typically higher speeds than on a CPU). If you’re using Colab, allocate a GPU by going to Runtime > Change runtime type > GPU.

By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using .to method (after checking for GPU availability). Keep in mind that copying large tensors across devices can be expensive in terms of time and memory!

In [25]:
# check if cuda available
torch.cuda.is_available()

False

In [26]:
## IF present: move tensor like this:
# We move our tensor to the GPU if available
if torch.cuda.is_available():
    tensor = tensor.to("cuda")

Some examples of tensor operations are provided below. Check the details of tensor operations in the tutorial [here](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html#operations-on-tensors). Syntax very similar to numpy arrays.
* first row: `tensor[0]`
* first column: `tensor[:,0]`

A bit non-trivial is the `cat` method. **Note**: there is also the `stack` method, not to be confused.

In [36]:
print("Stack along dimension = 1")
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)
print(t1.shape)

print("Stack along dimension = 0")
t2 = torch.cat([tensor, tensor, tensor], dim = 0)
print(t2)
print(t2.shape)

Stack along dimension = 1
tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
torch.Size([3, 12])
Stack along dimension = 0
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
torch.Size([9, 4])


For arithmetic computations, there are various possibilities

In [39]:
# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
# note: tensors must have compatible shapes
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3)

tensor([[4., 4., 4.],
        [4., 4., 4.],
        [4., 4., 4.]])

In [40]:
# This computes the element-wise product. z1, z2, z3 will have the same value
# note: for this, tensors must have same shapes
z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

Special operations exist for **one-element tensors**.  If you have a one-element tensor, for example by aggregating all values of a tensor into one value, you can convert it to a Python numerical value using `item()`

In [42]:
agg = tensor.sum()
print(agg) # agg is a one-element TENSOR
agg_item = agg.item()
print(agg_item, type(agg_item)) # agg_item is a regular float

tensor(12.)
12.0 <class 'float'>


Also pay attention to **in-place operations**. Operations that store the result into the operand are called in-place. They are denoted by a `_` suffix. For example: `x.copy_(y)`, `x.t_()`, will change `x`. Note that some functions, like `add`, have an in-place operation equivalent - see the difference below!

In [53]:
t = torch.ones(5)
print(f'Tensor t: {t}')

print("Calling add: not stored in original tensor")
another = t.add(2)
print(f'Tensor another: {another}')
print(f'Tensor t: {t}')

print("Calling add_: stored in original tensor")
t.add_(2)
print(f'Tensor t: {t}')

Tensor t: tensor([1., 1., 1., 1., 1.])
Calling add: not stored in original tensor
Tensor another: tensor([3., 3., 3., 3., 3.])
Tensor t: tensor([1., 1., 1., 1., 1.])
Calling add_: stored in original tensor
Tensor t: tensor([3., 3., 3., 3., 3.])


### Bridge with numpy 

Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other. This is done by calling `.numpy()`. Note that, when printing, a tensor object will print `tensor()` in front of the array.

To go from tensors to numpy arrays:

In [55]:
t = torch.ones(5)
n = t.numpy()

print(f't: {t} \n') # note: "tensor()" is printed
print(f'tn {n} \n') # "tensor()" is NOT printed

print(f'Type of t: {type(t)} \n')
print(f'Type of n: {type(n)}')

t: tensor([1., 1., 1., 1., 1.]) 

tn [1. 1. 1. 1. 1.] 

Type of t: <class 'torch.Tensor'> 

Type of n: <class 'numpy.ndarray'>


Changes are reflected into each other (note: we **have** to call `add_` for this!)

In [57]:
t.add_(1) # add 1 to the tensor
print(n) # n is changed as well

[3. 3. 3. 3. 3.]


From numpy array to tensor (again: changes are reflected in both objects)

In [58]:
n = np.ones(5)
t = torch.from_numpy(n)

## Datasets & dataloaders