# Introduction to deep learning using PyTorch

In this tutorial, we go over the basics of building deep learning models in python. We will make use of the following python packages.

 1. pytorch and torchvision. See [pytorch.org](http://pytorch.org) for installation. If you're installing on a computer with a compatible GPU, make sure to install the GPU version of pytorch (i.e., select a CUDA version).

 2. matplotlib. See [matplotlib.org](http://matplotlib.org) for installation.
 
 3. numpy. See [numpy.org](http://numpy.org) for installation.
 
 Acknowledgement: This tutorial was created by Alaa ElKhatib - previous TA for ECE457B

In [1]:
import numpy as np
import torch
import matplotlib.pyplot as plt

## Tensors

 Tensors are multidimensional arrays. PyTorch uses tensors to store model parameters, gradients, data, etc.

 PyTorch provides many functions to create tensors.

In [2]:
print(torch.tensor([[1, 2], [3, 4]]))

tensor([[1, 2],
        [3, 4]])


In [3]:
print(torch.ones(3, 4))

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])


In [5]:
print(torch.rand(3, 4))

tensor([[0.6434, 0.4795, 0.4872, 0.1332],
        [0.8152, 0.3405, 0.5804, 0.3657],
        [0.2588, 0.7215, 0.1119, 0.9151]])


 PyTorch also provides functions to interface with numpy objects.

In [6]:
x = np.random.randint(0, 10, 4)
y = torch.from_numpy(x)
z = y.numpy()

print(f'x = {x}, y = {y}, z = {z}')

x = [3 2 2 7], y = tensor([3, 2, 2, 7], dtype=torch.int32), z = [3 2 2 7]


### Tensor attributes and methods

 Tensor have many attributes and methods. Some noteworthy ones are illustarted below.

In [7]:
x = torch.tensor([1, 2, 3, 4], dtype=torch.float)
print(x, x.dtype)
x = x.to(torch.int64)  # type conversion
print(x, x.dtype)

x = torch.rand(3, 4, device=torch.device('cpu'))
print(x, x.device)

tensor([1., 2., 3., 4.]) torch.float32
tensor([1, 2, 3, 4]) torch.int64
tensor([[0.5793, 0.0591, 0.2013, 0.1477],
        [0.8553, 0.9286, 0.6493, 0.8766],
        [0.4449, 0.0184, 0.1605, 0.5377]]) cpu


 the .to() also moves tensors from cpu to gpu and back
 attempting to move tensor to gpu throws an error
 when it's not available, so it's common practice
 to check first

In [11]:
if torch.cuda.is_available():  # checks gpu availablity
    x = x.to(torch.device('cuda'))
    print(x, x.device)

tensor([[0.0041, 0.2936, 0.0258, 0.7921],
        [0.8515, 0.8661, 0.0691, 0.1757],
        [0.7579, 0.3465, 0.7950, 0.1114]], device='cuda:0') cuda:0


 .to() can also be used to change one tensor to
 have the same dtype and device as another

In [12]:
x = torch.tensor([1, 2, 3], dtype=torch.float)
y = torch.tensor([4], dtype=torch.int64)
if torch.cuda.is_available():
    y = y.to(torch.device('cuda'))

z = x.to(y)
print(x, x.dtype, x.device)
print(z, z.dtype, z.device)

tensor([1., 2., 3.]) torch.float32 cpu
tensor([1, 2, 3], device='cuda:0') torch.int64 cuda:0


## Defining a neural network model

 The simplest (and hence least flexible) way to define a network is using the Sequential class. 

 As you might expect from the name, this allows us to define networks as sequences of layers, with the output of each layer fed as input to the next. 
 
 The argument to Sequential is an *OrderedDict* object, where each key-value pair represents a layer and a name to identify it. 
 
 So, in the example below, we can subsequently reference the first layer by net.hidden_1. We can also access the first layer's parameters by net.hidden_1.weight and net.hidden_1.bias. Finally, we can iterate over all model parameters by using the net.parameters() method.

In [8]:
from collections import OrderedDict

net = torch.nn.Sequential(OrderedDict([
    ('hidden_1', torch.nn.Linear(10, 10)),
    ('act_1', torch.nn.ReLU()),
    ('hidden_2', torch.nn.Linear(10, 1))
]))
print(net)

Sequential(
  (hidden_1): Linear(in_features=10, out_features=10, bias=True)
  (act_1): ReLU()
  (hidden_2): Linear(in_features=10, out_features=1, bias=True)
)


We can subsequently make predictions with net by calling it on a tensor (of the shape it expects). For example:

In [9]:
print(net(torch.rand(1, 10)))

tensor([[0.3133]], grad_fn=<AddmmBackward>)


 PyTorch provides a more flexible way to define nets by subclassing from the torch.nn.Module class. For instance, this allows us to define a forward pass that does go through the layers sequentially. In the example below, the net has 2 branches (or columns) that merge before the final layer.
 
In most cases, one would only need to define the forward pass.

In [10]:
class MyNet(torch.nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.hidden_1a = torch.nn.Linear(10, 5)
        self.hidden_1b = torch.nn.Linear(10, 5)
        self.hidden_2 = torch.nn.Linear(10, 1)

    def forward(self, x):
        z1 = self.hidden_1a(x)
        z2 = self.hidden_1b(x)
        z = torch.cat([z1, z2], dim=1)
        f1_z = torch.nn.functional.relu(z)
        out = self.hidden_2(f1_z)
        return out


net = MyNet()
print(net)
print(net(torch.rand(1, 10)))  # try forward pass on random sample

MyNet(
  (hidden_1a): Linear(in_features=10, out_features=5, bias=True)
  (hidden_1b): Linear(in_features=10, out_features=5, bias=True)
  (hidden_2): Linear(in_features=10, out_features=1, bias=True)
)
tensor([[0.4293]], grad_fn=<AddmmBackward>)


## Iterating over data

 Training neural nets involves iterating over datasets, often in batches. PyTorch provides the DataLoader class to facilitate this. A DataLoader wraps around a Dataset and provides an iterator for it.

In [11]:
from torch.utils.data import TensorDataset, DataLoader

data = torch.rand(100, 8)  # 100 8-dimensional samples
labels = torch.randint(0, 2, (100,), dtype=torch.int64)
dataset = TensorDataset(data, labels)
loader = DataLoader(dataset, batch_size=16, shuffle=True)
for x, y in loader:
    print(x.shape, y.shape)  # get first batch and break
    break


torch.Size([16, 8]) torch.Size([16])


 Of course, we can always iterate over tensors directly, but using the provided DataLoader should be faster. Also, the DataLoader class provides options for shuffling, batching, and various sampling strategies.

## Torchvision datasets

 Torchvision provides functions to download and prepare popular image datasets. 
 
 We're going to use the MNIST dataset, which contains images of handwritten digits. We showed in previous tutorials how we can create a MLP to classify this dataset with Keras and Tensorflow.
 
 With torchvision datasets, we usually have to pass the *ToTensor()* transform, which converts the returned images to tensors. (Note that this conversion happens when you use the dataset's *&#95;&#95;getitem&#95;&#95;* method, or in other words, when you use indexing, e.g. mnist[i]. Accessing the images directly, say by mnist.train_images, will not apply the transform and so will not return tensors.)


*NOTE*: You should not face the error shown in tutorial if you update PyTorch (use latest version). If you wish to use the same version you have, refer to the fix [here](https://github.com/pytorch/vision/issues/3500) or [here](https://stackoverflow.com/questions/66467005/torchvision-mnist-httperror-http-error-403-forbidden): 

In [14]:
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor

mnist = MNIST(
    root='mnist',  # choose a folder to save data in
    download=True,
    train=True,  # load the training set
    transform=ToTensor()
)
print(f'number of training samples: {len(mnist)}')
img, label = mnist[0]
print(img.shape)
print(label)
plt.imshow(img.reshape(28, 28))
plt.show()

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to mnist\MNIST\raw\train-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

HTTPError: HTTP Error 403: Forbidden

 To download the test dataset, pass 'train=False' instead. There are 10000 test samples.

## Optimization (i.e. training the model)

 To train a neural net, we need 2 more things, in addition to the model and data. We need an optimizer and a loss function. PyTorch has the most commonly used loss functions implemented. For multiclass classification, we normally use the cross_entropy loss.

 A training iteration would go something like this:

In [19]:
# x, t = next(iter(loader))  # get batch
# y = net(x)  # forward pass
# loss = cross_entropy(y, t)
# compute gradients of loss wrt parameters
# for each parameter p, update as:
#       p <-- p - (lr * p_grad)

 Fortunately, we don't need to compute the gradients or do the updates manually. We can use one of PyTorch's optimizers to keep track of gradients and update parameters. Moreover, gradients can be computed by using the .backward() method of a tensor, which automatically backpropogates the loss. The training step now becomes:

In [20]:
# net = ...
# loader = ...
# optimizer = SGD(net.parameters(), lr=0.1, momentum=0)
# for x, t in loader
#   y = net(x)
#   loss = cross_entropy(y, t)
#   optimizer.zero_grad()
#   loss.backward()
#   optimizer.step()

We're using here the most basic optimizer, a vanilla stochastic gradient descent, but there more sophisticated optimizers available. 

Note that we need to zero_grad() in each iteration. Otherwise, the gradients would accumulate in the parameters.

## Evaluating the model

As we're training the model, we need a way to gauge how well things going. We can keep track of the training loss, but that's not enough. What we're really interested in is how the model would perform on future data samples, not in seen the training set. To this end, we often use a separate validation (or dev) set while we're tuning the model hyperparameters to keep track of performance. Moreover, it is common practice to have a third set, the test set, that is only used to report the final performance after itereating between tuning and training (the dev set at this point would be "contaminated").

The logic of all this is that once a training sample is used to modify the model's parameters in any way, the model's performance on that sample becomes biased, usually too optimistic. In other words, it would not reflect accuractely the model's performance on future samples.


## Putting it all together

Now that we've seen all the building blocks, let's put it all together and train a model to classify MNIST images.

In [18]:
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
import torch
from torch.optim import SGD
from torch.utils.data import DataLoader
from torch.nn.functional import cross_entropy
from collections import OrderedDict

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_set = MNIST(root='mnist', download=True, train=True, transform=ToTensor())
test_set = MNIST(root='mnist', download=True, train=False, transform=ToTensor())

train_loader = DataLoader(train_set, batch_size=60)
test_loader = DataLoader(test_set, batch_size=50)

net = torch.nn.Sequential(OrderedDict([
    ('h1', torch.nn.Linear(784, 1024)),
    ('a1', torch.nn.ReLU()),
    ('h2', torch.nn.Linear(1024, 10)),
]))

net.to(device)
optimizer = SGD(net.parameters(), lr=0.01, momentum=0.9)
for i, (x, t) in enumerate(train_loader):
    x = x.view(-1, 784)  # flatten images into vectors
    x = x.to(device)
    t = t.to(device)
    y = net(x)
    loss = cross_entropy(y, t)
    optimizer.zero_grad()
    
    loss.backward()
    optimizer.step()
    if (i + 1) % 100 == 0:
        print(f'batch {i + 1}')
        # normally, we would track the average loss over
        # a number of batches. A single batch loss oscillates
        # from one batch to another, but used here just
        # to illustrate.
        print(f'training batch loss = {loss.item():.4f}')
        correct = 0.
        total = 0.
        for x, t in test_loader:
            x = x.view(-1, 784)
            x = x.to(device)
            t = t.to(device)
            y = net(x)
            c = y.argmax(dim=1)
            correct += t.eq(c).sum().item()
            total += t.shape[0]
        print(f'accuracy = {correct / total}')
        
    
    

batch 100
training batch loss = 0.4364
accuracy = 0.8401
batch 200
training batch loss = 0.3836
accuracy = 0.8757
batch 300
training batch loss = 0.2739
accuracy = 0.8983
batch 400
training batch loss = 0.4461
accuracy = 0.9065
batch 500
training batch loss = 0.4702
accuracy = 0.9136
batch 600
training batch loss = 0.3497
accuracy = 0.9185
batch 700
training batch loss = 0.3287
accuracy = 0.9176
batch 800
training batch loss = 0.3441
accuracy = 0.9305
batch 900
training batch loss = 0.4765
accuracy = 0.9305
batch 1000
training batch loss = 0.1362
accuracy = 0.9295
