In [6]:
from __future__ import print_function, division
import os
import torch
import pandas as pd
from skimage import io, transform
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils
import numpy as np

# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

plt.ion()   # interactive mode

# Assignment 1b - Introduction to PyTorch

In this assignment, you will be going over the basics of PyTorch as covered in the notes and the slides. 

Some background about Pytorch. PyTorch is, like TensorFlow, an open-source machine learning library. Contraty to TensorFlow, PyTorch is created mainly by Facebook's AI lab. PyTorch is nice and efficient due to its optimization concerning tensor like computation (as Numpy and TF) and also due to it being specifically targeted for deep learning applications.

Before we get started, double check that your torch version is up to date!

In [7]:
torch.__version__

'1.6.0'

Also, don't forget to turn on your **GPU** through *runtime* > *change runtime type* > *GPU*

# 1- Data Utilities

In PyTorch, you have to write your own custom **datasets**, **dataloaders**. Each of them, as expected, are encapsulated into a different class. The **dataset** class needs to be initialized as follows:
  - initializer with path to data/data, transforms*
  - len: length of the dataset
  - __getitem__(idx): obtaining a data sample at index `idx`.

Dataloaders are extremely easy to be intialized once you have created your custom dataset through `DataLoader(dataset, batch_size=4,
                        shuffle=True, num_workers=0)`

You will practice more with this during your own projects, so we will save this part for you. Take a look at this and understand what each part does.

In [8]:
class DataSet(Dataset):
    """Face Landmarks dataset."""

    def __init__(self, labels, images, transform=None):
        self.labels = labels
        self.images = images
        self.transform = transform

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        image = self.images[idx]

        if self.transform:
            image = self.transform(image)

        return image, label


Transforms is for data augmentation. Luckily, PyTorch has a range of functions defined for us to perform augmentation, so we don't have to do it!

**Question 1** Find a function for the following data augmentations from `transforms` and explain how many/what arguments it takes:

- Convert image from RGB to Grey
- Randomly performs a crop in the middle 
- Converts to a tensor

**Solutions**
- `transforms.Grayscale(output_channels)`
- `RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')`
- `toTensor()`

---

# 2 - Defining Neural Networks

Neural Networks are encapsulated into a *class*, since, as discussed earlier, PyTorch focused a lot on the object oriented part of Python. Generally, a `class Network` at least contains the following functions:
  - an initializer with no variables
  - a `forward(input)` function that basically resembles `__call__` and calculates the output of running the network on `input`.
  - (optional) `num_flat_features(x)` function that returns the number of parameters that you have if you flatten a specific x.

**Question 2** Complete the code below such that it satisfies the following neural network architecture:
![Neural Network](https://www.researchgate.net/profile/Angel_Cruz-Roa/publication/263052166/figure/fig3/AS:614373047402509@1523489357882/3-layer-CNN-architecture-composed-by-two-layers-of-convolutional-and-pooling-layers-a.png)

In [9]:
class Net(torch.nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # BEGIN CODE
        ...
        # END CODE

    def forward(self, x):
        # BEGIN CODE
        ...
        # END CODE

    # OPTIONAL
    def num_flat_features(self, x):
        # BEGIN CODE
        ...
        # END CODE

In [10]:
# BEGIN SOLUTION
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(3, 16, 8)
        self.conv2 = nn.Conv2d(3, 32, 8)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(32 * 15 * 15, 128)  # 6*6 from image dimension
        self.fc2 = nn.Linear(128, 2)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = x.view(-1, self.num_flat_features(x)) # OPTION A
        # x = x.view(-1, 32 * 15 * 15) # OPTION B
        x = self.fc1(x)
        x = F.log_softmax(self.fc2(x))
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

# END SOLUTION

---

# 3 - Tensors, Dimensionality and Numpy

Tensors are like numpy arrays, a data structure where all the `dtypes` are the same. Across all libraries `TF`, `numpy` and `torch`, they have the same functionality, namely representing vectors, through which we can do fast computation. Contrary to **tensorflow**, you will not have to juggle with *constants* and *placeholder variables*. It is created to be for Object Oriented Programming and behaves a lot more like the arrays you are used from Numpy. Let's take a look at it.

In [11]:
x = torch.empty(5, 3)
print(x)

tensor([[0.0000e+00, 0.0000e+00, 1.8788e+31],
        [1.7220e+22, 4.7881e+22, 6.7200e-07],
        [1.0141e-11, 8.1929e-10, 3.3234e-09],
        [5.4363e+22, 8.4945e+20, 8.3564e+20],
        [2.6948e-09, 3.1369e+27, 7.0800e+31]])


**Question 3.1** What do you notice about `torch.empty`, every time we initialize an empty tensor? Are the values random? What can you find online about it?

**Solution**: As we can see on [torch documentation](https://pytorch.org/docs/stable/generated/torch.empty.html, `empty` takes uninitialized, consecutive data blocks and then creates a tensor from those. That's why you see random values; these are basically garbage values. If we want to be sure that nothing is happening we would rather use other functions as we demonstrate in the demo below.

As you noticed, the values of `empty` change every time! Let's look at other functions, which look really familiar to Numpy functions, that take care for us of that.

**Question 3.2** Find the equivalents for the following numpy functions in torch:

- `np.ones((5,3))`
- `np.random.randn((10,1))`


In [12]:
# BEGIN SOLUTION
torch.ones((5,3));
torch.rand((10, 1));
# END SOLUTION

To obtain the dimensionality of your tensors, just simply do `tensor.Size()` and you will get a pretty familiar sight as you have learned in numPy and the TensorFlow tutorials.

Let's see into the differences between `PyTorch tensors` and `TensorFlow tensors`. 

**Question 3.3** Change the values on the diagonal to be `1`, `2`, `3`

In [13]:
# DO NOT TEMPER WITH THIS CELL
myTensor = torch.from_numpy(999*np.ones((3,3)))
myTensor

tensor([[999., 999., 999.],
        [999., 999., 999.],
        [999., 999., 999.]], dtype=torch.float64)

In [14]:
# START SOLUTION
myTensor[0,0] = 1
myTensor[1,1] = 2
myTensor[2,2] = 3
myTensor
# END SOLUTION

tensor([[  1., 999., 999.],
        [999.,   2., 999.],
        [999., 999.,   3.]], dtype=torch.float64)

Way easier than TensorFlow right? That is because PyTorch is designed to be a object oriented program and therefore, also to allow for easy mutations. We will not be going over the following trivial functions:

- `torch.from_numpy()`
- `tensor.numpy()`

*Why are we using Tensors though? Let's take a look at the demo below.* Think about what happens if differences of millions of parameters build up.

In [15]:
%%time
#NUMPY
x = np.random.rand(10000, 10000)
y = np.random.rand(10000, 10000)
a = x + y
b = x - y
c = x * y

CPU times: user 2.82 s, sys: 1.22 s, total: 4.04 s
Wall time: 4.69 s


In [16]:
%%time
# TENSOR
x = torch.rand(10000, 10000)
y = torch.rand(10000, 10000)
a = x + y
b = x - y
c = x * y

CPU times: user 1.25 s, sys: 738 ms, total: 1.99 s
Wall time: 2.24 s


## GPU: Cuda

We can also use our GPUs! Observe the demo below. Basically, we can use GPUs through defining a `device` and then sending our tensors to `device`. To extract values, we want to extract the data from the `device-type`.

*HINT* Make sure your GPU is turned on.

In [None]:
device = torch.device("cuda")          # a CUDA device object

In [None]:
%%time
x = torch.rand(10000, 10000, device=device)
y = torch.rand(10000, 10000, device=device)
x.to(device) # send data
y.to(device) # send data
a = x + y
b = x - y
c = x * y

---

# 4 - Mathematical Operations and Gradients

Unlike in TensorFlow, we can just use the arithmetic operators in python: +, -., *, **, /, //, % etc. to perform calculations, which is really nice, because it saves a lot of complexities. In this short passage, we will be covering the calculation of gradients because also that is optimized in the PyTorch library.

To be able to calculate a gradient in PyTorch, you need to initialize a tensor as follows:

In [None]:
x = torch.ones(2, 2, requires_grad=True)
x

Let's take a look at what happens when we perform linear transformations on x. What happens to the gradient designation?

In [None]:
y = x + 2
y

In [None]:
y.backward(x)
print(x.grad)

Does that make sense to be its partial derivative $\dfrac{dY}{dX}$? It does right!

**Question 4** Imagine, we don't want to track gradients anymore. Look up on the PyTorch documentation on how to get a copy of y without tracking gradients.

In [17]:
# BEGIN SOLUTION
y_notrack = y.detach()
# END SOLUTION

---

# 5 - Optimizers, Functions (Torch-Specific)

All functions that you will be using have been pre-defined in the [nn library](https://pytorch.org/docs/stable/nn.html). 

**Question 5.1** How do we combine multiple functions sequentially? Look up a function that allows you to combine first a `ReLu` and then a `Softmax` into one function. Assign this function to variable `myfunc`.

In [None]:
# BEGIN SOLUTION
my_func = nn.Sequential(nn.ReLU(), nn.Softmax())
# END SOLUTION

We also have **optimizers** and **learning schedulers**. These are important for how you want to do your optimization for your neural network: `torch.optim`. The demo below is not supposed to be run but you will see more in depth how this works in assignment 2b!

In [None]:
# optimizer = optim.SGD(model.parameters(), lr=0.01)
# for input, target in dataset:
#     def closure():
#         optimizer.zero_grad() # Zero out gradients
#         output = model(input) # Push input through
#         loss = loss_fn(output, target) # Calculate loss
#         loss.backward() # Computer Backpropagation
#         return loss