# PyTorch Tutorial

## PyTorch Tensors

PyTorch is a lot like numpy. A lot of operations used to manipulate numpy arrays have their counterparts in pytorch and numpy arrays can be converted to and from pytorch *tensors*. PyTorch arrays are given the more proper mathematical name of tensors (see e.g tensorflow), but for all intents and purposes these are simply multi-dimensional arrays. 

If you are familiar with numpy arrays then manipulating pytorch tensors should not be too unfamiliar to you. Below are some examples. Let us begin by importing torch and numpy.

In [3]:
import torch #import torch
import numpy as np

Generating a random array of size 2x2x2: NumPy vs PyTorch

In [32]:
# Numpy
numpy_random_arr = np.random.rand(2,2,2)
numpy_random_arr

array([[[0.26750508, 0.63507303],
        [0.72651604, 0.41289983]],

       [[0.8224605 , 0.03002775],
        [0.9270561 , 0.41149525]]])

In [33]:
# PyTorch
torch_random_arr = torch.rand(2,2,2)
torch_random_arr

tensor([[[0.2671, 0.4111],
         [0.3481, 0.4226]],

        [[0.2356, 0.2893],
         [0.7886, 0.5119]]])

They are indexed in the same way

In [23]:
torch_random_arr[0,0,0], numpy_random_arr[0,0,0]

(tensor(0.9665), 0.9300530825946917)

and can be reshaped using reshape functions ...

In [30]:
torch_random_arr = torch_random_arr.reshape(4,2)
print(torch_random_arr.size())

torch.Size([4, 2])


In [31]:
numpy_random_arr = np.reshape(numpy_random_arr, [4,2])
print(numpy_random_arr.shape)

(4, 2)


Converting to and from numpy arrays is easy:

In [38]:
a = np.array([[1,2],[3,4]]) # make a numpy array

a_torch = torch.from_numpy(a) #converting to a torch Tensor from a numpy array
print(a_torch) 

a_np = a_torch.numpy() # converting to a numpy array from a torch Tensor
print(a_np)

tensor([[1, 2],
        [3, 4]], dtype=torch.int32)
[[1 2]
 [3 4]]


Other basic functions such as torch.diag, torch.cat (concatenate), torch.matmul work similarly to their numpy equivalents. <br>
As always, when looking for a function **check the documentation**

Notice that printing the torch tensor also gave a dtype. Just like in numpy, the data type of an object is important.
PyTorch Tensors types - just like in any other programming language - depend on whether they are storing integers, floating points or bools, and in how many bits. Often, it is important to make sure tensors are of the right / matching type when performing operations on them.
<br>
See https://pytorch.org/docs/stable/tensors.html for a list of dtypes and what they are called in PyTorch.

Changing torch tensor type is simple too:


In [42]:
print(a_torch)
print(a_torch.to(torch.double)) #casts the int32 dtype tensor into a 64 bit float dtype tensor

tensor([[1, 2],
        [3, 4]], dtype=torch.int32)
tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)



## Task: Basic Tensor Operations

1. Generate two random numpy arrays, **a** and __b__ of sizes [12,5] and [3,5,20] 


2. Find the matrix product **a** $\cdot$ __b__. Result should be of shape (3,12,20) <br>
    *hint: may need to reshape a first*
    
    
3. Convert **a** and __b__ into PyTorch Tensors and repeat.



In [31]:
# Solution:
a = np.random.rand(12,5)
b = np.random.rand(3,5,20)
np.matmul(np.reshape(a, [1, 12,5]),b).shape

a_t = torch.from_numpy(a)
b_t = torch.from_numpy(b)

print(torch.matmul(a_t, b_t).size())

torch.Size([3, 12, 20])


As you might imagine, it is not the similarities between the two that we are interested in, but what makes torch Tensors relevant to machine learning. The most significant and relevant difference is that PyTorch Tensors also have an associated *gradient*. It is this that is used to perform the optimization that machine learning is based on. 

The gradient of a pytorch tensor is stored as its .grad attribute. All pytorch tensors have this even if it is not apparent nor used. In such a case it would be set to "None". 

All PyTorch tensors have another boolean attribute 'requires_grad' that indicates whether pytorch *needs* to track and store its gradient or whether it is simply a static tensor. By default, requires_grad is set to False.
When we later construct neural networks from the torch.nn neural network module, requires_grad will be automatically set to True for the relevant learning parameters so it is not something you should generally worry about setting manually.

In [36]:
a = torch.rand(5,5) # generate a random Tensor

print(a.grad) # check its gradient - the result is None

print(a.requires_grad) # check if its gradient is required - the result is False by default

a.required_grad = True # we can set it to True

print(a.grad) # but there is still no gradient currently stored, because we havent done anything with it yet 

None
False
None


## PyTorch NN Functions

In [58]:
import torch.nn as nn

The torch.nn module contains all the functions you will need to build a neural network. 
<br> This includes fully connected layers, convolutions, and pooling operations. 


It is well documented and easy to read: **See for Yourself** https://pytorch.org/docs/stable/nn.html

We will go over a few nn modules and then use what we have learned to build an MLP.

Let us begin with a 2d Convolution: one of the most common and important functions used in image processing and deep learning in general. We will be generating a 3D 'image' array (3 input channels, 100 x 100 pixels) and performing a 2d Convolution with stride = 1, kernel size = 3x3 and 2 output channels

In [68]:
input_image = torch.randint(0, 255, (1, 3,100,100)) # our random image. 
# the first dimension has size N where N is the number of images. here it is simply 1

operation = nn.Conv2d(in_channels = 3,out_channels = 2, kernel_size = 3) 
# building our conv operation. note that we did not need to specify the names of the parameters. nn.Conv2d(3,2,3) is sufficient

In [69]:
print(operation) #we can see our convolution operation by printing it

Conv2d(3, 2, kernel_size=(3, 3), stride=(1, 1))


In [71]:
result = operation(input_image)

RuntimeError: _thnn_conv2d_forward is not implemented for type torch.LongTensor

The operation fails as it cannot work on integer tensors. Let us convert it into a float tensor first


In [74]:
input_image = input_image.to(torch.float)
result = operation(input_image)
print(result)

tensor([[[[  84.3498,   63.9077,  -17.0368,  ...,  -19.8883,  146.8244,
            116.4487],
          [ 114.1581,   21.6166,   39.4248,  ...,  120.4760,   40.2672,
            124.1527],
          [  51.4259,   88.8663,   19.4072,  ...,   27.4615,   47.9384,
             31.0113],
          ...,
          [  69.9386,   80.1651,   97.4229,  ...,   84.3317,   62.7600,
             77.0753],
          [  59.2197,   80.6550,   56.2093,  ...,   59.6583,   22.5126,
             88.3077],
          [   4.3444,   -7.2992,  145.9830,  ...,   75.7843,   45.3026,
             47.7600]],

         [[ -25.0516,   20.1285, -112.5526,  ...,  -54.0992,  -79.3677,
            -69.5192],
          [ -12.9119,   16.2892,   18.5945,  ...,  -45.2710,  -59.9804,
             42.8030],
          [ -46.3792,  -84.8874,   21.3314,  ...,  -43.0982, -112.4890,
            -54.4308],
          ...,
          [ -65.6272,   23.0787,   17.7784,  ...,  -90.7418, -109.8219,
            -10.4001],
          [ -35.52

We can see that we have our result of a 2d Convolution of our image with some randomly generated kernel. What if we wanted to know what that kernel actually is? 

In [76]:
for name, param in operation.named_parameters(): # for each named parameter
    print(name, param.data)

weight tensor([[[[-0.0503, -0.0771,  0.0385],
          [ 0.0618, -0.0579,  0.1223],
          [ 0.1245,  0.0250,  0.0932]],

         [[-0.1685,  0.0984,  0.1322],
          [-0.1506, -0.1639,  0.1860],
          [-0.0761,  0.1491, -0.1911]],

         [[-0.1289,  0.0783,  0.1418],
          [-0.1170,  0.1398,  0.1340],
          [ 0.1355, -0.0257,  0.0179]]],


        [[[ 0.1175,  0.0906, -0.1641],
          [-0.1250, -0.1894,  0.1588],
          [-0.0745, -0.0957,  0.0403]],

         [[ 0.1258,  0.0379,  0.1697],
          [-0.0464,  0.0835, -0.0119],
          [-0.1185,  0.1465, -0.0537]],

         [[-0.1021, -0.1557,  0.0275],
          [-0.0224, -0.0183,  0.1036],
          [-0.1804, -0.0144, -0.0969]]]])
bias tensor([-0.1075,  0.0405])


Above we can see that our convolution weight tensor is of shape [2,3,3] and has a bias of shape [2].

We can do the same for a fully connected Linear layer, which can be found in the torch.nn module under the function Linear(). Its parameters are the number of input features and the number of output features.
We will use 3x100x100 = 300000 input features and 10 output features.

In [81]:
fc_operation = nn.Linear(30000, 10) # defining our fully connected Linear layer

reshaped_input_image = input_image.reshape(input_image.size(0), -1) #reshaping input image 

result = fc_operation(reshaped_input_image) 

print(result)

tensor([[ 136.4518,    1.0370,  -72.2169,   -6.4639,   70.8948,   27.0052,
          -59.5169,    6.4923, -153.2144,  -13.1952]], grad_fn=<AddmmBackward>)


Notice that we used a strange way to reshape input_image. We specified the dimensions as (input_image.size(0), -1).

Remember that the first dimension indicates the number of images. In this simple example there is only one, but in real examples, there can be a variable number of images used in each batch. So by calling size(0) we are keeping this the same. Obviously -1 is invalid as an actual dimension size. It is a very useful feature that tells pytorch to, essentially, figure out itself what should be the corresponding size of this dimension given the input. 

## Task: MaxPooling in 2D.

Generate an array of 5 images which have 3 channels are of size (100 x 100) and perform a 2D maxpool on the images using PyTorch. Your max pooling operation should have:

1. filter size 3x3, stride = 1 x 1

2. filter size 4 x 2, stride = 2 x 2


In [92]:
#solution

random_ims = torch.randint(0, 255, (5,3,100,100)).to(torch.float)

maxpoolop = nn.MaxPool2d(3,1) 
print(maxpoolop)
r = maxpoolop(random_ims)
print(r)


maxpoolop = nn.MaxPool2d((4,2),2) 
print(maxpoolop)
r = maxpoolop(random_ims)
print(r)

MaxPool2d(kernel_size=3, stride=1, padding=0, dilation=1, ceil_mode=False)
tensor([[[[229., 235., 235.,  ..., 245., 245., 245.],
          [242., 242., 235.,  ..., 245., 245., 245.],
          [242., 242., 222.,  ..., 245., 245., 245.],
          ...,
          [247., 241., 241.,  ..., 246., 246., 246.],
          [241., 241., 241.,  ..., 246., 246., 246.],
          [241., 241., 241.,  ..., 246., 246., 246.]],

         [[248., 170., 188.,  ..., 249., 249., 245.],
          [222., 216., 216.,  ..., 249., 249., 220.],
          [216., 216., 216.,  ..., 246., 220., 220.],
          ...,
          [248., 233., 233.,  ..., 191., 231., 231.],
          [248., 236., 220.,  ..., 245., 240., 240.],
          [247., 236., 220.,  ..., 245., 240., 240.]],

         [[246., 252., 252.,  ..., 254., 254., 250.],
          [246., 252., 252.,  ..., 254., 250., 250.],
          [246., 252., 252.,  ..., 254., 197., 150.],
          ...,
          [237., 237., 237.,  ..., 247., 247., 247.],
          [2

## Data Loading

Before we can use a 

## Optimizers

### criterion and optimizer

to(device)


# MLP Example