## Demystifying PyTorch, One Bug at a Time! - Supervised Learning II - MDS Computational Linguistics

### Goal of this tutorial
- Debug some erroneous PyTorch snippets

###  General
- This notebook was last tested on Python 3.6.9, PyTorch 1.2.0 and Matplotlib 3.1.2

### Debugging

In this tutorial, we will look at some PyTorch snippets that contain errors, which we will try to debug. Some of the resources useful for debugging include:
- [PyTorch Official Documentation](https://pytorch.org/docs/stable/index.html) contains well-documented information about different classes, functions supported by PyTorch, with many self-contained examples
- [Google search](https://www.google.com/), entry point for other resources in many cases
- [PyTorch Discussion Forum](https://discuss.pytorch.org/), official place to "discuss PyTorch code, issues, install, research"
- [Stackoverflow](https://stackoverflow.com/questions/tagged/pytorch), popular q&a website for programming in general
- and probably more.


#### Snippet #1

Consider the following PyTorch snippet:

In [1]:
# load torch module
import torch

# create a linear layer
layer = torch.nn.Linear(10, 5)

# create a sample input of size 5x11
input = torch.randn(5, 11)

# pass the sample input to the layer to calculate the output
output = layer(input)

# print the calculated output 
print(output) 

RuntimeError: size mismatch, m1: [5 x 11], m2: [10 x 5] at /Users/distiller/project/conda/conda-bld/pytorch_1565272679438/work/aten/src/TH/generic/THTensorMath.cpp:752

#### Snippet #1 Discussion: Size mismatch

When the above snippet is executed, you should get ``RuntimeError: size mismatch, m1: [5 x 11], m2: [10 x 5]`` in the error stacktrace. Observe the following in the stacktrace:
- The line ``output = layer(input)`` that resulted in the error is highlighted as expected. Further, we can drill down the error to the level of the PyTorch internal logic, ending at ``ret = torch.addmm(bias, input, weight.t())``. From the PyTorch documentation for [torch.addmm](https://pytorch.org/docs/stable/generated/torch.addmm.html#torch.addmm) function, we can see that the function performs matrix multiplication of the second matrix (``input``) with the third matrix (``weight.t()``) and adds the result with the first matrix (``bias``). The error can be in the matrix multiplication or addition.
- The runtime error ``size mismatch, m1: [5 x 11], m2: [10 x 5]`` talks about issues in performing operation of **m1** (which is 5x11, our sample input) and **m2** (which is 10x5, weights of the linear layer). Since bias matrix for our linear layer is 5x1, we can conclude that the issue is with performing matrix multiplication of ``input`` and ``weight`` that do not respect the rules of matrix multiplcation, that is, (n x m) x (m x p). To overcome this error, we need to set the second dimension of the sample input to match the number of input features the linear layer accepts (which is 10). Thus, we need to change the line ``input = torch.randn(5, 11)`` to ``input = torch.randn(5, 10)``.

Alternatively, we can look into the documentation of [torch.nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html?highlight=nn%20linear#torch.nn.Linear), which provides an example for constructing linear layer and feeding sample input to it. 

The debugged snippet is:

In [2]:
# load torch module
import torch

# create a linear layer
layer = torch.nn.Linear(10, 5)

# create a sample input of size 5x10
input = torch.randn(5, 10)

# pass the sample input to the layer to calculate the output
output = layer(input)

# print the calculated output 
print(output) # 5x5

tensor([[-0.4165, -0.0999,  0.0170, -0.5031, -0.2180],
        [-0.4864,  0.0954, -0.2751,  0.5822, -0.1681],
        [-0.4231,  0.1031,  1.0003, -0.4724, -1.1671],
        [ 0.1133, -0.2623, -0.4408, -0.4030,  0.6620],
        [-0.1106,  0.2542, -0.6437, -0.5557,  0.1015]],
       grad_fn=<AddmmBackward>)


#### Snippet #2

Consider the following PyTorch snippet:

In [3]:
# create the cross entropy loss 
loss = torch.nn.CrossEntropyLoss()

# create a sample prediction tensor of size 3x5 (3 inputs x 5 outputs)
prediction = torch.randn(3, 5)

# create a sample target tensor of size 1x3 (one target for each input)
target = torch.tensor([1.0, 2.0, 3.0])

# calculate the loss between the prediction and the target
output = loss(prediction, target)

# call backward to calculate the gradients
output.backward() 

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target'

#### Snippet #2 Discussion: (A) Expected object of a specific type

When the above snippet is executed, you should get ``Expected object of scalar type Long but got scalar type Float for argument #2 'target`` in the error stacktrace at the line ``output = loss(prediction, target)``. Note that most of the Pytorch error messages are informative enough to let us easily spot the error. From the error message, it seems that the second argument to the loss function shouldn't be **Float**, but it should be **Long**. We can confirm this by looking at the description for the second argument in the PyTorch documentation for the [torch.nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html?highlight=nn%20crossentropyloss#torch.nn.CrossEntropyLoss) function, which states that the value of the second argument should contain class indices, which are **long** in general. To solve this error, we can drop the decimal points of the target tensor from ``torch.tensor([1.0, 2.0, 3.0])`` to ``torch.tensor([1, 2, 3])``.

Let's see if this change removes the bug in the snippet completely:

In [4]:
# create the cross entropy loss 
loss = torch.nn.CrossEntropyLoss()

# create a sample prediction tensor of size 3x5 (3 inputs x 5 outputs)
prediction = torch.randn(3, 5)

# create a sample target tensor of size 1x3 (one target for each input)
target = torch.tensor([1, 2, 3])

# calculate the loss between the prediction and the target
output = loss(prediction, target)

# call backward to calculate the gradients
output.backward() 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

#### Snippet #2 Discussion: (B) Computational graph without gradient buffer

After making the change, we get the error ``element 0 of tensors does not require grad and does not have a grad_fn
`` in the next line of the snippet: ``output.backward()``. This error message again is informative as it indicates that the computational graph does not contain any tensor that requires gradients. To overcome this error, we should explicitly set ``requires_grad`` to ``True`` for the prediction tensor. 

The debugged snippet is as follows:

In [5]:
# create the cross entropy loss 
loss = torch.nn.CrossEntropyLoss()

# create a sample prediction tensor of size 3x5 (3 inputs x 5 outputs)
prediction = torch.randn(3, 5, requires_grad=True)

# create a sample target tensor of size 1x3 (one target for each input)
target = torch.tensor([1, 2, 3])

# calculate the loss between the prediction and the target
output = loss(prediction, target)

# call backward to calculate the gradients
output.backward() 

#### Snippet #3

Consider the following PyTorch snippet:

In [6]:
# create a simple lookup table that stores 5 embeddings each of size 3
embedding = torch.nn.Embedding(5, 3)

# print the lookup table
print(embedding.weight)

# print the second and third embeddings
print(embedding(torch.LongTensor([[1,2]])))

# print the fifth and sixth embeddings
print(embedding(torch.LongTensor([[4,5]])))

Parameter containing:
tensor([[ 0.5152,  0.3221,  0.8535],
        [-0.5248, -1.1462, -1.0750],
        [-0.8332,  1.5889, -0.0585],
        [ 1.3809, -1.4966, -0.2269],
        [ 0.5363,  0.0987,  0.8901]], requires_grad=True)
tensor([[[-0.5248, -1.1462, -1.0750],
         [-0.8332,  1.5889, -0.0585]]], grad_fn=<EmbeddingBackward>)


RuntimeError: index out of range: Tried to access index 5 out of table with 4 rows. at /Users/distiller/project/conda/conda-bld/pytorch_1565272679438/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:237

#### Snippet 3 Discussion: Out of Range Index

When the above snippet is executed, we get the error ``index out of range: Tried to access index 5 out of table with 4 rows. `` at the line ``print(embedding(torch.LongTensor([[4,5]])))``. 

Before debugging this, it's important to understand the basics of a lookup table from the documentation for [torch.nn.Embedding](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html?highlight=nn%20embedding#torch.nn.Embedding). It seems that the input to the lookup table should be valid indices to extract (in our case, 0 <= index <= 4).

Accordingly, the error message clearly states that the line ``print(embedding(torch.LongTensor([[4,5]])))`` attempts to access invalid index 5 from the table with 4 rows (caution: all the indices are 0-indexed). To solve this error, we have to omit the invalid index 5 from the problematic line.

After debugging, the snippet becomes:

In [7]:
# create a simple lookup table that stores 5 embeddings each of size 3
embedding = torch.nn.Embedding(5, 3)

# print the lookup table
print(embedding.weight)

# print the second and third embeddings
print(embedding(torch.LongTensor([[1,2]])))

# print the fifth embedding only
print(embedding(torch.LongTensor([[4]])))

Parameter containing:
tensor([[-0.0995, -0.6146, -1.8983],
        [-0.5602,  1.3285,  0.7369],
        [ 1.5553, -0.8091,  0.0447],
        [-0.6073,  1.0819, -0.1459],
        [ 0.1411,  0.9189, -0.1612]], requires_grad=True)
tensor([[[-0.5602,  1.3285,  0.7369],
         [ 1.5553, -0.8091,  0.0447]]], grad_fn=<EmbeddingBackward>)
tensor([[[ 0.1411,  0.9189, -0.1612]]], grad_fn=<EmbeddingBackward>)


That's it!