# PyTorch Basics
This notebook summarizes the content covered in the "60 Minute Blitz Tutorial" of PyTorch that covers the basics of PyTorch library.

## Tensors
Tensors are basically like stronger numpy arrays. These are special arrays that can do math very quick (needed for neural networks) and can run on CPU, GPU or even specialized TPUs (hardware designed for neural network training from the ground up). In fact tensors and numpy arrays are so alike, they are usually connected through a bridge that allows conversion between the two as they can share the same underlying memory location.

In [1]:
# Import stuff
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np

### Tensor Initialization
Tensors can be initialized in a ton of different ways, to allow flexibility depending on what the ML pipeline can be.

In [2]:
# From slower native lists
data = [[1,2], [3,4]]
data_tensor = torch.tensor(data)

# From numpy arrays
data_numpy = np.array(data)
data_tensor_2 = torch.tensor(data)

# Similar to numpy they have a bunch of random or fixed value initializations
shape = (2, 3,)
# Init random values
torch.rand(shape)
# Init ones
torch.ones(shape)
# Init zeros
torch.zeros(shape)

tensor([[0., 0., 0.],
        [0., 0., 0.]])

### Tensor Attributes
Each tensor that's created has few information associated with it.

In [3]:
test = torch.ones(shape)

print(test.shape) # dimensions of tensor
print(test.device) # where the tensor is stored now, always created on CPU by default
print(test.dtype) # Data type of the tensor

torch.Size([2, 3])
cpu
torch.float32


### Tensor Operations
There are some tensor-specific operations that can be performed. Some of them are essentially the same as numpy operations with one exception below.

In [4]:
test_gpu = test.to('cuda') # move the tensor to GPU for faster calculation
print(test_gpu.device)

cuda:0


## AutoGrad

AutoGrad is one of the critical components that allow PyTorch to build neural networks. As you might know every neural network consists of two key steps.

*Forward Propagation:* Model uses data and its current parameters to make a guess of the end objective (a number for regression, label for classification and so on).

*Backward Propagation:* Based on a loss function that tells us how bad our model's guesses were, this tries to get derivatives of the loss function with respect to the parameters to update said parameters by a small value that is opposite to the greatest ascent (hence obviously called gradient descent) multiplied with the learning rate.

AutoGrad helps to achieve the derivative needed that makes backward propagation possible for neural networks to "Learn".

Let consider the following polynomial function as a sample loss function.

$Q = 3a^3 - b^2$

We can see that loss function $Q$ is dependent on two parameters: $a, b$. Therefore the partial derviatives of the function with respect to the two parameters be.

$\frac{\partial Q}{\partial a} = 9a^2$ and $\frac{\partial Q}{\partial b} = -2b$

### Quirk with PyTorch
PyTorch splits the entire forward propagation into two steps. Calling `.backward()` calculates the needed gradients based on the loss function and the model used, while the `.step()` function of the optimizer actually applies the update according to the optimization algorithm used (Gradient descent, ADAM, AdaGrad, etc).

Let's check this in code.

In [5]:
a = torch.tensor([2. ,3.], requires_grad=True)
b = torch.tensor([6. ,4.], requires_grad=True)

Q = 3*a**3 - b**2

In [6]:
# Since Q is a vector and not a scalar, we need to specify the gradient to perform the backward pass
Q.backward(gradient=torch.tensor([1., 1.])) # gradient is the same shape as Q and has 1 cause gradient of Q w.r.t Q is 1

Now that `.backward()` is invoked, all the gradients are stored in the `.grad` of each of the parameter that was involved in the back prop. We can confirm this by a simple assertion. `.allclose()` is a function provided by PyTorch to see if both values are close enough to be considered equal.

In [7]:
assert torch.allclose(a.grad, 9*a**2) and torch.allclose(b.grad, -2*b) # check if the gradients are correct

### Looks like magic, How does it work?
The moment we mark tensors $a \& b$ as requiring gradient, PyTorch intelligently starts creating a computation graph for them and every tensor that is created because of them, right until the final tensor ($Q$ here). This naturally creates the forward pass where we go from input to loss function.

The backward prop kicks off when we call `.backward()` on the root of the computational graph, which computes the gradient for each function and accumulates them in the respective tensors `.grad` property, all the way to the leaf. (Using Chain Rule)

### When's the fun stuff coming?
Right now, actually! We will create a simple *Convolutional Neural Network* using the PyTorch API that can be used to predict digit present in an image. Assume our image is a grayscale image of size 32x32 (single channel). More on their functionality later!

In [9]:
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        # We define the layers of the network here
        # 1 input image channel, 6 output channels, 5x5 square convolutions, 0 padding
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    
    def forward(self, input):
        # The forward pass of the network is defined here
        # First convolution output
        c1 = F.relu(self.conv1(input))
        # Max pooling
        s2 = F.max_pool2d(c1, (2, 2))
        # Second convolution output
        c3 = F.relu(self.conv2(s2))
        # Second max pooling
        s4 = F.max_pool2d(c3, 2)
        # Flatten the tensor for dense part of the CNN
        s4_flat = torch.flatten(s4, 1)
        # Connect the dense layers
        f5 = F.relu(self.fc1(s4_flat))
        f6 = F.relu(self.fc2(f5))
        # Final output layer
        output = self.fc3(f6)
        return output

convNet = ConvNet()
print(convNet)

ConvNet(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
