# Workshop 5: Using PyTorch

In [None]:
import torch
import torch.nn as nn
import numpy as np
import torch.optim as optim

Last worksheet, we finally finished implementing our neural network. It worked, but the results could have been better. However, in order to get better results, we probably would have needed to test out different network architectures or implement several advanced techniqes. That would take a lot of work -- implementing just our basic network was difficult enough. Thankfully though, there is a way to implement neural networks quickly and easily: *Machine Learning frameworks*.

Using an ML framework simplifies the implementation of a neural network greatly. Frameworks will automatically derive all of the backpropagation calculations under without any help from the user. They usually also have pre-made implementations for different techniques, such as using special loss functions, different activation functions, different types of layers -- the list goes on. Oftentimes, the only thing a user has to do is specify the architecture of the model they want, set up the data pipeline, and train the network. No math necessary. Because of this, frameworks are super powerful; they are ubiquitous in academia and industry.

There are many great ML frameworks, but the one we will learn about here is PyTorch. It is popular and easy to use, while still being flexible enough to give the user as much control as they want. Today, we will use PyTorch to create a powerful network that will completely outclass the one we implemented from scratch.

## Torch Tensors

One of the most critical components of PyTorch is the *Torch Tensor*. For simplicity, you can think of the Torch Tensor as a fancy numpy array. The tensor is comprised of a multidimensional array, but it also stores important properties, such the previous mathematical operation applied to it (this is used for automatic gradient descent). Let's look at an example.

In [None]:
# We instantiate torch tensors by passing in a Numpy Array.
array1 = np.full((3, 4), 2)
A = torch.Tensor(array1)
print(A)
print(A.shape)
print(A[1, 2]) # we can index into torch tensors just like numpy arrays
print()

array2 = np.random.rand(3, 4)
B = torch.Tensor(array2)
B.requires_grad = True # we do this if we intend to calculate the gradients of this matrix
print(B)
print(B.shape)

Something interesting happens when we add these two tensors together:

In [None]:
C = A + B
print(C)

As we can see, torch saves the operation used to create C as "grad_fn". Now, lets try something new. Suppose we want  to minimize the first entry in C. It's sort of a stupid objective, but I want to illustrate a point.

In [None]:
loss = C[0, 0]
print(loss)

We can call the .backward() on a scalar tensor "loss" in order to calculate the gradients of "B" that could minimize it.

In [None]:
print(B.grad)

loss.backward()
print(B.grad)

Observe that the .backward() method updated B.grad. We can then use this newly calculated gradient as we wish.

In [None]:
B = B - 0.1 * B.grad
C = A + B
print(C[0, 0])

The first entry of C got smaller! That was sort of a toy example, but it shows how the autograd feature of PyTorch works. Let's tackle the MNIST problem now.

## Preparing the Data
In order to train a network in PyTorch, we will need to pass in our image and label data as tensors. Furthermore, we also have the option of "mini-batching" our data. For example, training with a mini-batch of size 20 means that we calculate the gradients for 20 image samples at a time, and then stepping in the average direction. This is different than how we trained originally, which was just stepping after calculating the gradient of one sample.

This time, we will use mini-batches of size 20. That means we must feed our data in as a 3 dimensional tensor. The dimensions of our training images should be (number of minibatches, mini-batch size, num_features) = (N, 20, 784). The dimensions of our labels should be (N, 20, 10).

In [None]:
def get_training_data():
    f = open('mnist_train.csv', 'r')
    
    lines = f.readlines()
    
    training_images = np.zeros((len(lines), 784))
    training_labels = np.zeros((len(lines), 10))
    index = 0
    for line in lines:
        line = line.strip()
        label = int(line[0])
        training_images[index, :] = np.fromstring(line[2:], dtype=int, sep=',')
        training_labels[index, label - 1] = 1.0
        index += 1
        

    f.close()
    
    # now, instantiate torch tensors
    training_images = torch.tensor(training_images, dtype=torch.float)
    training_labels = torch.tensor(training_labels, dtype=torch.float)
    
    # reshape for minibatch size 20
    # note that if num of total samples is not divisible by minibatch size, we may have to throw out some samples
    training_images = training_images.view(-1, 20, 784)
    training_labels = training_labels.view(-1, 20, 10)
    
    return training_images / 255, training_labels

def get_validation_data():
    f = open('mnist_test.csv', 'r')
    
    lines = f.readlines()
    
    val_images = np.zeros((len(lines), 784))
    val_labels = np.zeros((len(lines), 10))
    index = 0
    for line in lines:
        line = line.strip()
        label = int(line[0])
        val_images[index, :] = np.fromstring(line[2:], dtype=int, sep=',')
        val_labels[index, label - 1] = 1.0
        index += 1
        

    f.close()
    
    val_images = torch.tensor(val_images, dtype=torch.float)
    val_labels = torch.tensor(val_labels, dtype=torch.float)
    
    
    val_images = val_images.view(-1, 20, 784)
    val_labels = val_labels.view(-1, 20, 10)
    
    return val_images / 255, val_labels

## Creating a Model
So how do we create a neural network in pytorch? There are several options. We could use a very barebones approach and just instantiate our own tensors then run the autograd operation, sort of like we did above. However, a convenient and widely used approach would be to use torch.nn.Module. torch.nn.Module is an abstract class that we can inherit from to implement a neural network. All we have to do to use it is write the .\__ init__() method and override the .forward() method. Once we do that, we have a new network Class -- we will be able to instantiate and train our own networks.

Here is a basic network class as an example. Actually, this is a PyTorch implementation of the same network we made on our own.

In [None]:
class BasicNet(torch.nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        attributes.
        """
        super().__init__()
        self.linear1 = torch.nn.Linear(784, 12) # these are "linear layers", just like the ones we 
                                                # implemented from scratch.
        self.linear2 = torch.nn.Linear(12, 10)  # these layers each store a weight (and bias) tensor that 
                                                # can be updated (requires_grad=True)

        self.tanh = torch.nn.Tanh() # the tanh activation function
        self.sigmoid = torch.nn.Sigmoid() # the sigmoid activation
        
        
        
    def forward(self, x):
        """
        In the forward function, we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        
        x = self.linear1(x)
        x = self.tanh(x)
        x = self.linear2(x)
        x = self.sigmoid(x)
        
        return x

We now write a training loop to train our network. We have the option of using different loss functions and different optimizers. The optimizer provides a scheme with which to update the weights of the network. In the past, we used standard updating rule (step weights in the negative direction of the gradient). However, more sophisticated methods, such as involving the calculation of several moments of the gradient, have been shown to be more effective.

However, we will simply stick to the most basic optimizer today (stochastic gradient descent).
Observe how the training loop works:

In [None]:
def train(model, training_images, training_labels, val_images, val_labels, epoch=5, lr=0.01):
    """
    Trains a pytorch model using the given data.
    """
    
    
    optimizer = optim.SGD(model.parameters(), lr) # stochastic gradient descent. We must pass in the model's 
                                                  # updatable parameters into our optimizer.
    loss_function = nn.MSELoss() # we use mean squared loss, same as before
    
    for e in range(epoch):
        model.train() # sets the model to training mode
        for n in range(training_images.shape[0]):
            x = training_images[n] # select a batch of images
            y = training_labels[n] # select the corresponding batch of labels
            
            out = model(x) # this implicitly calls the .forward() method of model and returns the output
            
            loss = loss_function(out, y) # calculates loss given the output and label
            
            loss.backward() # fills gradient tensors for all tensors in each of the model's layers
            
            optimizer.step() # updates all parameters of the model
            
            optimizer.zero_grad() # Clears gradient tensors. This must be done after every update
        
        print("Epoch {} complete".format(e + 1))
        print("Evaluating the model...")
        evaluate(model, val_images, val_labels)
        print()
        
        
def evaluate(model, val_images, val_labels):
    model.eval() # sets the model to evaluation mode
    num_correct = 0
    total_num = val_labels.shape[0] * val_labels.shape[1]
    with torch.no_grad():
        for n in range(val_images.shape[0]):
            x = training_images[n] 
            y = training_labels[n]
            y = y.numpy()
            truth = np.argmax(y, axis=1)
            

            out = model(x)
            out = out.numpy()
            out = np.argmax(out, axis=1)
            results = np.equal(out, truth)
            num_correct += np.sum(results)
    print("    Percent correct: {}".format(num_correct * 100 / total_num))

In [None]:
training_images, training_labels = get_training_data()
val_images, val_labels = get_validation_data()

In [None]:
model = BasicNet()
train(model, training_images, training_labels, val_images, val_labels, epoch=10, lr=0.05)

Okay, so these results are about the same as the network we coded on our own, as we should expect. Now, try to make a new neural network class. This time, it should have an extra hidden layer. The first hidden layer should have 400 nodes, and the second hidden layer should have 100 nodes. Use the nn.Module abstract class. Reference the [torch.nn documentation](https://pytorch.org/docs/stable/nn.html) for the layer and activation modules.

In [None]:
class NeuralNet2(torch.nn.Module):
    def __init__(self):
        super().__init__()
        ### Add your code here
        
        ###
        
    def forward(self, x):
        ### Add your code here
        
        ###
        
        return x

In [None]:
# test here


You probably noticed how easy that was. Now, try using the Adam optimizer (reference the [optim documentation](https://pytorch.org/docs/stable/optim.html)) -- it is a more sophisticated optimizer that calculates several moments of the gradient. You can edit the existing train function or write another.

In [None]:
# work here


Finally, write another neural network, this time with 3 hidden layers. Use whatever activation functions you want (look in the torch documentation). You can decide how many nodes in each layer it should have. See how high you can make the validation accuracy.

In [None]:
# work here
