# PyTorch 

[PyTorch](https://pytorch.org/) is another Machine Learning Framework, similar in many ways to TensorFlow but with a few key differences.

 - PyTorch does not support `function` compilation in the same way that TensorFlow does
 - PyTorch generally uses less memory than TensorFlow
 - PyTorch preserves a more `numpy`-like interface
 
 More information about pytorch can be found here: https://pytorch.org/
 
 In this short notebook, we'll cover the same topics as before in the [TensorFlow notebook](https://github.com/argonne-lcf/sdl_workshop/blob/learningFrameworks/learningFrameworks/TensorFlow.ipynb), but this time in PyTorch.
 
 This document This document is not meant to be a [PyTorch tutorial](https://pytorch.org/tutorials/) - instead, this is meant to inform you of the core concepts of using PyTorch on Polaris, assuming you have some familiarity with PyTorch already.

In [None]:
import torch

Here we temporarilgy load TensorFlow in order to import the [cifar10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html). The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

In [None]:
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
del tf

In [None]:
# Again we will work with a batch of 10% of the data
batch_size=5000
batch_data = x_train[0:batch_size].transpose((0,3,1,2)) # permute the axes
batch_labels = y_train[0:batch_size]

In [None]:
print(x_train.shape)
print(batch_data.shape)

In [None]:
batch_data = torch.Tensor(batch_data)
batch_labels = torch.Tensor(batch_labels).long()

In [None]:
print(x_train.dtype)
print(batch_data.dtype)
print()

print(y_train.dtype)
print(batch_labels.dtype)

## Creating Machine Learning Models

PyTorch's `nn` package allows an object-oriented way to create models, just like Keras in TensorFlow. There is also a [functional API](https://pytorch.org/docs/stable/index.html) that works similarily. For example, building a few layers of a [ResNet](https://doi.org/10.48550/arXiv.1512.03385)-like model can be done like so:

In [None]:
class ResidualBlock(torch.nn.Module):

    def __init__(self):
        # Call the parent class's __init__ to make this class functional with training loops:
        super().__init__()
        self.conv1  = torch.nn.Conv2d(in_channels=16, out_channels=16, kernel_size=[3,3], padding=[1,1])
        self.conv2  = torch.nn.Conv2d(in_channels=16, out_channels=16, kernel_size=[3,3], padding=[1,1])

    def forward(self, inputs):
    
        # Apply the first weights + activation:
        outputs = torch.nn.functional.relu(self.conv1(inputs))
        
        # Apply the second weights:
        outputs = self.conv2(outputs)

        # Perform the residual step:
        outputs = outputs + inputs

        # Second activation layer:
        return torch.nn.functional.relu(outputs)



In [None]:
class MyModel(torch.nn.Module):
    
    def __init__(self):
        # Call the parent class's __init__ to make this class functional with training loops:
        super().__init__()
        
        self.conv_init = torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=1)
        
        self.res1 = ResidualBlock()
        
        self.res2 = ResidualBlock()
        
        # 10 filters, one for each possible label (classification):
        self.conv_final = torch.nn.Conv2d(in_channels=16, out_channels=10, kernel_size=1)
        
        self.pool = torch.nn.AvgPool2d(32,32)
        
    def forward(self, inputs):
        
        x = self.conv_init(inputs)
        
        x = self.res1(x)
        
        x = self.res2(x)
        
        x = self.conv_final(x)
        
        return self.pool(x).reshape((-1,10))

In [None]:
model = MyModel()

In [None]:
print(model)
_num_trainable_parameters = sum(p.numel() for p in model.parameters() if p.requires_grad)
print("Number of Trainable Parameters: {:d}".format(_num_trainable_parameters))

# Automatic Differentiation

The big advantage of the Machine Learning Frameworks is automatic differentiation.  PyTorch supports automatic differentiation through the automatic differentiation package the `torch.autograd`:

In [None]:
logits = model(batch_data)
print(logits.shape)

In [None]:
print(batch_labels.shape)

In [None]:
loss = torch.nn.functional.cross_entropy(logits, batch_labels.flatten())

In [None]:
print(loss)

In [None]:
gradients = torch.autograd.grad(loss, model.parameters())

In [None]:
for i, p in enumerate(model.parameters()):
    print(gradients[i].shape)

In [None]:
input_grads = torch.autograd.grad(loss, batch_data)

In [None]:
logits = model(batch_data.requires_grad_())
loss = torch.nn.functional.cross_entropy(logits, batch_labels.flatten())

In [None]:
input_grads = torch.autograd.grad(loss, batch_data)[0] # <-- returns tuple with input gradients as only member

In [None]:
print(batch_data.shape)
print(input_grads.shape)

print(input_grads[0,:,:,:])

## PyTorch Performance

Here's the same gradient step function using an identical model that was in the TensorFlow notebook:

In [None]:
def gradient_step():
    logits = model(batch_data)
    loss = torch.nn.functional.cross_entropy(logits, batch_labels.flatten())
    gradients = torch.autograd.grad(loss, model.parameters())
    return gradients

In [None]:
%timeit gradient_step()

As you can see, it is significantly slower.  However, for larger input sizes and models PyTorch is quite competitive with TensorFlow, and sometimes faster.  PyTorch also has JIT functionality, but it does not make the same improvements as TensorFlow:

In [None]:
traced_module = torch.jit.trace_module(model, inputs={"forward" : batch_data})

In [None]:
%timeit traced_module(batch_data)

In [None]:
%timeit model(batch_data)