# PyTorch 

PyTorch is another Machine Learning Framework, similar in many ways to tensorflow but with a few key differences.

 - PyTorch does not support `function` compilation in the same way that tensorflow does
 - PyTorch generally uses less memory than Tensorflow
 - PyTorch preserves a more `numpy`-like interface
 
 More information about pytorch can be found here: https://pytorch.org/
 
 In this short notebook, we'll cover the same topics as in tensorflow but in pytorch.

In [1]:
import torch


In [2]:
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
del tf

2022-09-28 15:38:39.430718: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-28 15:38:39.638140: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [3]:
x_train.shape

(50000, 32, 32, 3)

In [8]:
batch_size=5000
batch_data = x_train[0:batch_size].transpose((0,3,1,2))
batch_labels = y_train[0:batch_size]

In [9]:
batch_data = torch.Tensor(batch_data)
batch_labels = torch.Tensor(batch_labels).long()

In [10]:
print(batch_labels.shape)
print(batch_labels.dtype)

torch.Size([5000, 1])
torch.int64


## Creating Models

Pytorch's `nn` package allows an object-oriented way to create models, just like in tensorflow.  There is also a functional API that works similarily

In [11]:

class ResidualBlock(torch.nn.Module):

    def __init__(self):
        # Call the parent class's __init__ to make this class functional with training loops:
        super().__init__()
        self.conv1  = torch.nn.Conv2d(in_channels=16, out_channels=16, kernel_size=[3,3], padding=[1,1])
        self.conv2  = torch.nn.Conv2d(in_channels=16, out_channels=16, kernel_size=[3,3], padding=[1,1])

    def forward(self, inputs):
    
        # Apply the first weights + activation:
        outputs = torch.nn.functional.relu(self.conv1(inputs))
        # Apply the second weights:

        outputs = self.conv2(outputs)

        # Perform the residual step:

        outputs = outputs + inputs

        # Second activation layer:
        return torch.nn.functional.relu(outputs)



In [12]:
class MyModel(torch.nn.Module):
    
    def __init__(self):
        # Call the parent class's __init__ to make this class functional with training loops:
        super().__init__()
        
        self.conv_init = torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=1)
        
        self.res1 = ResidualBlock()
        
        self.res2 = ResidualBlock()
        
        # 10 filters for each class:
        self.conv_final = torch.nn.Conv2d(in_channels=16, out_channels=10, kernel_size=1)
        
        self.pool = torch.nn.AvgPool2d(32,32)
        
    def forward(self, inputs):
        
        x = self.conv_init(inputs)
        
        x = self.res1(x)
        
        x = self.res2(x)
        
        x = self.conv_final(x)
        
        return self.pool(x).reshape((-1,10))

In [13]:
model = MyModel()

In [14]:
print(batch_labels.shape)

torch.Size([5000, 1])


In [15]:
logits = model(batch_data)
print(logits.shape)

torch.Size([5000, 10])


In [16]:
loss = torch.nn.functional.cross_entropy(logits, batch_labels.flatten())

In [17]:
print(loss)

tensor(54.8145, grad_fn=<NllLossBackward0>)


In [18]:
gradients = torch.autograd.grad(loss, model.parameters())

In [19]:
print(gradients)

(tensor([[[[ -6.1680]],

         [[ -6.4833]],

         [[ -6.0052]]],


        [[[  1.6117]],

         [[  1.4966]],

         [[  1.4272]]],


        [[[  8.7620]],

         [[  8.5471]],

         [[  7.1663]]],


        [[[ 12.0247]],

         [[ 11.7343]],

         [[ 10.5113]]],


        [[[ 32.6196]],

         [[ 32.1299]],

         [[ 29.2971]]],


        [[[-26.3559]],

         [[-25.1373]],

         [[-23.1878]]],


        [[[-13.4713]],

         [[-13.2587]],

         [[-10.9752]]],


        [[[-25.3698]],

         [[-24.4165]],

         [[-22.2656]]],


        [[[  1.6860]],

         [[  1.1964]],

         [[  0.5747]]],


        [[[ -1.0497]],

         [[ -0.6729]],

         [[ -0.4048]]],


        [[[  2.7059]],

         [[  2.6041]],

         [[  2.5760]]],


        [[[  0.8066]],

         [[  0.7687]],

         [[  0.6334]]],


        [[[ 25.0398]],

         [[ 25.4663]],

         [[ 24.4834]]],


        [[[-23.4840]],

         [[-2

In [20]:
input_grads = torch.autograd.grad(loss, batch_data)

RuntimeError: One of the differentiated Tensors does not require grad

In [21]:
logits = model(batch_data.requires_grad_())
loss = torch.nn.functional.cross_entropy(logits, batch_labels.flatten())

In [22]:
input_grads = torch.autograd.grad(loss, batch_data)[0]

In [23]:
input_grads

tensor([[[[-2.3552e-09,  5.9672e-10,  1.0317e-10,  ...,  5.7760e-09,
           -1.3224e-10, -1.6840e-09],
          [-6.6007e-09, -4.9613e-08,  1.0412e-08,  ...,  2.0474e-08,
            1.5374e-08,  2.4686e-08],
          [-7.4042e-09,  2.3268e-08, -9.5472e-10,  ...,  2.4082e-08,
            2.3169e-08,  1.5996e-08],
          ...,
          [ 1.6310e-09,  8.6019e-10,  5.2292e-09,  ...,  2.3163e-08,
            1.8099e-08,  1.5371e-08],
          [ 2.0745e-09,  1.2779e-09,  1.0442e-09,  ..., -2.2714e-09,
            9.0985e-09,  9.1185e-09],
          [ 9.8262e-09,  5.8726e-09,  5.4668e-09,  ...,  1.2757e-08,
            4.1780e-09,  3.6002e-09]],

         [[ 1.1858e-07,  1.2480e-07,  1.2719e-07,  ...,  1.1823e-07,
            1.1650e-07,  1.0899e-07],
          [ 1.1154e-07,  5.1998e-08,  8.7073e-08,  ...,  9.2989e-08,
            9.1718e-08,  9.4365e-08],
          [ 1.2099e-07,  6.2236e-08,  1.1860e-07,  ...,  9.1051e-08,
            8.3962e-08,  9.3126e-08],
          ...,
     

## Pytorch Performance

Here's the same function, for an identical model, that was in the tensorflow notebook:

In [24]:
def gradient_step():
    logits = model(batch_data)
    loss = torch.nn.functional.cross_entropy(logits, batch_labels.flatten())
    gradients = torch.autograd.grad(loss, model.parameters())
    return gradients

In [25]:
%timeit gradient_step()

657 ms ± 40.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [36]:
%timeit gradient_step()

673 ms ± 61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


As you can see, it is significantly slower.  However, for larger input sizes and models pytorch is quite competitive with Tensorflow, and sometimes faster.  Pytorch also has JIT functionality, but it does not make the same improvements as Tensorflow:

In [32]:
traced_module = torch.jit.trace_module(model, inputs={"forward" : batch_data})

In [33]:
%timeit traced_module(batch_data)

211 ms ± 368 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [35]:
%timeit model(batch_data)

211 ms ± 646 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
