<a href="https://colab.research.google.com/github/amitkag85/CKAD-exercises/blob/master/1_Intro_to_Pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction
This is meant to be a fast paced introduction to machine learning using Pytorch. The goal is to get familiar with the basic aspects of building a model and train it on a supervised learning task.

In [1]:
import torch

## Tensors

[Tensors](https://pytorch.org/docs/stable/tensors.html) are (multidimensional) arrays, and are the core data type used in deep learning.

In [2]:
# create a new tensor out of existing data
t = torch.tensor([1,3,3,7])
print(t)
# get the the dimensions of a tensor
print(t.size())
# initialize a tensor of all ones of a particular size
t = torch.ones(2, 3, 4)
print(t)
# `torch.Size` can be unpacked
depth, height, width = t.size()
print('depth:', depth, 'height:', height, 'width:', width)
# tensors can be addressed just like numpy arrays
print(t[0].size())
# random, normally distributed tensors of a given size are also easy
t = torch.randn(2,3)
print(t)

tensor([1, 3, 3, 7])
torch.Size([4])
tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])
depth: 2 height: 3 width: 4
torch.Size([3, 4])
tensor([[ 0.4478,  0.2252, -0.3283],
        [-1.5603, -1.2793,  0.6241]])


## Modules

Torch uses [Modules](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) as the brimary building block for models. A `Module` can represent a layer or an entire model (models can be used as layers within other models).

[torch.nn](https://pytorch.org/docs/stable/nn.html) is the primary library defining the most comong building blocks for most neural networks.

In [None]:
# all basic layers/modules are defined in torch.nn
import torch.nn as nn

# a `Linear` layer (also called fully connected, or `Dense` in Keras)
# is just y = Wx + b. `Linear` takes the input dimension, output dimensions
# whether or not to include a bias term (default True)
m = nn.Linear(2, 3)
print(m)
print('weight matrix:', m.weight.size())
print('bias vector', m.bias.size())
# you can create one without a bias as well
m = nn.Linear(2, 3, bias=False)
print(m)
print('weight matrix:', m.weight.size())
print('bias vector', m.bias)

## Simple Model: Multilayer Perceptron

We'll define a simple multilayer perceptron ([MLP](https://en.wikipedia.org/wiki/Multilayer_perceptron) a.k.a. feedforward network) with one hidden layer. This will be a subclass of `torch.nn.Module`, which requires:
* `__init__` with a call to `super().__init__()`
* `forward()` which defines how to run the model given an input

We will use the container [torch.nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential) to glue all of our layers into a single module that passes the output from the previous step as the input to the next. Note that you must ensure that the output dimensions of the previous layer match the input dimensions of the next.

The main component which makes neural networks able to learn non-linear functions are the [activations](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) places between the various layers. Since this model is very small and simple, we will be using [torch.nn.Sigmoid](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html#torch.nn.Sigmoid), which produces an "S curve" that squashes all values to the range 0.0 - 1.0.

In [None]:
class MLP(nn.Module):
  def __init__(self, input_dim, hidden_dim, output_dim):
    super().__init__()
    self.input_dim = input_dim
    self.hidden_dim = hidden_dim
    self.output_dim = output_dim

    self.fc = nn.Sequential(
        nn.Linear(self.input_dim, self.hidden_dim),
        nn.LeakyReLU(),
        nn.Linear(self.hidden_dim, self.output_dim),
        nn.Sigmoid()
    )

  def forward(self, x):
    # the `Sequential` can just be called on the input
    return self.fc(x)

## Device

Torch supports multiple backends, most notibly `'cpu'` and `'cuda'` (with experimental support for Apple Metal as well). The main thing to remember is that all `Module`s and `Tensor`s neeed to be on the same device when running, or else you will get an error. Also note that all everything initially begins on CPU when created (with the exception of models that were saved while on GPU, but let's ignore that for now).

In [None]:
# get the device we are using and save it for later. change the runtime
# to GPU to see if print `cuda`
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

# create a tensor and send it to our device
x = torch.randn(2,3, device=device)
print(x)
# you can also send things to a device after they are creates
x = torch.randn(2, 3)
print(x)
x = x.to(device)
print(x)

# some operations, such as converting to a numpy array, require tensors
# to be on the CPU specifically, so using `.cpu()` is helpful
print(x.cpu().numpy())
# there is also `.cuda()` but that asumesthat you actually have a GPU

## Dataset

The primary mechanisms for working with data live in [torch.utils.data](https://pytorch.org/docs/stable/data.html).

Now we will make a model that solves [XOR](https://en.wikipedia.org/wiki/Exclusive_or). This will require creating:
* a [map-style](https://pytorch.org/docs/stable/data.html#map-style-datasets) `Dataset` to hold our examples
* a model to train
* a training loop

Although simple, XOR is useful as it requires a model that can handle non-linear relationships between inputs and outputs: it is not possible to draw a single straight line as a decision boundary.

In [None]:
'''
a `Dataset` is an interface for defining data to train on. it requires defining
* __init__
* __len__ to return the number of training examples
* __getitem__ to return an example at a specific index
'''
from torch.utils.data import Dataset

class XorDataset(Dataset):
  def __init__(self):
    # example inputs
    self.x = [
        [0, 0],
        [1, 0],
        [0, 1],
        [1, 1]
    ]
    # example outputs
    self.y = [
        0,
        1,
        1,
        0
    ]

  def __len__(self):
    # return the number of training examples
    return len(self.x)

  def __getitem__(self, idx):
    # return the input/output for a given example (by index)
    x = self.x[idx]
    y = self.y[idx]
    return {'x': torch.FloatTensor(x), 'y': torch.FloatTensor([y])}

xor_dataset = XorDataset()
print(len(xor_dataset))
print(xor_dataset[2])

## DataLoader and Batching

The most common way to work with a `Dataset` is to use a [DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). This handles things like:
* iterating through the `Dataset`
* optionally randomizing the order of examples
* batching examples together

Batching (a.k.a. mini-batches) means loading multiple training examples in *parallel*. This is helpful, especially when using GPUs, for being able to do more computation in a single pass, speeding up traning for feedforward models. Additionally some objectives like contrastive learning specifically rely on batcfhing as part of the loss function. However recurrent models benefit less from batching as they take sequences as input, which often requires padding shorter examples since everything in the batch needs to be the same length.

In [None]:
from torch.utils.data import DataLoader

batch_size = 4

dataloader = DataLoader(xor_dataset, batch_size=batch_size, shuffle=True, drop_last=True)

for batch in dataloader:
  x = batch['x']
  y = batch['y']
  print(x, 'x.size()', x.size())
  print(y, 'y.size()', y.size())

## Training Setup

For supervised learning, we need:
* a model to train
* a dataset
* an [optimizer](https://pytorch.org/docs/stable/optim.html)
* an objective ([loss function](https://pytorch.org/docs/stable/nn.html#loss-functions))

We then loop over all of our training data multiple times (epochs) and track the loss so that we can see how the model is improving. Of course you can (and usually should) also track validation loss on your test data, but we will omit that here.

Here we will be using the [Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam) optimizer and [mean squared error loss](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss).

In [2]:
from torch.optim import Adam
from tqdm import trange  # gives us a nice progress bar

epochs = 5000  # the number of times to iterate through the training data

model = MLP(2, 2, 1)  # create an instance of our model
model = model.to(device)  # send the model to the appropriate device
print(model.train())  # set the model to train mode (default) and print it for good measure
opt = Adam(model.parameters())  # initialize the optimizer with the model parameters
loss_fn = nn.MSELoss()  # create an instance of our loss function
losses = []  # create an empty list for tracking the loss every epoch

for epoch in trange(epochs):  # loop for the number of epochs
  for batch in dataloader:  # iterate through the dataset

    # get the inputs and target outputs and send them to the device
    x = batch['x'].to(device)
    y = batch['y'].to(device)

    # run the model and get its prediction
    y_hat = model(x)

    # calculate the loss
    loss = loss_fn(y_hat, y)

    # clear the previous gradient from the optimizer
    opt.zero_grad()
    # calculate the gradient based on the loss
    loss.backward()
    # update the model weights based on the gradient
    opt.step()

    '''
    Store the loss in a list so that we can plot it later.
    When doing so however, we need to call `.detach()` in
    order to remove the gradient, `.cpu()` to make sure it
    is on the CPU, and `.numpy()` to convert it into a numpy
    value because matplotlib doesn't work directly on tensors.
    '''
    losses.append(loss.detach().cpu().numpy())

NameError: name 'MLP' is not defined

In [None]:
import matplotlib.pyplot as plt

plt.plot(losses)
plt.show()

Since our training data is so small, we can iterate through all examples and compare the prediction to the target. For more complex datasets/tasks could use a test set to plot the validation loss, calculate a confusion matrix, etc.

In [1]:
model = model.cpu()
model.eval()

with torch.no_grad():
  for example in xor_dataset:
    x = example['x']
    y = example['y']
    y_hat = model(x)
    loss = loss_fn(y_hat, y)
    print('x:', x, 'y:', y, 'y_hat:', y_hat, 'loss:', loss)

NameError: name 'model' is not defined

## Bonus: Other Resources

* [Andrej Karpathy](https://www.youtube.com/@AndrejKarpathy)'s Youtube Channel where he is currently building a minimal implementation of Pytorch from the ground up
* [Yannic Kilcher](https://www.youtube.com/@YannicKilcher)'s Youtube channel where he has overviews of tons of papers, streams working on open source projects, and keeps up with current events in ML
* [lucidrains](https://github.com/lucidrains) on github who implements absolutely everything