Class Session::
https://www.youtube.com/watch?v=OMDn66kM9Qc&ab_channel=LightningAI
# MNIST Digital Classification

In [8]:
# install m1 gpu support
# pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

In [9]:
import platform
import torch
from torch import nn
device = torch.device("mps")

# test gpu run
torch.backends.mps.is_available()
torch.randn(5, device=device)
print(torch.has_mps)
print(platform.platform())
print(device)

True
macOS-12.3.1-arm64-arm-64bit
mps


In [10]:
import torch.nn.functional as F

# Fully Connected Neurol Network
class ResNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 10)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return F.log_softmax(x, dim=1)

# Fully Connected Neurol Network with Residual Connections
# Faster Training with this network
class ResNet2(nn.Module):
    def __init__(self):
        super().__init__()
        self.l1 = nn.Linear(28*28, 64)
        self.l2 = nn.Linear(64, 64)
        self.l3 = nn.Linear(64, 10)
        self.do = nn.Dropout(0.1)

    def forward(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        do = self.do(h2 + h1)
        logits = self.l3(do)
        return logits

print(ResNet())
print(ResNet2())
model = ResNet2()
#model = model.to(device)

ResNet(
  (fc1): Linear(in_features=784, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=64, bias=True)
  (fc4): Linear(in_features=64, out_features=10, bias=True)
)
ResNet2(
  (l1): Linear(in_features=784, out_features=64, bias=True)
  (l2): Linear(in_features=64, out_features=64, bias=True)
  (l3): Linear(in_features=64, out_features=10, bias=True)
  (do): Dropout(p=0.1, inplace=False)
)


## ResNet
model = nn.Sequential(
    nn.Linear(28*28,64),
    nn.ReLU(),
    nn.Linear(64,64),
    nn.ReLU(),
    nn.Linear(64,10)
)
In this Fully Connected Neurol Network we have an input layer that takes an image of 28*28px and transforms it to an output of 64. This is the hidden dimension and only the network knows what's going on inside. Next we have a non-linear function such that we are learning something.

Next we have a 2nd hidden layer, with 64 dimension input and output. Then we learn again with a Relu function. Finally, we have our output layer that takes a hidden dimension and outputs our 10 classification classes.

## ResNet2
model = nn.Sequential(
    nn.Linear(28*28,64),
    nn.ReLU(),
    nn.Linear(64,64),
    nn.ReLU(),
    nn.Dropout(0.1),
    nn.Linear(64,10)
)
In this Fully Connected Neurol Network we use Residual Connections to speed up training time. Again we start off with an input layer, then we have a number of hidden layers. The key difference is the Dropout function, which drastically reduces the chance of overfitting during training.
More info:: https://wandb.ai/authors/ayusht/reports/Implementing-Dropout-in-PyTorch-With-Example--VmlldzoxNTgwOTE


In [11]:
from torch import optim

# optimiser
params = model.parameters()
optimiser = optim.SGD(params,lr=1e-2)

# loss function
loss = nn.CrossEntropyLoss()

Here we create our optimiser, and define our loss function. Optimization is the process of adjusting model parameters to reduce model error in each training step. Our Optimiser here is a stochastic gradient decent with a learning rate of 0.01. Loss functions are used to gauge the error between the prediction output and the provided target value. A loss function tells us how far the algorithm model is from realizing the expected outcome. Our Loss function here is a cross entropy loss.
## Notes highlight
CrossEntropyLoss - > good loss functions for classification problems



In [12]:
from torchvision import transforms, datasets
from torch.utils.data import random_split, DataLoader

train_data = datasets.MNIST('', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))
train, val = random_split(train_data,[55000,5000])
train_loader = DataLoader(train, batch_size=32)
val_loader = DataLoader(val,batch_size=32)

Download our training data from MNIST datasets with train True (meaning our data should be training data)
Split our training data into a training part, and validation part

Make our data iteraterable with DataLoader


Download our testing data from MNIST datasets with train False (meaning our data should not be training data)


In [13]:
# training loop

# number of iteration through dataset
nb_epochs = 5

for epoch in range(nb_epochs):
    losses = list()
    for batch in train_loader:
        x,y = batch # separate image matrix from our class label
        #x, y = x.to(device), y.to(device)

        #x: image matrix batch_size x 1 x 28 x 28
        b = x.size(0)
        x = x.view(b,-1) # convert to matrix with b rows, and column (1 x 28 * 28) = 784

        # step 1: forward
        l = model(x) # logits:: output of our last layer

        # step 2: compute objective function
        J = loss(l,y)

        # step 3: clean up gradients
        model.zero_grad()

        # step 4: accumulate the partial derivatives of J with respect to parameters
        J.backward()

        # step 5: learn - apply our optimiser step [ opposite direction of gradient]
        optimiser.step()

        # compute losses
        losses.append(J.item())

    # plot with 2 decimal places
    print(f'Epoch {epoch + 1}, train loss: {torch.tensor(losses).mean():.2f}')

    losses = list()
    for batch in val_loader:
        x,y = batch # separate image matrix from our class label
        #x, y = x.to(device), y.to(device)

        #x: image matrix batch_size x 1 x 28 x 28
        b = x.size(0)
        x = x.view(b,-1) # convert to matrix with b rows, and column (1 x 28 * 28) = 784

        # step 1: forward
        with torch.no_grad():
            l = model(x) # logits:: output of our last layer

        J = loss(l, y)

        losses.append(J.item())

    # plot with 2 decimal places
    print(f'Epoch {epoch + 1}, validation loss: {torch.tensor(losses).mean():.2f}')

Epoch 1, train loss: 0.87
Epoch 1, validation loss: 0.46
Epoch 2, train loss: 0.38
Epoch 2, validation loss: 0.37
Epoch 3, train loss: 0.32
Epoch 3, validation loss: 0.32
Epoch 4, train loss: 0.28
Epoch 4, validation loss: 0.28
Epoch 5, train loss: 0.25
Epoch 5, validation loss: 0.26


We iterate through our training data 5x
In each step::
    In our training loop we iterate through our validation data
        extract one batch from our training data
        separate our image x, from our class label y from the batch
        convert our batch image matrix to a matrix that fits our input
        forward our image into our input layer in the model
        compute the objective function (last layer output) - When Loss is calcul Output will be a tensor without a graph [saves memory]
        clean up our gradients before computing our derivatives
        compute partial derivatives (clean up gradients otherwise derivatives will accumulate)
        step into our optimiser, the learning part
        append our loss to our training loss list
        Finally, we print our losses
    In our validation loop we iterate through our validation data
        extract one batch from our training data
        separate our image x, from our class label y from the batch
        convert our batch image matrix to a matrix that fits our input
        forward our image into our input layer in the model
            our output will not have a graph attached to each tensor
        compute the objective function (last layer output) - When Loss is calcul Output will be a tensor without a graph [saves memory]
        append to our validation loss list
    Print our validation loss
