# Replicating LeNet 5

In [None]:
"""
Notes from the paper:

The Lenet paper summrizes previous work done on character recognition, including SGD, Convolutions and Neural Networks.

Goal:
Character Recognition, by building a character classifier

Dataset Used:
MNIST

Method Used:
Build a Convolution based Feature Extractor, followed by a Fully Connected Neural Network Classifier

Architecture:
Input (32, 32)
-> Convolution (5x5, 6 filters) (6, 28, 28)
-> Sub Sampling (6, 14, 14)
-> Sigmoid
-> Convolution (5x5, 16 filters) (16, 10, 10)
-> Sub Sampling (16, 5, 5)
-> Sigmoid
-> Convolution (5x5, 120 filters) (120, 1, 1)
-> Sigmoid
-> Fully Connected (120)
-> Sigmoid
-> Fully Connected (84)
-> Sigmoid
-> RBF (10)

Training Parameters / Hyperparamters:
- Important to note detail is the the dataset is 28 x 28. Padding is added to the image to better extract stroke-endpoints on the edges on the images
- Image is norrmalized to have zero mean and equal variance.
- Sumsampling means, in a 2x2 pixel area, all values are arred, multiplied by a weight and added to a bias. This IS NOT THE SAME AS MAX POOLING.
- Stride for subsampling is 2, so that the output is half the size of the input and the area of sub-sampling is non overlaping
- S2 and C3 have some weird associations which I will ignore probably
- The last layer is a layer of RBF units instead of neurons. The Paper explains, "In probabilistic terms, the RBF output can be interpreted as the unnormalized negative loglikelihood of a Gaussian distribution in the space of configurations of layer F6"
- Loss function is MSE, but they modify it and make it scary. We will just just MSE loss

- Ran three Experiments
- 1. Images were centered into a 28 x 28 image and then padded to 32 x 32. This was called the "Regular" dataset
- 2. Images were deslanted and cropped into a 20 x 20 image. This was called the "Deslanted" dataset
- 3. Images were centered into a 16 x 16 image. The Author forgot to name this dataset like it was his middle child.

I will only be using the Regular Dataset.

- Trained for 20 epochs
- 60k training images, 10k test images
- Learning Rate was 0.0005 for the first 2 epochs, and 0.0002 for the next 3, 0.0001 fir the next 3, 0.00005 for the next 4 and 0.00001 thereafter.
- Author obeserver no over-fitting? Is he Jesus? The Author says this is because the learning rates are too high? LMFAO
-

Metrics Defined:
Error Rate
- Number of misclassified test samples / Total number of test samples

Results:

"""

In [21]:
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F

In [22]:
class sub_sampler(nn.Module):
  def __init__(self, size, stride):
    super(sub_sampler, self).__init__()
    self.size = size
    self.stride = stride
    self.pool = nn.AvgPool2d(self.size, self.stride)
    self.weight = nn.Parameter(torch.ones(1))
    self.bias = nn.Parameter(torch.zeros(1))

  def forward(self, img):
    img = self.pool(img)
    return img * self.weight + self.bias

In [23]:
class RBFLayer(nn.Module):
  def __init__(self, input_dim, output_dim):
    super(RBFLayer, self).__init__()
    self.centers = nn.Parameter(torch.randn(output_dim, input_dim)) # This creates a centers array with dimensions (output_dim, input_dim) and randomizes it. It's a parameter because centers are trainable

  def forward(self, x): # X will be passed in batches so keep that in mind
    distances = torch.cdist(torch.unsqueeze(x, 1), torch.unsqueeze(self.centers, 0)) # The x shape will be (batch_size, 1, input_dim) and centers shape will be (1, output_dim, input_dim)
    # Distances output dimensions will be (batch_size, 1, output_dim)
    return torch.exp(-1.0 * distances.squeeze(1))


In [26]:
class LeNet5(nn.Module):
  def __init__(self):
    super(LeNet5, self).__init__()
    self.c1 = nn.Conv2d(1, 6, 5, 1, 2) # (1, 28, 28) -> (1, 32, 32) -> (6, 28, 28)
    self.s2 = sub_sampler(2, 2) # (6, 14, 14)
    self.c3 = nn.Conv2d(6, 16, 5, stride=1, padding=0) # (16, 10, 10)
    self.s4 = sub_sampler(2, 2) # (16, 5, 5)
    self.c5 = nn.Conv2d(16, 120, 5, stride=1, padding=0) # (120, 1)
    self.f6 = nn.Linear(120, 84)
    self.rbf = RBFLayer(84, 10)

  def forward(self, x):
    x = F.sigmoid(self.s2(self.c1(x)))
    x = self.s4(self.c3(x))
    x = self.c5(x)
    x = x.view(-1, 120)
    x = self.f6(x)
    x = self.rbf(x)
    return x

In [27]:
# Testing model class
model = LeNet5()
image = torch.randn(5, 1, 28, 28)
output = model(image)
output.shape

torch.Size([5, 10])

In [32]:
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor, Lambda, Compose
from torch.nn.functional import one_hot
from torch.utils.data import DataLoader, Dataset

In [33]:
train_data = MNIST('./data', download=True, train=True)
test_data = MNIST('./data', download=True, train=False)

In [34]:
class MNISTDataset(nn.Module):
  def __init__(self, dataset, transform = None):
    super(MNISTDataset, self).__init__()
    self.dataset = dataset
    self.transform = transform

  def __len__(self):
    return len(self.dataset)

  def __getitem__(self, idx):
    image, label = self.dataset[idx]
    if self.transform:
      image = self.transform(image)
    return image, one_hot(torch.tensor(label), num_classes = 10).float()


In [35]:
def hard_coded_normalize(img, min = -0.1, max = 1.175) -> torch.Tensor:
  img = img.float()
  img = img * (max-min) / 1 + min # Standardization
  return img

transforms = Compose([
    ToTensor,
    Lambda(hard_coded_normalize)
])

mnist_train_dataset = MNISTDataset(train_data, transforms)
mnist_test_dataset = MNISTDataset(test_data, transforms)

## Hyperparameters

In [36]:
batch_size = 64
train_loader = DataLoader(mnist_train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(mnist_test_dataset, batch_size=batch_size, shuffle=False)
lr = 0.01

In [37]:
from torch.optim import SGD

In [None]:
model = LeNet5().cuda()
epochs = 20
optimizer = SGD(model.parameters(), lr=lr)
loss = nn.CrossEntropyLoss()