# Introduction

I am starting a series of post in medium covering most of the CNN architectures implemented so far, in pytorch and tensorflow. I believe after getting your hands on with the standard architectures, we will be ready to build our own custom CNN architectures for any task.

So I am starting with the oldest CNN architecture LeNet(1998). It was primarily developed for recognition of handwritten and other characters.

<img src="https://miro.medium.com/max/700/1*lvvWF48t7cyRWqct13eU0w.jpeg">

The above picture summarizes the LeNet's architecture, let's break down each of them layer by layer.


## LeNet Architecture
S.No | Layers | Output Shape (Height, Width, Channels)
--- | --- | ---
1 | Input Layer | 32 x 32 x 1
2 | Conv2d [6 Filters of size = 5x5, stride = 1, padding = 0 ] | 28 x 28 x 6
3 | Average Pooling [stride = 2, padding = 0] | 14 x 14 x 6
4 | Conv2d [16 Filters of size = 5x5, stride = 1, padding = 0 ] | 10 x 10 x 16
5 | Average Pooling [stride = 2, padding = 0] | 5 x 5 x 16
6 | Conv2d [120 Filters of size = 5x5, stride = 1, padding = 0 ] | 1 x 1 x 120
7 | Linear1 Layer | 120
8 | Linear2 Layer | 84
9 | Final Linear Layer | 10



<img src="https://miro.medium.com/max/330/1*D47ER7IArwPv69k3O_1nqQ.png">

## Number of Learning Parameters = [i x (f x f) x b] + b
i = Number of input channels in conv2d

f = Filter Size

b = Number of Bias


## Output size calculation after applying convolution
Stride and Padding are kept constants across the network, so S = 1, P = 0

1. Input Layer shape = 32 x 32 x 1
2. After applying conv2d with 6 filters of (5x5),
  * Output shape = ((32 + 0 - 5) / 1) + 1 = 28
  * No of Learning Parameters = ([ 1 x (5 * 5) x 1] + 1) * 6 filters = 156
3. After applying Average Pooling (2x2),
  * Output shape = ((28 + 0 - 2) / 2) + 1 = 14
  * No of Learning Parameters = None (0)
4. After applying conv2d with 16 filters of (5x5),
  * Output shape = ((14 + 0 - 5) / 1) + 1 = 10
  * No of Learning Parameters = ([ 6 x (5 * 5) x 1] + 1) * 16 filters = 2416
5. After applying Average Pooling (2x2),
  * Output shape = ((10 + 0 - 2) / 2) + 1 = 5
  * No of Learning Parameters = None (0)
6. After applying conv2d with 150 filters of (5x5),
  * Output shape = ((5 + 0 - 5) / 1) + 1 = 1
  * No of Learning Parameters = ([ 16 x (5 * 5) x 1] + 1) * 120 filters = 48120
7. Apply Linear Layer of 84 neurons,
  * No of Learning Parameters = (120 * 84 + 84) = 10164
8. Apply Linear Layer of 10 neurons,
  * No of Learning Parameters = (84 * 10 + 10) = 850


In [None]:
# Importing necessary modules
import time
import torch
import torch.nn as nn
import torchvision.datasets as datasets
import torch.optim as optim
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torch.autograd import Variable


!pip install torchsummaryX --quiet
from torchsummaryX import summary as summaryX
from torchsummary import summary

from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()

In [None]:
class LeNet(nn.Module):
  def __init__(self):
    super(LeNet, self).__init__()
    self.conv1 = nn.Conv2d(1, 6, 5, bias=False)
    self.relu1 = nn.ReLU()
    self.pool1 = nn.MaxPool2d(2, 2)
    self.conv2 = nn.Conv2d(6, 16, 5, bias=False)
    self.relu2 = nn.ReLU()
    self.pool2 = nn.MaxPool2d(2, 2)
    self.fc1 = nn.Linear(256, 120, bias=False)
    self.relu3 = nn.ReLU()
    self.fc2 = nn.Linear(120, 84, bias=False)
    self.relu4 = nn.ReLU()
    self.fc3 = nn.Linear(84, 10, bias=False)
    # self.q = q
    # if q:
    #   self.quant = QuantStub()
    #   self.dequant = DeQuantStub()

  def forward(self, x):
    x = self.conv1(x)
    x = self.relu1(x)
    x = self.pool1(x)
    x = self.conv2(x)
    x = self.relu2(x)
    x = self.pool2(x)
    # Be careful to use reshape here instead of view
    x = x.reshape(x.shape[0], -1)
    x = self.fc1(x)
    x = self.relu3(x)
    x = self.fc2(x)
    x = self.relu4(x)
    x = self.fc3(x)
    return x

model = LeNet()
model

LeNet(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), bias=False)
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1), bias=False)
  (relu2): ReLU()
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=256, out_features=120, bias=False)
  (relu3): ReLU()
  (fc2): Linear(in_features=120, out_features=84, bias=False)
  (relu4): ReLU()
  (fc3): Linear(in_features=84, out_features=10, bias=False)
)

In [None]:
x = torch.randn(64,1,28,28)
output = model(x)
print(output.shape)
summary(model, (1,28,28))

torch.Size([64, 10])
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 6, 24, 24]             150
              ReLU-2            [-1, 6, 24, 24]               0
         MaxPool2d-3            [-1, 6, 12, 12]               0
            Conv2d-4             [-1, 16, 8, 8]           2,400
              ReLU-5             [-1, 16, 8, 8]               0
         MaxPool2d-6             [-1, 16, 4, 4]               0
            Linear-7                  [-1, 120]          30,720
              ReLU-8                  [-1, 120]               0
            Linear-9                   [-1, 84]          10,080
             ReLU-10                   [-1, 84]               0
           Linear-11                   [-1, 10]             840
Total params: 44,190
Trainable params: 44,190
Non-trainable params: 0
----------------------------------------------------------------
Input size 

# Loading MNIST

In [None]:
# Hyperparameters
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
learning_rate = 0.01
num_epochs = 10


transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5,), (0.5,))])

trainset = datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=16, pin_memory=True)

testset = datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(testset, batch_size=64,
                                         shuffle=False, num_workers=16, pin_memory=True)
dataset_sizes = {'train':len(trainset), 'test':len(testset)}

model = LeNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 11673392.97it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 343213.11it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 3173168.14it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 3599873.16it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






In [None]:
from IPython.display import HTML, display
class ProgressMonitor(object):
    """
    Custom IPython progress bar for training
    """

    tmpl = """
        <p>Loss: {loss:0.4f}   {value} / {length}</p>
        <progress value='{value}' max='{length}', style='width: 100%'>{value}</progress>
    """

    def __init__(self, length):
        self.length = length
        self.count = 0
        self.display = display(self.html(0, 0), display_id=True)

    def html(self, count, loss):
        return HTML(self.tmpl.format(length=self.length, value=count, loss=loss))

    def update(self, count, loss):
        self.count += count
        self.display.update(self.html(self.count, loss))

def train_new(model,criterion,optimizer,num_epochs,dataloaders,dataset_sizes,first_epoch=1):
  since = time.time()
  best_loss = 999999
  best_epoch = -1
  last_train_loss = -1
  plot_train_loss = []
  plot_valid_loss = []


  for epoch in range(first_epoch, first_epoch + num_epochs):
      print()
      print('Epoch', epoch)
      running_loss = 0.0
      valid_loss = 0.0

      # train phase
      model.train()

      # create a progress bar
      progress = ProgressMonitor(length=dataset_sizes["train"])

      for data in dataloaders[0]:
          # Move the training data to the GPU
          inputs, labels  = data
          batch_size = inputs.shape[0]
          print(inputs.shape)
          # break
          inputs = Variable(inputs.to(device))
          labels = Variable(labels.to(device))

          # clear previous gradient computation
          optimizer.zero_grad()
          outputs = model(inputs)
          loss = criterion(outputs, labels)

          loss.backward()
          optimizer.step()

          running_loss += loss.data * batch_size
          # update progress bar
          progress.update(batch_size, running_loss)
      # break
      epoch_loss = running_loss / dataset_sizes["train"]
      print('Training loss:', epoch_loss.item())
      writer.add_scalar('Training Loss', epoch_loss, epoch)
      plot_train_loss.append(epoch_loss)

      # validation phase
      model.eval()
      # We don't need gradients for validation, so wrap in
      # no_grad to save memory
      with torch.no_grad():
        for data in dataloaders[-1]:
            inputs, labels  = data
            batch_size = inputs.shape[0]

            inputs = Variable(inputs.to(device))
            labels = Variable(labels.to(device))
            outputs = model(inputs)

            # calculate the loss
            optimizer.zero_grad()
            loss = criterion(outputs, labels)

            # update running loss value
            valid_loss += loss.data * batch_size

      epoch_valid_loss = valid_loss / dataset_sizes["test"]
      print('Validation loss:', epoch_valid_loss.item())
      plot_valid_loss.append(epoch_valid_loss)
      writer.add_scalar('Validation Loss', epoch_valid_loss, epoch)

  time_elapsed = time.time() - since
  print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))

  return plot_train_loss, plot_valid_loss, model

if __name__=="__main__":
  train_losses, valid_losses, model = train_new(model = model ,criterion = criterion,optimizer = optimizer,
                                              num_epochs=10,dataloaders = [train_loader, test_loader],dataset_sizes = dataset_sizes)


Epoch 1


  self.pid = os.fork()


torch.Size([64, 1, 28, 28])
Training complete in 0m 1s


In [None]:
def accuracy(loader, model, train=True):
    num_correct = num_samples = 0
    model.eval()
    with torch.no_grad():
      for data in loader:
        inputs, labels  = data
        batch_size = inputs.shape[0]

        inputs = Variable(inputs.to(device))
        labels = Variable(labels.to(device))

        outputs = model(inputs)
        _, preds = outputs.max(1)
        num_correct += (preds == labels).sum()
        num_samples += preds.size(0)
    accuracy = (num_correct.item()/num_samples)*100
    if train:
      print("Model Predicted {} correctly out of {} from training dataset, Acuracy : {:.2f}".format(num_correct.item(), num_samples, accuracy))
    else:
      print("Model Predicted {} correctly out of {} from testing dataset, Acuracy : {:.2f}".format(num_correct.item(), num_samples, accuracy))
    model.train()

accuracy(train_loader, model)
accuracy(test_loader, model, train=False)

Model Predicted 59161 correctly out of 60000 from training dataset, Acuracy : 98.60
Model Predicted 9827 correctly out of 10000 from testing dataset, Acuracy : 98.27


In [None]:
import torch.quantization
from torch.quantization import QuantStub, DeQuantStub

class QuantLeNet(nn.Module):
  def __init__(self):
    super(QuantLeNet, self).__init__()
    self.conv1 = nn.Conv2d(1, 6, 5, bias=False)
    self.relu1 = nn.ReLU()
    self.pool1 = nn.MaxPool2d(2, 2)
    self.conv2 = nn.Conv2d(6, 16, 5, bias=False)
    self.relu2 = nn.ReLU()
    self.pool2 = nn.MaxPool2d(2, 2)
    self.fc1 = nn.Linear(256, 120, bias=False)
    self.relu3 = nn.ReLU()
    self.fc2 = nn.Linear(120, 84, bias=False)
    self.relu4 = nn.ReLU()
    self.fc3 = nn.Linear(84, 10, bias=False)

    self.quant = QuantStub()
    self.dequant = DeQuantStub()

  def forward(self, x):
  
    x = self.quant(x)
    
    # print(x," ",type(x))
    x = self.conv1(x)
    # print(x)

    x = self.relu1(x)
    x = self.pool1(x)
    x = self.conv2(x)
    x = self.relu2(x)
    x = self.pool2(x)

    # Be careful to use reshape here instead of view
    x = x.reshape(x.shape[0], -1)
    x = self.fc1(x)
    x = self.relu3(x)
    x = self.fc2(x)
    x = self.relu4(x)
    x = self.fc3(x)
    x = self.dequant(x)
    return x


In [None]:
net_quantized = QuantLeNet().to(device)


# Copy weights from unquantized model
net_quantized.load_state_dict(model.state_dict())

net_quantized = torch.quantization.fuse_modules(net_quantized, [['conv1', 'relu1'],
                                            ['conv2', 'relu2'],
                                            ['fc1', 'relu3'],
                                            ['fc2', 'relu4']], inplace=True)

net_quantized.eval()

net_quantized.qconfig = torch.ao.quantization.default_qconfig
net_quantized = torch.ao.quantization.prepare(net_quantized) # Insert observers
net_quantized

QuantLeNet(
  (conv1): ConvReLU2d(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), bias=False)
    (1): ReLU()
    (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
  )
  (relu1): Identity()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): ConvReLU2d(
    (0): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1), bias=False)
    (1): ReLU()
    (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
  )
  (relu2): Identity()
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): LinearReLU(
    (0): Linear(in_features=256, out_features=120, bias=False)
    (1): ReLU()
    (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
  )
  (relu3): Identity()
  (fc2): LinearReLU(
    (0): Linear(in_features=120, out_features=84, bias=False)
    (1): ReLU()
    (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
  )
  (relu4): Identity()
  (f

In [None]:
accuracy(test_loader, net_quantized, train=False)

Model Predicted 9827 correctly out of 10000 from testing dataset, Acuracy : 98.27


In [None]:
net_quantized

QuantLeNet(
  (conv1): ConvReLU2d(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), bias=False)
    (1): ReLU()
    (activation_post_process): MinMaxObserver(min_val=0.0, max_val=6.905801773071289)
  )
  (relu1): Identity()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): ConvReLU2d(
    (0): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1), bias=False)
    (1): ReLU()
    (activation_post_process): MinMaxObserver(min_val=0.0, max_val=36.02035140991211)
  )
  (relu2): Identity()
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): LinearReLU(
    (0): Linear(in_features=256, out_features=120, bias=False)
    (1): ReLU()
    (activation_post_process): MinMaxObserver(min_val=0.0, max_val=208.48196411132812)
  )
  (relu3): Identity()
  (fc2): LinearReLU(
    (0): Linear(in_features=120, out_features=84, bias=False)
    (1): ReLU()
    (activation_post_process): MinMaxObserver(min_val=0.0, max_

In [None]:
net_quantized = torch.ao.quantization.convert(net_quantized)
net_quantized

QuantLeNet(
  (conv1): QuantizedConvReLU2d(1, 6, kernel_size=(5, 5), stride=(1, 1), scale=0.05437639355659485, zero_point=0, bias=False)
  (relu1): Identity()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): QuantizedConvReLU2d(6, 16, kernel_size=(5, 5), stride=(1, 1), scale=0.2836248278617859, zero_point=0, bias=False)
  (relu2): Identity()
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): QuantizedLinearReLU(in_features=256, out_features=120, scale=1.6415902376174927, zero_point=0, qscheme=torch.per_tensor_affine)
  (relu3): Identity()
  (fc2): QuantizedLinearReLU(in_features=120, out_features=84, scale=2.435471773147583, zero_point=0, qscheme=torch.per_tensor_affine)
  (relu4): Identity()
  (fc3): QuantizedLinear(in_features=84, out_features=10, scale=2.011951208114624, zero_point=90, qscheme=torch.per_tensor_affine)
  (quant): Quantize(scale=tensor([0.0157]), zero_point=tensor([64]), dtype=

In [None]:
accuracy(test_loader, net_quantized, train=False)

Model Predicted 9801 correctly out of 10000 from testing dataset, Acuracy : 98.01


In [None]:
torch.save(net_quantized.state_dict(), "./model_weights.pt")