# Implementation of Modern Computer Vision architecutres
In this section we would be discussing populuar production CNN architecuteres that are used in production applications today. These architectures are a result of rigorous experimentation, this means that the hyper parameters and architecture is definite  and the best combination for that type of problem. 
We would be looking at: 
- AlexNet
- VGG
- NiN
- GoogLeNet
- ResNet
- ResNeXt
- DenseNet



In [2]:
import keras
import tensorflow as tf
from d2l import tensorflow as d2l

### AlexNet
AlexNet was the direct successor to LeNet, it was the first architecture to make us of GPUs for training neural networks, and it was able to win the image net competition. Because of the size of the model architecture, the model was split into two across GPUs, this enabled the memory requirements of the model to be met. The architecture also made some other improvements over LeNet such as using ReLU instead of sigmoid as the activation function, It made use of MaxPooling as opposed to average pooling used in LeNet, It also made use of DropOut as a regularization technique to stabilize training in the fully connected layers. The convnet code was used as the standard for implementing deep neural networks for some time, this was before the popularization of deep learning libraries such as tensorflow and pytorch. These advancedments were also due to the availability of a large dataset such as ImageNet, which was required to achieve good performance in deep learning algorithms.

In [3]:
# implementation of AlexNet using the d2l classifier class
class AlexNet(d2l.Classifier):
    def __init__(self, lr=0.1, num_classes=10):
        super().__init__()
        self.save_hyperparameters()
        self.net = tf.keras.models.Sequential([
            tf.keras.layers.Conv2D(filters=96, kernel_size=11,
                                   strides=4, activation="relu"),
            tf.keras.layers.MaxPool2D(pool_size=3, strides=2), 
            
            tf.keras.layers.Conv2D(filters=256, kernel_size=5, 
                                   padding='same', activation='relu'),
            tf.keras.layers.MaxPool2D(pool_size=3, strides=2),
            
            tf.keras.layers.Conv2D(filters=384, kernel_size=3,
                                   padding='same', activation='relu'),
            tf.keras.layers.Conv2D(filters=384, kernel_size=3, 
                                   padding='same', activation='relu'),
            tf.keras.layers.Conv2D(filters=256, kernel_size=3,
                                   padding='same', activation='relu'),
            tf.keras.layers.MaxPool2D(pool_size=3, strides=2),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(4096, activation='relu'),
            tf.keras.layers.Dropout(0.5),
            tf.keras.layers.Dense(4096, activation='relu'),
            tf.keras.layers.Dropout(0.5),
            tf.keras.layers.Dense(num_classes)
        ])

In [4]:
# show layer summary
AlexNet().layer_summary((1, 224, 224, 1))

Conv2D output shape:	 (1, 54, 54, 96)
MaxPooling2D output shape:	 (1, 26, 26, 96)
Conv2D output shape:	 (1, 26, 26, 256)
MaxPooling2D output shape:	 (1, 12, 12, 256)
Conv2D output shape:	 (1, 12, 12, 384)
Conv2D output shape:	 (1, 12, 12, 384)
Conv2D output shape:	 (1, 12, 12, 256)
MaxPooling2D output shape:	 (1, 5, 5, 256)
Flatten output shape:	 (1, 6400)
Dense output shape:	 (1, 4096)
Dropout output shape:	 (1, 4096)
Dense output shape:	 (1, 4096)
Dropout output shape:	 (1, 4096)
Dense output shape:	 (1, 10)


In [5]:
trainer = d2l.Trainer(max_epochs=10)
data = d2l.FashionMNIST(batch_size=128, resize=(224, 224))
with d2l.try_gpu():
    model = AlexNet(lr=0.01)
    trainer.fit(model, data)

2023-06-13 01:41:07.036629: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


KeyboardInterrupt: 

### VGG (Visual Geometry Group at Oxford)
The VGG introduced the concept of convolutional blocks, these blocks consisted of several convolution layers and ending with a pooling layer that was used to reduce the dimensions of the image. The AlexNet architecture proved that feature learning with deep neural networks was possible but it did not provide a framework for developing newer CNN algorithms, VGG introduced the concept of the VGG block, these blocks can be stacked together to create deeper and therefore more powerful neural networks

In [6]:
# Implementation of single VGG block
def vgg_block(num_convs, num_channels):
    blk = tf.keras.models.Sequential()
    
    for _ in range(num_convs):
        blk.add(tf.keras.layers.Conv2D(num_channels, kernel_size=3, 
                                       padding='same', activation='relu'))
    
    blk.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
    return blk

In [21]:
# VGG blocks can be stacked and interchanged in different ways, therefore
# VGG does not define a single architecture, but rather a family of architectures

class VGG(d2l.Classifier):
    def __init__(self, arch, lr=0.1, num_classes=10):
        super().__init__()
        self.save_hyperparameters()
        self.net = tf.keras.models.Sequential()
        for (num_convs, num_channels) in arch:
            self.net.add(vgg_block(num_convs, num_channels))
        self.net.add(tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(4096, activation='relu'),
            tf.keras.layers.Dropout(0.5),
            tf.keras.layers.Dense(4096, activation='relu'),
            tf.keras.layers.Dropout(0.5),
            tf.keras.layers.Dense(num_classes)
        ]))

In [22]:
vgg11 = VGG(arch=((1, 64), (1, 128), (2, 256), (2, 512), (2, 512)))


In [23]:
vgg11.layer_summary((1, 224, 224, 1))

Sequential output shape:	 (1, 112, 112, 64)
Sequential output shape:	 (1, 56, 56, 128)
Sequential output shape:	 (1, 28, 28, 256)
Sequential output shape:	 (1, 14, 14, 512)
Sequential output shape:	 (1, 7, 7, 512)
Sequential output shape:	 (1, 10)


In [None]:
# we use smaller number of channels to train VGG 11 to be used on fashion MNIST
trainer = d2l.Trainer(max_epochs=10)
data = d2l.FashionMNIST(batch_size=128, resize=(224, 224))
with d2l.try_gpu():
    model = VGG(arch=((1, 16), (1, 32), (2, 64), (2, 128), (2, 128)), lr=0.01)
    trainer.fit(model, data)

### Network in Network (NiN)
This architure proposes the use of 1X1 convolutional layers to add non-linearity in the convolutional layer, it then uses average pooling to group the layers, removing the need for large fully connected layers at the end of the architecture

In [24]:
# Implementation of the NiN block
def nin_block(out_channels, kernel_size, strides, padding):
    return tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(out_channels, kernel_size, 
                               strides, padding),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Conv2D(out_channels, 1),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Conv2D(out_channels, 1),
        tf.keras.layers.Activation('relu')
    ])

In [25]:
# NiN don't make use of fully connected layers, 
# rather they make use of average pooling, with the last layer
# being an NiN block with the number of channels being equal to the number
# of target classes, then the average pooling is applied on it to produce the logits

class NiN(d2l.Classifier):
    def __init__(self, lr=0.1, num_classes=10):
        super().__init__()
        self.save_hyperparameters()
        self.net = tf.keras.models.Sequential([
            nin_block(96, kernel_size=11, strides=4, padding='valid'),
            tf.keras.layers.MaxPool2D(pool_size=3, strides=2),
            nin_block(256, kernel_size=5, strides=1, padding='same'),
            tf.keras.layers.MaxPool2D(pool_size=3, strides=2),
            nin_block(384, kernel_size=3, strides=1, padding='same'),
            tf.keras.layers.MaxPool2D(pool_size=3, strides=2),
            tf.keras.layers.Dropout(0.5),
            nin_block(num_classes, kernel_size=3, strides=1, padding='same'),
            tf.keras.layers.GlobalAvgPool2D(),
            tf.keras.layers.Flatten()
        ])