## Basic Neural Network Architecture 

Macro components `slt`:
- __stem__ takes the input and does initial coarse-level feature extraction
- __learner__ is composed of any number of convolutional groups. Does detailed feature extraction and convolutional learning from teh extracted coarse feature.
- __task__ learns the task (e.g. classification) from the representation of the input in the latent space

In [6]:
def stem(input_layers):
    """stem layers
    input_shape: the shape of the input tensor
    """
    return outputs


def learner(inputs):
    """learner layers
    inputs: the input tensors (feature maps)
    """
    return outputs


def task(inputs, n_classes):
    """classifier layers
    inputs: the input tensors (feature maps)
    n_classes: the number of output classes
    """
    return outputs

## Stem component

- the entry point of neural network
- perform the first (coarse-level) feature extraction while reducing the feature map

### VGG

In [9]:
from tensorflow.keras.layers import Conv2D


def stem(inputs):
    """Constructs the Stem Convolution Group
    inputs: the input tensor"""
    outputs = Conv2D(64, (3, 3), strides=(1, 1), padding="same", activation="relu")(
        inputs
    )
    return outputs

### ResNet

In [12]:
from tensorflow.keras.layers import BatchNormalization, Conv2D, ReLU, ZeroPadding2D


def stem(inputs):
    """Constructs the Stem Convolution Group
    inputs: the input vector"""
    outputs = ZeroPadding2D(padding=(3, 3))(inputs)
    outputs = Conv2D(64, (7, 7), strides=(2, 2), padding="valid")(outputs)
    outputs = BatchNormalization()(outputs)
    outputs = ReLU()(outputs)
    outputs = ZeroPadding2D(padding=(1, 1))(outputs)
    outputs = MaxPooling2D((3, 3), strides=(2, 2))(outputs)
    return outputs

### ResNeXt

In [15]:
def stem(inputs):
    """Construct the Stem Convolution Group
    inputs: input vector
    """

    # Using padding='same' instead of ZeroPadding2D as in VGG.
    outputs = Conv2D(64, (7, 7), strides=(2, 2), padding="same")(inputs)
    outputs = BatchNormalization()(outputs)
    outputs = ReLU()(outputs)
    outputs = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(outputs)
    return outputs

### Xception

In [18]:
def stem(inputs):
    """Create the stem entry into the neural network
    inputs: input tensor to neural network
    """

    # A 5x5 convolution refactored as two 3x2 convolutions.
    outputs = Conv2D(32, (3, 3), strides=(2, 2))(inputs)
    outputs = BatchNormalization()(outputs)
    outputs = ReLU()(outputs)

    outputs = Conv2D(64, (3, 3), strides=(1, 1))(outputs)
    outputs = BatchNormalization()(outputs)
    outputs = ReLU()(outputs)
    return outputs

## Pre-stem

The purpose of pre-stem is to move into the graph (model) some or all the data preprocessing that was performed upstream.

Some of functions typically performed by a pre-stem group are:
- preprocessing
    - adapting a model to different input size
    - normalization
- augmentation
    - resizing and cropping
    - translational and scale invariance

In [21]:
from tensorflow.keras.layers.experimental.preprocessing import Normalization


def prestem(input_shape):
    """prestem layers"""
    outputs = Normalization(input_shape=input_shape)
    return outputs

## Learner Component

The _learner component_ is where we generally perform feature learning through more detailed feature extraction. 

This process is also referred to as _representational_ or _transformational learning_.

The learner component consists of one or more convolutional groups, and each group consists of one or more convolutional blocks.

In [24]:
def learner(inputs, groups):
    """learner layers
    inputs: the input tensors (feature maps)
    groups: the block parameters for each group
    """
    outputs = inputs
    for group_params in groups:
        outputs = group(outputs, **group_params)
    return outputs


def group(inputs, **blocks):
    """group layers
    inputs: the input tensors (feature maps)
    blocks: the block parameters for each block
    """
    outputs = inputs
    for block_params in blocks:
        outputs = block(**block_params)
    return outputs


def block(inputs, **params):
    """block layers
    inputs: the input tensor (feature maps)
    params: the block parameters for the block
    """
    return outputs

### ResNet

In [25]:
def learner(inputs, groups):
    """Construct the Learner
    inputs: input to the learner
    groups: group parameters per group
    """
    outputs = inputs
    group_params = groups.pop(0)
    outputs = group(outputs, **group_params, strides=(1, 1))

    for group_params in groups:
        outputs = group(outputs, **group_params, strides=(2, 2))
    return outputs

In [26]:
def group(inputs, blocks, strides=(2, 2)):
    """Construct a Residual Group
    inputs: input to the group
    blocks: block parameter for each block
    strides: whether the projection block is a strided convolution
    """
    outputs = inputs
    block_params = blocks.pop(0)
    outputs = projection_block(outputs, strides=strides, **block_params)

    for block_params in blocks:
        outputs = identity_block(outputs, **block_params)
    return outputs

### DenseNet

The _learner component_ in __DenseNet__ consists of four convolutional groups. 

Each group, with the exception of the last group, delays pooling to the end of the group (called _transactional block_). 

The feature maps will be pooled and flattened by the task component, so it is redundant to pool at the end of the group.

In [28]:
def learner(inputs, groups, reduction):
    """Construct the Learner
    inputs: input to the learner
    groups: set of number of blocks per group
    reduction: the amount to reduce (compress) feature maps by
    """
    outputs = inputs
    last = (
        groups.pop()
    )  # Pop off the last dense group parameters and saves for the end.

    for group_params in groups:
        outputs = group(outputs, reduction, **group_params)

    # Adds the last group without a transactional block.
    outputs = group(outputs, last, reduction=None)
    return outputs

In [29]:
def group(inputs, reduction=None, **blocks):
    """Construct a Dense Group
    inputs: input tensor to the group
    reduction: amount to reduce feature map by
    blocks: parameters for each dense block in the group
    """
    outputs = inputs

    for block_params in blocks:
        outputs = residual_blocks(outputs, **block_params)

    if reduction is not None:
        outputs = transition_block(outputs, reduction)

    return outputs

## Task Component

The _task component_ is where we generally perform task learning. In large conventional CNNs for image classification, this component typically consists of two layers:

- _bottleneck layer_ - performs dimensionality reduction of final feature maps into latent space
- _classifier layer_ - performs the task the model is learning


### ResNet

In [31]:
def classifier(inputs, n_classes):
    """The output classifier
    inputs: input tensor to the classifier
    n_classes: number of output classes
    """
    # Use global average pooling to reduce and flatten the feature maps (latent space)
    # into a 1D feature vector (bottleneck layer).
    outputs = GlobalAveragePooling2D()(inputs)

    # The fully connected dense layer for the final classification of the input.
    outputs = Dense(n_classes, activation="softmax")(outputs)
    return outputs

### Multilayer output

today, we build not models, but applications that are amalgamation, or composition, of models. As a result, we no longer treat the task component as a single output.

Depending on how the model is connected to other models in the application, there can be up to four outputs:

- Feature extraction
    - high dimensionality (encoding)
    - low dimensionality (embedding) - feature vector
- Prediction
    - prediction pre-activation (probabilities) - soft targets
    - post-activation (outputs) - hard targets

In [33]:
def classifier(inputs, n_classes):
    """The output classifier
    inputs: input tensor to the classifier
    n_classes: number of output classes
    """

    # High-dimensionality feature extraction (encoding)
    encoding = inputs

    # Low-dimensionality feature extraction (embedding).
    embeddings = GlobalAveragePooling2D()(inputs)

    # Pre-activation probabilities (soft labels)
    probabilities = Dense(n_classes)(embeddings)

    # Post-activation probabilities (hard labels)
    outputs = Activation("softmax")(outputs)

    # Returns a tuple of all four outputs.
    return encodings, embeddings, probabilities, outputs

### SqueezeNet

In [34]:
def classifier(inputs, n_classes):
    """Construct the Classifier
    inputs: input tensor to the classifier
    n_classes: number of output classes
    """

    # Sets the number of filters equal to the number of output classes.
    encoding = Conv2D(n_classes, (1, 1), strides=1, activation="relu", padding="same")(
        inputs
    )

    # Reduce each feature map (class) to a single value (soft label).
    embedding = GlobalAveragePooling2D()(outputs)

    # Use softmax to squash all the class probabilities to add up to 100%.
    outputs = Activation("softmax")(outputs)
    return outputs