# Creating a Custom Model ─ ResNet

In this notebook, we'll delve deeper into Model subclassing by constructing a more intricate architecture known as a Residual Network (ResNet). Residual Networks revolutionized deep learning by introducing skip connections that allow gradients to flow through the network without degradation, significantly easing the training of very deep networks.

Residual Networks utilize skip connections, or shortcuts to jump over some layers. Typical ResNet architectures involve several blocks of layers where each block has a skip connection that bypasses one or more layers. <br></br>

Steps for Implementing ResNet as a Subclassed Model
1. **Model Subclassing:** To manage the complexity inherent in Residual Networks and to enhance code reusability, we'll define our ResNet architecture by subclassing the Model class from Keras. This approach provides a structured way to encapsulate the network's behavior, keeping the code organized and modular.
2. **Building Blocks:** Within our ResNet class, we can define separate methods or sub-classes for different blocks of layers. Each block can have its own skip connections and sequence of layers. This modular block design makes it easier to experiment with different layer configurations and to scale the architecture up or down.
3. **Utilizing Keras Functionalities:** By inheriting from the Model class, our custom ResNet will inherit all the powerful functionalities of Keras models, including `compile()`, `fit()`, `evaluate()`, and `predict()` methods. This integration allows for a seamless training, evaluation, and application of the deep learning models. <br></br>


##### **Advantages of Model Subclassing for Complex Architectures**
- **Flexibility**: Subclassing provides the flexibility to implement models that have complex, non-linear topologies with varying layer connections, such as loops, branches, and multi-output configurations.
- **Clarity and Modularity**: Organizing the ResNet into a class with methods for each block of layers keeps the code clean and understandable. It also simplifies the process of modifying the architecture or adapting it to different tasks.  <br></br>

Building a Residual Network using model subclassing is an excellent way to practice designing advanced neural network architectures. It not only helps in understanding the flow and benefits of gradients within deep networks but also prepares for implementing and managing other sophisticated models in the future projects.

## Imports

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.layers import Layer

## Implement Model subclasses

One of the foundational elements we'll focus on is the Identity Block. This block is crucial for understanding how skip connections work within the architecture to facilitate easier training of deep networks. These skip connections essentially allow the network to learn an identity function that ensures the main path's activations are not altered but are instead incrementally adjusted, which helps in preventing the vanishing gradient problem in deep networks.


- **Class Structure:** The Identity Block will be implemented as a subclass of the Model class from Keras. This structure will allow the Identity Block to integrate seamlessly with other Keras functionalities and to be reusable across different parts of the network or in different models.
- **Initialization Method (`__init__()`):** Within the initialization method, we will define all the necessary components of the Identity Block. This typically includes convolutional layers, batch normalization layers, and activation layers. Each component is initialized and configured here, ready to be connected in the `call()` method.
- **Forward Pass (`call()`):** The call method is where the actual data flow through the block is defined. Inputs pass through the convolutional and normalization layers, and the result is added back to the block's input through a skip connection. This method must handle both the transformation of the input through the block’s main path and the merging of this output with the original input using the `add()` operation.

In [None]:
class IdentityBlock(tf.keras.Model):
    def __init__(self, filters, kernel_size):
        super(IdentityBlock, self).__init__(name='')

        self.conv1 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
        self.bn1 = tf.keras.layers.BatchNormalization()

        self.conv2 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
        self.bn2 = tf.keras.layers.BatchNormalization()

        self.act = tf.keras.layers.Activation('relu')
        self.add = tf.keras.layers.Add()

    def call(self, input_tensor):
        x = self.conv1(input_tensor)
        x = self.bn1(x)
        x = self.act(x)

        x = self.conv2(x)
        x = self.bn2(x)

        x = self.add([x, input_tensor])
        x = self.act(x)
        return x

With the `Identity Block` implemented, the next step is to build the full Residual Network (ResNet) model. This involves strategically placing our custom Identity Blocks within the network to leverage their benefits throughout the model's depth. We will follow the following steps to build full ResNet Model :

1. **Setting Up the Initial Layers:** The ResNet model begins with standard layers that prepare the input for deeper processing. This includes a convolutional layer followed by batch normalization and a ReLU activation to introduce non-linearity. A max pooling layer is then used to reduce the spatial dimensions of the output, condensing the feature maps and reducing computational overhead in deeper layers.
2. **Incorporating Identity Blocks:** The core of the ResNet model involves placing Identity Blocks in sequence. In this model, we call the IdentityBlock class twice. Each instance of IdentityBlock processes the data sequentially, applying its internal layers and skip connections to maintain a robust flow of gradients.
3. **Completing the Network:** After the data passes through the Identity Blocks, it is pooled globally to reduce each feature map to a single number, effectively summarizing the features extracted by the network. Finally, a dense layer with a softmax activation function classifies the input into the desired number of classes.

In [None]:
class ResNet(tf.keras.Model):
    def __init__(self, num_classes):
        super(ResNet, self).__init__()
        self.conv = tf.keras.layers.Conv2D(64, 7, padding='same')
        self.bn = tf.keras.layers.BatchNormalization()
        self.act = tf.keras.layers.Activation('relu')
        self.max_pool = tf.keras.layers.MaxPool2D((3, 3))

        # Use the Identity blocks that we just defined
        self.id1a = IdentityBlock(64, 3)
        self.id1b = IdentityBlock(64, 3)

        self.global_pool = tf.keras.layers.GlobalAveragePooling2D()
        self.classifier = tf.keras.layers.Dense(num_classes, activation='softmax')

    def call(self, inputs):
        x = self.conv(inputs)
        x = self.bn(x)
        x = self.act(x)
        x = self.max_pool(x)

        # insert the identity blocks in the middle of the network
        x = self.id1a(x)
        x = self.id1b(x)

        x = self.global_pool(x)
        return self.classifier(x)

## Training the Model

With our ResNet model defined as a subclass of the Model class from Keras, we are now well-equipped to leverage the full suite of functionalities that Keras offers. This structure not only supports a streamlined training process but also allows for easy model serialization and evaluation.<br></br>

#### **Key Features Leveraged from Keras:**
1. **Training:** We can train the model using the standard `.fit()` method, which handles everything from forward propagation to backpropagation and updating the model weights.
2. **Serialization:** Keras provides methods for saving and loading the model, which is crucial for deployment or continuing training later.
3. **Evaluation:** After training, we can use `.evaluate()` to test the model’s performance on new, unseen data, providing metrics such as loss and accuracy.


Once the ResNet model class is defined, we can instantiate and train it just like any other Keras model.

In [None]:
# utility function to normalize the images and return (image, label) pairs.
def preprocess(features):
    return tf.cast(features['image'], tf.float32) / 255., features['label']

# create a ResNet instance with 10 output units for MNIST
resnet = ResNet(10)
resnet.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# load and preprocess the dataset
dataset = tfds.load('mnist', split=tfds.Split.TRAIN)
dataset = dataset.map(preprocess).batch(32)

# train the model
resnet.fit(dataset, epochs=1)



<tensorflow.python.keras.callbacks.History at 0x7f719013e210>