### Practical Examples

In this notebook we are going to do some practicals based on the subclassing api in keras just to cement the concept of subclassing by building larger models like `ResNet`.

In [1]:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras

Let's say we have the following achitecture which is repeatedly used during creation of our model.

```
Conv2D -> BatchNormalization -> ReLU (the common structure for our model.)
```
Let's subclass this achitecture block.


In [3]:
class ConvBlock(keras.layers.Layer):
  def __init__(self, output_features, kernel_size=3):
    super().__init__()
    self.conv = keras.layers.Conv2D(output_features, kernel_size)
    self.bn = keras.layers.BatchNormalization()
  
  def call(self, x, training=False):
    x = self.conv(x)
    x = self.bn(x , training=training)
    return keras.activations.relu(x)

Let's now build a sequential model with 3 of these layers, just like what we did from the prevoius notebook and add an output layer to our model.


In [7]:
model_1 = keras.Sequential([
    ConvBlock(64),
    ConvBlock(128),
    ConvBlock(256),

    # Output blocks
    keras.layers.GlobalMaxPool2D(),
    keras.layers.Dense(64, keras.activations.relu),
    keras.layers.Dense(10, activation='softmax')
])
model_1.build((None, 32, 32, 3))
model_1.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv_block_6 (ConvBlock)     (None, 30, 30, 64)        2048      
_________________________________________________________________
conv_block_7 (ConvBlock)     (None, 28, 28, 128)       74368     
_________________________________________________________________
conv_block_8 (ConvBlock)     (None, 26, 26, 256)       296192    
_________________________________________________________________
global_max_pooling2d_2 (Glob (None, 256)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 64)                16448     
_________________________________________________________________
dense_5 (Dense)              (None, 10)                650       
Total params: 389,706
Trainable params: 388,810
Non-trainable params: 896
______________________________________________

### Residual Block (`ResBlock`)
At the end of the theory we want to build the `ResBlock` using the subclassing `API`.4

#### Theory
In deep learning, it’s common that the deeper network has stronger ability, and performance is better. However, the deeper network also brings out some problems, such as **gradient disappearance** and gradient explosion. There’s had many optimizing method, like **Batchnormlization** layer, **RelU** activations. Although it’s still limitedly optimized not until **skip connection** is widespread use.

<p align="center"><img src="https://miro.medium.com/max/472/1*Cc3o7Hq7aMb0JPb9UuuxzA.png" /></p>

Skip connection is usually used in resnet. It’s a way to avoid gradient diffusion. It’s like the differential coefficient plus 1, even if the original one is small, the error can still be backpropagated.

**ResBlock in code:**

````python
Output = x + Conv2(Conv1(x))
````

### The `ResNet`

In [8]:
class ResBlock(keras.layers.Layer):
  def __init__(self, channels):
    super().__init__()
    self.conv1 = ConvBlock(channels[0])
    self.conv2 = ConvBlock(channels[1])
    self.conv3 = ConvBlock(channels[2])
    self.pooling = keras.layers.MaxPool2D()
    self.identity_mapping = keras.layers.Conv2D(channels[1], 3, padding="same")

  def call(self, x, training=False):
    input_tensor = x
    x = self.conv1(x, training=training)
    x = self.conv2(x, training=training)
    x = self.conv3(x + self.identity_mapping(input_tensor), training=training)
    return self.pooling(x)

### Create a ``ResNet`` model

In [41]:
class ResNetModel(keras.Model):
  def __init__(self):
    super().__init__()
    self.block1 = ResBlock([32, 64, 128])
    self.block2 = ResBlock([128, 128, 256])
    self.block3 = ResBlock([128, 256, 512])
    self.pool = keras.layers.GlobalAveragePooling2D()
    self.classifier = keras.layers.Dense(10)

  def call(self, x, training=False):
    x = self.block1(x, training=training)
    x = self.block2(x, training=training)
    x = self.block3(x, training=training)
    x = self.pool(x, training=training)
    return self.classifier(x)

  def model(self):
    x = keras.layers.Input(shape=(32, 32, 3))
    return keras.Model(inputs=[x], outputs=self.call(x))

### Creating the dataset to test our `ResNetModel`.

In [39]:
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
X_test_tensors = tf.convert_to_tensor(X_test.reshape(-1, 32, 32, 3)/255.0, dtype=tf.float32)
X_train_tensors = tf.convert_to_tensor(X_train.reshape(-1, 32, 32, 3)/255.0, dtype=tf.float32)
y_test_tensors = tf.one_hot(tf.squeeze(y_test), depth=10)
y_train_tensors = tf.one_hot(tf.squeeze(y_train), depth=10)

In [42]:
model_2 = ResNetModel()
model_2.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)
model_2.fit(X_train_tensors, y_train_tensors, epochs=2, verbose=1, validation_data=(X_test_tensors, y_test_tensors))
model_2.model().summary()

Epoch 1/2
Epoch 2/2
Model: "model_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_7 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
res_block_36 (ResBlock)      (None, 16, 16, 128)       95936     
_________________________________________________________________
res_block_37 (ResBlock)      (None, 8, 8, 256)         739968    
_________________________________________________________________
res_block_38 (ResBlock)      (None, 4, 4, 512)         2364032   
_________________________________________________________________
global_average_pooling2d_11  (None, 512)               0         
_________________________________________________________________
dense_19 (Dense)             (None, 10)                5130      
Total params: 3,205,066
Trainable params: 3,201,802
Non-trainable params: 3,264
_________________________

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_5 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
res_block_27 (ResBlock)      (None, 14, 14, 128)       94208     
_________________________________________________________________
res_block_28 (ResBlock)      (None, 7, 7, 256)         739968    
_________________________________________________________________
res_block_29 (ResBlock)      (None, 3, 3, 512)         2364032   
_________________________________________________________________
global_average_pooling2d_8 ( (None, 512)               0         
_________________________________________________________________
dense_16 (Dense)             (None, 10)                5130      
Total params: 3,203,338
Trainable params: 3,200,074
Non-trainable params: 3,264
_____________________________________________

**Conclusion:** The subclass API is more flexible way of building models. 

✔ **Other resources:**

[towardsdatascience](https://towardsdatascience.com/model-sub-classing-and-custom-training-loop-from-scratch-in-tensorflow-2-cc1d4f10fb4e)