## Setup and Imports

This code imports necessary modules and packages from TensorFlow and NumPy. It also disables warnings to ensure a cleaner output.

The CIFAR-10 dataset is a collection of 60,000 32x32 color images in 10 classes, with 6,000 images per class. It's commonly used for image classification tasks. TensorFlow provides a convenient way to load this dataset using the cifar10.load_data() function.

The CIFAR-10 dataset is divided into two sets: training and testing. The training set is used to train the model, while the testing set is used to evaluate its performance.

In [1]:
import tensorflow as tf
from tensorflow.keras.layers import Dense, Conv2D, BatchNormalization, Activation
from tensorflow.keras.layers import AveragePooling2D, Input, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, LearningRateScheduler
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.regularizers import l2
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import cifar10
import numpy as np
import os
import warnings
warnings.filterwarnings('ignore')


## Configuration

The configuration parameters for training the ResNet model are defined. The model_type is set to 'ResNet', indicating the type of model to be trained. The batch_size is specified as 32, representing the number of samples per gradient update during training. Training will be conducted over 200 epochs, with the option for data_augmentation set to True, which enables the augmentation of training data to improve model generalization. The num_classes parameter is set to 10, representing the number of classes in the dataset.

To enhance model performance, subtract_pixel_mean is enabled, which subtracts the pixel mean from the input data. This preprocessing step can help normalize the data and improve convergence during training. Additionally, the model's depth is determined by the n parameter, which controls the number of residual blocks in each stage. The version parameter specifies the version of the ResNet model, and the depth is computed based on the supplied n, determining the overall number of layers in the network.

In [2]:
model_type = 'ResNet'

# Training parameters
batch_size = 32
epochs = 120
data_augmentation = True
num_classes = 10

# Subtracting pixel mean improves accuracy
subtract_pixel_mean = True

# Model parameter
n = 3

# Model version
version = 1

# Computed depth from supplied model parameter n
depth = n * 6 + 2


## Data Loading and Preprocessing

CIFAR-10 dataset is loaded and preprocessed for training the ResNet model. The dataset is loaded using the cifar10.load_data() function, and the training and testing sets are unpacked into (x_train, y_train) and (x_test, y_test) variables, respectively.

The input_shape is determined based on the shape of the input images in the training data. Next, the data is normalized by dividing the pixel values by 255, converting them to floating-point numbers between 0 and 1.

If subtract_pixel_mean is enabled, the mean pixel value is computed from the training data and subtracted from both the training and testing sets. This step ensures that the pixel mean is subtracted from each image, aiding in normalization and improving convergence during training.

Finally, the class labels are converted to binary class matrices using one-hot encoding, where each class label is represented as a binary vector with a 1 at the index corresponding to the class and 0s elsewhere. This transformation is performed using the tf.keras.utils.to_categorical() function, enabling categorical classification during model training.

In [3]:
# Load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Input image dimensions.
input_shape = x_train.shape[1:]

# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# If subtract pixel mean is enabled
if subtract_pixel_mean:
    x_train_mean = np.mean(x_train, axis=0)
    x_train -= x_train_mean
    x_test -= x_train_mean

# Convert class vectors to binary class matrices.
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)


## Learning Rate Schedule

The `lr_schedule` function defines a learning rate schedule for adjusting the learning rate during training epochs. This schedule is designed to reduce the learning rate at specific epochs to improve model convergence and performance. The function takes the current epoch number as input and returns the corresponding learning rate.

At the beginning of training, the initial learning rate `lr` is set to `1e-3` (0.001). Then, the function checks the current epoch number to determine whether to adjust the learning rate. 

- If the epoch is greater than 180, the learning rate is reduced by a factor of `0.5e-3`.
- If the epoch is greater than 160, the learning rate is reduced by a factor of `1e-3`.
- If the epoch is greater than 120, the learning rate is reduced by a factor of `1e-2`.
- If the epoch is greater than 80, the learning rate is reduced by a factor of `1e-1`.

After calculating the new learning rate based on the epoch, the function prints the updated learning rate for monitoring purposes. Finally, the computed learning rate is returned to be used in the training process.

This learning rate schedule allows for gradual reductions in the learning rate as training progresses, which can help stabilize training, prevent overfitting, and improve the generalization of the model.

In [4]:
def lr_schedule(epoch):
    """Learning Rate Schedule
    Learning rate is scheduled to be reduced after 80, 120, 160, 180 epochs.
    Called automatically every epoch as part of callbacks during training.
    # Arguments
        epoch (int): The number of epochs
    # Returns
        lr (float32): learning rate
    """
    lr = 1e-3
    if epoch > 180:
        lr *= 0.5e-3
    elif epoch > 160:
        lr *= 1e-3
    elif epoch > 120:
        lr *= 1e-2
    elif epoch > 80:
        lr *= 1e-1
    print('Learning rate: ', lr)
    return lr


## ResNet Utilities

The `resnet_layer` function constructs a 2D Convolution-Batch Normalization-Activation stack for building the ResNet model. This function is essential for creating the residual blocks within the network architecture. It provides flexibility in configuring convolutional layers with or without batch normalization and activation functions.

- **Inputs**: 
  - `inputs`: Input tensor from the image or the previous layer.
  - `num_filters`: Number of filters for the Conv2D layer.
  - `kernel_size`: Size of the convolutional kernel.
  - `strides`: Strides for the convolution operation.
  - `activation`: Activation function to be applied.
  - `batch_normalization`: Boolean indicating whether to include batch normalization.
  - `conv_first`: Boolean indicating the order of operations.

- **Returns**: 
  - `x`: Tensor as input to the next layer.

This function starts by defining a Conv2D layer with specified parameters such as the number of filters, kernel size, strides, padding, and kernel initializer. Then, it applies batch normalization and activation functions according to the specified order (`conv_first`). If `conv_first` is True, convolution is applied first followed by batch normalization and activation. Otherwise, batch normalization and activation are applied before convolution.

##### ResNet Version 1 Model

The `resnet_v1` function constructs the ResNet Version 1 model, which consists of multiple residual blocks with stacking convolutional layers. This model architecture is known for its deep structure and skip connections, which facilitate training of very deep networks while mitigating issues like the vanishing gradient problem.

- **Inputs**: 
  - `input_shape`: Shape of the input image tensor.
  - `depth`: Number of core convolutional layers.
  - `num_classes`: Number of classes (CIFAR10 has 10).

- **Returns**: 
  - `model`: Keras model instance.

This function first validates the depth parameter to ensure it follows the ResNet architecture guidelines. Then, it proceeds to define the model by stacking residual units with convolutional layers. The number of residual blocks and filters are determined based on the provided depth. Finally, the model concludes with an average pooling layer, a flatten layer, and a dense output layer with softmax activation for classification.

In [5]:
def resnet_layer(inputs,
                 num_filters=16,
                 kernel_size=3,
                 strides=1,
                 activation='relu',
                 batch_normalization=True,
                 conv_first=True):
    """2D Convolution-Batch Normalization-Activation stack builder
    # Arguments
        inputs (tensor): input tensor from input image or previous layer
        num_filters (int): Conv2D number of filters
        kernel_size (int): Conv2D square kernel dimensions
        strides (int): Conv2D square stride dimensions
        activation (string): activation name
        batch_normalization (bool): whether to include batch normalization
        conv_first (bool): conv-bn-activation (True) or
            bn-activation-conv (False)
    # Returns
        x (tensor): tensor as input to the next layer
    """
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))

    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x

def resnet_v1(input_shape, depth, num_classes=10):
    """ResNet Version 1 Model builder [a]
    Stacks of 2 x (3 x 3) Conv2D-BN-ReLU
    Last ReLU is after the shortcut connection.
    At the beginning of each stage, the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, while the number of filters is
    doubled. Within each stage, the layers have the same number filters and the
    same number of filters.
    Features maps sizes:
    stage 0: 32x32, 16
    stage 1: 16x16, 32
    stage 2:  8x8,  64
    The Number of parameters is approx the same as Table 6 of [a]:
    ResNet20 0.27M
    ResNet32 0.46M
    ResNet44 0.66M
    ResNet56 0.85M
    ResNet110 1.7M
    # Arguments
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)
    # Returns
        model (Model): Keras model instance
    """
    if (depth - 2) % 6 != 0:
        raise ValueError('depth should be 6n+2 (eg 20, 32, 44 in [a])')
    # Start model definition.
    num_filters = 16
    num_res_blocks = int((depth - 2) / 6)

    inputs = Input(shape=input_shape)
    x = resnet_layer(inputs=inputs)
    # Instantiate the stack of residual units
    for stack in range(3):
        for res_block in range(num_res_blocks):
            strides = 1
            if stack > 0 and res_block == 0:  # first layer but not first stack
                strides = 2  # downsample
            y = resnet_layer(inputs=x,
                             num_filters=num_filters,
                             strides=strides)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters,
                             activation=None)
            if stack > 0 and res_block == 0:  # first layer but not first stack
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = tf.keras.layers.add([x, y])
            x = Activation('relu')(x)
        num_filters *= 2

    # Add classifier on top.
    # v1 does not use BN after last shortcut connection-ReLU
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model


## Model Creation and Compilation

After constructing the ResNet Version 1 model using the `resnet_v1` function with the specified input shape and depth, the code compiles the model for training. The compilation involves configuring the model with a loss function, an optimizer, and evaluation metrics.

- **Loss Function**: Categorical cross-entropy is chosen as the loss function, suitable for multi-class classification tasks like CIFAR10.

- **Optimizer**: Adam optimizer is utilized for optimizing the model parameters during training. The learning rate is initialized using the `lr_schedule` function with an initial epoch value of 0.

- **Metrics**: Accuracy is selected as the evaluation metric to monitor the model's performance during training. It measures the proportion of correctly classified images among the total number of images.

In [6]:
model = resnet_v1(input_shape=input_shape, depth=depth)

model.compile(loss='categorical_crossentropy',
              optimizer=Adam(learning_rate=lr_schedule(0)),
              metrics=['accuracy'])


Learning rate:  0.001


## Callbacks and Model Saving


This code prepares callbacks for model saving and learning rate adjustment during training. 

- **Model Saving Directory**: It specifies the directory where the trained model checkpoints will be saved. If the directory doesn't exist, it creates one.

- **Model Checkpoint**: This callback monitors the validation accuracy during training and saves the model with the highest validation accuracy. It ensures that we have the best-performing model saved.

- **Learning Rate Scheduler**: The learning rate scheduler adjusts the learning rate during training according to a predefined schedule. It is based on the `lr_schedule` function defined earlier.

- **Learning Rate Reducer**: This callback reduces the learning rate when the validation loss has stopped improving, helping to fine-tune the training process.

- **Callbacks List**: Finally, all the defined callbacks are added to a list, which will be passed to the `fit` method during model training. These callbacks will be executed at various stages of training to control the training process and save the model's progress.

In [7]:
# Prepare model model saving directory.
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'cifar10_%s_model.{epoch:03d}.h5' % model_type
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
filepath = os.path.join(save_dir, model_name + ".keras")

# Prepare callbacks for model saving and for learning rate adjustment.
checkpoint = ModelCheckpoint(filepath=filepath,
                             monitor='val_accuracy',
                             verbose=1,
                             save_best_only=True)

lr_scheduler = LearningRateScheduler(lr_schedule)

lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
                               cooldown=0,
                               patience=5,
                               min_lr=0.5e-6)

callbacks = [checkpoint, lr_reducer, lr_scheduler]


## Training

Responsible for training the model either with or without data augmentation. 

- If `data_augmentation` is set to `False`, the model trains on the original dataset. It calls the `fit` method on the model with the training data (`x_train`, `y_train`), validation data (`x_test`, `y_test`), batch size, number of epochs, and the defined callbacks.

- If `data_augmentation` is set to `True`, real-time data augmentation is applied to the input data using the `ImageDataGenerator` provided by TensorFlow. Various augmentation techniques such as random shifts, flips, and rotations are applied to the input images to increase the diversity of the training data and improve the model's generalization ability. The `fit` method is then called on the model using the batches generated by `datagen.flow()`, along with the validation data and callbacks.

In [8]:
# Run training, with or without data augmentation.
if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, y_test),
              shuffle=True,
              callbacks=callbacks)
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        # set input mean to 0 over the dataset
        featurewise_center=False,
        # set each sample mean to 0
        samplewise_center=False,
        # divide inputs by std of dataset
        featurewise_std_normalization=False,
        # divide each input by its std
        samplewise_std_normalization=False,
        # apply ZCA whitening
        zca_whitening=False,
        # epsilon for ZCA whitening
        zca_epsilon=1e-06,
        # randomly rotate images in the range (deg 0 to 180)
        rotation_range=0,
        # randomly shift images horizontally
        width_shift_range=0.1,
        # randomly shift images vertically
        height_shift_range=0.1,
        # set range for random shear
        shear_range=0.,
        # set range for random zoom
        zoom_range=0.,
        # set range for random channel shifts
        channel_shift_range=0.,
        # set mode for filling points outside the input boundaries
        fill_mode='nearest',
        # value used for fill_mode = "constant"
        cval=0.,
        # randomly flip images
        horizontal_flip=True,
        # randomly flip images
        vertical_flip=False,
        # set rescaling factor (applied before any other transformation)
        rescale=None,
        # set function that will be applied on each input
        preprocessing_function=None,
        # image data format, either "channels_first" or "channels_last"
        data_format=None,
        # fraction of images reserved for validation (strictly between 0 and 1)
        validation_split=0.0)

    # Compute quantities required for feature-wise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(x_train)

    # Fit the model on the batches generated by datagen.flow().
    model.fit(datagen.flow(x_train, y_train, batch_size=batch_size),
              validation_data=(x_test, y_test),
              epochs=epochs, verbose=1, workers=4,
              callbacks=callbacks)

# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])


Using real-time data augmentation.
Learning rate:  0.001
Epoch 1/120
Epoch 1: val_accuracy improved from -inf to 0.50290, saving model to /home/studio-lab-user/sagemaker-studiolab-notebooks/saved_models/cifar10_ResNet_model.001.h5.keras
Learning rate:  0.001
Epoch 2/120
Epoch 2: val_accuracy improved from 0.50290 to 0.64830, saving model to /home/studio-lab-user/sagemaker-studiolab-notebooks/saved_models/cifar10_ResNet_model.002.h5.keras
Learning rate:  0.001
Epoch 3/120
Epoch 3: val_accuracy improved from 0.64830 to 0.69540, saving model to /home/studio-lab-user/sagemaker-studiolab-notebooks/saved_models/cifar10_ResNet_model.003.h5.keras
Learning rate:  0.001
Epoch 4/120
Epoch 4: val_accuracy improved from 0.69540 to 0.71390, saving model to /home/studio-lab-user/sagemaker-studiolab-notebooks/saved_models/cifar10_ResNet_model.004.h5.keras
Learning rate:  0.001
Epoch 5/120
Epoch 5: val_accuracy did not improve from 0.71390
Learning rate:  0.001
Epoch 6/120
Epoch 6: val_accuracy did not

In [10]:
# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Test loss: 0.48086869716644287
Test accuracy: 0.906000018119812
