# Introduction to Computer Vision Project using Pre-trained ResNet101 on CIFAR-10 Dataset


*Introduction:*

Welcome to my computer vision project! In this project, we'll be leveraging the power of deep learning to tackle the task of image classification using the CIFAR-10 dataset. Our goal is to build an accurate image classifier capable of identifying objects in images across ten different classes.

To achieve this, we'll be employing transfer learning, a technique that allows us to leverage pre-trained neural network architectures and adapt them to our specific task. Specifically, we'll be using the ResNet101 model, which has been pre-trained on the ImageNet dataset, a large-scale dataset with millions of labeled images across thousands of classes.

The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. The classes are: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.

Our approach involves fine-tuning the pre-trained ResNet101 model on the CIFAR-10 dataset. This allows us to benefit from the generalization capabilities learned by the model on ImageNet while adapting it to the specific characteristics of our target dataset.

In this project, we'll walk through the entire pipeline, from data preparation and preprocessing to model training, evaluation, and inference. By the end, we aim to have a robust image classifier capable of accurately predicting the classes of images from the CIFAR-10 dataset.

Let's dive into the code and start building our image classification pipeline!


Let's load CIFAR-10 dataset and divide it into training and validation sets.

In [12]:
import tensorflow as tf

tf.random.set_seed(42)

(X_train_full, y_train_full), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
X_train = X_train_full[5000:]
y_train = y_train_full[5000:]
X_valid = X_train_full[:5000]
y_valid = y_train_full[:5000]

In [13]:
X_train = tf.data.Dataset.from_tensor_slices(X_train)
y_train = tf.data.Dataset.from_tensor_slices(y_train)
X_valid = tf.data.Dataset.from_tensor_slices(X_valid)
y_valid = tf.data.Dataset.from_tensor_slices(y_valid)
X_test = tf.data.Dataset.from_tensor_slices(X_test)
y_test = tf.data.Dataset.from_tensor_slices(y_test)
train_dataset_raw = tf.data.Dataset.zip((X_train, y_train))
valid_dataset_raw = tf.data.Dataset.zip((X_valid, y_valid))
test_dataset_raw = tf.data.Dataset.zip((X_test, y_test))

In [15]:
from tensorflow.keras import mixed_precision

mixed_precision.set_global_policy('mixed_float16')

INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: NVIDIA GeForce RTX 3060 Laptop GPU, compute capability 8.6


# Data preprocessing

ResNet-101 model expects 224x224 pixel images, and it also expects it to be scaled from 0 to 1, hopefully, each model provides `preprocess_input()` method, that can be used to preprocess input images. Keras provide a lot of ways to resize and rescale images, like `Upsampling2D` layer, `Lambda` layer and so on. These methods involve preprocessing data on the fly, that can slow down training. Another approach is to preprocess images before training and feed them into the model. I will go for the second approach.

In [16]:
preprocess = tf.keras.Sequential([
    tf.keras.layers.Resizing(height=224, width=224, crop_to_aspect_ratio=True),
    tf.keras.layers.Lambda(tf.keras.applications.resnet.preprocess_input)
])

# Training data shuffling and prefetching

It is important to shuffle the training set, since `Gradient descent` works best, when instances of an existing data are independent and identically distributed, and shuffling insuring those conditions. There is a simple method called `shuffle()`, that will do all the work for us. After, I want to prefetch dataset, this insures that dataset will always be `n` batch ahead, meaning that, while the algorithm works on one batch, the dataset will work on the next batch to make it ready when the algorithm finishes with the current one. There is a `prefetch(1)` method, where 1 means => 1 batch ahead.

# Speeding up training

It could be done by utilizing `num_parallel_calls()` method when calling `map()`, also we can `cache()` the dataset context into RAM, but it can only be done, while dataset is small enough to fit into RAM.

In [17]:
batch_size = 32
train_dataset = train_dataset_raw.map(lambda X, y:(preprocess(X), y))
train_dataset = train_dataset.shuffle(buffer_size=3000, seed=42, reshuffle_each_iteration=False).batch(batch_size).prefetch(1)
valid_dataset = valid_dataset_raw.map(lambda X, y:(preprocess(X), y)).batch(batch_size)
test_dataset = test_dataset_raw.map(lambda X, y:(preprocess(X), y)).batch(batch_size)

`reshuffle_each_iteration=False` is set to get the same order on a shuffled dataset for testing purposes, it is generally recommended to leave it as a default and use `repeat()` method to make `shuffle()` method generate new order each iteration. 

# Data augmentation

It is a good way of addressing the problem of not enough training instances, but , I believe, 50000 instances is enough for my goals. If you are interested in it, there are several layers about which you can read in the Keras documentation: `RandomFlip()`, `RandomRotation()` and `RandomConstrany()`

# Loading model

Now we are ready to load ResNet101 model. We have to set `include_top=False`  to load model without fully connected top layers, so we will be able to put our own. After we need to add global average pooling layer and fully connected Dense layer with `softmax` activation function with 10 output units, which we've deleted by `include_top=False`, 

In [21]:
base_model = tf.keras.applications.resnet.ResNet101(include_top=False)
avg = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
drop = tf.keras.layers.Dropout(0.5)(avg)
output = tf.keras.layers.Dense(10, activation="softmax", dtype="float32")(drop)
model = tf.keras.Model(inputs=base_model.input, outputs=output)

# Freezing low layers

Since we've added new layers, their weights were initialized randomly, consequently model will make a lot of mistakes, so there will be large error gradient that may wreck the reused weights. In order to avoid this scenario, we can freeze the reused layers during the first epochs, to give time for the new layers to learn reasonable weights.

In [22]:
for layer in base_model.layers:
    layer.trainable = False

In [24]:
optimizer = tf.keras.optimizers.Nadam(learning_rate=0.001)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, 
              metrics=["accuracy"])
history = model.fit(train_dataset, validation_data=valid_dataset, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [25]:
model.evaluate(test_dataset)



[1.0423580408096313, 0.9156000018119812]

After the first 2 epochs, this model managed to reach 92% validation accuracy and 90% training set accuracy, which is great result, but loss of 1 indicates, that model is not really confident when making predictions. I suggest to unfreeze some layers of ResNet101 to gain more insight on what is going on. 
I'm unfreezing layers 72 to the top. Also, adding some callbacks allows me to set as many epochs as I want, since ModelCheckpoint will save the best model for us and EarlyStopping will stop the simulation. Consequently, we may not bother about overfitting. 

In [26]:
for layer in base_model.layers[72:]:
    layer.trainable = True

In [30]:
optimizer = tf.keras.optimizers.Nadam(learning_rate=0.001)

early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=6, monitor="val_loss", mode='min')
lr_scheduler_cb = tf.keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=3)
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("my_cifar10_model_v1.keras", save_best_only=True)
callbacks = [early_stopping_cb, checkpoint_cb, lr_scheduler_cb]

In [31]:
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
              metrics=["accuracy"])
history = model.fit(train_dataset, validation_data=valid_dataset, 
                    epochs=100, callbacks=callbacks)

Epoch 1/100




Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
 158/1407 [==>...........................] - ETA: 5:41 - loss: 0.0401 - accuracy: 0.9866

KeyboardInterrupt: 

In [34]:
model = tf.keras.models.load_model("my_cifar10_model_v1.keras")
model.evaluate(test_dataset)



[0.515618085861206, 0.8349999785423279]

After 5th epoch model started to overfit a lot, so my first guess is that I've unfrozen too many layers. This model is twice more confident about its predictions, but at the same time it makes more mistakes. Also, I want to try experimenting with L2 regularization, instead of dropout, that should bound parameter values, which may help with overfitting. One more think is that I want to give more epochs to the first model and after unfreeze just a small amount of layers with lower learning rate.
Here we may see the main disadvantage of Neural Networks in general, too much to bother about. Well, I set an outline on what I want to work now. I get back to you as soon as I find the best solution that works for me.

In [35]:
base_model = tf.keras.applications.resnet.ResNet101(include_top=False)
avg = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
drop = tf.keras.layers.Dropout(0.5)(avg)
output = tf.keras.layers.Dense(10, activation="softmax", dtype="float32")(drop)
model = tf.keras.Model(inputs=base_model.input, outputs=output)
for layer in base_model.layers:
    layer.trainable = False
optimizer = tf.keras.optimizers.Nadam(learning_rate=0.001)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, 
              metrics=["accuracy"])
history = model.fit(train_dataset, validation_data=valid_dataset, epochs=100, callbacks=callbacks)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100


In [52]:
model = tf.keras.models.load_model("my_cifar10_model_v1.keras")
#model.evaluate(test_dataset)

In [53]:
for layer in model.layers:
    if layer.name.startswith("conv5_block3"):
        layer.trainable = True

In [54]:
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("my_cifar10_model_conv5_block3_unfreeze.keras", save_best_only=True)
callbacks = [early_stopping_cb, checkpoint_cb, lr_scheduler_cb]

In [55]:
optimizer = tf.keras.optimizers.Nadam(learning_rate=0.0001)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
              metrics=["accuracy"])
history = model.fit(train_dataset, validation_data=valid_dataset, epochs=100, callbacks=callbacks)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100

KeyboardInterrupt: 

In [64]:
from tensorflow.keras.regularizers import l2

base_model = tf.keras.applications.resnet.ResNet101(include_top=False)
avg = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
drop = tf.keras.layers.Dropout(0.6)(avg)
output = tf.keras.layers.Dense(10, activation="softmax", dtype="float32", kernel_regularizer=l2(0.1))(drop)
model = tf.keras.Model(inputs=base_model.input, outputs=output)

for layer in base_model.layers:
    layer.trainable = False

checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("my_cifar10_model_l2_freeze_all.keras", save_best_only=True)
callbacks = [early_stopping_cb, checkpoint_cb, lr_scheduler_cb]

optimizer = tf.keras.optimizers.Nadam(learning_rate=0.01)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
              metrics=["accuracy"])
history = model.fit(train_dataset, validation_data=valid_dataset, epochs=100, callbacks=callbacks)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
 124/1407 [=>............................] - ETA: 1:36 - loss: 0.6291 - accuracy: 0.8594

KeyboardInterrupt: 

In [66]:
model = tf.keras.models.load_model("my_cifar10_model_l2_freeze_all.keras")

for layer in model.layers:
    if layer.name.startswith("conv5_block3"):
        layer.trainable = True

checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("my_cifar10_model_l2_conv5_unfreeze.keras", save_best_only=True)
callbacks = [early_stopping_cb, checkpoint_cb, lr_scheduler_cb]

optimizer = tf.keras.optimizers.Nadam(learning_rate=0.001)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
              metrics=["accuracy"])
history = model.fit(train_dataset, validation_data=valid_dataset, epochs=100, callbacks=callbacks)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100

KeyboardInterrupt: 

In [67]:
model = tf.keras.models.load_model("my_cifar10_model_l2_conv5_unfreeze.keras")

for layer in model.layers:
    if layer.name.startswith("conv5"):
        layer.trainable = True

checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("my_cifar10_model_l2_conv5_unfreeze_all.keras", save_best_only=True)
callbacks = [early_stopping_cb, checkpoint_cb, lr_scheduler_cb]

optimizer = tf.keras.optimizers.Nadam(learning_rate=0.00001)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
              metrics=["accuracy"])
history = model.fit(train_dataset, validation_data=valid_dataset, epochs=100, callbacks=callbacks)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
 122/1407 [=>............................] - ETA: 2:18 - loss: 0.0076 - accuracy: 1.0000

KeyboardInterrupt: 

In [70]:
for layer in model.layers:
    if layer.name.startswith("conv4"):
        layer.trainable = True

checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("my_cifar10_model_l2_conv4_unfreeze.keras", save_best_only=True)
callbacks = [early_stopping_cb, checkpoint_cb, lr_scheduler_cb]

optimizer = tf.keras.optimizers.Nadam(learning_rate=0.00001)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
              metrics=["accuracy"])
history = model.fit(train_dataset, validation_data=valid_dataset, epochs=100, callbacks=callbacks)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100

KeyboardInterrupt: 

In [72]:
model = tf.keras.models.load_model("my_cifar10_model_l2_conv4_unfreeze.keras")
model.evaluate(test_dataset)



[0.15199005603790283, 0.9613000154495239]

## Conclusion

After a day of training and tweaking parameters, I've concluded that selecting a complex model for CIFAR-10 was a mistake. I primarily struggled with overfitting and regret skipping the data augmentation step, which definitely could have mitigated this issue. Additionally, I encountered several out-of-memory (OOM) errors. To address these, I tried various methods; reducing the batch size helped, but it came at the cost of lower accuracy. A more effective solution was setting the mixed precision policy, which did not impact the model's performance.

Ultimately, I documented the steps for the solution that worked best. To combat the overfitting problems, I used L2 and Dropout regularization methods, which address the issue from different angles. Achieving a test loss of 0.151 and an accuracy of 96.1% is a satisfactory outcome, considering the lack of data augmentation and the initially unsuitable architecture choice. Moving forward, I plan to design my own Residual block and train CIFAR-10 using it.