<a href="https://colab.research.google.com/github/araghavendra16/pytorch-intro/blob/main/DL_models_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.applications import ResNet50, MobileNet
from tensorflow.keras.optimizers import Adam

# Load CIFAR-10 data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Convert labels to one-hot encoding
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Define the basic CNN model
# Basic CNN Model:
# - This model is a straightforward example of a Convolutional Neural Network.
# - It consists of multiple convolutional layers with ReLU activation, followed by max-pooling layers.
# - Batch normalization is applied after each convolution operation to help stabilize learning and reduce the number of training epochs required for convergence.
# - The network ends with fully connected layers that classify the images into one of the ten categories based on the learned features.
# - This model is good for understanding the basic building blocks of CNNs and is quite efficient on small to medium-sized datasets.
def basic_cnn():
    model = Sequential([
        Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
        BatchNormalization(),
        MaxPooling2D((2, 2)),
        Conv2D(64, (3, 3), activation='relu'),
        BatchNormalization(),
        MaxPooling2D((2, 2)),
        Conv2D(128, (3, 3), activation='relu'),
        Flatten(),
        Dense(128, activation='relu'),
        Dense(10, activation='softmax')
    ])
    return model

# Define the ResNet model
# ResNet50 Model using Transfer Learning:
# - ResNet, or Residual Network, introduces a novel architecture with "skip connections" or "residual blocks" allowing the network to skip one or more layers.
# - These connections help combat the vanishing gradient problem in deep neural networks, enabling training of very deep networks without performance degradation.
# - Here, we use ResNet50, a variant with 50 layers, pre-trained on ImageNet, employing it for transfer learning.
# - The model is customized by adding a new top layer (a fully connected layer) to classify images into the CIFAR-10 classes.
# - The convolution base is frozen during training, meaning only the top layer's weights are updated. This is efficient when the new dataset is similar to the dataset on which the model was originally trained.

def resnet_model():
    base_model = ResNet50(include_top=False, weights='imagenet', input_shape=(32, 32, 3))
    base_model.trainable = False
    model = Sequential([
        base_model,
        Flatten(),
        Dense(10, activation='softmax')
    ])
    return model

# Define the MobileNet model
# MobileNet Model using Transfer Learning:
# - MobileNet is designed specifically for mobile and embedded devices with constraints on memory and computational power.
# - It utilizes depthwise separable convolutions which significantly reduce the number of parameters compared to a standard convolution with similar performance.
# - Like ResNet50, this model uses a pre-trained MobileNet model with a customized top layer for CIFAR-10 classification.
# - The lightweight nature of MobileNet makes it ideal for applications requiring high efficiency and speed, such as real-time image recognition on mobile devices.
# - As with ResNet50, transfer learning is applied here, updating only the top layer weights during training to adapt to the CIFAR-10 dataset.

def mobilenet_model():
    base_model = MobileNet(include_top=False, weights='imagenet', input_shape=(32, 32, 3))
    base_model.trainable = False
    model = Sequential([
        base_model,
        Flatten(),
        Dense(10, activation='softmax')
    ])
    return model

# Compile and train models
models = {'Basic CNN': basic_cnn(), 'ResNet50': resnet_model(), 'MobileNet': mobilenet_model()}
results = {}

for name, model in models.items():
    print(f"Training {name}...")
    model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    history = model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2, verbose=2)
    results[name] = model.evaluate(x_test, y_test, verbose=0)
    print(f"{name} Accuracy: {results[name][1] * 100:.2f}%")

# Compare the performance
for name, result in results.items():
    print(f"{name} Test Accuracy: {result[1] * 100:.2f}%")


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5




Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_1_0_224_tf_no_top.h5
Training Basic CNN...
Epoch 1/10
625/625 - 88s - loss: 1.3461 - accuracy: 0.5204 - val_loss: 1.9470 - val_accuracy: 0.3804 - 88s/epoch - 140ms/step
Epoch 2/10
625/625 - 71s - loss: 0.9540 - accuracy: 0.6637 - val_loss: 1.2137 - val_accuracy: 0.5801 - 71s/epoch - 113ms/step
Epoch 3/10
625/625 - 71s - loss: 0.7591 - accuracy: 0.7348 - val_loss: 1.0938 - val_accuracy: 0.6312 - 71s/epoch - 113ms/step
Epoch 4/10
625/625 - 70s - loss: 0.6209 - accuracy: 0.7818 - val_loss: 0.9336 - val_accuracy: 0.6962 - 70s/epoch - 112ms/step
Epoch 5/10
625/625 - 72s - loss: 0.4998 - accuracy: 0.8234 - val_loss: 0.9737 - val_accuracy: 0.6891 - 72s/epoch - 115ms/step
Epoch 6/10
625/625 - 71s - loss: 0.3973 - accuracy: 0.8621 - val_loss: 1.0369 - val_accuracy: 0.6767 - 71s/epoch - 113ms/step
Epoch 7/10
625/625 - 74s - loss: 0.3031 - accuracy: 0.8943 - val_loss: 1.2752 - val_accuracy: 0.66

Basic CNN Test Accuracy: 65.41%
This model was trained from scratch on CIFAR-10, benefiting from the simplicity and direct applicability to the task at hand.
Overfitting: There is some evidence of overfitting, we observe training loss decreasing and the validation loss starting to increase. This is typical for models that learn to fit the training data too well, at the expense of generalizing to new data.


ResNet50 Test Accuracy: 37.94%
The warning (input_shape is undefined or non-square, or rows is not in [128, 160, 192, 224]) suggests that the pre-trained model did not adapt well to the CIFAR-10 images' size of 32x32. ResNet50 is typically used with larger images (like 224x224), and using it directly on smaller images without appropriate adjustments or pre-processing can lead to poor performance.
The model was pre-trained on ImageNet and then directly applied to CIFAR-10 without fine-tuning the deeper layers (which contain more abstract representations) likely contributed to its underperformance.


MobileNet Test Accuracy: 22.13%
Like ResNet50, MobileNet is designed for input images of size 224x224. The small size of CIFAR-10 images (32x32) means that the model may not capture enough features effectively, resulting in poor performance.
The use of MobileNet with its parameters frozen (except for the top layer) also limits its ability to learn CIFAR-10 specific features effectively.

Ideas:
In models built from scratch, like the Basic CNN, no layers are frozen because the model needs to learn all the features from the beginning specifically for the given task.

Transfer Learning Models: In pre-trained models like ResNet50 and MobileNet used in our experiment, the convolutional layers are typically frozen during the initial phase of training. This approach is taken because these layers contain useful features learned on ImageNet, which are likely beneficial for the new task.

The idea behind freezing the layers is to preserve the features that the model has already learned from a large and diverse dataset like ImageNet. These features are generally quite versatile, ranging from simple edge detectors in the early layers to complex object detectors in the deeper layers.
