[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AMLA-UBC/100-Exploring-the-World-of-Modern-Machine-Learning/blob/main/Modern_CNN_on_Image_Classification_Tutorial.ipynb)

# Install and load the required modules

In [None]:
!pip install -q tensorflow tensorflow_datasets
import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt

# Create the train and test sets

In [None]:
# Loading the dataset
dataset, info = tfds.load('horses_or_humans', with_info=True, as_supervised=True)

# Splitting the dataset
train_dataset, test_dataset = dataset['train'], dataset['test']

# Preprocessing the data
def preprocess(image, label):
    image = tf.cast(image, tf.float32)
    image = image/255.0
    image = tf.image.resize(image, (299, 299))
    return image, label

# Applying the preprocess function to each item in the dataset
train_dataset = train_dataset.map(preprocess).batch(32)
test_dataset = test_dataset.map(preprocess).batch(32)

# Define the train model function

In [None]:
def train_model(model):
  # Compiling the model
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

  # Training the model
  model.fit(train_dataset, epochs=10)

  # Evaluating the model
  test_loss, test_accuracy = model.evaluate(test_dataset)

  print('Test Loss: {}, Test Accuracy: {}'.format(test_loss, test_accuracy))

  return model

# Imagenet Dataset
- Moving forward, you will see or hear about the Imagenet Dataset. Imagenet is an incredibly popular dataset used in the field of machine learning. It is a large collection of over 14 million labeled images, organized into over 20,000 categories. It is a great resource for training and testing computer vision algorithms, which are used to identify objects in photographs.

- Many machine learning models use Imagenet to train a pretrained version. This allows the model to become more accurate and better at recognizing objects in images. The dataset can be downloaded from the Imagenet website: http://image-net.org/download-images

# Convolution Operation
- A convolution operation is a mathematical operation that is used to extract features from an image.


- It works by sliding a small matrix (called a “kernel”) over the image and computing the dot product between the kernel and the original image. This operation is repeated for each pixel in the image, and the output of the convolution is a feature map. This feature map can then be used to detect objects or patterns in the image.

# Xception
- The Xception architecture is a deep convolutional neural network (CNN) that was introduced in 2016 by Google. It is a powerful model that has been proven to perform well in image classification tasks.



- Compared to other models such as MobileNetV2 and DenseNet121, Xception is unique because it uses a Depthwise Separable Convolution, which is a type of convolutional layer that is more efficient than regular convolutional layers.



- Depthwise Separable convolution is a type of convolutional layer that reduces the number of parameters and computations required in a CNN model. It works by first applying a depthwise convolution, which splits the input channels into separate “groups” and then applies a convolution operation on each of these groups. This reduces the number of parameters and computations required compared to a regular convolutional layer.

In [None]:
# Basic Xception block
inputs = tf.keras.layers.Input(shape=(299, 299, 3))
x = tf.keras.layers.Conv2D(128, (3, 3), strides=(2, 2), padding='same')(inputs)
x = tf.keras.layers.Activation('relu')(x)
x = tf.keras.layers.SeparableConv2D(128, (3, 3), padding='same')(x)
x = tf.keras.layers.Activation('relu')(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)
tf.keras.Model(inputs, x, name="Xception").summary()

In [None]:
# Loading the pretrained Xception model
model = tf.keras.applications.Xception(weights='imagenet', include_top=True, input_shape=(299, 299, 3))

# Train the model
model = train_model(model)

# Save the model
model.save('horses_or_humans_xception.h5')


# InceptionResNetV2

In [None]:
# Basic InceptionResNetV2 block
inputs = tf.keras.layers.Input(shape=(299, 299, 3))
x = tf.keras.layers.Conv2D(128, (3, 3), strides=(2, 2), padding='same')(inputs)
x = tf.keras.layers.Activation('relu')(x)
x = tf.keras.layers.Conv2D(128, (3, 3), padding='same')(x)
x = tf.keras.layers.Activation('relu')(x)
x = tf.keras.layers.Conv2D(128, (3, 3), padding='same')(x)
x = tf.keras.layers.Activation('relu')(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)
tf.keras.Model(inputs, x, name="InceptionResNetV2").summary()

In [None]:
# Loading the pretrained InceptionResNetV2 model
model = tf.keras.applications.InceptionResNetV2(weights='imagenet', include_top=True, input_shape=(299, 299, 3))

# Train the model
model = train_model(model)

# Save the model
model.save('horses_or_humans_inceptionresnetv2.h5')


# MobileNetV2
- Designed to be lightweight and efficient for use on mobile devices
- Smaller number of parameters than other models
- May not perform as well on tasks that require a higher level of abstraction and feature extraction
- May not be able to handle large datasets as effectively as other models


In [None]:
# Basic MobileNetV2 block 
inputs = tf.keras.layers.Input(shape=(299, 299, 3))
x = tf.keras.layers.Conv2D(32, (3, 3), strides=(2, 2), padding='same')(inputs)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
tf.keras.Model(inputs, x, name="MobileNetV2").summary()

In [None]:
# Loading the pretrained MobileNetV2 model
model = tf.keras.applications.MobileNetV2(weights='imagenet', include_top=True, input_shape=(299, 299, 3))

# Train the model
model = train_model(model)

# Save the model
model.save('horses_or_humans_mobilenetv2.h5')


# DenseNet121



In [None]:
# Basic DenseNet121 block 
inputs = tf.keras.layers.Input(shape=(299, 299, 3))
x = tf.keras.layers.Conv2D(64, (7, 7), strides=(2, 2), padding='same')(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.MaxPooling2D((3, 3), strides=(2, 2))(x)
tf.keras.Model(inputs, x, name="DenseNet121").summary()

In [None]:
# Loading the pretrained DenseNet121 model
model = tf.keras.applications.DenseNet121(weights='imagenet', include_top=True, input_shape=(299, 299, 3))

# Train the model
model = train_model(model)

# Save the model
model.save('horses_or_humans_densenet121.h5')


# ResNet50
- ResNet50 is a powerful image classification model that stands out from other models due to its unique architecture. It was developed by Microsoft Research and uses a technique called Deep Residual Learning, which helps it to learn more effectively from data and achieve better performance.

- The main idea behind ResNet50 is that it helps to reduce the training error of a deep neural network by introducing a “shortcut” or “skip connection” between layers. This shortcut allows the model to learn more quickly and accurately by allowing information to flow directly from one layer to the next, instead of having to go through all the layers in between.

- This makes ResNet50 a great choice for image classification tasks, as it can quickly learn complex patterns and features from the data, leading to more accurate predictions. Additionally, ResNet50 is a very efficient model and requires less data and computing power than other models.

In [None]:
# Basic ResNet50 block 
inputs = tf.keras.layers.Input(shape=(299, 299, 3))
x = tf.keras.layers.Conv2D(64, (7, 7), strides=(2, 2), padding='same')(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.MaxPooling2D((3, 3), strides=(2, 2))(x)
tf.keras.Model(inputs, x, name="ResNet50").summary()

In [None]:
# Loading the pretrained ResNet50 model
model = tf.keras.applications.ResNet50(weights='imagenet', include_top=True, input_shape=(299, 299, 3))

# Train the model
model = train_model(model)

# Save the model
model.save('horses_or_humans_resnet50.h5')


Overall, the models that scores the highest accuracy on large multi-class datasets are DenseNet121, InceptionResNetV2, and ResNet50. However, the most efficient model is MobileNetV2 because of its low disk space and RAM usage. The model that is best suited for a specific task depends on the specific requirements of that task, such as the size of the dataset, the amount of memory and computational resources available, and the desired level of abstraction and feature extraction.

# Where can I find datasets outside of Kaggle?

1. Google Cloud Datasets - https://cloud.google.com/ai-platform/data/docs/datasets
2. TensorFlow Datasets - https://www.tensorflow.org/datasets
3. Open Images - https://storage.googleapis.com/openimages/web/index.html
4. ImageNet - http://www.image-net.org/
5. MNIST Database - http://yann.lecun.com/exdb/mnist/
6. Fashion MNIST - https://github.com/zalandoresearch/fashion-mnist
7. CIFAR-10 - https://www.cs.toronto.edu/~kriz/cifar.html
8. UCI Machine Learning Repository - https://archive.ics.uci.edu/ml/index.php
9. Google Trends Data - https://trends.google.com/trends/