<a href="https://colab.research.google.com/github/ManuelGehl/IMPRS-Introduction-to-Neural-Networks-2023/blob/main/Chapter_3_Convolutional_Neural_Network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 3 - Convolutional Neural Network

In the previous chapter, we saw how an ANN can learn the patterns of a grayscale image. The image was flattened into a 1D array and this array was fed into the ANN. Although this approach worked very well on the MNIST dataset, it is not very suitable for real-world problems because it ignores the internal (2D) structure of an image.


In this chapter, we will cover the following topics:
* How are color images represented in the computer?
* How are color images used as input to an ANN?
* How can we use convolution to preserve the internal structure of images?

The workflow in this chapter is also a typical workflow you would use for real-world problems:

1. Inspect data
2. Preprocessing data
3. Use a small portion of the dataset to screen different models
4. Compare different models on the test dataset
5. Optimize the best model and train it on the entire dataset


# Load and inspect the Malaria dataset

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow_hub as hub
import seaborn as sns
sns.set_style()

In [None]:
# Prepare the dataset
BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
tf.random.set_seed(42)

# Load full dataset and split it into 70%:30% train and test dataset
(validation_dataset, train_dataset, test_dataset), info = tfds.load(
    'malaria',
    split=["train[:10%]","train[10%:70%]", "train[70%:100%]"],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
    )

# Take 10% of the train dataset for baseline models
shuffled_dataset = train_dataset.shuffle(buffer_size=len(train_dataset))
train_dataset_10 = shuffled_dataset.take(round(len(train_dataset) / 10))

# Check distribution of different datasets
num_train = len(train_dataset)
num_train_1 = len(train_dataset_10)
num_test = len(test_dataset)
num_validation = len(validation_dataset)
print(f"Samples in train_dataset: {num_train}")
print(f"Samples in train_dataset_10: {num_train_1}")
print(f"Samples in test_dataset: {num_test}")
print(f"Samples in validation_dataset: {num_validation}")
print(f"Samples overall: {num_train+num_test+num_validation}")
info

**Now we check if the classes in train_dataset and train_dataset_10 are balanced.**

In [None]:
# Check if classes are balanced in train_dataset
counter = []
for image, label in train_dataset:
  counter.append(label.numpy())

print(f"Number of parazited images: {counter.count(0)} \nNumber of uninfected images: {counter.count(1)}")

In [None]:
# Check if classes are balanced in train_dataset_10
counter = []

for image, label in train_dataset_10:
  counter.append(label.numpy())

print(f"Number of parazited images: {counter.count(0)} \nNumber of uninfected images: {counter.count(1)}")

**Let's take a look at some of the images in the dataset.**

In [None]:
# Visualize some examples of the dataset
tfds.visualization.show_examples(train_dataset, info)

❓**Question**: Looking at the images, what do you think we need to do to preprocess the dataset before feeding it to an ANN?

In [None]:
# Check image shape of some images
sample_image = next(iter(train_dataset))[0]
print(f"Sample image shape: {sample_image.shape}")

In [None]:
# Resize images, rescale them, and batch them
def transform_image(image, label):
  resized_image = tf.image.resize(image, [IMG_HEIGHT, IMG_WIDTH])
  transformed_image = tf.divide(resized_image, 255.0)
  return transformed_image, label

# Apply the resize function to the dataset
train_dataset = train_dataset.map(transform_image)
train_dataset_10 = train_dataset_10.map(transform_image)
test_dataset = test_dataset.map(transform_image)
validation_dataset = validation_dataset.map(transform_image)

tfds.visualization.show_examples(train_dataset, info)

In [None]:
# Check image shape of some images
sample_image = next(iter(train_dataset))[0]
print(f"Sample image shape: {sample_image.shape}")

**Now we will perform some additional preprocessing steps to speed up data handling, i.e. batching and prefetching our dataset.**

In [None]:
# Batch and prefetch dataset
tf.random.set_seed(42)
train_dataset = train_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
train_dataset_10 = train_dataset_10.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
validation_dataset = validation_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

In [None]:
# Check image tensor dimensions
sample_image = next(iter(train_dataset))[0]
print(f"Sample image shape: {sample_image.shape}")

**Before we start modeling, let's create a lab book to track the different experiments.**

We will use 10% of the train dataset to save time. Virtually, we will upscale our best model and train it on the entire train dataset.

In [None]:
# Create a lab-book to track the different experiments
lab_book = {}

# Model 1 (Baseline):

- Best model from Chapter 2 - Classification
- Flatten layer to convert image tensor (224, 224, 3) into 1D tensor
- 2 Hidden layers of 16 neurons each (activation = ReLU)
- Output layer with 1 neuron (activation = sigmoid)

In [None]:
# Lab Book Name Entry
name = "Model 1"

# Build model
tf.random.set_seed(42)
model_1 = keras.Sequential([layers.Input(shape=(IMG_HEIGHT,IMG_WIDTH,3)),
                            layers.Flatten(),
                            layers.Dense(16, activation="relu"),
                            layers.Dense(16, activation="relu"),
                            layers.Dense(1, activation="sigmoid")
                           ])

model_1.summary()

#  Compile the model
model_1.compile(optimizer=keras.optimizers.Adam(),
                loss=keras.losses.BinaryCrossentropy(),
                metrics=["accuracy"]
                )

# Fit the model
history_1 = model_1.fit(train_dataset_10, validation_data=validation_dataset, epochs=5)

# Write lab-book
lab_book[name] = history_1

# Model 2:
- Conv2D layer with 5 neurons/filters
- Flatten layer
- Output layer with 1 neuron (activation = sigmoid)

In [None]:
# Lab Book Name Entry
name = "CNN model 1"

# Build CNN model
tf.random.set_seed(42)
model_2 = keras.Sequential([layers.Input(shape=(IMG_HEIGHT, IMG_WIDTH, 3)),
                            layers.Conv2D(filters=8,
                                          kernel_size=(3,3),
                                          strides=(1,1),
                                          activation="relu"),
                            layers.Flatten(),
                            layers.Dense(1, activation="sigmoid")
                            ])

model_2.summary()

# Compile CNN model
model_2.compile(loss=keras.losses.BinaryCrossentropy(),
                optimizer=keras.optimizers.Adam(),
                metrics=["accuracy"])

# Fit CNN model
history_2 = model_2.fit(train_dataset_10, validation_data=validation_dataset, epochs=5)

# Write lab-book
lab_book[name] = history_2

**Before we compare the performance of the baseline model (Model 1) and the CNN model (Model 2), take a look at the number of trainable parameters for these two models.**

In [None]:
# Define function to compare the histories of different experiments from the lab-book
def plot_history(histories:dict):
  num_histories = len(histories)
  fig, ax = plt.subplots(2,2, figsize=(10,7), layout="constrained")
  ax[0,0].set_title("Losses")
  ax[0,1].set_title("Validation losses")
  ax[1,0].set_title("Accuracies")
  ax[1,1].set_title("Validation accuracies")
  ax[1,0].set_xlabel("Epochs")
  ax[1,1].set_xlabel("Epochs")

  for i in range (num_histories):
    keys = list(lab_book.keys())
    values = list(lab_book.values())
    ax[0,0].plot(values[i].history["loss"], label=keys[i])
    ax[0,1].plot(values[i].history["val_loss"][1:], label=keys[i])
    ax[1,0].plot(values[i].history["accuracy"], label=keys[i])
    ax[1,1].plot(values[i].history["val_accuracy"], label=keys[i])

  ax[0,1].legend(bbox_to_anchor=(1.8, 1))

# Visualize learning performance of our first two models
plot_history(lab_book)

# Model 3:

In general, the larger and more complex a dataset is, the larger and more complex a model must be to learn the underlying patterns.

Therefore, Model 3 consists of:

* 4 Conv2D layers, each with 8 filters
* 2 MaxPool2D layers
* 1 Dense layer of 128 neurons
* 1 output dense layer of 1 neuron

In [None]:
# Name the model
name = "CNN model 2"

# Build CNN model
tf.random.set_seed(42)
model_3 = keras.Sequential([layers.Input(shape=(IMG_HEIGHT, IMG_WIDTH, 3)),
                            layers.Conv2D(filters=8,
                                           kernel_size=(3,3),
                                           strides=(1,1),
                                           activation="relu"),
                            layers.Conv2D(filters=8,
                                           kernel_size=(3,3),
                                           activation="relu"),
                            layers.MaxPool2D(pool_size=(2,2)),
                            layers.Conv2D(filters=8,
                                           kernel_size=(3,3),
                                           activation="relu"),
                            layers.Conv2D(filters=8,
                                           kernel_size=(3,3),
                                           activation="relu"),
                            layers.MaxPool2D(pool_size=(2,2)),
                            layers.Flatten(),
                            layers.Dense(128, activation="relu"),
                            layers.Dense(1, activation="sigmoid")
                            ])

model_3.summary()

# Compile CNN model
model_3.compile(loss=keras.losses.BinaryCrossentropy(),
                optimizer=keras.optimizers.Adam(),
                metrics=["accuracy"])

# Fit CNN model
history_3 = model_3.fit(train_dataset_10, validation_data=validation_dataset, epochs=5)

# Write lab-book
lab_book[name] = history_3

In [None]:
# Visualize learning performance of all models
plot_history(lab_book)

# Model 4:

Until now, we have relied entirely on our own models. A very powerful technique is called **transfer learning**. The concept here is to take a model that has already been trained on a similar dataset and adapt the top layer(s) to your needs.

We will use EfficientNet/B0 (1), a model that has been trained on millions of different images in the [ImageNet database](https://www.image-net.org/). The output layer of EfficientNet/B0 is removed and replaced with our previously used output layer.

---
(1) Tan, M. &amp; Le, Q.. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. <i>Proceedings of the 36th International Conference on Machine Learning</i>, in <i>Proceedings of Machine Learning Research</i> 97:6105-6114 Available from https://proceedings.mlr.press/v97/tan19a.html.

In [None]:
# Download the pretrained model and save it as a Keras layer
efficientnet_url = "https://tfhub.dev/tensorflow/efficientnet/b0/feature-vector/1"
feature_extractor_layer = hub.KerasLayer(efficientnet_url,
                                         trainable=False,
                                         name='feature_extraction_layer',
                                         input_shape=(IMG_HEIGHT, IMG_WIDTH, 3))

In [None]:
# Name the model
name = "Transfer learning model 1"

# Build model
tf.random.set_seed(42)
model_4 = keras.Sequential([feature_extractor_layer,
                            layers.Dense(1, activation="sigmoid")
                            ])

model_4.summary()

# Compile CNN model
model_4.compile(loss=keras.losses.BinaryCrossentropy(),
                optimizer=keras.optimizers.Adam(),
                metrics=["accuracy"])

# Fit CNN model
history_4 = model_4.fit(train_dataset_10, validation_data=validation_dataset, epochs=5)

# Write lab-book
lab_book[name] = history_4

In [None]:
# Visualize learning performance of all models
plot_history(lab_book)

# Time to train our best model on 100% of the training data

In [None]:
# Let's train Feature model 2 on 100% of training data

# Name your model
name = "Transfer learning model full"

# Build CNN model
tf.random.set_seed(42)
model_5 = keras.models.clone_model(model_4)

model_5.summary()

# Compile CNN model
model_5.compile(loss=keras.losses.BinaryCrossentropy(),
                optimizer=keras.optimizers.Adam(),
                metrics=["accuracy"])

# Fit CNN model
history_5 = model_5.fit(train_dataset, validation_data=validation_dataset, epochs=5)

# Write lab-book
lab_book[name] = history_5

In [None]:
# Visualize learning performance of all models
plot_history(lab_book)

# Final evaluation of all models

**Now we will evaluate all models on the test dataset. This can be considered the final test.**

Depending on the results, the best model can then be deployed for use or for engineering the dataset (e.g., detecting mislabeled images).

In [None]:
# Evaluate all models on test data
eval = []
for i in range(1,6):
  model_name = "model_" + str(i)
  model_name = globals()[model_name]
  eval.append(model_name.evaluate(test_dataset))

# Transform into a dataframe
results = pd.DataFrame(eval).round(decimals=2)

# Plot a bar plot with accuracy scores
model_names = list(lab_book.keys())
model_accuracies = results[1]

fig, ax = plt.subplots(layout="constrained")
p = ax.bar(x=model_names, height=model_accuracies)
ax.set_ylabel("Accuracy")
plt.xticks(rotation=30, ha="right")
ax.bar_label(p);

# Wrapping Up: Your Neural Network Journey

Congratulations on completing our intensive neural network journey! 🎉 You've absorbed essential concepts that form the foundation of machine learning and neural networks.

**Data's Power**: You've grasped how data's type and format influence neural network performance. Your data shapes the path toward accurate insights.

**Power & Responsibility**: Neural networks are remarkable tools, yet wield them responsibly. Your choices impact outcomes.

**Human vs. Machine Learning**: While neural networks excel at patterns, human learning is enriched by intuition and context. Embrace both, as they complement the evolution of AI.

**You've Done It**: Kudos for tackling this condensed exploration of neural networks! You've equipped yourself with vital knowledge in a short time.

**Future Ventures**: As you step forward, remember the strides you've taken here. Your understanding is a stepping stone to innovation in the world of AI.

---

Best wishes,
Manuel