# AI Workshop - Lab 2-1: Computer Vision

In this lab, we will use pre-trained models to classify machine part defects. Like yesterday, we will be working with Keras and TensorFlow, but this time we will use the `tf.keras.applications` module to load pre-trained models.

### Data Overview

The dataset provided for this lab contains images of an automotive part, the **Fender Apron**, captured under varying conditions such as different angles and scales. The dataset has already been labeled as either **defective** or **healthy**, making it ideal for supervised learning tasks.

- **Total Images**: 250
  - **Healthy Parts**: 139 images
  - **Defective Parts**: 111 images
- **Train/Test Split**:
  - **Training Set**: 90% of the data
  - **Test Set**: 10% of the data (25 randomly selected images)

### Key Steps in Lab
1. **Exploration**: We'll start by visualizing the dataset, inspecting some sample images to understand variations and potential challenges.
2. **Preprocessing**: Learn to preprocess images by resizing, normalizing pixel values, and augmenting the dataset to simulate real-world scenarios.
3. **Model Selection**:
   - We'll use the MobileNetV2 architecture, a lightweight and efficient model available in `tf.keras.applications`.
   - The pre-trained model will be fine-tuned to classify images into **defective** and **healthy** categories.
4. **Evaluation**:
   - Evaluate model performance using metrics such as **f1-score**, **precision**, and **recall**.
   - Analyze confusion matrices and visualize predictions for better interpretability.

### Goals
By the end of this lab, you will:
- Understand the concept of transfer learning and how to adapt a pre-trained model to solve specific problems.
- Gain hands-on experience with training and deploying a defect detection system.
- Learn to create APIs for model deployment using Flask.

### Getting Started
- Open the `Machine defect detection.ipynb` notebook for a guided walkthrough.
- Pre-process the dataset using the `functions.py` script.
- Explore the live demo at [Demo Link](https://kapilve.pythonanywhere.com/) to see the final application in action.

Now, let's dive into the **dataset preparation** and explore how these images can be preprocessed for model training!

### Loading the Dataset

We will start by loading the dataset and visualizing a few sample images to understand the data distribution and characteristics. The dataset is already organized for you into training and testing sets, with separate folders for **defective** and **healthy** parts. Keras makes it easy for us to load images like this using the `image_dataset_from_directory` function.

In [None]:
!wget https://github.com/alexwolson/mdlw_materials/raw/refs/heads/main/data/parts_dataset.tar.gz

In [None]:
!tar -xf parts_dataset.tar.gz

In [None]:
import tensorflow as tf
from keras.preprocessing import image_dataset_from_directory
from pathlib import Path

# Define the dataset directory
dataset_dir = Path('parts_dataset')

# Load the dataset
train_dataset = image_dataset_from_directory(
    dataset_dir / 'train',
    image_size=(224, 224),
    batch_size=32,
)

test_dataset = image_dataset_from_directory(
    dataset_dir / 'test',
    image_size=(224, 224),
    batch_size=32,
)

In [None]:
# Visualize the first 9 images from the training set
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
for images, labels in train_dataset.take(1):
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title("defective" if labels[i] == 0 else "healthy")
        plt.axis("off")

It might be easier to tell in some images than others that there is a defect - see if you can spot the differences!

### Preprocessing the Images

As you might have noticed, we have already done some of the preprocessing for this dataset for you. Namely, we've done the following:

- Resizing: The original images are 4160 × 3120 pixels, which is too large for most models to handle. We've resized them to 224 × 224 pixels (although the files are 480 × 480 pixels, so there is some room for further resizing). We also normalized to an aspect ratio of 1:1 by adding black bars to the sides of the images where necessary.
- Train-Test Split: We've split the dataset into training and testing sets, with 90% of the data used for training and 10% for testing.

Now it's time to complete the preprocessing. We'll do the following:

- Normalization: Rescale pixel values to the range [0, 1]. By default, the pixel values are in the range [0, 255] (8-bit integers).
- Data Augmentation: Apply random transformations to the training images to simulate real-world scenarios. This allows us to artificially increase the size of the training set and improve model generalization.
- Configure the Dataset: Optimize the dataset for performance by prefetching, caching, and shuffling the images.

Let's start by rescaling the pixel values to the range [0, 1]. Above we used `matplotlib` to visualize the images, but for the model they are simply a matrix of numbers. Run the below cell to see the pixel values of the first image in the training set.

In [None]:
for images, labels in train_dataset.take(1):
    print(images[0].shape)
    print(images[0][100]) # Slice in the middle of the image
    break

As you can see, the values are all between 0 and 255. You'll also notice that they come in sets of 3. This is because the images are in RGB format, so each pixel has 3 values (red, green, and blue).

In [None]:
# Normalize pixel values
from keras.layers import Rescaling
normalization_layer = Rescaling(1./255)
# Apply normalization to the dataset
normalized_train_dataset = train_dataset.map(lambda x, y: (normalization_layer(x), y))

In [None]:
for images, labels in normalized_train_dataset.take(1):
    print(images[0].shape)
    print(images[0][100]) # Slice in the middle of the image
    break

Now that the pixel values are normalized, we can move on to data augmentation. This step is crucial for improving the model's generalization and robustness. By applying random transformations to the training images, we can simulate real-world scenarios and prevent overfitting.

We will take advantage of the convenient image augmentation layers provided by Keras. These layers can be added directly to the model architecture, making it easy to experiment with different augmentation strategies. We will use the layers in that way, but first let's get a sense of what they do to our images by visualizing a few examples.

Run the below cell to see some augmented images.

In [None]:
from keras.layers import RandomCrop, RandomFlip, RandomRotation, RandomZoom, RandomContrast, RandomBrightness

# Define the augmentation layers
augmentation_layers = tf.keras.Sequential([
    RandomCrop(224, 224),
    RandomFlip("horizontal_and_vertical"),
    RandomRotation(0.2),
    RandomZoom(0.2, 0.2),
    RandomContrast(0.2),
    RandomBrightness(0.2),
])

# Apply augmentation to the dataset
augmented_train_dataset = train_dataset.map(lambda x, y: (augmentation_layers(x), y))

# Visualize the first 9 augmented images
plt.figure(figsize=(10, 10))
for images, labels in augmented_train_dataset.take(1):
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title("defective" if labels[i] == 0 else "healthy")
        plt.axis("off")

By applying these transformations, we can see that the images are now slightly different from the original ones. This variety will help the model learn to generalize better and make more accurate predictions.

### Configure the Dataset

Finally, we will configure the dataset for performance by optimizing it for training. We will apply the following optimizations:

- **Prefetching**: Overlapping the preprocessing and model execution to improve training speed.
- **Caching**: Caching the data to memory to avoid loading the images from disk every time.
- **Shuffling**: Randomizing the order of the images to prevent the model from learning the sequence of images.

Let's apply these optimizations to the training and testing datasets.

In [None]:
# Configure the training dataset
AUTOTUNE = tf.data.AUTOTUNE
train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
train_dataset = train_dataset.cache()
train_dataset = train_dataset.shuffle(buffer_size=1000)

# Configure the testing dataset
test_dataset = test_dataset.prefetch(buffer_size=AUTOTUNE)
test_dataset = test_dataset.cache()
test_dataset = test_dataset.shuffle(buffer_size=1000)

# Model Selection

In contemporary deep learning tasks, it's common to take advantage of _pre-trained_ models. These models have been trained on large-scale datasets and have learned to extract useful features from images. By leveraging pre-trained models, we can significantly reduce the training time and computational resources required to build a new model.

In this lab, we will use the **MobileNetV2** architecture, a lightweight and efficient model available in `tf.keras.applications`. MobileNetV2 is designed for mobile and embedded vision applications, making it a suitable choice for our defect detection task.

Let's go ahead and load the MobileNetV2 model with pre-trained weights and fine-tune it on our dataset.

To adapt the pre-trained MobileNetV2 model to our defect detection task, we need to fine-tune it on our dataset. Fine-tuning involves updating the model's weights to learn the specific features of our dataset. We will freeze the base layers of the model and train only the top layers, which are randomly initialized.

Let's add the top layers to the model and compile it for training.

In [None]:
import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, RandomFlip, RandomRotation, RandomZoom, RandomTranslation, Rescaling, Input
from tensorflow.keras.optimizers import Adam

# Load and preprocess the datasets
batch_size = 32
image_size = (224, 224)

# Define data augmentation and rescaling layers
data_augmentation = Sequential([
    RandomFlip("horizontal"),
    RandomRotation(0.1),
    RandomZoom(0.1),
])

# Rescale pixel values to [0, 1]
rescaling = Rescaling(1.0 / 255)

# Load the MobileNetV2 model with pre-trained weights
base_model = MobileNetV2(
    input_shape=image_size + (3,),
    include_top=False,
    weights='imagenet'
)

# Freeze the base layers
base_model.trainable = False

# Build the model
model = Sequential([
    Input(shape=image_size + (3,)),
    data_augmentation,    # Apply data augmentation
    rescaling,            # Rescale pixel values
    base_model,           # Pre-trained base model
    GlobalAveragePooling2D(),  # Global average pooling
    Dense(128, activation='relu'),  # Fully connected layer
    Dense(1, activation='sigmoid')  # Output layer
])

# Compile the model
model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Display the model summary
model.summary()

# Prefetch the datasets for optimized performance
AUTOTUNE = tf.data.AUTOTUNE
train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
val_dataset = test_dataset.prefetch(buffer_size=AUTOTUNE)

# Train the model
model.fit(
    train_dataset,
    epochs=15,
    validation_data=val_dataset
)


The model has been successfully trained on the dataset, and we can observe the training and validation accuracy improving over the epochs. The final accuracy will depend on the dataset size, model architecture, and training duration. In practice, it's common to experiment with different models, hyperparameters, and training strategies to achieve the best performance.

### Evaluation Metrics

To evaluate the model's performance, we need to compute various metrics such as accuracy, precision, recall, and F1-score. These metrics provide insights into the model's ability to classify defective and healthy parts correctly. We'll also look at `saliency maps` to visualize the regions of the image that the model focuses on during prediction.

Let's start by computing the evaluation metrics for the model.

In [None]:
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np

# Get the predictions and true labels
y_pred = model.predict(test_dataset).flatten()
y_true = np.concatenate([y for x, y in test_dataset])

# Convert probabilities to binary predictions
y_pred = np.where(y_pred > 0.5, 1, 0)

# Compute evaluation metrics
report = classification_report(y_true, y_pred)
conf_matrix = confusion_matrix(y_true, y_pred)

print(report)
print(conf_matrix)

We can see that the model performs decently well in identifying healthy parts, but recall on defective parts is lower. This suggests that the model may be biased towards predicting healthy parts, which could be due to class imbalance or insufficient training data.

### Visualizing Predictions

To gain a better understanding of the model's predictions, we can visualize the images along with their predicted labels. This will help us identify any patterns or inconsistencies in the model's predictions.

Let's visualize a few sample images along with their predicted labels.

In [None]:
import matplotlib.pyplot as plt

# Get the first batch of images and labels
for images, labels in test_dataset.take(1):
    y_pred = model.predict(images).flatten()
    y_pred = np.where(y_pred > 0.5, 1, 0)

    plt.figure(figsize=(10, 10))
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title(f'Predicted: {"defective" if y_pred[i] == 0 else "healthy"}\nTrue: {"defective" if labels[i] == 0 else "healthy"}')
        plt.axis("off")

## Improving Performance

It's your turn! Let's see if you can improve the model's performance by experimenting with different strategies. Here are some ideas to get you started:

- **Data Augmentation**: Try different combinations of data augmentation techniques to increase the diversity of the training set. Consider that too much augmentation can lead to overfitting - maybe we started too high?
- **Hyperparameter Tuning**: Experiment with different learning rates, batch sizes, and optimizers to find the optimal configuration for training.
- **Model Architecture**: Try using a different pre-trained model or building a custom architecture to see if it improves performance.
- **Fine-Tuning**: We _froze_ the base model to only train the top layers. We can unfreeze some of the base layers and train them along with the top layers to learn more specific features from the dataset.

# Data Augmentation

You can make changes to the data augmentation layers in the cell below to experiment with different combinations of transformations. Try adding or removing layers, changing the parameters, or using different types of augmentations to see how they affect the model's performance.

In [None]:
from keras.layers import RandomCrop, RandomFlip, RandomRotation, RandomZoom, RandomContrast, RandomBrightness

# Define the augmentation layers
data_augmentation = Sequential([
    RandomCrop(224, 224),
    RandomFlip("horizontal_and_vertical"),
    RandomRotation(0.2),
    RandomZoom(0.2, 0.2),
    RandomContrast(0.2),
    RandomBrightness(0.2),
])

# Hyperparameter Tuning

You can experiment with different hyperparameters such as learning rates, batch sizes, and optimizers to improve the model's performance. Try changing the values in the cell below and observe how they affect the training process and final accuracy.

In [None]:
from keras.optimizers import Adam, SGD, RMSprop, Adagrad

learning_rate = 0.001
batch_size = 32
image_size = (224, 224) # Can be increased up to 480x480
optimizer = Adam(learning_rate=learning_rate)

## Model Architecture

We used MobileNetV2 in this lab, but there are many other pre-trained models available in `tf.keras.applications` that you can experiment with. Using the cell below you can load and fine-tune models like `ResNet50`, `InceptionV3`, and more.

In [None]:
from keras.applications import MobileNetV2, ResNet50, InceptionV3

# base_model = MobileNetV2(
#     input_shape=image_size + (3,),
#     include_top=False,
#     weights='imagenet'
# )
#
# base_model = ResNet50(
#     input_shape=image_size + (3,),
#     include_top=False,
#     weights='imagenet'
# )

base_model = InceptionV3(
    input_shape=image_size + (3,),
    include_top=False,
    weights='imagenet'
)

# Fine-Tuning

You can experiment with fine-tuning the pre-trained model by unfreezing some of the base layers and training them along with the top layers. Try unfreezing different blocks of layers and observe how it affects the model's performance.

In [None]:
# Unfreeze the base layers
base_model.trainable = True

# Fine-tune from this layer onwards
fine_tune_at = 100

# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
    layer.trainable = False

In [None]:
# Rescale pixel values to [0, 1]
rescaling = Rescaling(1.0 / 255)

# Build the model
model = Sequential([
    Input(shape=image_size + (3,)),
    data_augmentation,    # Apply data augmentation
    rescaling,            # Rescale pixel values
    base_model,           # Pre-trained base model
    GlobalAveragePooling2D(),  # Global average pooling
    Dense(128, activation='relu'),  # Fully connected layer
    Dense(1, activation='sigmoid')  # Output layer
])

# Compile the model
model.compile(
    optimizer=optimizer,
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Display the model summary
model.summary()

# Prefetch the datasets for optimized performance
AUTOTUNE = tf.data.AUTOTUNE
train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
val_dataset = test_dataset.prefetch(buffer_size=AUTOTUNE)

# Train the model
model.fit(
    train_dataset,
    epochs=15,
    validation_data=val_dataset
)
