<a href="https://colab.research.google.com/github/Sitraka17/Python/blob/main/cnn_training_progression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Oxford AI Summit: Kaggle dataset training notebook

In this notebook we cover the model development progression when training on the Kaggle fashion product image dataset with various CNN architectures. Initially we used tensorflow to train a basic and advanced CNN from scratch. While we observed a performance imrpovement when including advanced features like batch normalisation and global average pooling, we were still not able to reach desired accuracy statistics. To get around this, we chose to fine tune a pre-trained CNN model called Yolov8. This model has a classification version, along with image segmentation and object detection models and is pre-trained on the ImageNet dataset. We found that performance of this fine-tuned model exceeded that of our tensorflow models.

## Installation, execution

We ran the training for these models in Google Colab environments, where the majority of the pacakages were pre installed, but if you wish to run on a local device you just ensure that libhdf5-dev (or equivalent, depending on your OS) is installed (e.g. sudo apt install libhdf5-dev), along with the following python dependencies:

tensorflow
pandas
kaggle
ultralytics
scikit-learn

Below we only install ultralytics manually as the rest are included in the environment by default.

In [None]:
%pip install ultralytics

Collecting ultralytics
  Downloading ultralytics-8.2.28-py3-none-any.whl (779 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m779.6/779.6 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
Collecting ultralytics-thop>=0.2.5 (from ultralytics)
  Downloading ultralytics_thop-0.2.7-py3-none-any.whl (25 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.8.0->ultralytics)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.8.0->ultralytics)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.8.0->ultralytics)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.8.0->ultralytics)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-c

Note: sometimes the import block needs to be run twice before it suceeded due to some issue with kaggle

In [None]:
import os
import shutil
import pathlib

import pandas as pd

from kaggle.api.kaggle_api_extended import KaggleApi

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization, GlobalAveragePooling2D
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import schedules, AdamW

from sklearn.model_selection import train_test_split

from ultralytics import YOLO

from IPython.display import display

After importing required pacakges, we must check to see what compute we have available for training. It a TPU is available, then we set it up, otherwise we check to see if there are GPUs available, or if we are operating on CPU alone. Note: this is only required for the tensorflow train, and ultralytics package deals with this separately

In [None]:
# Enable TPU if available
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.TPUStrategy(tpu)
    print('Running on TPU')
except ValueError:
    strategy = tf.distribute.get_strategy()
    print('Running on GPU or CPU')
    print(tf.config.experimental.list_physical_devices())
    if (gpus := tf.config.experimental.list_physical_devices('GPU')):
        for gpu in gpus:
            print(tf.config.experimental.get_device_details(gpu))

Running on GPU or CPU
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
{'compute_capability': (7, 5), 'device_name': 'Tesla T4'}


## General Data Preprocessing

We then download the fashion product images (small) dataset (only have to run this block once if you are doing multiple trainings/experimenting)

In [None]:
api = KaggleApi()

dataset = 'paramaggarwal/fashion-product-images-small'
destination_folder = 'fashion_product_images'

api.dataset_download_files(dataset, path=destination_folder, unzip=True, quiet=False)

Dataset URL: https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-small
Downloading fashion-product-images-small.zip to fashion_product_images


  0%|          | 0.00/565M [00:00<?, ?B/s]

After loading thre dataset, we must process it to use the article type labels for training, and ensure there is not class mismatch between the training and validation sets (e.g. no socks appearing in validation set could potentially break code). We see there are 30000+ images available for training.

In [None]:
# Load the metadata
metadata_path = 'fashion_product_images/myntradataset/styles.csv'
metadata = pd.read_csv(metadata_path, on_bad_lines='skip')

# Filter the dataset to include only rows with valid images
image_folder = 'fashion_product_images/myntradataset/images'
metadata['image_path'] = metadata.apply(lambda row: os.path.join(image_folder, str(row['id']) + '.jpg'), axis=1)
metadata = metadata[metadata['image_path'].apply(os.path.exists)]

# Select relevant columns and convert 'articleType' to category
metadata = metadata[['image_path', 'articleType']].copy()
metadata['articleType'] = metadata['articleType'].astype('category')
metadata['label'] = metadata['articleType'].cat.codes

# Ensure each class has at least 2 samples
min_samples_per_class = 2
class_counts = metadata['label'].value_counts()
valid_classes = class_counts[class_counts >= min_samples_per_class].index
metadata = metadata[metadata['label'].isin(valid_classes)]

# Split into training and validation sets
train_df, val_df = train_test_split(metadata, test_size=0.2, stratify=metadata['label'], random_state=42)

# Convert the labels to strings
train_df['label'] = train_df['label'].astype(str)
val_df['label'] = val_df['label'].astype(str)

# Find common classes
train_classes = set(train_df['label'].unique())
val_classes = set(val_df['label'].unique())
common_classes = train_classes.intersection(val_classes)

# Filter dataframes to only include common classes
train_df = train_df[train_df['label'].isin(common_classes)]
val_df = val_df[val_df['label'].isin(common_classes)]

# Print the number of unique labels
num_classes = len(common_classes)
print(f'Number of unique labels: {num_classes}')
print(f'Training set size: {len(train_df)}')
print(f'Validation set size: {len(val_df)}')

## TensorFlow Training

Having preprocessed out data, we then set up training and validation data generators which will load the data on the fly during training (rather than loading all into memort beforehand) and for the training data apply augmentation techniques.

In [None]:
# Image data generator with augmentation for training
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Image data generator for validation (without augmentation)
val_datagen = ImageDataGenerator(rescale=1./255)

# Data generators
train_generator = train_datagen.flow_from_dataframe(
    train_df,
    x_col='image_path',
    y_col='label',
    target_size=(128, 128),
    batch_size=32,
    class_mode='categorical',
    shuffle=True,
)

val_generator = val_datagen.flow_from_dataframe(
    val_df,
    x_col='image_path',
    y_col='label',
    target_size=(128, 128),
    batch_size=32,
    class_mode='categorical',
    shuffle=False,
)


First, we try training a very basic small CNN model with 3 convolutional layers using the Adam optimiser and categorical crossentropy loss function

In [None]:
# Define a basic CNN model within the strategy scope
with strategy.scope():
    basic_model = Sequential([
        Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
        MaxPooling2D((2, 2)),
        Conv2D(64, (3, 3), activation='relu'),
        MaxPooling2D((2, 2)),
        Conv2D(128, (3, 3), activation='relu'),
        MaxPooling2D((2, 2)),
        Flatten(),
        Dense(256, activation='relu'),
        Dropout(0.5),
        Dense(num_classes, activation='softmax')  # Adjusted number of output units
    ])

    basic_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    basic_model.summary()

We also configure an early stopping callback to halt training if the validation accuracy shows no meaningful improvement over the last 5 epochs

In [None]:
# Define training callback for early stopping
early_stopping = EarlyStopping(
    monitor="val_accuracy",
    min_delta=0,
    patience=5,
    verbose=1,
    mode="auto",
    baseline=None,
    restore_best_weights=False,
    start_from_epoch=0,
)

We then train and save the basic model (along with checkpoints per epoch in case the process is interupted)

In [None]:
# Train the basic model
basic_checkpoint = ModelCheckpoint('basic_tf_cp.keras', save_best_only=True)

history = basic_model.fit(
    train_generator,
    epochs=10,
    validation_data=val_generator,
    callbacks=[basic_checkpoint, early_stopping],
)
basic_model.save('basic_tf.keras')

Then, we try training a larger CNN model with 8 convolutional layers using the AdamW optimiser and categorical crossentropy loss function. This model also incorporates batch normalisation, multiple dropout layers and global average pooling. We also include a learning rate decay schedule and extra metrics to visualise during training (precision, recall)

In [None]:
# Define an advanced CNN model within the strategy scope
with strategy.scope():
    advanced_model = Sequential([
        Conv2D(64, (3, 3), activation='relu', input_shape=(128, 128, 3)),
        BatchNormalization(),
        Conv2D(64, (3, 3), activation='relu'),
        BatchNormalization(),
        MaxPooling2D((2, 2)),
        Dropout(0.3),

        Conv2D(128, (3, 3), activation='relu'),
        BatchNormalization(),
        Conv2D(128, (3, 3), activation='relu'),
        BatchNormalization(),
        MaxPooling2D((2, 2)),
        Dropout(0.4),

        Conv2D(256, (3, 3), activation='relu'),
        BatchNormalization(),
        Conv2D(256, (3, 3), activation='relu'),
        BatchNormalization(),
        MaxPooling2D((2, 2)),
        Dropout(0.4),

        Conv2D(512, (3, 3), activation='relu'),
        BatchNormalization(),
        Conv2D(512, (3, 3), activation='relu'),
        BatchNormalization(),
        GlobalAveragePooling2D(),
        Dropout(0.5),

        Dense(1024, activation='relu', kernel_regularizer=l2(0.01)),
        Dropout(0.5),
        Dense(num_classes, activation='softmax')
    ])

    advanced_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

    initial_learning_rate = 0.001
    lr_schedule = schedules.ExponentialDecay(
        initial_learning_rate,
        decay_steps=100000,
        decay_rate=0.96,
        staircase=True
    )
    advanced_model.compile(
        optimizer=AdamW(learning_rate=lr_schedule, weight_decay=0.0001),
        loss='categorical_crossentropy',
        metrics=['accuracy', 'Precision', 'Recall']
    )
    advanced_model.summary()

We also add an additional callback to reduce the learning rate as the validation loss begins to plateau to help improve training performance.

In [None]:
# Define training callback for learning rate reduction
lr_reduce = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=3,
    min_lr=0.00001,
    verbose=1
)

We then train and save the advanced model (along with checkpoints per epoch in case the process is interupted)

In [None]:
# Train the advanced model
advanced_checkpoint = ModelCheckpoint('advanced_tf_cp.keras', save_best_only=True)

advanced_history = advanced_model.fit(
    train_generator,
    epochs=10,
    validation_data=val_generator,
    callbacks=[advanced_checkpoint, early_stopping, lr_reduce],
    verbose=1,
)
advanced_model.save('advanced_tf.keras')

# Yolov8 Fine-tuning

Having attempted training a model from scratch using tensorflow, we now turn our attention to fine-tuning a pre-trained model to improve performance. We chose the nano version of the Yolov8 classification model. This uses a different method to format the training data to tensorflow so below we include a script to produce the folder structure required, along with the YAML config file for training.

In [None]:
# Function to create directory structure
def prepare_dataset(df, base_path):
    for _, row in df.iterrows():
        class_dir = os.path.join(base_path, row['label'])
        os.makedirs(class_dir, exist_ok=True)
        shutil.copy(row['image_path'], class_dir)


data_dir = pathlib.Path(os.getcwd(), 'datasets')
train_path = pathlib.Path(data_dir, 'train')
val_path = pathlib.Path(data_dir, 'val')

# Create directory structure
prepare_dataset(train_df, train_path)
prepare_dataset(val_df, val_path)

# Create the YAML file for the dataset
yaml_content = f"""
train: {train_path}
val: {val_path}

# Number of classes
nc: {num_classes}

# Class names
names:
"""

# Add class names to the YAML content
class_names = train_df['articleType'].cat.categories
for i, class_name in enumerate(class_names):
    yaml_content += f"  {i}: {class_name}\n"

config_path = pathlib.Path(data_dir, 'dataset.yaml')
with open(config_path, 'w') as f:
    f.write(yaml_content)


After setting up the dataset in the format required, we can then load the pre-trained Yolov8 weights (have chosen yolov8n-cls, but can change the n to {s,m,l,x} if large model required, be wary this will increase both training and subsequent inference time).

In [None]:
# Load a pre-trained YOLOv8 classification model
yolo_model = YOLO('yolov8n-cls.pt')

# Train the model
yolo_model.train(data='datasets', epochs=10, imgsz=128, batch=32)

# Evaluate the model
metrics = yolo_model.val(data='datasets')

# Save the model
yolo_model.save('yolo.pt')

## Inference

Below we give examples of how to load the weights we created during training and use them to run on sample images from the dataset to visually see performance

In [None]:
test_image_path = "fashion_product_images/myntradataset/images/18008.jpg"
img = tf.keras.utils.load_img(
    test_image_path, target_size=(128, 128)
)
display(img)

### CURRENTLY BROKEN!! Cannot load weights I trained on Colab onto my laptop ###
# # Tensorflow
# loaded_tf = tf.keras.models.load_model('../weights/basic_tf.keras')

# img_array = tf.keras.utils.img_to_array(img)
# img_array_batch = tf.expand_dims(img_array, 0)

# predictions = loaded_tf.predict(img_array_batch)

# score = tf.nn.softmax(predictions[0])
# top1_val = tf.argmax(score)
# top1_str = metadata['articleType'].cat.categories[int(top1_val)]
# print("Class: ")
# print(top1_str)
# print("Confidence: ")
# print(float(score[top1_val]))
# print("")

# Yolov8
with open("datasets/dataset.yaml", "r") as f:
    text = f.read()
    unproc_classes = text.split("names:\n")[1].split("\n")
    class_names = [c.split(": ")[1].strip() for c in unproc_classes if c]

loaded_yolo = YOLO(model='../weights/yolo.pt', task="classify")

results = loaded_yolo.predict(test_image_path, verbose=False)
result, = results

top1_val = result.probs.top1
top1_str = class_names[int(result.names[int(top1_val)])]
print("Class: ")
print(top1_str)
print("Confidence: ")
print(result.probs.top1conf)
