# MSA 2024 Phase 2 - Part 3


This notebook builds a Convolutional Neural Network (CNN) model for the CIFAR-10 dataset, with the use of `pytorch` to define the model structure.

**Before start working on the competition, please ensure all required libraries are installed and properly set up on your system**:

- `python >= 3.8`,
- `pytorch >= 2.4.0`,
- `CUDA 11.8`

The following shows all the libraries that need to be referenced

In [1]:
import os
import pandas as pd
import numpy as np
from PIL import Image

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset, random_split
from torch.cuda.amp import autocast, GradScaler

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix, roc_curve, auc
import matplotlib.pyplot as plt
import seaborn as sns



### 1. Data loading & preprocessing

The CIFAR-10 dataset contains 60,000 images(32x32x3) in 10 different classes, with 6,000 images in each class. You can download the dataset directly from the competition webpage.

Define data preprocessing and enhancement including on-the-fly horizontal inversion, random clipping padding, converting Tensor types and normalisation

Define a dataset class to handle the reading of the data including finding the correct mapping of images to labels from train.csv

Split the dataset, 80% for training, 10% for testing and %10 for validation, and set the random number generator seed to 101 to ensure consistency and reproducibility of the experimental results

In [2]:
# 数据预处理和增强
print("Data augmentation and preprocessing")
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),  # 随机水平翻转
    transforms.RandomCrop(32, padding=4),  # 随机裁剪并填充
    transforms.ToTensor(),  # 转换为Tensor
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))  # 标准化
])

# 自定义数据集类
class CIFAR10Dataset(Dataset):
    def __init__(self, csv_file, root_dir, transform=None):
        self.data_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.data_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir, f"image_{self.data_frame.iloc[idx, 0]}.png")
        image = Image.open(img_name).convert('RGB')  # 确保图像是RGB格式
        label = int(self.data_frame.iloc[idx, 1])

        if self.transform:
            image = self.transform(image)

        return image, label
    
# 加载训练数据
print("Loading training data")
dataset = CIFAR10Dataset(csv_file='nzmsa-2024/train.csv', root_dir='nzmsa-2024/cifar10_images/train', transform=transform)

# 拆分数据集为训练集、验证集和测试集
train_size = int(0.8 * len(dataset))
val_size = int(0.1 * len(dataset))
test_size = len(dataset) - train_size - val_size

train_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size], generator=torch.Generator().manual_seed(101))

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=128, shuffle=False, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False, num_workers=2)

print("Data loaded and split into train, validation, and test sets")


Data augmentation and preprocessing
Loading training data
Data loaded and split into train, validation, and test sets


### 2. Build & train the model

The model includes six convolutional layers with different numbers of channels to gradually extract different levels of features from the input image. Each convolution is followed by a normalization layer to stabilize and accelerate training, improving model stability. Additionally, the spatial dimensions of the feature map are gradually reduced using a maximum pooling layer to focus on the most significant features. 

1. **Convolutional Layers**:
   - **Purpose**: Extract hierarchical features from the input image.
   - **Details**: Six convolutional layers with increasing channels to capture different levels of details.

2. **Batch Normalization Layers**:
   - **Purpose**: Stabilize and accelerate training.
   - **Details**: Placed after each convolutional layer to normalize the output, reducing internal covariate shift.

3. **Max Pooling Layers**:
   - **Purpose**: Reduce the spatial dimensions of feature maps.
   - **Details**: Used to focus on the most significant features and reduce computational complexity.

4. **Fully Connected Layers**:
   - **Purpose**: Map high-dimensional features to the final output.
   - **Details**: Connect the flattened feature map from convolutional and pooling layers to the output layer for classification.

5. **Dropout Layers**:
   - **Purpose**: Reduce overfitting and enhance generalization.
   - **Details**: Randomly discard neurons during training to prevent overfitting.

6. **ReLU Activation Function**:
   - **Purpose**: Introduce nonlinearity.
   - **Details**: Applied after convolutional and fully connected layers to enable the network to learn and represent complex nonlinear features.


In [3]:
class CNN(nn.Module):
    def __init__(self, num_classes=10, conv1_channels=64, conv2_channels=128, conv3_channels=256, conv4_channels=512, fc1_size=1024, fc2_size=512, dropout_p=0.5):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=conv1_channels, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(conv1_channels)
        self.conv2 = nn.Conv2d(in_channels=conv1_channels, out_channels=conv2_channels, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(conv2_channels)
        self.conv3 = nn.Conv2d(in_channels=conv2_channels, out_channels=conv3_channels, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(conv3_channels)
        self.conv4 = nn.Conv2d(in_channels=conv3_channels, out_channels=conv4_channels, kernel_size=3, padding=1)
        self.bn4 = nn.BatchNorm2d(conv4_channels)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        
        self.conv5 = nn.Conv2d(in_channels=conv4_channels, out_channels=conv4_channels, kernel_size=3, padding=1)
        self.bn5 = nn.BatchNorm2d(conv4_channels)
        self.conv6 = nn.Conv2d(in_channels=conv4_channels, out_channels=conv4_channels, kernel_size=3, padding=1)
        self.bn6 = nn.BatchNorm2d(conv4_channels)
        
        self.dropout = nn.Dropout(p=dropout_p)
        self.fc1 = nn.Linear(conv4_channels * 2 * 2, fc1_size)
        self.bn7 = nn.BatchNorm1d(fc1_size)
        self.fc2 = nn.Linear(fc1_size, fc2_size)
        self.bn8 = nn.BatchNorm1d(fc2_size)
        self.fc3 = nn.Linear(fc2_size, num_classes)

    def forward(self, x):
        x = self.pool(F.relu(self.bn1(self.conv1(x))))
        x = self.pool(F.relu(self.bn2(self.conv2(x))))
        x = F.relu(self.bn3(self.conv3(x)))
        x = self.pool(F.relu(self.bn4(self.conv4(x))))
        x = F.relu(self.bn5(self.conv5(x)))
        x = self.pool(F.relu(self.bn6(self.conv6(x))))
        
        x = x.view(-1, self.conv4.out_channels * 2 * 2)
        x = F.relu(self.bn7(self.fc1(x)))
        x = self.dropout(x)
        x = F.relu(self.bn8(self.fc2(x)))
        x = self.fc3(x)
        return x

Create functions to save and load breakpoints for breakpoint training of the model

In [4]:
def save_checkpoint(state, filename='models/checkpoint.pth.tar'):
    os.makedirs(os.path.dirname(filename), exist_ok=True) 
    torch.save(state, filename)
    print(f"Checkpoint saved at {filename}")

def load_checkpoint(filename, model, optimizer):
    checkpoint = torch.load(filename)
    model.load_state_dict(checkpoint['state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer'])
    epoch = checkpoint['epoch']
    print(f"Checkpoint loaded from {filename}, starting from epoch {epoch}")
    return epoch

Define training and validation functions and plot learning curves for initial evaluation of the model.

In [5]:
# 定义训练和验证过程
def train_and_validate(net, train_loader, val_loader, criterion, optimizer, scheduler, num_epochs, device, start_epoch=0, early_stopping_patience=5):
    scaler = GradScaler()
    best_val_accuracy = 0
    train_losses = []
    val_losses = []
    train_accuracies = []
    val_accuracies = []

    epochs_no_improve = 0

    for epoch in range(start_epoch, num_epochs):
        net.train()
        running_loss = 0.0
        correct = 0
        total = 0
        for i, data in enumerate(train_loader, 0):
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()
            with autocast():
                outputs = net(inputs)
                loss = criterion(outputs, labels)
            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

            if i % 100 == 99:
                print(f'[Epoch {epoch + 1}, Iter {i + 1}] loss: {running_loss / 100:.3f}')
                running_loss = 0.0

        train_accuracy = 100 * correct / total
        train_losses.append(running_loss / len(train_loader))
        train_accuracies.append(train_accuracy)

        # 验证模型
        net.eval()
        val_loss = 0.0
        correct = 0
        total = 0
        with torch.no_grad():
            for data in val_loader:
                images, labels = data
                images, labels = images.to(device), labels.to(device)
                with autocast():
                    outputs = net(images)
                    loss = criterion(outputs, labels)
                val_loss += loss.item()
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
        val_accuracy = 100 * correct / total
        val_losses.append(val_loss / len(val_loader))
        val_accuracies.append(val_accuracy)
        scheduler.step(val_loss / len(val_loader))
        print(f'[Epoch {epoch + 1}] validation loss: {val_loss / len(val_loader):.3f}, accuracy: {val_accuracy:.2f}%')

        # 保存模型断点
        is_best = val_accuracy > best_val_accuracy
        if is_best:
            best_val_accuracy = val_accuracy
            save_checkpoint({
                'epoch': epoch + 1,
                'state_dict': net.state_dict(),
                'optimizer': optimizer.state_dict(),
            }, filename='models/best_model.pth.tar')
            epochs_no_improve = 0
        else:
            epochs_no_improve += 1

        if epochs_no_improve >= early_stopping_patience:
            print("Early stopping")
            break

    print('Finished Training')
    
    # 绘制学习曲线
    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(train_losses, label='Training Loss')
    plt.plot(val_losses, label='Validation Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()

    plt.subplot(1, 2, 2)
    plt.plot(train_accuracies, label='Training Accuracy')
    plt.plot(val_accuracies, label='Validation Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()

    plt.show()

Apply the model class and set up the loss function and optimiser. 
The AdamW optimiser is used, which features decoupling of weight decay and gradient updating to obtain better generalisation while adaptively adjusting the learning rate based on the first-order moment estimate and second-order moment estimate of the gradient. 
The learning rate is also dynamically adjusted using the ReduceLROnPlateau learning rate scheduler.

In [6]:
model = CNN(
    num_classes=10,
    conv1_channels=64,
    conv2_channels=128,
    conv3_channels=256,
    conv4_channels=512,
    fc1_size=1024,
    fc2_size=512,
    dropout_p=0.5
).to('cuda')
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=0.001)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=2, factor=0.5)

Perform model training and validation
Perform 30 epoches of training

In [8]:
if __name__ == '__main__':
    # 检查设备
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")

    # 加载模型断点（如果有）
    start_epoch = 0
    checkpoint_path = 'models/best_model.pth.tar'
    if os.path.exists(checkpoint_path):
        start_epoch = load_checkpoint(checkpoint_path, model, optimizer)

    # 训练和验证模型
    print("Starting training and validation")
    train_and_validate(model, train_loader, val_loader, criterion, optimizer, scheduler, num_epochs=30, device=device, start_epoch=start_epoch, early_stopping_patience=5)


Using device: cuda
Starting training and validation


## 3. Evaluate Models

Setting the confusion matrix and ROC plotting function

In [None]:
def plot_confusion_matrix(cm, classes, title='Confusion matrix'):
    plt.figure(figsize=(10, 7))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=classes, yticklabels=classes)
    plt.xlabel('Predicted')
    plt.ylabel('True')
    plt.title(title)
    plt.show()

def plot_roc_curve(labels, preds, num_classes):
    fpr = dict()
    tpr = dict()
    roc_auc = dict()
    labels = np.array(labels)
    preds = np.array(preds)
    for i in range(num_classes):
        fpr[i], tpr[i], _ = roc_curve(labels == i, preds == i)
        roc_auc[i] = auc(fpr[i], tpr[i])

    plt.figure(figsize=(10, 7))
    for i in range(num_classes):
        plt.plot(fpr[i], tpr[i], label=f'Class {i} (area = {roc_auc[i]:.2f})')
    plt.plot([0, 1], [0, 1], 'k--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC) Curve')
    plt.legend(loc='lower right')
    plt.show()

Evaluation results are saved in the results folder including confusion matrix, ROC, and other evaluation values are saved in evaluate.csv

#### Evaluation criteria used

- **Accuracy**: indicates the proportion of the test set that the model correctly classifies.
- **precision**: indicates the proportion of all samples that were predicted by the model to be positively classified that were actually positively classified.
- **recall**: indicates the proportion of all samples that were actually positively classified by the model that were correctly predicted to be positively classified by the model.
- **f1-score**: the reconciled mean of precision and recall.
- **support**: the number of times each category appears in the dataset.

#### Detailed Analysis

1. **Training set vs. test set accuracy**:
   - **Training Set Accuracy**: 0.960775 shows that the model has high accuracy on the training set, indicating that the model learnt the data patterns of the training set well.
   - **Test Set Accuracy**: 0.8784 decreased relative to the training set accuracy, but still maintained at a high level, indicating that the model has some generalisation ability.

2. **Precision, recall and f1-score**:
   - **Training set**: precision, recall and f1-score for all categories are close to or above 0.90, indicating that the model has almost no misclassification on the training set.
   - **Test set**: Precision, recall and f1-score of most categories are above 0.80, indicating that the model also performs more consistently on the test set. However, there is a slight decrease in some of the categories' metrics relative to the training set, especially for category 0 and category 5, showing that the model's performance on these categories needs to be improved.

In [None]:
# 评估最佳模型在训练集和测试集上的表现
print("Evaluating the best model on the training set")
train_labels, train_preds = evaluate_model(model, train_loader, device, num_classes=10)
print("Training Accuracy: ", accuracy_score(train_labels, train_preds))
print("Classification Report for Training Set:\n", classification_report(train_labels, train_preds))

print("Evaluating the best model on the test set")
test_labels, test_preds = evaluate_model(model, test_loader, device, num_classes=10)
print("Test Accuracy: ", accuracy_score(test_labels, test_preds))
print("Classification Report for Test Set:\n", classification_report(test_labels, test_preds))

# 混淆矩阵
cm = confusion_matrix(test_labels, test_preds)
plot_confusion_matrix(cm, classes=[f'Class {i}' for i in range(10)], title='Confusion Matrix for Test Set')

# ROC 曲线
plot_roc_curve(test_labels, test_preds, num_classes=10)

**ROC Curve**

- **ROC Curve**: Shows the trade-off between the True Positive Rate (TPR) and the False Positive Rate (FPR) of the model at various thresholds.
- **AUC (Area Under Curve)**: The area under the ROC curve, ranging from 0 to 1, with larger values indicating better model performance.
  - The AUC values are all around 0.90, indicating that the model has good prediction performance on most classes, especially the AUC value of 0.97 for Class 3, which is the best performance.

**Confusion Matrix**

- **True Positives (TP)**: Values on the diagonal indicating the number of samples correctly classified.
- **False Positives (FP)**: Values of columns not on the diagonal, indicating the number of samples incorrectly categorised as that class.
- **False Negatives (FN)**: Non-diagonal row values indicating the number of samples that were incorrectly classified as other classes for that class.

**Analysis**:
- The number of correctly classified samples is high for most classes.
- **Class 0** and **Class 5** have significant misclassification problems and need further optimisation.

## 4. Predict

Creating predictive functions, including using trained models

In [None]:
def predict(model, data_loader, device):
    model.eval()
    results = []
    with torch.no_grad():
        for images, image_ids in data_loader:
            images = images.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            results.extend(zip(image_ids.cpu().numpy(), predicted.cpu().numpy()))
    return results

Load dataset and test, save predictions to predictions.csv

In [None]:
# 加载预测数据
predict_dataset = CIFAR10Dataset(root_dir='nzmsa-2024/cifar10_images/test', transform=transform)
predict_loader = DataLoader(predict_dataset, batch_size=128, shuffle=False, num_workers=2)

# 检查预测数据加载是否正确
print("Checking loaded prediction data...")
for images, image_ids in predict_loader:
    print(f"Image IDs: {image_ids[:5]}")
    print(f"Image tensor shape: {images.shape}")
    break  # 只检查一个batch

# 进行预测
results = predict(model, predict_loader, device)

# 保存结果到CSV文件
results_df = pd.DataFrame(results, columns=['image_id', 'label'])
results_df.sort_values(by='image_id', inplace=True)  # 按image_id排序
results_df.to_csv('predictions.csv', index=False)
print("Predictions saved to predictions.csv")

## 5. Summary
In this challenge, we built a convolutional neural network with a convolutional layer, a normalisation layer, a pooling layer, a fully connected layer, a dropout layer, and an activation function, and performed a comprehensive evaluation of its training results.

#### Model Evaluation

- **Accuracy**:
  - The accuracy of the model remained high on both the training and test sets. The test set accuracy was lower than the training set, but also reached 87%.

- **Precision, Recall and f1 Score**:
  - Precision, recall and f1 scores remain high on both the training and test sets. The evaluation results were close to 90% for the training set and greater than 80% for the test set, but the model's metrics dropped slightly on Classification 0 and Classification 5 on the test set.

- **Confusion matrix and ROC curve**:
  - By analysing the results, the AUC values in the ROC curves are all around 90%, which means that the model has good prediction ability for most of the categories.
  - In the confusion matrix, we find that the true positives of most categories are high, which means that the model has a good ability to correctly classify, but misclassification occurs in category 0 and category 5, which needs further optimisation.

#### Final Results
- We made predictions on the test dataset for the challenge and saved the results as ID and Label in the `predication.csv` file.

### Model Optimisation Recommendations

Based on the above evaluation, we can optimise the model as follows:

1. **Data Enhancement**:
   - Increase data diversity and reduce the risk of overfitting.

2. **Adjustment of model structure**:
   - Increase the number of convolutional layers or increase the number of neurons to improve the model's ability to recognise complex patterns.

3. **Tuning hyperparameters**:
   - Find the optimal model parameters to improve model performance.