## Overview

Hello! Welcome to the architecture setup notebook, where we will be installing all requirements and outline the basic architecture of our AlexNet model (whose performance will be compared to our custom model, EfficentNet, and ConvNeXt). 


The cell below handles our initial requirements installation:

In [3]:
!pip3 install -r ../../requirements.txt

Collecting opencv-python
  Downloading opencv_python-4.9.0.80-cp37-abi3-macosx_10_16_x86_64.whl (55.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.7/55.7 MB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: opencv-python
Successfully installed opencv-python-4.9.0.80


## Data Preprocessing

As part of our data preprocessing, we will split the down-scaled lung dataset from the original dataset into a train/test split. 

Note that we will be using five-fold cross-validation for testing later, hence we will not be partioning an additional validation set. 

After splitting our data, we will then feed the training set into our models. Here, we will specifically feed it into the AlexNet model. 

In [9]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data
from torchvision import datasets, models, transforms
from torch.utils.data import DataLoader, Subset
import torchvision.transforms as transforms
from torch.utils.data import SubsetRandomSampler
from torchvision.datasets import ImageFolder
from sklearn.model_selection import KFold

from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score
import os

The code below extracts images from our dataset, resizes each into a fourth their original size (768 -> 192), and converts them into Torch tensors. The ImageFolder class allows us to lazyload our images to preserve our computational power.

In [5]:
# Path to our lung_image_sets
data_dir = "../../lung_colon_image_set/lung_image_sets"

# Define resized size of images
resized_size = 192

# Convert images into Tensors
tensor_data = transforms.Compose([
  transforms.Resize((resized_size, resized_size)),   # Cut image into a fourth of original size
  transforms.ToTensor()
])

# Load the dataset using ImageFolder
data = ImageFolder(root=data_dir, transform=tensor_data)

# Split the dataset into train and test sets
train_size = int(0.8 * len(data))
test_size = len(data) - train_size
train, test = torch.utils.data.random_split(data, [train_size, test_size])

# Create data loaders for training and testing
load_train = DataLoader(train, batch_size=32, shuffle=True)
load_test = DataLoader(test, batch_size=32, shuffle=False)

## Model Initialization
We will initialize the AlexNet model using Pytorch's pretrained AlexNet model and remove the final layer to perform feature extraction on our data.

In [6]:
# Initialize AlexNet Model 
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
alexnet = models.alexnet(pretrained=True)

# Modify AlexNet to Extract Features
# Note: we are removing the final layer
model = torch.nn.Sequential(*list(alexnet.children())[:-1])
model.eval()
model = model.to(device)

# Define hyperparameters
learning_rate = 5e-4
momentum = 0.9

# Define our loss function and optimizer
loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)



## AlexNet + SVM Classifier Training and Testing
We will perform k-fold cross-validation testing on the SVM classifier, which is trained the on features extracted by our AlexNet model.

In [13]:
# Store the results of each fold
num_folds = 5
kfold = KFold(n_splits=5, shuffle=True, random_state=231)

results = {}

# K-Fold Cross Validation
for fold, (train_indices, val_indices) in enumerate(kfold.split(data), 1):
    print(f'Fold {fold}')

    # Create data samplers for train and validation sets
    train_sampler = SubsetRandomSampler(train_indices)
    val_sampler = SubsetRandomSampler(val_indices)

    # Create data loaders for train and validation sets
    train_loader = DataLoader(data, batch_size=32, sampler=train_sampler)
    val_loader = DataLoader(data, batch_size=32, sampler=val_sampler)
    
    # Extract features and labels for the training set
    train_features = []
    train_labels = []
    with torch.no_grad():
        for inputs, labels in train_loader:
            inputs = inputs.to(device)
            outputs = model(inputs)
            outputs = outputs.view(outputs.size(0), -1)
            train_features.append(outputs.cpu().numpy())
            train_labels.append(labels.cpu().numpy())
    train_features = np.concatenate(train_features)
    train_labels = np.concatenate(train_labels)
    
    # Extract features and labels for the validation set
    val_features = []
    val_labels = []
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs = inputs.to(device)
            outputs = model(inputs)
            outputs = outputs.view(outputs.size(0), -1)
            val_features.append(outputs.cpu().numpy())
            val_labels.append(labels.cpu().numpy())
    val_features = np.concatenate(val_features)
    val_labels = np.concatenate(val_labels)

    # Create and train the SVM classifier
    svm_model = make_pipeline(StandardScaler(), SVC(kernel='linear'))
    svm_model.fit(train_features, train_labels)

    # Evaluate the classifier on the validation set
    val_predictions = svm_model.predict(val_features)
    accuracy = accuracy_score(val_labels, val_predictions)
    results[fold] = accuracy
    print(f'Fold {fold} Accuracy: {accuracy:.4f}')

# Print the average accuracy across all folds
average_accuracy = np.mean(list(results.values()))
print(f'K-FOLD CROSS VALIDATION RESULTS FOR {num_folds} FOLDS')
print('--------------------------------')
for fold in results:
    print(f'Fold {fold}: {results[fold]:.4f}')
print(f'Average: {average_accuracy:.4f}')

Fold 1
Fold 1 Accuracy: 0.9623
Fold 2
Fold 2 Accuracy: 0.9563
Fold 3
Fold 3 Accuracy: 0.9650
Fold 4
Fold 4 Accuracy: 0.9657
Fold 5
Fold 5 Accuracy: 0.9647
K-FOLD CROSS VALIDATION RESULTS FOR 5 FOLDS
--------------------------------
Fold 1: 0.9623
Fold 2: 0.9563
Fold 3: 0.9650
Fold 4: 0.9657
Fold 5: 0.9647
Average: 0.9628


## AlexNet + Softmax Classifier Training and Testing
We will perform k-fold cross-validation testing on the Softmax classifier, which is trained the on features extracted by our AlexNet model.

In [20]:
# Define hyperparameters
learning_rate = 5e-4
momentum = 0.9
num_epochs = 5  # Number of epochs for training
num_folds = 5

# Load the pre-trained AlexNet model
model_ = alexnet
num_features = model_.classifier[6].in_features
model_.classifier[6] = nn.Linear(num_features, len(data.classes))
model_ = model_.to(device)

# Store the results of each fold
results = {}

# K-Fold Cross Validation
for fold, (train_indices, val_indices) in enumerate(kfold.split(data), 1):
    print(f'Fold {fold}')

    # Create data samplers for train and validation sets
    train_sampler = SubsetRandomSampler(train_indices)
    val_sampler = SubsetRandomSampler(val_indices)

    # Create data loaders for train and validation sets
    train_loader = DataLoader(data, batch_size=32, sampler=train_sampler)
    val_loader = DataLoader(data, batch_size=32, sampler=val_sampler)
    
    # Define the optimizer
    optimizer = optim.SGD(alexnet.parameters(), lr=learning_rate, momentum=momentum)
    
    # Train the model
    alexnet.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        for inputs, labels in train_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            
            optimizer.zero_grad()
            outputs = alexnet(inputs)
            loss = loss_function(outputs, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
        
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')
    
    # Set the model to evaluation mode
    alexnet.eval()
    
    # Evaluate the model on the validation set
    val_predictions = []
    val_labels_list = []
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = alexnet(inputs)
            _, preds = torch.max(outputs, 1)
            val_predictions.extend(preds.cpu().numpy())
            val_labels_list.extend(labels.cpu().numpy())
    
    accuracy = accuracy_score(val_labels_list, val_predictions)
    results[fold] = accuracy
    print(f'Fold {fold} Accuracy: {accuracy:.4f}')

# Print the average accuracy across all folds
average_accuracy = np.mean(list(results.values()))
print(f'K-FOLD CROSS VALIDATION RESULTS FOR {num_folds} FOLDS')
print('--------------------------------')
for fold in results:
    print(f'Fold {fold}: {results[fold]:.4f}')
print(f'Average: {average_accuracy:.4f}')

Fold 1


KeyboardInterrupt: 