## Alzheimer Detection using Google Colaboratory

### Step 0: Import Libraries and Clone Repository

In [None]:
%cd /content/
!git clone https://github.com/Verbosi7y/ai-alzheimer-detection.git

%pip install --upgrade pip
%pip install torch
%pip install numpy
%pip install matplotlib
%pip install seaborn
%pip install scikit-image
%pip install scikit-learn
%pip install imbalanced-learn
%pip install albumentations
%pip install opencv-python
%pip install pillow

# Uncomment if you are running Google Colab on a CUDA GPU (NVIDIA)
# %pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

In [None]:
from torch.utils.data import DataLoader

import os
import sys

Setting Paths

In [None]:
parent_path = r"/content/ai-alzheimer-detection"

kaggle_dir = r"assets/Kaggle"
kaggle_path = os.path.join(parent_path, kaggle_dir)

kaggle_dataset_dir = r"alzheimer_mri_preprocessed_dataset"
kaggle_raw_dir = r"alzheimer_mri_preprocessed_dataset/raw"

kaggle_dataset_path = os.path.join(kaggle_path, kaggle_dataset_dir)
kaggle_raw_path = os.path.join(kaggle_path, kaggle_raw_dir)

model_dir = r"models"
model_path = os.path.join(parent_path, model_dir)

if not os.path.exists(model_path):
    os.makedirs(model_path)

model_dir = r"models/best_ad_model.pth"
model_path = os.path.join(parent_path, model_dir)

In [None]:
# add parent to path
sys.path.append(parent_path)

### Step 1: Load the Dataset

In [None]:
from alzheimersdetection import Dataset

X, y = Dataset.step1_load_data(path=kaggle_raw_path) # np.array, np.array

### Step 2: Split the dataset

Split the data into 80% training and 20% testing data. Ensure same class distribution using stratify=y (class/label).

Further split the training data into 75% training and 25% validation respectively.

Ratio: 60% Training : 20% Validation : 20% Testing

In [None]:
test_size = 0.20
validation_size = 0.25

split_dataset = Dataset.step2_split_data(X, y, test_size=test_size, validation_size=validation_size)

Visualization for the Distribution of the Training Dataset

Results should be heavily imbalanced 

In [None]:
import stats.statistics as Statistics

title_before_aug = "AD Classification Distribution of Training Dataset"

sample_dist = Dataset.distribution(split_dataset["train"]["y"])

Statistics.pieChartClassificationPlot(sample_dist, title_before_aug)

### Step 3: Balance and Oversample the Dataset

#### 3a. Balance

To further balance the dataset, we need to employ more techniques. One of which is data augmentation.
Method to balance the data augmentation process is to define class-specific augmentation rates.

In [None]:
'''
    Rates:
    - Non_Demented: 1
    - Very_Mild_Demented: 1
    - Mild_Demented: 2
    - Moderate_Demented: 5
'''
rates = [1, 1, 2, 5]

split_dataset["train"] = Dataset.step3a_augmentation(split_dataset["train"], rates=rates)

Dataset.display_split(split_dataset=split_dataset)

Visualizing out results of the class distribution after data augmentation

In [None]:
title_after_aug = "AD Classification Distribution after Data Augmentation"

aug_dist = Dataset.distribution(split_dataset["train"]["y"])

Statistics.pieChartClassificationPlot(aug_dist, title_after_aug)

#### 3b. ADASYN Oversampling

The dataset is still imbalanced and to fix this, we need to increase the minority class's representation (oversampling). This allows us to have a more balanced dataset.

We will be using Adaptive Synthetic Sampling (ADASYN) to oversample the minority classes.

In [None]:
# Visualize class imbalance before ADASYN
title_before_ADASYN = "Class Distribution before ADASYN"

Dataset.display_split(split_dataset=split_dataset);

Statistics.ad_plot_bar(sample=split_dataset["train"], title=title_before_ADASYN)

Applying Adapative Synthetic Sampling (ADASYN)

Optimal Results: ~25% distribution across all AD classifications.

In [None]:
k = 5 # This is the k-neighbors which will be used for ADASYN

split_dataset["train"] = Dataset.step3b_ADASYN(sample=split_dataset["train"], k=k)

Visualizing our results after applying ADASYN as a Bar Plot

In [None]:
# Visualize class imbalance after ADASYN
title_after_ADASYN = "Class Distribution after ADASYN"

Dataset.display_split(split_dataset=split_dataset);

Statistics.ad_plot_bar(sample=split_dataset["train"], title=title_after_ADASYN)

### Step 4: Save as dataset .npz and images

In [None]:
#Dataset.step4_save_npz(split_dataset, path=kaggle_dataset_path)

### Step 5: Define Hyperparameters

In [None]:
param = {
        "epoches"       : 25, # implement early stopping
        "learning_rate" : 0.001,
        "batch_size"    : 8,
        "early_stop"    : 5
        }

### Step 6: Load Dataset as Dataloader

In [None]:
from alzheimersdetection.AlzheimerDataset import AlzheimerDataset

train_dataset = AlzheimerDataset(samples=split_dataset["train"])
val_dataset = AlzheimerDataset(samples=split_dataset["test"])
test_dataset = AlzheimerDataset(samples=split_dataset["validation"])

loaders =  {
           "train"  : DataLoader(train_dataset, batch_size=param["batch_size"], shuffle=True),
           "test"   : DataLoader(val_dataset, batch_size=param["batch_size"], shuffle=False),
           "val"    : DataLoader(test_dataset, batch_size=param["batch_size"], shuffle=False)
           }

### Step 7: Device Setup

Device Setup:

If you have an Nvidia GPU, you need to install CUDA

Otherwise, CPU will be used

In [None]:
from alzheimersdetection import AlzheimerModel

device = AlzheimerModel.set_device()

### Step 8: Creating CNN Model

Create our model using PyTorch's Convolutional Neural Network

In [None]:
from alzheimersdetection.AlzheimerModel import AlzheimerCNN

model = AlzheimerCNN().to(device)

### Step 9: Train the Model

Criterion: Cross Entropy Loss

Optimizer: Adam

In [None]:
AlzheimerModel.step9_train_model(model, param, loaders, device, model_path)

### Step 10: Verify using Test Data

In [None]:
from alzheimersdetection import AlzheimerMetrics

AlzheimerMetrics.run_metrics(model, loaders["test"], device)

------
<p style="text-align: center;"> Made with ❤️ </p>
<p style="text-align: center;"> Darwin Xue </p>