# **CS 598 Project: Multimodal Attention for Alzheimer's Disease Classification - Replication of MADDi Framework**


### Team Members
| Name | NetId |
|----------|----------|
| Mersim Rizmani | mrizma2 |
| Abhilash Raghuram | araghu9 |
| Jacob Men | men5 |



### Project Presentation Video Link: https://youtu.be/3H2nd8KSeDU


### Project GitHub Repository: https://github.com/MersimRizmani/MADDi-Replication.git

### Statement on the Use of Existing Code
For our project, we will be utilizing the existing code in an effort to replicate the original study. Specifically, we will attempt to replicate the MADDi framework that was developed in the original study. The original code can be found here: https://github.com/rsinghlab/MADDi/tree/main

---

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Introduction

## Citation of Original Paper

Below are the references to the original research paper published to JAMIA, and the code repository:

Michal Golovanevsky, Carsten Eickhoff, Ritambhara Singh, Multimodal attention-based deep learning for Alzheimer’s disease diagnosis, Journal of the American Medical Informatics Association, Volume 29, Issue 12, December 2022, Pages 2014–2022, https://doi.org/10.1093/jamia/ocac168

GitHub Repository: https://github.com/rsinghlab/MADDi/tree/main

\

## Background

The general problem that this project focuses on is accurately diagnosing Alzheimer’s disease in susceptible patients. The objective of the original study was to develop a multimodal deep learning framework that could aid medical professionals in accomplishing this task and making Alzheimer’s diagnosis easier and more accurate.
\
\
Why is it important to improve the process of Alzheimer’s disease diagnosis utilizing deep learning models? Well, according to statistics from Alzheimer’s News Today, Alzheimer’s disease is the most common neurodegenerative disorder affecting approximately 5.5 million people in the United States, and around 44 million people worldwide. On top of that, research has also shown that less than half of Alzheimer’s patients are diagnosed accurately for pathology and disease progression based on clinical symptoms alone. This statistic alone highlights the urgent need for advancement in Alzheimer’s disease diagnosis, and that is why deep learning researchers are working to create frameworks to solve this problem.
\
\
The difficultly of this problem, as mentioned in the original research, lies in cross-modal interactions. Until this research had been conducted, several previous deep learning-based studies lacked focus on cross-modal interactions and simply focused on conjoined features extracted from disjoint modalities. This model that we will be repliciating in this project, MADDi, will seek to response and fill the gaps of previous multimodal studies.

\

## Original Paper Explanation

The approach taken by this research paper was centered around a multimodal deep learning framework called MADDi, short for Multimodal Alzheimer’s Disease Diagnosis framework. It utilizes a cross-modal attention scheme to integrate imaging data (i.e. MRI data), genetic data (i.e. SNPs), and structured clinical data to classify patients and attempt to label (diagnose) them. The entire pipeline involves clinical, image, and genetic data preprocessing, building the multimodal framework, neural network attention, unified hyperparameter tuning, and model evaluation.
\
\
In our project, we seek to replicate this study to the best of our ability by attempting to rebuild MADDi in our own notebook. From this process, our goal will be to fully understand and learn how real medical data, such as that from ADNI, can be paired with a functioning deep learning framework to greatly improve the accuracy of Alzheimer’s disease diagnosis.
\
\
The researchers found that MADDi was superior in performance to existing multimodal studies and was proven to be consistently high in accuracy, achieving figures as high as 97% for Alzheimer's classification. This study contributes heavily to the research on Alzheimer's Disease, and the medical community as a whole, as it provides a window of potential into what's possible with automated and accuated deep learning models for disease diagnosis.  


# Scope of Reproducibility

### Hypothesis

The hypothesis we want to test according to this research paper is whether a novel multimodal deep learning framework, in this case MADDi, can accurately aid medical professionals in diagnosing Alzheimer’s disease. Specifically, we hypothesize that integrating multiple modalities (i.e. imaging, genetic, and clinical data) using a cross-modal attention scheme will lead to improved accuracy in Alzheimer’s diagnosis, compared to unimodal approaches.

\
### Ablations
This project will also involve the replication of two ablations, or experimental variations, presented in the original research paper:
1. Attention Mechanisms: Ablations will be conducted by toggling the presence of attention based on four criteria: self-attention and cross-modal attention, just cross-modal attention, and no attention. The variations will be generalized attention, self-attention, and cross-modal attention.
2. Unified Hyperparameter Tuning Scheme: Ablations will vary by toggling the methods for optimizing hyperparameters in the model. The scheme will allow for generalizing the model and tuning it for specific scenarios without manual intervention.


![maddi.jpeg](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jamia/29/12/10.1093_jamia_ocac168/1/ocac168f2.jpeg?Expires=1715219500&Signature=JlUSynqR~d5I6VW7O9wXDescNbaAjm02F3mL5TbpHcm0-lfxXQaw2F4sk24sLg0c2K6AqGrYMblT~VI1VZ5TNyf3HVpqeUakuz~EoQpCWrmhKhoXq277Zj8gdhZ5yP5kXyzewf9BKMakGRBTJoW7Trs9jayoq4CrofKZGd03~Vd4tF1kZu27FF~N8kiidoHH5zW6~fQsdokE672zPsdfdERefEhpRmVIBvIVFPOaKGJlqFzVxKkY8vtYwf-pmWXyyS5d7X4Os~3X3Hq~X6JiajD-86QMXDtsdPQ-OVj4hmvDKRnJ6gpoAzXrVASwnnXx54tS21~D9552D5JBxZdRSw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)

# Methodology

## Environment

### **Python version**
For this project, we utilized Python version 3.10.12. This is the default in Google Colab, which we used for a lot of our code running, and was deemed suitable by the original authors. We employed the same version in a GCP VM that we spun up for running certain scripts.


### **Code environment**
With the original paper, no details were provided on what environment we required to run the code. So, we had to determine the best environment ourselves and adapt after trial and error.

As detailed earlier, we leveraged google colab to run our jupyter notebooks, since the amount of RAM and overall performance was suitable for a lot of our work with the notebooks and it was a easy way to interact with our data in Google Drive at no cost.

In certain cases, namely preprocessing the genetic data we had from ADNI, we had to leverage a GCP VM instance that we customized since the amount of RAM required for the preprocessing did not suffice in colab, especially with runtime limits. This GCP VM had 6 vCPU cores (with an average of about 1 vCPU of usage) and 250 GB of RAM for the use case we had. Using more data would require more RAM, more around 350 GB to be safe since overconsumption of memory can lead to the process being killed.



### **Required packages**
In addition to python, there are multiple required packages like tensorflow, numpy, pandas, and pickle being used throughout the experiment. For the required versions, we have these specified in our requirements.txt file in the linked github repository.

##  Data


### **Data Source**

The data that will be processed and analyzed in this project and study replication has to be requested directly from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), https://adni.loni.usc.edu/data-samples/access-data/.

All ADNI data are shared without embargo through the LONI Image and Data Archive (IDA), https://ida.loni.usc.edu/login.jsp?project=ADNI, a secure research data repository. Access is contingent on adherence to the ADNI Data Use Agreement and the publications’ policies.

Once access is granted from the LONI Image and Data Archive, which from our experience has a timeline of approximately two weeks, then we look for the necessary files in the data archive to download. In our case, we’d be looking for clinical, genetic, and imaging data from three studies: ADNI1, ADNI2, and ADNI GO.

### **Data Description**

**Clinical**: Neurological exams, cognitive assessments, and patient demographics from 2384 patients.

- Type: Quanititative, categorical, binary
- Features: 29

**Genetic**: Genome sequencing data from 805 ADNI participants. Each subject had about 3 million SNPs in the raw VCF generated.

- Type: VCF (variant call files)

**Imaging**: Cross-sectional MRI data corresponding to first baseline screenings from ADNI1 (551 patients).

### **Statistics**

#### Data Storage and Size

A key challenge for this project was the sheer amount of data we were handling. It took several days to transfer data from ADNI to our shared Google Drive where we housed all the data. The specific size of our dataset were rougly equivalent to:

- Genetic Data (VCF files): ~45 GB
- Imaging Data (images and metadata): ~10 GB
- Clinical Data: ~1 GB

#### Basic Statistics Obtained from Data Preprocessing

The following statistics were obtained from clinical data preprocessing in the decision_making.ipynb file:

Number of diagnosed patients: 3025

- Number of Normal patients: 1122
- Number of Mild Cognitive Impairment patients: 1007
- Number of Alzheimer's Disease patients: 896

and the following statistics are from the original research paper, showing the number of participants in each modality and further separating the participants into their diagnoses. The overlap section refers to patients who had all 3 modalities recorded.
\
\
![stats.jpeg](./stats.png)

### **Data Preprocessing**

The CSV files and the VCF files utilized in the data processing must be obtained directly from ADNI (see Data Source section above) and thus cannot be included in this project's repository, or within the notebook.
\
\
As a result of having such a large collection of multimodal data, we must implement a sufficient amount of preprocessing tasks. Preprocessing tasks differ based off which data type we are working with, therefore there will be separate preprocessing goals for the clinical, imaging, and genetic data. The end goal of the data preprocessing is to have a combined diagnosis dataset by patient ID, where we take diagnoses from images, clinical, and genetic data to create one ground truth diagnosis file.
\
\
For imaging data, the main goal of preprocessing is to have the data split into training and testing pools. For genetic data preprocessing, VCF files are first obtained for ADNI. Then, the filter_vcfs.py is used to filter the files based on chosen criteria. Finally, we concat the files together. For this data, we used 240 images as opposed to 551 since we did not have the computational capability at the time to train for 551 images.

For the vcfs preprocessing, there were several files to run which consisted of 8-10 GB gzip files. Due to bottlenecks with VM cost, time, and also the fact that only a subset of genes were taken as specified by the genes_list.csv, we only used a subset of files in preprocessing. With each file, we saw around 12-30 hours of processing time with filter_vcfs.py, which was af
\
\
**The code for preprocessing the data is NOT included in this notebook, but rather each preprocessing stage's notebook is stored in our project's GitHub repo for reference. Each stage of preprocessing will be implemented in it's own separate notebook which are stored in our GitHub repository.** We replicated the code from the original repository, and then modified it to point to our data that we downloaded, and fixed any errors that came about.
\
\
Please refer to the folders with the prefix "preprocess_x/" in our GitHub for all of our data preprocessing implementation code:
https://github.com/MersimRizmani/MADDi-Replication

##   Model

### Citation of Original Paper

Below are the references to the original research paper published to JAMIA, and the code repository:

Michal Golovanevsky, Carsten Eickhoff, Ritambhara Singh, Multimodal attention-based deep learning for Alzheimer’s disease diagnosis, Journal of the American Medical Informatics Association, Volume 29, Issue 12, December 2022, Pages 2014–2022, https://doi.org/10.1093/jamia/ocac168

GitHub Repository: https://github.com/rsinghlab/MADDi/tree/main

The original authors experimented with a variety of different combinations involving the building of a model. Particularly, the built and train both **unimodal** models and **multi-modal** models. The purpose of this was for them to demonstrate the superior performance of their multimodal framework (MADDi) compared to the single-modality models trained on only clinical, genetic, or imaging data.
\
\
Below we show the code that the original authors used to built both the unimodal and multimodal models. Training code for these models has been omitted due to resource constraints. At the current stage of our project, we are still in the process of finalizing the replication of the original auther's model and training it.

The training code is available in our project's GitHub repository under the "/training" directory.

https://github.com/MersimRizmani/MADDi-Replication/tree/main/training

(see Next Steps and Plans in the last section)

In [None]:
!pip install pickle5
!pip install "pandas<2.0.0"
import pandas as pd
import numpy as np
import os
import random
import tensorflow as tf
import pickle5 as pickle
from sklearn.model_selection import train_test_split
from keras.models import Sequential, Model
from keras.layers import Input,Dense,Dropout,MaxPooling2D, Flatten, Conv2D, BatchNormalization, MultiHeadAttention, concatenate
import matplotlib.pyplot as plt
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.metrics import classification_report
from tensorflow.keras.models import Sequential

In [None]:
def reset_random_seeds(seed):
    os.environ['PYTHONHASHSEED']=str(seed)
    tf.random.set_seed(seed)
    np.random.seed(seed)
    random.seed(seed)

seeds = random.sample(range(1, 200), 5)

### **Clinical Unimodal Model**

The clinical unimodal model architecture consists of a neural network with 3 fully connected layers.

In terms of tuning the hyperparameters, they found that the ones that gave the best accuracy for the clinical unimodal model are:

- **Learning rate**: 0.0001
- **Batch size**: 32
- **Number of layers**: 3
- **Dropout value**: {0.2, 0.3, 0.5}
- **Number of epochs:** 100

The best performing clinical unimodal model has an accuracy of 80.59%. More evaluations metrics will be described in the **Results and Analysis** section.

The below code demonstrates the design and architecture of the clinical unimodal neural network, and the output that follows highlights the details of the layers, activation functions, output shapes, and layer types:

In [None]:
for seed in seeds:
    reset_random_seeds(seed)
    model = Sequential()

    model.add(Dense(128, input_shape = (185,), activation = "relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))

    model.add(Dense(64, activation = "relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.3))

    model.add(Dense(50, activation = "relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.2))

    model.add(Dense(3, activation = "softmax"))

model.compile(Adam(learning_rate = 0.0001), "sparse_categorical_crossentropy", metrics = ["sparse_categorical_accuracy"])
model.summary()

### **Genetic Unimodal Model**

The genetic unimodal model architecture consists of a neural network with 3 fully connected layers.

In terms of tuning the hyperparameters, they found that the ones that gave the best accuracy for the genetic unimodal model are:

- **Learning rate**: 0.001
- **Batch size**: 32
- **Number of layers**: 3
- **Dropout value**: {0.3, 0.5}
- **Number of epochs:** 50

The best performing genetic unimodal model has an accuracy of 77.78%. More evaluations metrics will be described in the **Results and Analysis** section.

The below code demonstrates the design and architecture of the genetic unimodal neural network, and the output that follows highlights the details of the layers, activation functions, output shapes, and layer types:

In [None]:
for seed in seeds:
    reset_random_seeds(seed)
    model = Sequential()
    model.add(Dense(128, input_shape = (15965,), activation = "relu"))
    model.add(Dropout(0.5))
    model.add(Dense(64, activation = "relu"))
    model.add(Dropout(0.5))

    model.add(Dense(32, activation = "relu"))
    model.add(Dropout(0.3))

    model.add(Dense(32, activation = "relu"))
    model.add(Dropout(0.3))


    model.add(Dense(3, activation = "softmax"))

model.compile(Adam(learning_rate = 0.001), "sparse_categorical_crossentropy", metrics = ["sparse_categorical_accuracy"])
model.summary()

### **Imaging Unimodal Model**

The imaging unimodal model architecture consists of a convolutional neural network with 3 convolutional layers.

In terms of tuning the hyperparameters, they found that the ones that gave the best accuracy for the clinical unimodal model are:

- **Learning rate**: 0.001
- **Batch size**: 32
- **Number of layers**: 3
- **Dropout value**: {0.3, 0.5}
- **Number of epochs:** 50

The best performing imaging unimodal model has an accuracy of 92.23%. More evaluations metrics will be described in the **Results and Analysis** section.

The below code demonstrates the design and architecture of the imaging unimodal neural network, and the output that follows highlights the details of the layers, activation functions, output shapes, and layer types:

In [None]:
for seed in seeds:
    reset_random_seeds(seed)
    model = Sequential()
    model.add(Conv2D(100, (3, 3),  activation='relu', input_shape=(72, 72, 3)))
    model.add(MaxPooling2D((2, 2)))
    model.add(Dropout(0.5))
    model.add(Conv2D(50, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2, 2)))
    model.add(Dropout(0.3))
    model.add(Flatten())
    model.add(Dense(3, activation = "softmax"))

model.compile(Adam(learning_rate = 0.001), "sparse_categorical_crossentropy", metrics = ["sparse_categorical_accuracy"])
model.summary()

### **Multimodal Alzheimer's Disease Diagnosis Framework (MADDi)**

The multimodal model architecture consists of components derived from each of the unimodal models.

In terms of tuning the hyperparameters, they found that the ones that gave the best accuracy for the clinical unimodal model are:

- **Learning rate**: 0.001
- **Batch size**: 32
- **Number of layers**: {3, 3, 3}
- **Dropout value**: {0.2, 0.3, 0.5}
- **Number of epochs:** 50

The best performing multimodal model has an average accuracy of 96.88% +/- 3.33%. More evaluations metrics will be described in the **Results and Analysis** section.

The below code demonstrates the design and architecture of the multimodal model, and the output that follows highlights the details of the layers, activation functions, output shapes, and layer types:

In [None]:
def make_img(t_img):
    img = pd.read_pickle(t_img)
    img_l = []
    for i in range(len(img)):
        img_l.append(img.values[i][1])

    return np.array(img_l)

In the following code block, there are modality-specific neural network architecture backbones developed in a single modality setting.

As mentioned in the unimodal model discussions, those backbones consist of:
- 3-layer fully connected neural network for clinical data
- 3-layer fully connected neural network for genetic data
- 3-layer convolutional neural network for imaging data

In [None]:
# 3-layer fully connected neural network for genetic data
def create_model_snp():

    model = Sequential()
    model.add(Dense(200,  activation = "relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))
    model.add(Dense(100, activation = "relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.3))

    model.add(Dense(50, activation = "relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.2))
    return model

# 3-layer fully connected neural network for genetic data
def create_model_clinical():

    model = Sequential()
    model.add(Dense(200,  activation = "relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))
    model.add(Dense(100, activation = "relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.3))

    model.add(Dense(50, activation = "relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.2))
    return model

# 3-layer convolutional neural network for imaging data
def create_model_img():

    model = Sequential()
    model.add(Conv2D(72, (3, 3), activation='relu'))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(Flatten())
    model.add(Dense(50, activation='relu'))
    return model

The output of the above layers then enters a multi-headed self-attention layer, which allows the inputs to interact with each other and find what features should be paid most attention to within each modality. Then, that is followed by a cross-modal bidirectional layer, which does something similar, but across different pairs of modalities. The purpose of this is to identify and analyze interactions between different modalities, which in turn is the general purpose of building this multimodal framework and the purpose of this paper in the first place.

The below code details the self-attention and cross-modal attention layers:

In [None]:
# Cross-modal attention layer
def cross_modal_attention(x, y):

    x = tf.expand_dims(x, axis=1)
    y = tf.expand_dims(y, axis=1)
    a1 = MultiHeadAttention(num_heads = 4,key_dim=50)(x, y)
    a2 = MultiHeadAttention(num_heads = 4,key_dim=50)(y, x)
    a1 = a1[:,0,:]
    a2 = a2[:,0,:]
    return concatenate([a1, a2])

# Self-attention layer
def self_attention(x):

    x = tf.expand_dims(x, axis=1)
    attention = MultiHeadAttention(num_heads = 4, key_dim=50)(x, x)
    attention = attention[:,0,:]
    return attention

The code below is where we rebuilt the original author's MADDi framework. It utilizes the backbones from the single-modality models shown above via the create_model_clinical(), create_model_snp(), and create_model_img() methods, and demonstrates the ability to configure the attention mechanisms.
\
\
The "mode" variable allows us to experiment with the aforementioned attention mechanism ablation. It allows us to compare results of toggling the presence of attention based on four criteria: self-attention and cross-modal attention, just cross-modal attention, and no attention. The variations will be generalized attention, self-attention, and cross-modal attention.

In [None]:
def multi_modal_model(mode, train_clinical, train_snp, train_img):

    in_clinical = Input(shape=(train_clinical.shape[1]))

    in_snp = Input(shape=(train_snp.shape[1]))

    in_img = Input(shape=(train_img.shape[1], train_img.shape[2], train_img.shape[3]))

    # Single-modality model backbones
    dense_clinical = create_model_clinical()(in_clinical) # clinical unimodal model
    dense_snp = create_model_snp()(in_snp) # genetic unimodal model
    dense_img = create_model_img()(in_img) # imaging unimodal model

    ########### Attention Layer ############

    ## Cross Modal Bi-directional Attention ##

    if mode == 'MM_BA':

        vt_att = cross_modal_attention(dense_img, dense_clinical)
        av_att = cross_modal_attention(dense_snp, dense_img)
        ta_att = cross_modal_attention(dense_clinical, dense_snp)

        merged = concatenate([vt_att, av_att, ta_att, dense_img, dense_snp, dense_clinical])

    ## Self Attention ##
    elif mode == 'MM_SA':

        vv_att = self_attention(dense_img)
        tt_att = self_attention(dense_clinical)
        aa_att = self_attention(dense_snp)

        merged = concatenate([aa_att, vv_att, tt_att, dense_img, dense_snp, dense_clinical])

    ## Self Attention and Cross Modal Bi-directional Attention##
    elif mode == 'MM_SA_BA':

        vv_att = self_attention(dense_img)
        tt_att = self_attention(dense_clinical)
        aa_att = self_attention(dense_snp)

        vt_att = cross_modal_attention(vv_att, tt_att)
        av_att = cross_modal_attention(aa_att, vv_att)
        ta_att = cross_modal_attention(tt_att, aa_att)

        merged = concatenate([vt_att, av_att, ta_att, dense_img, dense_snp, dense_clinical])

    ## No Attention ##
    elif mode == 'None':

        merged = concatenate([dense_img, dense_snp, dense_clinical])

    else:
        print ("Mode must be one of 'MM_SA', 'MM_BA', 'MU_SA_BA' or 'None'.")
        return

    ########### Output Layer ############

    output = Dense(3, activation='softmax')(merged)
    model = Model([in_clinical, in_snp, in_img], output)

    return model

In [None]:
### WE JUST NEED THE SHAPES OF THESE TRAINING AND TEST SETS ##################
train_clinical = pd.read_pickle("X_train_clinical.pkl").values
test_clinical= pd.read_pickle("X_test_clinical.pkl").values

train_snp = pd.read_pickle("X_train_snp.pkl").values
test_snp = pd.read_pickle("X_test_snp.pkl").values

train_img= make_img("X_train_img.pkl")
test_img= make_img("X_test_img.pkl")
##############################################################################

model = multi_modal_model('MM_SA_BA', train_clinical, train_snp, train_img)
model.compile(optimizer=Adam(learning_rate = 0.001), loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])
model.summary()

##   Training

For the training for this project, we were only able to train the images modality model, since the other models were not trainable due to issues with the provided code and a lack of documentation to fix them, which will be detailed further down below.

For the image training a few hyperparameters we had were 50 epochs, a learning rate of 0.001, and a dropout rate of 0.5. This process was repeated 5 times with randoms seeds.

For the computational requirements of training this model, it would be based off of what we did in Google Colab. For Colab, we used a CPU runtime with 12 GB of RAM, using about 4 GB at most while training. Each epoch took 4-6 seconds. Since this was all done on CPU, no GPU hours were used.

For reference, training code is available in our project's GitHub repository under the "/training" directory.

https://github.com/MersimRizmani/MADDi-Replication/tree/main/training


The following is how the unimodal models get trained:

In [None]:
X_train = pd.read_pickle("X_train_c.pkl")
y_train = pd.read_pickle("y_train_c.pkl")

model.fit(X_train, y_train,  epochs=50, validation_split=0.1, batch_size=32,verbose=1)

The following is how the multimodal model gets trained:

In [None]:
from sklearn.utils import compute_class_weight

train_clinical = pd.read_csv("X_train_clinical.csv").drop("Unnamed: 0", axis=1).values
train_snp = pd.read_csv("X_train_snp.csv").drop("Unnamed: 0", axis=1).values
train_img= make_img("X_train_img.pkl")
train_label= pd.read_csv("y_train.csv").drop("Unnamed: 0", axis=1).values.astype("int").flatten()

class_weights = compute_class_weight(class_weight = 'balanced',classes = np.unique(train_label),y = train_label)
d_class_weights = dict(enumerate(class_weights))

model.fit([train_clinical,
            train_snp,
            train_img],
            train_label,
            epochs=50,
            batch_size=32,
            class_weight=d_class_weights,
            validation_split=0.1,
            verbose=1)

##   Evaluation

Due to training issues, we can only cover the image modality for this project.

We collected the following evaluation metrics from the results:
- Accuracy
- Precision
- Recall
- F1-Score

According to the original authors, the F1-Score was the primary performance metric for evaluating the baselines, and accuracy was used to evaluate their best model against previous papers.

The following will be the code used for the evaluation metrics for the unimodal models:

In [None]:
model = pickle.load(open("img_model.pkl", 'rb')) #Image modality model loaded in

In [None]:
with open("img_train.pkl", "rb") as fh:
  data = pickle.load(fh)
X_train_ = pd.DataFrame(data)["img_array"]

with open("img_test.pkl", "rb") as fh:
  data = pickle.load(fh)
X_test_ = pd.DataFrame(data)["img_array"]

with open("img_y_train.pkl", "rb") as fh:
  data = pickle.load(fh)
y_train = np.array(pd.DataFrame(data)["label"].values.astype(np.float32)).flatten()

with open("img_y_test.pkl", "rb") as fh:
  data = pickle.load(fh)
y_test = np.array(pd.DataFrame(data)["label"].values.astype(np.float32)).flatten()

y_test[y_test == 2] = -1
y_test[y_test == 1] = 2
y_test[y_test == -1] = 1

y_train[y_train == 2] = -1
y_train[y_train == 1] = 2
y_train[y_train == -1] = 1


X_train = []
X_test = []

for i in range(len(X_train_)):
  X_train.append(X_train_.values[i])

for i in range(len(X_test_)):
  X_test.append(X_test_.values[i])

X_train = np.array(X_train)
X_test = np.array(X_test)

In [None]:
acc = []
f1 = []
precision = []
recall = []
seeds = random.sample(range(1, 200), 5)
for seed in seeds:
  reset_random_seeds(seed)

  score = model.evaluate(X_test, y_test, verbose=0)
  print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')
  acc.append(score[1])

  test_predictions = model.predict(X_test)
  test_label = to_categorical(y_test,3)

  true_label= np.argmax(test_label, axis =1)

  predicted_label= np.argmax(test_predictions, axis =1)

  cr = classification_report(true_label, predicted_label, output_dict=True)
  precision.append(cr["macro avg"]["precision"])
  recall.append(cr["macro avg"]["recall"])
  f1.append(cr["macro avg"]["f1-score"])

print("Avg accuracy: " + str(np.array(acc).mean()))
print("Avg precision: " + str(np.array(precision).mean()))
print("Avg recall: " + str(np.array(recall).mean()))
print("Avg f1: " + str(np.array(f1).mean()))
print("Std accuracy: " + str(np.array(acc).std()))
print("Std precision: " + str(np.array(precision).std()))
print("Std recall: " + str(np.array(recall).std()))
print("Std f1: " + str(np.array(f1).std()))
print(acc)
print(precision)
print(recall)
print(f1)

The following will be the code would be used for the evaluation metrics for the multimodal model:

In [None]:
def plot_classification_report(y_tru, y_prd, mode, learning_rate, batch_size,epochs, figsize=(7, 7), ax=None):

    plt.figure(figsize=figsize)

    xticks = ['precision', 'recall', 'f1-score', 'support']
    yticks = ["Control", "Moderate", "Alzheimer's" ]
    yticks += ['avg']

    rep = np.array(precision_recall_fscore_support(y_tru, y_prd)).T
    avg = np.mean(rep, axis=0)
    avg[-1] = np.sum(rep[:, -1])
    rep = np.insert(rep, rep.shape[0], avg, axis=0)

    sns.heatmap(rep,
                annot=True,
                cbar=False,
                xticklabels=xticks,
                yticklabels=yticks,
                ax=ax, cmap = "Blues")

    plt.savefig('report_' + str(mode) + '_' + str(learning_rate) +'_' + str(batch_size)+'_' + str(epochs)+'.png')



def calc_confusion_matrix(result, test_label,mode, learning_rate, batch_size, epochs):
    test_label = to_categorical(test_label,3)

    true_label= np.argmax(test_label, axis =1)

    predicted_label= np.argmax(result, axis =1)

    n_classes = 3
    precision = dict()
    recall = dict()
    thres = dict()
    for i in range(n_classes):
        precision[i], recall[i], thres[i] = precision_recall_curve(test_label[:, i],
                                                            result[:, i])


    print ("Classification Report :")
    print (classification_report(true_label, predicted_label))
    cr = classification_report(true_label, predicted_label, output_dict=True)
    return cr, precision, recall, thres

In [None]:
score = model.evaluate([test_clinical, test_snp, test_img], test_label)

acc = score[1]
test_predictions = model.predict([test_clinical, test_snp, test_img])
cr, precision_d, recall_d, thres = calc_confusion_matrix(test_predictions, test_label, mode, learning_rate, batch_size, epochs)

# Results and Analysis

The following are the results of the test data on hte image modality:

Avg accuracy: 0.4615384638309479\
Avg precision: 0.3194444444444444\
Avg recall: 0.37222222222222223\
Avg f1: 0.2989417989417989

The following is the table from the paper for the metrics for the individual modalities:
![table.jpeg](./table.png)

In the original paper, the authors found the imaging modality to have a accuracy of 92.23% and a F1 Score of 91.83% with 551 images, compared to our 46.15% accuracy and 29.89% F1 score with 240 images. Our accuracy and F1 score are lower, most likely due to the reduced volume of data we trained the model with. With respect to ablations, we could not test them due to the issues with the training code. In regard to hypothesis, it could not be tested either due to the issues we faced with the training code, but the multimodal model would have had to best our 46.15% accuracy rate in imaging to support the hypothesis.

As explained a bit earlier, and will be below, due to issues with the training code, we were unable to train models for the multimodal examination, clinical, or genetic modalities.


# Discussion

### **Paper Reproducibility**

Based on our work so far, which includes replicating the data preprocessing stage, and an attempt at rebuilding the MADDi model, it seems the paper is difficult to reproduce.
\
\
Once the data is accessible in ADNI, which does take time, the actual storage and processing of the data is very difficult. The data itself, if taken in entirety, exceeds 100 GB. Furthermore, the preprocessing requries alot of RAM, exceeding 300 GB of RAM from our experience, which is very expensive if utilizing a VM. Finally, alot of time would need to be taken to determine how to fix the code and in general how it works while considering the contents of the data file. Overall, especially in the context of a project similar to ours, it is very difficult to reproduce.

### **Reproduction Eases**

There were several aspects of the reproduction of the original paper that we found to be easy:

- **Code access**: the code for the original paper was made pubicly available to us in a GitHub repository, and allowed for utiliziation without restriction.
- **Research paper clarity**: the original research paper was very concise in it's explanations on the background of the problem, what they built, the hypothesis they were testing, and their results, which made it easier for us to attempt to replicate their implementation.

### **Reproduction Challenges**

There were several challenges that we encountered while attempting to reproduce the results of this paper:

- **Getting access to the data**: the process for requesting access to the ADNI data had a turnaround of around 2-3 weeks, which delayed our ability to begin work on this project.
- **Volume of data**: once we were granted access to the ADNI data archive, we were overwhelmed with the volume of data that was available, spanning across several years. The original paper did not include and detail about where to obtain data from, from within the archive.
- **Size of the data**: once we were able to identify the correct datasets within the archive, the process for transferring it over to our local machines/cloud storage took several days because of the amount of data and the large file sizes. Due to the size of data, with vcf files being 8-10 GB compressed and imaging being a few GB as well, we had to subset the data for our purposes due to a computational bottleneck due to cost and the time of the project. Realistically, we could not use all the data within the timeframe of this project without encuring large costs
- **Data preprocessing**: Without a clear indication of the environment the authors used, along with the computational capibilities of that environment, we didn't have any information on how to reproduce the data in an appropriate environment considering the amount of RAM and storage required. We utilized Google Colab and it's different processing features, but were limited by the time limits on usage. To circumvent this, we utilized a GCP VM for some of the preprocessing, which seemed to be a bit expensive and limited how much data we could preprocess for training.
- **Original code modifications**: the original code needed modifications to point to current data, as the code in the original repository was outdated, pointing to old data. Furthermore, the original files did not match together. We noticed certain files referenced in training as being created during preprocessing, though they were actually different types of files then those described. Furthermore, the code was also incorrect for saving to files in multiple cases like image preprocessing and clinical preprocessing, where numpy arrays were attempted to be saved as pickle files and had to be coverted to pandas dataframes first.
- **Inoperable or reliably modifiable training code**: For the genetic, clinical, and all modality training, we were unable to run the training code as is, even after changing file paths and having the appropriate environment. This code seemed to be incompatible with the data passed in, this could've been due to the truncated data for genetic data, although it wasn't an issue with images, or could've been due to changes in data between the paper's publication and our project. The code errored on some data due to finding strings instead of numerical values, even though the files were as is from the preprocessing stage. Furthermore, we could not properly modify this code since the repository was poorly documented.

### **Suggestions to the Authors**

In an effort to make the reproducibility of this research more seamingless, the original authors should highlight the specific steps to preprocess the data such as:
- Pinpointing which specific file names to download from ADNI: the pool of data available on the ADNI website is overwhelming large, containing data across several years, so letting the public know where to navigate to once they get access to the archive would expedite the process of reproducing this paper.
- Include more data visualizations in the data preprocessing stage to help readers further understand the large datasets they are working with.
- Keeping code up to date: we found ourselves having to tweak the original code due to the fact that it was out-of-date or utilizing old data.
- Provide computing requirements

# References

Naqvi E. Alzheimer’s Disease Statistics. 2017. https://alzheimersnewstoday.com/alzheimers-disease-statistics/. Accessed June 20, 2022.

Thies W, Bleiler L. 2013 Alzheimer’s Disease Facts and Figures. Wiley Online Library; 2013. https://alz-journals.onlinelibrary.wiley.com/doi/10.1016/j.jalz.2013.02.003. Accessed June 20, 2022.
