# Neural Network Architecture for the Classification of Alzheimer’s Disease from Brain MRI

*by Khondaker Masfiq Reza,Md Ashraful Alam, December, 2024* 


## Introduction


Alzheimer’s Disease (AD) is a neurodegenerative disorder that progressively affects memory and cognitive functions. It is among the leading causes of dementia in adults aged 65 and older, with a prevalence that continues to grow alarmingly each year. By 2050, estimates predict a new case of AD every 33 seconds, translating to almost 1 million new cases annually. The societal and economic burden of AD is immense, costing the United States government approximately $100 billion annually. Unfortunately, many affected individuals reside in underdeveloped regions, where proper diagnosis and treatment remain inaccessible. As a result, AD is often diagnosed at advanced stages, highlighting the critical importance of early detection.

My personal connection to this issue stems from a family experience several years ago when one of my uncles was diagnosed with Alzheimer’s. Observing his journey with the disease inspired me to go deeper into understanding AD and explore ways to contribute to its early detection and treatment.

Magnetic Resonance Imaging (MRI) has been widely recognized as a reliable biomarker for detecting AD due to its ability to provide detailed representations of the brain's structure. Over recent years, advancements in machine learning have paved the way for automated AD diagnosis using MRI data. These techniques can be broadly categorized into linear statistical methods and deep nonlinear learning approaches. Among these, convolutional neural networks (CNNs) have emerged as the gold standard for feature extraction, outperforming traditional classifiers like Support Vector Machines (SVMs) and Random Forests. The availability of large, well-curated datasets, such as ADNI, MIRIAD, and OASIS, has further accelerated research in this domain.

This report aims to contribute to this growing body of work by utilizing CNNs for the classification and early diagnosis of Alzheimer’s Disease, with a focus on optimizing accuracy and leveraging advanced preprocessing techniques. Through this work, I hope to bridge the gap between research and practical applications, enabling more efficient diagnostic methods for AD.

We utilized the ADNI dataset, which provides 3D MRI image files. To simplify and enhance processing, we applied preprocessing techniques to extract 2D slices from the 3D data. We focused on slices with high entropy for classification.

The study involved deep learning approaches, primarily using Convolutional Neural Networks (CNNs), due to their robust feature extraction capabilities. Additionally, ensemble learning methods were employed to combine predictions from various models, including pre-trained architectures like VGG16, ResNet50, and InceptionV3. This approach aimed to enhance classification accuracy by leveraging diverse model strengths.

My interest in these techniques stems from their potential to automate and improve the diagnosis of complex diseases like Alzheimer's, a topic close to me due to personal experiences. The use of advanced algorithms and extensive datasets like ADNI provides a pathway to better understand and detect such conditions and contributes to impactful healthcare solutions.



###  Background Information: Human Brain and Neurodegeneration
The human brain which is the central organ of the nervous system, comprises the cerebrum, brainstem, cerebellum, and spinal cord. It is divided into two hemispheres. Each with a white matter core and a grey matter surface. It is known as the cerebral cortex. These hemispheres are further organized into four lobes: frontal, temporal, parietal, and occipital. Each associated with specific functions. For instance, the frontal lobe governs abstract thought and self-control, while the temporal lobe contains the amygdala and hippocampus, critical for memory, emotion regulation, and learning.

At the cellular level, neurons form the foundation of brain functionality. Each neuron consists of a cell body, an axon, and dendrites, which connect to form neural circuits and execute specialized brain functions. However, neural degeneration, caused by factors affecting these structures, leads to diseases like multiple sclerosis, Parkinson’s disease, and notably, Alzheimer’s disease.

### Disease Mechanism and Stages
Alzheimer’s disease is characterized as a proteopathy. This is basically a condition caused by the deformation of certain proteins from their normal structure. These misfolded proteins lose their ability to function properly, disrupting the cells they inhabit. More critically, these abnormal proteins often behave like toxins which damages surrounding tissues. Two key misfolded proteins associated with Alzheimer’s are amyloid-beta and tau proteins. These are considered major biomarkers for the disease. While both biomarkers are closely linked to aging, their exact role in causing the disease remains unclear.

The accumulation of these proteins progresses gradually, making early detection of Alzheimer’s challenging. This is due to the subtle differences between normal aging symptoms and early signs of Alzheimer’s. However, the transition from normal aging to Alzheimer’s is often marked by a condition known as Mild Cognitive Impairment (MCI). MCI is further divided into two subtypes: Amnestic MCI (aMCI) and Nonamnestic MCI (naMCI). In aMCI, early symptoms primarily involve memory loss, particularly short-term memory. On the other hand, naMCI, which appears at a later stage, involves significant cognitive declines beyond memory, such as difficulties with language and visuospatial skills, including depth perception.

As the disease advances, patients experience a gradual decline in their ability to communicate, often reduced to using single words. In its most severe stage, Alzheimer’s severely impacts cognitive abilities which leaves individuals entirely dependent on caregivers.

### Past works
Researchers have employed various approaches for improving Alzheimer's disease (AD) diagnosis using machine learning. Liu et al. developed a deep learning framework based on landmarks, leveraging both the ADNI and MIRIAD datasets. This model demonstrated its effectiveness in diagnosing AD. Lin et al. designed a CNN model aimed at identifying Mild Cognitive Impairment (MCI) for early AD detection, achieving an accuracy of 79.9% and an AUC of 86.1%, which indicated a balanced trade-off between sensitivity and specificity. Khan et al. addressed the challenges of limited training data by applying transfer learning with the well-known VGG architecture. Their work, tested on the ADNI dataset, achieved a notable improvement, boosting accuracy by 4% for AD vs. MCI and by 7% for MCI vs. NC classification.[1][2][3]

Korolev et al. (2017) introduced two advanced models for binary classification of Alzheimer's disease stages: a 21-layer Residual Neural Network and a 17-layer 3D Convolutional Neural Network (3D-CNN). Both models were tested on the ADNI dataset, achieving classification accuracies of 80% (AUC = 0.88) and 79% (AUC = 0.87) after 50 training epochs. Similarly, Li et al. developed a "Y-shaped" residual network that utilized two identical sub-networks with residual blocks to extract features from the left and right hippocampus separately. These features were combined in a fully connected layer for binary classification. This model, trained on ADNI 1 and validated on ADNI Go and ADNI 2 datasets, achieved a remarkable AUC of 0.939.[4][5]

In 2018, Khvostikova et al. developed a 3D-CNN model that shared similarities with previously proposed networks by utilizing separate identical sub-networks and combining their outputs in a fully connected (FC) layer. However, instead of focusing solely on the left and right hippocampus as primary brain components, they introduced multiple Regions of Interest (ROI) throughout the brain. These ROIs formed several sub-networks for feature extraction. The researchers conducted experiments on the ADNI dataset using varying numbers of ROIs, ranging from 28 to 48. Their approach achieved a peak accuracy of 96.7% when considering 48 ROIs.[6]


## Methods

This section gives an overview of the dataset we selected and the model we developed. It is divided into different sections to explain each part clearly.

First,we cover the methods used for collecting and processing the data. Next section describes the structure of the proposed model. After that, we outline the techniques we used to improve the model's performance during training. Finally, the last section focuses on the ensemble learning approach we applied to detect Alzheimer's disease at an early stage.

The following figure shows the complete workflow and methodology used to develop our solution. The performance of the solution is evaluated using various metrics, including accuracy, precision, and F1 score.

### Data Collection

For our dataset, we used MRI images provided by the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The ADNI dataset includes a variety of MRI images taken from different locations and at different times. For this study, we chose image from different baseline(3T, 1.5T) collection, which contains MRI images of many different individuals. There were a 3-way classification for the MRI images. Subjects are tagged as Alzheimer’s
Disease (AD), Mild Cognitive Impairment (MCI) and Control Normal (CN). 
![title](img/Process_diagram.png)

Getting dataset from ADNI was really time consuming. After we havd submitted our application to get data , we had to wait for whole week to get our application approved. THen the data downloading from ADNI server was also took a significant amount of time. As every file is Three Dimentional MRI image(.nii file) and each subject has such type of images, it took a long time to get our desired dataset and the size of the dataset was very large. We stored this large dataset in a drive and started our pre processing steps.  

The following image is showing sample nii file using freesurfer software in Linux. 

![title](img/sample_nii.png)


### Data Prepossessing
We want that our model can understand clear images from our datasets. Many details from the images might result in poor training an classification. So we performed various preprocessing steps on our data and get the images ready from the 3D format(.nii file) to the 2D format(.png file). 

#### Motion correction and conform:
This step helps correct minor movements between different scans by averaging them together. 

#### Non Uniform intensity normalization
Also known as N3, this process adjusts the brightness and contrast of the MRI data to account for non-uniform intensity levels using a mathematical equation.
This process is performed with the help of the following equation:<br>
![title](img/equation_img1.png)<br>
Where, I represents the given image, U denotes uncorrupted image, f describes the
bias field and n is the noise.

#### Talairach transform computation:
We convert the image into a standard coordinate system, so that we can compare different images directly, even if they were taken with different scanners or angles.

#### Intensity normalization:
We adjust the brightness levels again, this time to make sure the white matter (a specific type of brain tissue) has a consistent brightness value.

#### Skull Stripping:
We remove the skull and any other tissues that aren't the brain itself, so that we can focus on the brain structures that are important for our analysis.

The following figure is showing different steps of data preprocessing: 
![title](img/data_prepro.png)


The following image is showing sample processed nii file after skull stripping using freesurfer software.  

![title](img/sample_pro.png)

### Data selection

We cleaned up the MRI images of the patients. Each patient has 256 image slices. However, we only need a few of these slices to train our model. Our assumption was using too many slices can actually make the model less accurate.

So, we picked the about 40 slices with the highest "entropy" values for each patient. Entropy is a measure of how much information is in the image. Higher entropy means more information, which is generally better for training.

After selecting the best slices, we divided the dataset into three parts:

Training set: We use this set to train the model.
Test set: We use this set to evaluate how well the model performs on new, unseen data.
Validation set: We use this set to fine-tune the model during training.
By dividing the data in this way, we can ensure that our model is accurate and reliable.

### Model Architecture
We built our proposed CNN model to help detect Alzheimer's disease using MRI brain scan 

#### Convolutional Layer: 
The convolutional layer functions as a feature extractor, identifying patterns and hierarchies within input data. The layer applies a set of filters, or kernels, to the input image. Each filter is a small matrix of weights that slides across the image. It performes element-wise multiplication and summation with the underlying pixels. This process, known as convolution, generates a feature map that highlights specific features, such as edges, textures, or corners. By employing multiple filters with varying sizes and weights, the convolutional layer can capture a wide range of features. This hierarchical approach enables the network to learn increasingly complex representations, from low-level features like edges to high-level features like objects and scenes.A key advantage of convolutional layers is their ability to preserve spatial relationships between pixels. This is achieved by the use of small receptive fields, which limit the influence of distant pixels on the output. This property is crucial for tasks like image classification and object detection, where the spatial arrangement of features is essential.

#### Pooling Layer: 
The pooling layer helps to reduce the size of the images, making them easier to process. By reducing the image size, the model can process information faster, use less memory, and prevent overfitting. Overfitting occurs when a model becomes too complex and starts to memorize the training data instead 1 of learning general patterns. By reducing the image size, the pooling layer helps to prevent this. In our model, we used a Max Pooling layer, which selects the maximum value from a small region of the image. This is particularly useful for MRI images, as it helps to focus on the brighter areas, which are often the most informative parts of the image.

#### Flatten Layer: 
This layer takes the simplified image and turns it into a single column. THe pooled output have to be flattened so that it is ready for the fully connected layer. 

#### Fully Connected Layer: 
The fully connected layer is the final stage of our model. It takes the flattened image data and uses it to make a decision. This layer is like a neural network that assigns weights to different parts of the image data. These weights help the model to understand the importance of different features in the image.

#### Activation Functions:
We have used two activation function in our CNN
##### Softmax 
The softmax function is employed to transform the output of a neural network into a probability distribution over the specified output classes. This allows the model to assign probabilities to each possible class, making it suitable for multi-class classification tasks.<br>
![title](img/equation_img2.png)
##### Relu
The ReLU function is a simple yet effective activation function that outputs the maximum value between its input and zero. Unlike sigmoid or tanh functions, ReLU doesn't suffer from the vanishing gradient problem, making it a popular choice in deep learning models. It helps in accelerating the training process and improves the overall performance of the network.<br>
![title](img/equation_img3.png)

### Learning Enhancements
Here we will describe the learning enhancement techniques that we have used during the training of our model



### Ensemble Learning

Neural networks, including Convolutional Neural Networks (CNNs), often suffer from the issue of overfitting. This occurs when the model becomes too complex and starts to memorize the training data instead of learning general patterns. As a result, the model performs poorly on new, unseen data.   
To address this issue, techniques like early stopping and regularization are employed. However, a more effective approach is to combine multiple models, each trained on different subsets of the data or with different hyperparameters. This ensemble learning technique helps to reduce the variance and bias of individual models, leading to improved generalization performance.

Ensemble learning is a powerful technique that involves combining multiple models to improve overall prediction accuracy. By training multiple models with different architectures or hyperparameters, we can create a diverse set of predictions. This diversity helps to mitigate the limitations of individual models and reduces the risk of overfitting.

In our proposed solution, we employ an ensemble learning approach that combines our custom CNN model with several pre-trained state-of-the-art models, including VGG16, ResNet50 and InceptionV3. These pre-trained models have been trained on massive datasets and have learned to extract highly discriminative features from images.

To combine the predictions from these diverse models, we use a majority voting strategy. Each model independently classifies an input image, and the final prediction is determined by the class that receives the majority of votes from the ensemble members. This approach leverages the strengths of each individual model and produces a more robust and accurate prediction.

By utilizing ensemble learning, we aim to achieve higher accuracy and better generalization performance on our Alzheimer's disease classification task.<br>
![title](img/proposed_ensemble_learning_diagram.png)

### Implementation
In the previous sections, we covered the steps of data collection and preprocessing, along with a summary of our proposed model and ensemble learning-based approach. This section focuses on the detailed design of our network and the implementation of ensemble learning, integrating it with other pre-trained networks. The figure provides a concise overview of the complete model architecture.

#### Implementation of the baseline model
This section outlines the implementation of our proposed model in Python, utilizing libraries like Keras and TensorFlow. After training, the model is saved for integration into our ensemble learning framework. The specifics of this implementation are detailed in the following sections.

##### Convolutional Layer Selection: 
In our CNN model, we incorporated the Conv2D layer from Keras, utilizing four Conv2D layers as part of the design.

##### Pooling Layer Selection: 
For the pooling operation, we implemented the MaxPooling2D layer in Keras. Since pooling is required after each Conv2D layer, we included four MaxPooling layers in total.

##### Flatten Layer: 
After the final pooling layer, we used the Flatten layer in Keras to convert the outputs from the pooling layers into a one-dimensional format.

##### Dense Layer: 
Once the data was flattened, we added six Dense layers from Keras to serve as the hidden layers in our CNN. These Dense layers function as fully connected networks within the CNN architecture.

##### Dropout Layer: 
The Dropout layer randomly deactivates a portion of its inputs at a specified rate during training. This helps to reduce overfitting and improve the model's generalization.

![title](img/model_plot.png)

#### Implementation of Ensemble Learning
This section outlines how we utilized an ensemble learning strategy by integrating various pre-trained models from Keras with our own proposed model. These models were executed simultaneously, and the final outcome was determined using a majority voting approach. Below are the detailed steps of the process:

##### Loading the Proposed Model: 
We begin by loading our custom-developed model that was previously trained. This is achieved using Keras's load_model function.

##### Integrating Pre-trained Models: 
Next, we incorporate several pre-trained models available in the Keras application module. Specifically, we use VGG16, Xception, ResNet50, and InceptionV3 models, importing their pre-trained weights from the ImageNet dataset. To adapt these models for our experiment, we remove their fully connected layers at the top and modify them to align with our dataset.

##### Max Voting: 
After integrating the pre-trained models, we execute them alongside our proposed model on a validation subset of the dataset. The ensemble decision is then finalized by applying a majority voting mechanism, where the output from all models determines the collective prediction.




## Results

### Individual Model performance
The custom CNN model served as the base of our experimentation, but its performance did not meet expectations. Achieving an accuracy of only 47.98%, it struggled to effectively learn complex patterns from the 2D MRI slices. The precision, recall, and F1 score of 0.5400, 0.4798, and 0.5058, respectively, reflected its limited ability to distinguish between classes, highlighting areas for improvement in its architecture and training strategy.

The VGG16 model, known for its deep convolutional layers, achieved an accuracy of 54.52%. It demonstrated strong feature extraction capabilities, with precision, recall, and F1 score values of 0.6272, 0.5452, and 0.5735, respectively. Despite its higher computational cost, VGG16 outperformed the custom CNN model in identifying key features related to Alzheimer’s detection.

The ResNet50 model showed an accuracy of 45.89%, slightly lower than the custom CNN and VGG16. However, it achieved better precision at 0.5842, with recall and F1 score values of 0.4589 and 0.4950, respectively. ResNet50’s residual connections allowed it to handle deeper networks without vanishing gradient issues, though it fell short in overall classification performance.

The InceptionV3 model achieved an accuracy of 53.45%. Its unique multi-scale feature extraction capability was evident in its precision of 0.6330, recall of 0.5345, and F1 score of 0.5651. While slightly less accurate than VGG16, InceptionV3 showed robust performance, demonstrating its ability to capture complex patterns in the dataset effectively.

These results highlight that, while pre-trained models like VGG16 and InceptionV3 performed better overall, the custom CNN model needs significant refinement. 
| Model           | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) |
|------------------|--------------|---------------|------------|--------------|
| Custom CNN       | 47.98        | 54.00         | 47.98      | 50.58        |
| VGG16            | 54.52        | 62.72         | 54.52      | 57.35        |
| ResNet50         | 45.89        | 58.42         | 45.89      | 49.50        |
| InceptionV3      | 53.45        | 63.30         | 53.45      | 56.51        |






In our approach, we leveraged fine-tuning of pre-trained deep learning models to enhance the performance of Alzheimer’s disease classification. Specifically, we utilized VGG16, ResNet50, and InceptionV3 which are pre-trained on the ImageNet dataset. We also fine tuned our custom CNN model. Fine-tuning is a transfer learning technique where a pre-trained deep learning model is adapted to a specific task by making slight modifications and further training it on a new dataset. Pre-trained models, like VGG16, ResNet50, and InceptionV3, are typically trained on large datasets such as ImageNet and have learned a rich set of features that are transferable to other tasks. In our case, we used these pre-trained models as a starting point for Alzheimer’s disease detection from MRI images.

The VGG16 model achieved an accuracy of 51.48%, indicating slightly better performance than random guessing. Its recall was relatively higher at 60.51%, demonstrating the model's ability to identify true positives, but the precision was lower at 48.21%, suggesting a notable presence of false positives. The F1 score of 53.67% reflects a moderate balance between precision and recall, while the ROC AUC score of 52.90% indicates limited capability in distinguishing between classes. Overall, the model's performance highlights the need for further optimization, such as better fine-tuning or enhanced data preprocessing, to achieve more robust results.

The custom CNN model achieved an accuracy of 46.79%, indicating its performance in correctly classifying the test data. The precision of the model was 45.69%, which measures the ability of the model to correctly identify positive cases among the predicted positives. The recall was relatively higher at 77.56%, showing the model's effectiveness in identifying actual positives. The F1 score, which balances precision and recall, stood at 57.51%, reflecting moderate overall performance. However, the ROC AUC score was 52.26%, showing that the model performed only slightly better than random guessing in distinguishing between classes. Despite some positive recall, the overall performance of the custom CNN model did not meet expectations, particularly in terms of accuracy and precision.

The ResNet model achieved an accuracy of 42.68%, indicating its relatively low effectiveness in correctly classifying the test data. The precision of 37.34% shows that the model struggled to correctly identify positive cases among its predictions. The recall was 34.62%, reflecting its ability to detect actual positives but indicating room for improvement. The F1 score, which balances precision and recall, was 35.93%, pointing to modest overall performance. Additionally, the ROC AUC score of 44.66% reveals that the model performed below expectations and closer to random guessing. Overall, the ResNet model underperformed in this experiment, highlighting the need for further optimization or adjustments.

The InceptionV3 model achieved an accuracy of 43.04%, reflecting moderate performance in classifying the test data. With a precision of 42.38%, the model exhibited fair accuracy in predicting positive cases but still showed scope for improvement. The recall was 63.08%, indicating that the model was relatively better at identifying actual positives compared to its precision. The F1 score, a harmonic mean of precision and recall, was 50.70%, showing balanced yet modest performance overall. However, the ROC AUC score of 43.72% highlights a weak ability to differentiate between classes, suggesting the need for further tuning or adjustments to improve performance.

| Model        | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) | ROC AUC (%) |
|--------------|--------------|----------------|------------|--------------|-------------|
| VGG16        | 51.49        | 48.21         | 60.51      | 53.67        | 52.91       |
| Custom CNN   | 46.79        | 45.69         | 77.56      | 57.51        | 52.26       |
| ResNet       | 42.68        | 37.34         | 34.62      | 35.93        | 44.66       |
| InceptionV3  | 43.04        | 42.38         | 63.08      | 50.70        | 43.72       |




### Ensembling Results

This section showcase the performance metrics for the Ensemble Model, which combines predictions from multiple pre-trained models (VGG16, ResNet50, and InceptionV3) along with the Custom CNN model using a majority voting strategy.
The ensemble model achieved an accuracy of 57.50%, which is higher than most individual models but slightly behind the best-performing VGG16 model. This indicates that combining predictions helped improve overall performance but was not sufficient to surpass the top individual model.
The precision of 54.09% reflects the model's ability to correctly classify positive instances (i.e., true positives). This metric suggests that the ensemble model effectively balances false positives compared to individual models.
 With a recall of 55.90%, the ensemble model demonstrates moderate sensitivity in identifying true positive cases. This value reflects an improvement over some individual models, highlighting its ability to detect positive cases effectively.
The F1 Score, which is the harmonic mean of precision and recall, is 54.98%. This balanced metric indicates that the ensemble model maintains a reasonable trade-off between precision and recall.

The ensemble model leverages the strengths of multiple models, resulting in balanced performance across all metrics. However, it still falls short of significantly outperforming the best individual model (VGG16). 

![title](img/ensemble_accu_loss.png)

As we can for the custom CNN model,  the training accuracy shows a steady improvement initially but plateaus early, while the validation accuracy fluctuates significantly, indicating potential overfitting or instability in the model's performance.
The pre-trained VGG16 model demonstrates consistent growth in both training and validation accuracy, stabilizing at higher accuracy values. This reflects its ability to generalize better compared to the Custom CNN.
The ResNet50 model achieves modest accuracy, but both training and validation accuracies remain low, suggesting difficulty in learning meaningful features from the dataset.
InceptionV3 model shows slightly better validation accuracy compared to ResNet50 but still lags behind VGG16. Its performance is relatively stable across epochs.<br>

![title](img/ensemble_roc.png)

The Receiver Operating Characteristic (ROC) curve shown above represents the performance of the Ensemble Model.

The ROC curve plots the True Positive Rate (Sensitivity) against the False Positive Rate (1 - Specificity) for different threshold values. It evaluates the trade-off between correctly identifying positive cases and incorrectly classifying negative cases as positive.
The diagonal gray line indicates a random classifier with no discriminative power, which corresponds to an Area Under the Curve (AUC) of 0.5.
The blue curve illustrates the ensemble model's classification performance. The model achieves an AUC (Area Under the Curve) value of 0.5384, which is only slightly better than random guessing. This suggests that the ensemble model struggles to differentiate effectively between the two classes.

The AUC value of 0.5384 indicates limited effectiveness of the ensemble model for this task. While the model slightly surpasses random classification, its predictive performance does not meet expected standards. <br>
![title](img/ensemble_perf.png)

The bar chart illustrates the performance metrics of the Ensemble Model, focusing on three key measures: Accuracy, F1 Score, and Loss.
The model achieved an accuracy of approximately 57.5%, indicating that the ensemble correctly classified about 57.5% of the samples. 
The F1 Score, a balance between precision and recall, is around 54.9%, showing that the model struggles to consistently identify true positives and minimize false positives and negatives.
The loss value is significantly higher compared to the accuracy and F1 Score. This reflects the model's difficulty in minimizing the error during training and testing, suggesting potential challenges in optimizing weights or overfitting/underfitting issues.

While the accuracy and F1 Score are moderate, the high loss highlights inefficiencies in the model's predictions. This may indicate challenges in convergence during training or limitations in the dataset or model architecture.




### Discussion
In this report, we explored the performance of various models, including a custom CNN model and several pre-trained architectures (VGG16, ResNet50, and InceptionV3), fine-tuned for Alzheimer’s disease classification. While the ensemble model provided a moderate performance with an accuracy of 57.5% and an F1 Score of 54.9%, the results highlight key areas for improvement.

Our custom CNN model, despite being specifically designed for this task, did not perform as well as anticipated. It achieved a recall of 77.5%, which demonstrates its ability to identify positive cases but struggled in other metrics like precision and overall accuracy. This suggests that the model might have been overfitting to certain patterns or lacked the complexity to capture critical features from the data.

The pre-trained models, although leveraging transfer learning, also showed varied performance, with VGG16 achieving the best overall results among them. However, the ensemble approach, combining predictions from all models, did not significantly outperform individual models, indicating limitations in the ensemble strategy used.

Moving forward, we aim to address these challenges by:

**Enhancing the custom CNN architecture:** Introducing additional layers, advanced regularization techniques, and hyperparameter optimization could improve its performance.<br>
**Improving data augmentation:** Employing more diverse augmentation techniques may help the model generalize better.<br>
**Exploring new architectures:** Testing state-of-the-art models or hybrid approaches to leverage both custom and pre-trained features.
Optimizing ensemble methods: Refining the ensemble approach, such as weighted voting or stacking, to make better use of the strengths of individual models.<br>
This work provides a foundation for Alzheimer’s classification using MRI images and demonstrates the potential for improvement in both custom and ensemble modeling approaches. Future iterations of this research will focus on addressing these limitations to develop a more accurate and reliable diagnostic mode. 








## Conclusions

Alzheimer's Disease (AD) is a prevalent neurodegenerative condition among elderly populations, significantly impacting their physical, mental, and financial well-being. Accurate and timely diagnosis of AD is crucial to ensuring appropriate treatment and improving patients' quality of life. Advances in medical imaging technologies have greatly improved diagnostic accuracy, making early detection more achievable than ever. However, many state-of-the-art automated solutions rely heavily on pre-defined regions of interest for feature extraction, requiring extensive domain knowledge of the human brain. This reliance not only adds complexity to the system design but also makes the solutions less accessible for widespread use.

In this research, we focused on automating the detection of AD by developing a custom 15-layer CNN model. Our model, combined with pre-trained CNN networks, provides a fast and reliable solution for early detection of AD without necessitating specialized knowledge, thereby making it more accessible and user-friendly. However, the results from this project were surprising, as the achieved accuracy was lower than expected. We anticipated higher performance from our custom CNN model, but several challenges emerged throughout the project.

One of the most challenging aspects was the data preprocessing step, particularly skull stripping, which required meticulous adjustments to ensure proper preparation of MRI data. Another significant difficulty arose when using the lab’s GPU resources, which frequently ran out of memory, leading to resource exhaustion errors. These issues highlighted the need for efficient data handling and optimization of computational resources.

Despite these challenges, this project has been an invaluable learning experience. I gained a deeper understanding of the complexities involved in medical image preprocessing and the nuances of designing deep learning models for neurodegenerative disease detection. Additionally, I learned how to manage computational limitations effectively and gained insights into ensemble learning techniques and fine-tuning pre-trained networks. These lessons have prepared me to approach future projects with more confidence and improved strategies, aiming for enhanced performance and efficiency.

### References

[1]M. Liu, J. Zhang, E. Adeli, and D. Shen, “Landmark-based deep multi-instance learning for brain disease diagnosis,” Medical image analysis, vol. 43, pp. 157–168, 2018, Cited By :91. [Online]. Available: www.scopus.com.<br>
[2] W. Lin, T. Tong, Q. Gao, D. Guo, X. Du, Y. Yang, G. Guo, M. Xiao, M. Du, and X. Qu, “Convolutional neural networks-based mri image analysis for the alzheimer’s disease prediction from mild cognitive impairment,” Frontiers in Neuroscience, vol. 12, no. NOV, 2018, Cited By :57. [Online]. Available:
www.scopus.com.<br>
[3] N. M. Khan, N. Abraham, and M. Hon, “Transfer learning with intelligent training data selection for prediction of alzheimer’s disease,” IEEE Access,
vol. 7, pp. 72 726–72 735, 2019, Cited By :15. [Online]. Available: www.scopus.com.<br>
[4] S. Korolev, A. Safiullin, M. Belyaev, and Y. Dodonova, “Residual and plain convolutional neural networks for 3d brain mri classification,” in Proceedings - International Symposium on Biomedical Imaging, Cited By :73, 2017, pp. 835– 838. [Online]. Available: www.scopus.com.<br>
[5] H. Li, M. Habes, and Y. Fan, “Deep ordinal ranking for multicategory diagnosis of alzheimer’s disease using hippocampal mri data,” arXiv preprint,
2017.<br>
[6] A. Khvostikov, K. Aderghal, J. BenoisPineau, A. Krylov, and “. G. Catheline, “Cnn-based classification using smri and md-dti images for alzheimer disease studies,” arxiv,” preprint, 2018.<br>
[7]dataset raw link: https://drive.google.com/drive/folders/1t7muZxP-sxtJPlXQLehMqY5f2fJNcGl4?usp=sharing  <br>
[8]processed dataset link: https://drive.google.com/drive/folders/1N5M6ZgJSbR7i9IDF1qeaCk3xrMNES-8h?usp=sharing <br>
[9] ADNI dataset link: https://adni.loni.usc.edu/data-samples/adni-data/neuroimaging/mri/mri-image-data-sets/

In [2]:
import io
import nbformat
import glob
nbfile = glob.glob('Final_Report.ipynb')
if len(nbfile) > 1:
    print('More than one ipynb file. Using the first one.  nbfile=', nbfile)
with io.open(nbfile[0], 'r', encoding='utf-8') as f:
    nb = nbformat.read(f, nbformat.NO_CONVERT)
word_count = 0
for cell in nb.cells:
    if cell.cell_type == "markdown":
        word_count += len(cell['source'].replace('#', '').lstrip().split(' '))
print('Word count for file', nbfile[0], 'is', word_count)

Word count for file Final_Report.ipynb is 5622
