Predictive modeling for Alzheimer classification:
Comparative analysis of machine learning algorithms and ensemble techniques.
Alzheimer's disease is a neurodegenerative disorder that affects memory and
behavior. Since there is currently no cure available, early-stage prevention is crucial.
In this project, I will use heatmaps, logistic regression, decision trees, k-nearest
neighbors (k-NN), and random forests to predict Alzheimer's disease and identify
the features that most significantly affect its development. I will also compare the
accuracy and confusion matrices of these models to determine the best
classification method for this task.
o Age
o Years of education (EDUC)
o Socioeconomic status (SES)
o Clinical Dementia Rating (CDR)
0: none
0.5: possible
1: positive
o Mini-Mental State Examination (MMSE)
o Labels:
Group: nondemented (0: negative), converted (1: positive), demented (1: positive)
- Methodology
- Load data:
Load data from the CSV file. (alzheimer.csv) - Data preprocessing:
Combine group labels into binary categories.
Nondemented-0, Converted -0.5, Demented 1
Nondemented-0, Converted & Demented – 1
Use random 50/50 splits for training and testing data for generalizability, avoiding overfitting. - Visualization
Supervised learning classification using logistic regression, Decision Trees,
k-NN and Random Forest for classification and regression modeling.
Heatmap
Logistic Regression
Decision Trees
Heatmap: correlation matrix of features
0-Nondemented Correlation Matrix: 0.02 Age & EDUC
0.5-Converted Correlation Matrix: 0.00 MMSE & EDUC
1-Demented Correlation Matrix: 0.03 Age & CDR, SES & MMSE
Overall, Age, EDUC, and MMSE mentioned 2 times.
Age, years of education, and mental state are the main factors contributing to Alzheimer's disease.
Error rate (Random Forest): The best n is 10 and d is 5, with an error rate of 6.78%.
Random Forest is the most eƯective model for predicting Alzheimer’s disease in the dataset. Heatmap analysis revealed correlations between
features, which were analyzed based on their labels. The heatmap showed that age, years of education, and mental state were consistently mentioned
across the correlation matrices. Random Forest determined that the bestcombination of hyperparameters was n=10, d=5, yielding an error rate of 6.78%.
Early-stage detection is critical in Alzheimer's disease, and based on this project, I discovered that the converted stage correlates with mental status
and years of education. I believe that prolonged exposure to intense educational environments may lead to higher levels of mental stress,
potentially contributing to the disease. Demented status, age, clinical dementia rating, and socioeconomic status were also found to be significant
factors. It is noteworthy that Alzheimer's patients are mostly found in older age groups. Moreover, the combination of socioeconomic status and mental
state is intriguing, suggesting that lower socioeconomic status may exacerbate mental stress levels, potentially contributing to dementia.
Pre-requisite libraries
pip install pandas
pip install scikit-learn
pip install matplotlib
pip install seaborn
pip install numpy
Open a terminal or power shell
Navigate to the project saved location
Run the commands
Lwon_final_project.py file is the mail file that will present charts and graphs.