Skip to content

ProjectHopper/Alzeimer__MachineLearning

Repository files navigation

Project Overview

Predictive modeling for Alzheimer classification:
Comparative analysis of machine learning algorithms and ensemble techniques.
Alzheimer's disease is a neurodegenerative disorder that affects memory and
behavior. Since there is currently no cure available, early-stage prevention is crucial.
In this project, I will use heatmaps, logistic regression, decision trees, k-nearest
neighbors (k-NN), and random forests to predict Alzheimer's disease and identify
the features that most significantly affect its development. I will also compare the
accuracy and confusion matrices of these models to determine the best
classification method for this task.

Features:

o Age
o Years of education (EDUC)
o Socioeconomic status (SES)
o Clinical Dementia Rating (CDR)
0: none
0.5: possible
1: positive
o Mini-Mental State Examination (MMSE)
o Labels:
Group: nondemented (0: negative), converted (1: positive), demented (1: positive)
- Methodology

  1. Load data:
    Load data from the CSV file. (alzheimer.csv)
  2. Data preprocessing:
    Combine group labels into binary categories.
    Nondemented-0, Converted -0.5, Demented 1
    Nondemented-0, Converted & Demented – 1
    Use random 50/50 splits for training and testing data for generalizability, avoiding overfitting.
  3. Visualization
    Supervised learning classification using logistic regression, Decision Trees,
    k-NN and Random Forest for classification and regression modeling.
    Heatmap
    Logistic Regression
    Decision Trees

Results

Heatmap: correlation matrix of features
0-Nondemented Correlation Matrix: 0.02 Age & EDUC
0.5-Converted Correlation Matrix: 0.00 MMSE & EDUC
1-Demented Correlation Matrix: 0.03 Age & CDR, SES & MMSE
Overall, Age, EDUC, and MMSE mentioned 2 times.
Age, years of education, and mental state are the main factors contributing to Alzheimer's disease.
Error rate (Random Forest): The best n is 10 and d is 5, with an error rate of 6.78%.

Conclusion

Random Forest is the most eƯective model for predicting Alzheimer’s disease in the dataset. Heatmap analysis revealed correlations between features, which were analyzed based on their labels. The heatmap showed that age, years of education, and mental state were consistently mentioned across the correlation matrices. Random Forest determined that the bestcombination of hyperparameters was n=10, d=5, yielding an error rate of 6.78%. Early-stage detection is critical in Alzheimer's disease, and based on this project, I discovered that the converted stage correlates with mental status and years of education. I believe that prolonged exposure to intense educational environments may lead to higher levels of mental stress, potentially contributing to the disease. Demented status, age, clinical dementia rating, and socioeconomic status were also found to be significant factors. It is noteworthy that Alzheimer's patients are mostly found in older age groups. Moreover, the combination of socioeconomic status and mental state is intriguing, suggesting that lower socioeconomic status may exacerbate mental stress levels, potentially contributing to dementia.

Pre-requisite libraries

  • pip install pandas
  • pip install scikit-learn
  • pip install matplotlib
  • pip install seaborn
  • pip install numpy

How to install the programs

Open a terminal or power shell
Navigate to the project saved location
Run the commands

Expected output

Lwon_final_project.py file is the mail file that will present charts and graphs.

About

💻⚕️🖊️Predictive modeling for Alzheimer Classification_Supervised Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages