**Demented vs. Nondemented Classification**
---



**Project Overview**

This project involves developing a Machine Learning model to classify patients as demented or non-demented. It is conducted independently using data from the [MRI and Alzheimer's Kaggle Dataset](https://www.kaggle.com/datasets/jboysen/mri-and-alzheimers?select=oasis_longitudinal.csv).

  **Objectives** :  The primary goal for this project is to understand each step in the machine learning development processes, creating a structured guide for developing and deploying the demented vs non-demented classification model. Key steps covered in this project are as follows:
  - `Data Collection`: Gathering and organizing relevant data for model development and training
  - `Data Preprocessing`: Cleaning and preparation stage before proceeding to the analysis stage.
  - `Exploratory Data Analysis (EDA)`: Analyzing the prepared data through visualization and identifying correlations or patterns within variables.
  - `Feature Engineering`: Feature selection to improve efficiency and model accuracy.
  - `Model Selection`: Testing and choosing the most suitable model for classification.
  - `Model Training`: Training the selected model to learn patterns.
  - `Model Evaluation`: Evaluating and giving assesment on model accuracy and effectiveness using several evaluation metrics.
  - `Model Deployment`: Deploying the model to make prediction in a production setting.

## **A. Importing Libraries**

The libraries used in this inference are as follows:

In [1]:
# Import libraries
import pandas as pd
import pickle

With functions:

- **Pandas**: Data manipulation
- **Pickle**: Loading machine learning model

## **B. Data Loading**

First, we will load the model using `pickle`:

In [2]:
with open('dementia_model.pkl','rb') as file:
  model = pickle.load(file)

For the testing data, we will convert the following new set of data into a `pandas` DataFrame:

| Subject ID | Visit | MR Delay | M/F | Hand | Age | EDUC | SES | MMSE | CDR | eTIV | nWBV  | ASF   | Group    |
|------------|-------|----------|-----|------|-----|------|-----|------|-----|------|-------|-------|----------|
| OAS2_0002  | 3     | 1895     | M   | R    | 80  | 12   |     | 22   | 0.5 | 1698 | 0.701 | 1.034 | Demented |
| OAS2_0009  | 2     | 576      | M   | R    | 69  | 12   | 2   | 24   | 0.5 | 1480 | 0.791 | 1.186 | Demented |
| OAS2_0017  | 5     | 2400     | M   | R    | 86  | 12   | 3   | 27   | 0   | 1813 | 0.761 | 0.968 | Nondemented |
| OAS2_0045  | 1     | 0        | F   | R    | 75  | 18   | 1   | 30   | 0   | 1317 | 0.737 | 1.332 | Nondemented |
| OAS2_0056  | 2     | 622      | F   | R    | 73  | 14   | 2   | 30   | 0   | 1456 | 0.739 | 1.205 | Nondemented |
| OAS2_0054  | 2     | 846      | F   | R    | 87  | 18   | 1   | 24   | 0.5 | 1275 | 0.683 | 1.376 | Demented |



In [3]:
new_data_test = {
    'Subject ID': ['OAS2_0002', 'OAS2_0009', 'OAS2_0017', 'OAS2_0045', 'OAS2_0056', 'OAS2_0054'],
    'Visit': [3, 2, 5, 1, 2, 2],
    'MR Delay': [1895, 576, 2400, 0, 622, 846],
    'M/F': ['M', 'M', 'M', 'F', 'F', 'F'],
    'Hand': ['R', 'R', 'R', 'R', 'R', 'R'],
    'Age': [80, 69, 86, 75, 73, 87],
    'EDUC': [12, 12, 12, 18, 14, 18],
    'SES': [None, 2, 3, 1, 2, 1],
    'MMSE': [22, 24, 27, 30, 30, 24],
    'CDR': [0.5, 0.5, 0, 0, 0, 0.5],
    'eTIV': [1698, 1480, 1813, 1317, 1456, 1275],
    'nWBV': [0.701, 0.791, 0.761, 0.737, 0.739, 0.683],
    'ASF': [1.034, 1.186, 0.968, 1.332, 1.205, 1.376]
}



test_df = pd.DataFrame(new_data_test)
test_df

Unnamed: 0,Subject ID,Visit,MR Delay,M/F,Hand,Age,EDUC,SES,MMSE,CDR,eTIV,nWBV,ASF
0,OAS2_0002,3,1895,M,R,80,12,,22,0.5,1698,0.701,1.034
1,OAS2_0009,2,576,M,R,69,12,2.0,24,0.5,1480,0.791,1.186
2,OAS2_0017,5,2400,M,R,86,12,3.0,27,0.0,1813,0.761,0.968
3,OAS2_0045,1,0,F,R,75,18,1.0,30,0.0,1317,0.737,1.332
4,OAS2_0056,2,622,F,R,73,14,2.0,30,0.0,1456,0.739,1.205
5,OAS2_0054,2,846,F,R,87,18,1.0,24,0.5,1275,0.683,1.376


## **C. Model Prediction**

For the prediction itself, now that we have loaded the model, we can directly proceed with the prediction. Typically, the data would need to be pre-processed first. However, during the model development process, we implemented the pipeline technique, which included the pre-processing step. Therefore, there is no need to pre-process the data again in this case.

In [4]:
# Predict the new dataset test
y_new_test = model.predict(test_df)

# Obtain the probability
y_proba = model.predict_proba(test_df)
probability_percentage = y_proba*100

The data has been predicted, and the probability of the results being correctly predicted has also been obtained. Next, we will display the prediction results.

In [5]:
# Iterate through each data points
for i, score in enumerate(y_new_test):
  if score == 0:            # To predict Demented class
    prediction = 'Demented'
    print(f"Prediction for subject {test_df.loc[i,'Subject ID']}: {prediction} with a probability of {probability_percentage[i,0]:.2f}%")
  if score == 1:            # To predict Non-Demented class
    prediction = 'Non-Demented'
    print(f"Prediction for subject {test_df.loc[i,'Subject ID']}: {prediction} with a probability of {probability_percentage[i,1]:.2f}%")

Prediction for subject OAS2_0002: Demented with a probability of 91.96%
Prediction for subject OAS2_0009: Demented with a probability of 91.94%
Prediction for subject OAS2_0017: Non-Demented with a probability of 97.37%
Prediction for subject OAS2_0045: Non-Demented with a probability of 97.37%
Prediction for subject OAS2_0056: Non-Demented with a probability of 97.37%
Prediction for subject OAS2_0054: Demented with a probability of 91.96%


## **D. Conclusion**

To conclude, the model has accurately predicted all six data points, which included three data points in the Demented class and three data points in the Non-Demented class.