**Introduction**

Alzheimer's disease is a neurodegenerative disease that affects a person's cognitive abilities. Early diagnosis can help improve patients' quality of life. In this project, we analyze the data and build a machine learning model to predict the diagnosis.

# 1. Downloading the necessary libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier

from sklearn.metrics import ConfusionMatrixDisplay, classification_report, accuracy_score

# 2. Loading and analyzing data

In [None]:
df = pd.read_csv('/Users/riteshkumar/Downloads/ML projects/Predicting Alzheimer Disease /alzheimers_prediction_dataset.csv')

In [None]:
df.shape

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
df.isna().sum()

In [None]:
df.head()

In [None]:
df['Alzheimer’s Diagnosis'].value_counts()

# 3. Data visualization

In [None]:
sns.countplot(data=df, x='Alzheimer’s Diagnosis');

In [None]:
sns.scatterplot(data=df, x='Age', y='BMI', hue='Alzheimer’s Diagnosis'); 

Here we can see that Alzheimer's is not strongly related to BMI, but there is a strong correlation between Alzheimer's and age

# 4. Data preparation

In [None]:
X = df.drop('Alzheimer’s Diagnosis', axis=1)

In [None]:
X = pd.get_dummies(X)

In [None]:
y = df['Alzheimer’s Diagnosis']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=101)

In [None]:
scaler = StandardScaler()

In [None]:
scaled_X_train = scaler.fit_transform(X_train)

In [None]:
scaled_X_test = scaler.transform(X_test)

# 5. Model training
## All hyperparameters for the models were calculated using GridSearchCV

## 5.1. LogisticRegression

In [None]:
model = LogisticRegression(C=0.1, max_iter=100, solver='newton-cg')
model.fit(scaled_X_train, y_train)
y_pred = model.predict(scaled_X_test)
print(classification_report(y_test, y_pred))
print(f'Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%')
ConfusionMatrixDisplay.from_estimator(model, scaled_X_test, y_test);

## 5.2. RandomForestClassifier

In [None]:
model = RandomForestClassifier(max_depth=20, min_samples_leaf=1, min_samples_split=5, n_estimators=200, random_state=101)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
print(f'Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%')
ConfusionMatrixDisplay.from_estimator(model, X_test, y_test);

## 5.3. GradientBoostingClassifier

In [None]:
model = GradientBoostingClassifier(learning_rate=0.1, max_depth=4, min_samples_leaf=3, min_samples_split=5, n_estimators=100, subsample=0.8, random_state=101)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
print(f'Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%')
ConfusionMatrixDisplay.from_estimator(model, X_test, y_test);

**Conclusion**

Based on the accuracy, we can conclude that GradientBoostingClassifier (72.63%) is the best choice. It is possible to find better hyperparameters for the models, but this significantly increases the time it takes to find them