## ðŸ«€ Project Overview
This notebook compares the performance of three supervised machine learning modelsâ€”**Logistic Regression**, **Support Vector Machine (SVM)**, and **Decision Tree Classifier**â€”in predicting heart disease.

The process includes:
- Uploading and preprocessing the dataset
- Training each model on the data
- Evaluating performance using metrics like **accuracy**, **precision**, **recall**, **F1 score**, and **ROC AUC score**

This comparative analysis helps identify the most suitable model for clinical heart disease risk prediction.

# Heart Disease Risk Prediction: Logistic Regression, SVM, Decision Tree

In [None]:
# Install & import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, classification_report, confusion_matrix

In [None]:
# Upload the dataset
from google.colab import files
uploaded = files.upload()

Saving archive (3).zip to archive (3).zip


In [None]:
import zipfile

with zipfile.ZipFile("archive (3).zip", 'r') as zip_ref:
    zip_ref.extractall()

In [None]:
import os
os.listdir()

['.config', 'archive (3).zip', 'heart_cleveland_upload.csv', 'sample_data']

In [None]:
df = pd.read_csv('heart_cleveland_upload.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,condition
0,69,1,0,160,234,1,2,131,0,0.1,1,1,0,0
1,69,0,0,140,239,0,0,151,0,1.8,0,2,0,0
2,66,0,0,150,226,0,0,114,0,2.6,2,0,0,0
3,65,1,0,138,282,1,2,174,0,1.4,1,1,0,1
4,64,1,0,110,211,0,2,144,1,1.8,1,0,0,0


In [None]:
# Preprocess the data
X = df.drop('condition', axis=1)
y = df['condition']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
# Train the models
lr = LogisticRegression().fit(X_train_scaled, y_train)
svm = SVC(probability=True).fit(X_train_scaled, y_train)
dt = DecisionTreeClassifier().fit(X_train, y_train)

In [None]:
# Evaluate models
models = {'Logistic Regression': lr, 'SVM': svm, 'Decision Tree': dt}
for name, model in models.items():
    if name == 'Decision Tree':
        preds = model.predict(X_test)
        proba = model.predict_proba(X_test)[:, 1]
    else:
        preds = model.predict(X_test_scaled)
        proba = model.predict_proba(X_test_scaled)[:, 1]
    print(f"\n{name} Metrics:")
    print("Accuracy:", accuracy_score(y_test, preds))
    print("Precision:", precision_score(y_test, preds))
    print("Recall:", recall_score(y_test, preds))
    print("F1 Score:", f1_score(y_test, preds))
    print("ROC AUC Score:", roc_auc_score(y_test, proba))


Logistic Regression Metrics:
Accuracy: 0.7333333333333333
Precision: 0.7
Recall: 0.75
F1 Score: 0.7241379310344828
ROC AUC Score: 0.8415178571428571

SVM Metrics:
Accuracy: 0.7333333333333333
Precision: 0.6875
Recall: 0.7857142857142857
F1 Score: 0.7333333333333333
ROC AUC Score: 0.8337053571428572

Decision Tree Metrics:
Accuracy: 0.7666666666666667
Precision: 0.7333333333333333
Recall: 0.7857142857142857
F1 Score: 0.7586206896551724
ROC AUC Score: 0.7678571428571428
