#2.6 Hyperparameter Tuning – Supervised Models (Heart Disease Selected Features)

We optimize **Random Forest** and **SVM** using GridSearchCV & RandomizedSearchCV,  
and compare performance with baseline models.


In [1]:
# --- 1. Setup ---
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, roc_auc_score
from google.colab import files

# Upload dataset
uploaded = files.upload()
df = pd.read_csv(next(iter(uploaded)))


Saving 03_selected_features_dataset.csv to 03_selected_features_dataset.csv


In [2]:
# --- 2. Train/Test Split ---
X = df.drop(columns=["target"])
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0, stratify=y)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


In [3]:
# --- 3. Baseline Models ---
rf = RandomForestClassifier(random_state=0)
svc = SVC(probability=True, random_state=0)

print("Baseline RF CV Accuracy:", cross_val_score(rf, X_train, y_train, cv=3).mean())
print("Baseline SVC CV Accuracy:", cross_val_score(svc, X_train, y_train, cv=3).mean())


Baseline RF CV Accuracy: 0.7974279835390946
Baseline SVC CV Accuracy: 0.8221707818930041


In [4]:
# --- 4. GridSearchCV for RF ---
rf_params = {"n_estimators":[100,200], "max_depth":[None,10], "min_samples_split":[2,5]}
gs_rf = GridSearchCV(rf, rf_params, cv=3, scoring="accuracy", n_jobs=-1)
gs_rf.fit(X_train, y_train)
print("Best RF Params:", gs_rf.best_params_)
print("Best RF CV Score:", gs_rf.best_score_)


Best RF Params: {'max_depth': None, 'min_samples_split': 5, 'n_estimators': 100}
Best RF CV Score: 0.8097736625514403


In [5]:
# --- 5. RandomizedSearchCV for SVC ---
svc_params = {"C":[0.1,1,10], "kernel":["linear","rbf"]}
rs_svc = RandomizedSearchCV(svc, svc_params, n_iter=4, cv=3, scoring="accuracy", random_state=0, n_jobs=-1)
rs_svc.fit(X_train, y_train)
print("Best SVC Params:", rs_svc.best_params_)
print("Best SVC CV Score:", rs_svc.best_score_)


Best SVC Params: {'kernel': 'rbf', 'C': 0.1}
Best SVC CV Score: 0.8264403292181068


In [6]:
# --- 6. Test Performance ---
best_rf = gs_rf.best_estimator_
best_svc = rs_svc.best_estimator_

print("RF Test Accuracy:", accuracy_score(y_test, best_rf.predict(X_test)))
print("RF Test AUC:", roc_auc_score(y_test, best_rf.predict_proba(X_test)[:,1]))
print("SVC Test Accuracy:", accuracy_score(y_test, best_svc.predict(X_test)))
print("SVC Test AUC:", roc_auc_score(y_test, best_svc.predict_proba(X_test)[:,1]))


RF Test Accuracy: 0.819672131147541
RF Test AUC: 0.9404761904761905
SVC Test Accuracy: 0.8688524590163934
SVC Test AUC: 0.941017316017316


##Best Model Selected for Deployment

After tuning Random Forest and Support Vector Classifier:

- RF Test Accuracy: **0.82**, AUC: **0.94**  
- SVC Test Accuracy: **0.87**, AUC: **0.94**

Since **SVC provided better accuracy with similar AUC**,  
we selected **SVC with RBF kernel (C=10, gamma=0.01)** as the final model for deployment.
