# Heart Disease ML Pipeline (UCI) — Notebooks

These notebooks implement a full pipeline on the **UCI Heart Disease** dataset using your requested loader:

```python
from ucimlrepo import fetch_ucirepo
heart_disease = fetch_ucirepo(id=45)
X = heart_disease.data.features
y = heart_disease.data.targets
```

> Bonus items (Streamlit/Ngrok) are intentionally **omitted** per the request.

## 05 — Unsupervised Learning (Clustering)

- K-Means (elbow method)
- Agglomerative (hierarchical) clustering + dendrogram
- Optional: Compare cluster labels with true labels (ARI, NMI)

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.metrics import adjusted_rand_score, normalized_mutual_info_score
from scipy.cluster.hierarchy import dendrogram, linkage

# Load processed arrays
X_train = np.load("../data/X_train.npy")
X_test  = np.load("../data/X_test.npy")
y_train = np.load("../data/y_train.npy")
y_test  = np.load("../data/y_test.npy")

X = np.concatenate([X_train, X_test], axis=0)
y = np.concatenate([y_train, y_test], axis=0)

# KMeans elbow
inertias = []
K = range(1, 11)
for k in K:
    km = KMeans(n_clusters=k, n_init=10, random_state=42)
    km.fit(X)
    inertias.append(km.inertia_)

plt.figure()
plt.plot(list(K), inertias, marker='o')
plt.title("K-Means Elbow Method")
plt.xlabel("k")
plt.ylabel("Inertia")
plt.savefig("../results/kmeans_elbow.png", dpi=150)
plt.close()

# Choose a default k=2 (presence/absence of disease) but this is exploratory
k = 2
km2 = KMeans(n_clusters=k, n_init=10, random_state=42)
clusters_km = km2.fit_predict(X)

# Hierarchical
Z = linkage(X[:500], method='ward')  # sample to speed up plots if large
plt.figure(figsize=(10, 5))
dendrogram(Z, truncate_mode="level", p=5)
plt.title("Hierarchical Clustering Dendrogram (truncated)")
plt.xlabel("Samples")
plt.ylabel("Distance")
plt.tight_layout()
plt.savefig("../results/hierarchical_dendrogram.png", dpi=150)
plt.close()

# Compare with ground truth (optional, since clustering is unsupervised)
ari = adjusted_rand_score(y, clusters_km)
nmi = normalized_mutual_info_score(y, clusters_km)

with open("../results/clustering_report.txt", "w") as f:
    f.write(f"KMeans k={k} -> ARI={ari:.3f}, NMI={nmi:.3f}\n")

print("Saved elbow plot, dendrogram, and clustering report.")

Saved elbow plot, dendrogram, and clustering report.
