# Heart Disease ML Pipeline (UCI) — Notebooks

These notebooks implement a full pipeline on the **UCI Heart Disease** dataset using your requested loader:

```python
from ucimlrepo import fetch_ucirepo
heart_disease = fetch_ucirepo(id=45)
X = heart_disease.data.features
y = heart_disease.data.targets
```

> Bonus items (Streamlit/Ngrok) are intentionally **omitted** per the request.

## 02 — PCA (Dimensionality Reduction)

- Load processed features from `data/`
- Fit PCA, plot explained variance & cumulative variance
- Save PCA components to `results/`

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# Load processed arrays
X_train = np.load("../data/X_train.npy")
X_test  = np.load("../data/X_test.npy")

# PCA
pca = PCA(n_components=None, random_state=42)
pca.fit(X_train)

explained = pca.explained_variance_ratio_
cum = explained.cumsum()

print("Total components:", len(explained))
print("Components to keep ~95% variance:", (cum < 0.95).sum() + 1)

# Plots
plt.figure()
plt.plot(explained, marker='o')
plt.title("Explained Variance Ratio per Component")
plt.xlabel("Component")
plt.ylabel("Explained Variance Ratio")
plt.savefig("../results/pca_explained_variance.png", dpi=150)
plt.close()

plt.figure()
plt.plot(cum, marker='o')
plt.title("Cumulative Explained Variance")
plt.xlabel("Component")
plt.ylabel("Cumulative Explained Variance")
plt.ylim(0,1.01)
plt.savefig("../results/pca_cumulative_variance.png", dpi=150)
plt.close()

# Save PCA model
import joblib
joblib.dump(pca, "../models/pca.pkl")

# Transform and save reduced data for optional downstream experiments
X_train_pca = pca.transform(X_train)
X_test_pca  = pca.transform(X_test)
np.save("../data/X_train_pca.npy", X_train_pca)
np.save("../data/X_test_pca.npy", X_test_pca)

pd.DataFrame({
    "component": np.arange(1, len(explained)+1),
    "explained_variance_ratio": explained,
    "cumulative_variance": cum
}).to_csv("../results/pca_variance_table.csv", index=False)

print("PCA saved to ../models, plots & table saved to ../results, transformed arrays to ../data.")

Total components: 13
Components to keep ~95% variance: 12
PCA saved to ../models, plots & table saved to ../results, transformed arrays to ../data.
