In [1]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Indonesia Heart Attack Prediction\n",
    "## Notebook 7: Data Visualization\n",
    "\n",
    "---\n",
    "### Tahap 7 dari Data Science Life Cycle\n",
    "Visualisasi untuk EDA lanjutan dan interpretasi model: distribusi fitur, korelasi, confusion matrix, ROC curve, dan feature importance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Import library dan load data serta model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.metrics import roc_curve, auc, confusion_matrix, ConfusionMatrixDisplay\n",
    "import joblib\n",
    "\n",
    "df = pd.read_csv('../data/heart.csv')\n",
    "target_col = 'target'\n",
    "X = df.drop(columns=[target_col])\n",
    "y = df[target_col]\n",
    "\n",
    "# Load trained model jika tersedia\n",
    "try:\n",
    "    model = joblib.load('../models/best_model.pkl')\n",
    "    print('Loaded model ../models/best_model.pkl')\n",
    "except Exception as e:\n",
    "    model = None\n",
    "    print('Model not found:', e)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Distribusi fitur numerik"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_cols = ['age','trestbps','chol','thalach','oldpeak']\n",
    "df[num_cols].hist(bins=20, figsize=(12,8))\n",
    "plt.suptitle('Distribusi fitur numerik')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Heatmap korelasi"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "plt.figure(figsize=(10,8))\n",
    "corr = df.corr()\n",
    "sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm')\n",
    "plt.title('Korelasi antar fitur')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Confusion Matrix dan ROC Curve\n",
    "Jika model telah disimpan, tampilkan confusion matrix dan ROC curve pada data test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)\n",
    "\n",
    "if model is not None:\n",
    "    y_pred = model.predict(X_test)\n",
    "    if hasattr(model, 'predict_proba'):\n",
    "        y_proba = model.predict_proba(X_test)[:,1]\n",
    "    else:\n",
    "        # jika classifier tidak support predict_proba\n",
    "        y_proba = model.decision_function(X_test)\n",
    "\n",
    "    # Confusion matrix\n",
    "    cm = confusion_matrix(y_test, y_pred)\n",
    "    disp = ConfusionMatrixDisplay(confusion_matrix=cm)\n",
    "    disp.plot()\n",
    "    plt.title('Confusion Matrix')\n",
    "    plt.show()\n",
    "\n",
    "    # ROC Curve\n",
    "    fpr, tpr, _ = roc_curve(y_test, y_proba)\n",
    "    roc_auc = auc(fpr, tpr)\n",
    "    plt.figure()\n",
    "    plt.plot(fpr, tpr, label=f'ROC curve (area = {roc_auc:.3f})')\n",
    "    plt.plot([0,1],[0,1],'--')\n",
    "    plt.xlabel('False Positive Rate')\n",
    "    plt.ylabel('True Positive Rate')\n",
    "    plt.title('Receiver Operating Characteristic')\n",
    "    plt.legend(loc='lower right')\n",
    "    plt.show()\n",
    "else:\n",
    "    print('Model belum tersedia. Jalankan notebook modeling untuk membuat dan menyimpan model terbaik.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Feature importance (untuk tree-based models)\n",
    "Jika model berbasis tree (RandomForest, XGBoost), tampilkan importance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "if model is not None:\n",
    "    # ambil feature names setelah preprocessor jika pipeline\n",
    "    try:\n",
    "        # Jika pipeline [preprocessor, clf]\n",
    "        feature_names = None\n",
    "        pre = model.named_steps.get('preprocessor', None)\n",
    "        clf = model.named_steps.get('clf', None)\n",
    "        if pre is not None:\n",
    "            # Cara ekstrak nama fitur untuk OneHotEncoder\n",
    "            # Ini contoh sederhana dan mungkin perlu disesuaikan\n",
    "            num_features = pre.transformers_[0][2]\n",
    "            cat_transformer = pre.transformers_[1][1]\n",
    "            # Jika OneHotEncoder digunakan:\n",
    "            try:\n",
    "                ohe = cat_transformer.named_steps['encoder']\n",
    "                ohe_names = ohe.get_feature_names_out(pre.transformers_[1][2])\n",
    "                feature_names = list(num_features) + list(ohe_names)\n",
    "            except Exception:\n",
    "                feature_names = list(num_features) + list(pre.transformers_[1][2])\n",
    "        else:\n",
    "            # jika model langsung dilatih tanpa pipeline\n",
    "            feature_names = X.columns\n",
    "\n",
    "        if hasattr(clf, 'feature_importances_'):\n",
    "            importances = clf.feature_importances_\n",
    "            fi = pd.DataFrame({'feature': feature_names, 'importance': importances}).sort_values(by='importance', ascending=False)\n",
    "            plt.figure(figsize=(8,6))\n",
    "            sns.barplot(x='importance', y='feature', data=fi.head(15))\n",
    "            plt.title('Top 15 Feature Importance')\n",
    "            plt.tight_layout()\n",
    "            plt.show()\n",
    "        else:\n",
    "            print('Model tidak memiliki attribute feature_importances_')\n",
    "    except Exception as e:\n",
    "        print('Gagal mengekstrak feature importance secara otomatis. Error:', e)\n",
    "else:\n",
    "    print('Model belum tersedia.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "- Visualisasi distribusi fitur dan korelasi membantu memahami hubungan antar variabel.\n",
    "- Confusion matrix dan ROC curve digunakan untuk menilai performa model pada data test.\n",
    "- Feature importance berguna untuk interpretabilitas model, khususnya model tree-based.\n",
    "### Next Steps:\n",
    "- Dokumentasikan insight dan masukan untuk model (misal fitur yang sangat penting), serta siapkan visual untuk laporan dan presentasi."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

NameError: name 'null' is not defined