In [2]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Indonesia Heart Attack Prediction\n",
    "## Notebook 1: Business Understanding\n",
    "\n",
    "---\n",
    "\n",
    "### Tahap 1 dari Data Science Life Cycle"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Problem Definition\n",
    "\n",
    "### Background\n",
    "Penyakit kardiovaskular, khususnya serangan jantung (heart attack), merupakan salah satu penyebab kematian utama di Indonesia dan dunia. Menurut data WHO, penyakit kardiovaskular menyebabkan sekitar 17.9 juta kematian setiap tahunnya.\n",
    "\n",
    "### Problem Statement\n",
    "**Bagaimana memprediksi risiko serangan jantung pada populasi Indonesia berdasarkan faktor-faktor risiko kesehatan, gaya hidup, dan lingkungan?**\n",
    "\n",
    "### Objectives\n",
    "1. Mengidentifikasi faktor-faktor risiko utama yang berkontribusi terhadap serangan jantung\n",
    "2. Membangun model prediksi yang akurat untuk mengklasifikasikan risiko serangan jantung\n",
    "3. Memberikan insights untuk pencegahan dan deteksi dini\n",
    "4. Membantu sistem kesehatan Indonesia dalam prioritisasi screening dan intervensi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Business Questions\n",
    "\n",
    "Pertanyaan-pertanyaan bisnis yang ingin dijawab:\n",
    "\n",
    "1. **Faktor Risiko:**\n",
    "   - Apa saja faktor risiko utama yang paling berkontribusi terhadap serangan jantung?\n",
    "   - Bagaimana hubungan antara usia, hipertensi, diabetes, dan kolesterol dengan risiko serangan jantung?\n",
    "   - Apakah faktor gaya hidup (merokok, konsumsi alkohol, aktivitas fisik) mempengaruhi risiko?\n",
    "\n",
    "2. **Demografi:**\n",
    "   - Apakah ada perbedaan risiko antara urban dan rural?\n",
    "   - Bagaimana distribusi serangan jantung berdasarkan gender dan kelompok usia?\n",
    "   - Apakah tingkat pendapatan mempengaruhi risiko serangan jantung?\n",
    "\n",
    "3. **Prediksi:**\n",
    "   - Seberapa akurat model dapat memprediksi risiko serangan jantung?\n",
    "   - Model machine learning mana yang paling efektif untuk kasus ini?\n",
    "   - Berapa tingkat false negative dan false positive yang dapat diterima?\n",
    "\n",
    "4. **Impact:**\n",
    "   - Bagaimana hasil prediksi dapat membantu screening kesehatan?\n",
    "   - Siapa saja populasi yang perlu mendapat prioritas screening?\n",
    "   - Bagaimana mengoptimalkan alokasi sumber daya kesehatan?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Success Criteria\n",
    "\n",
    "### Model Performance Metrics:\n",
    "- **Accuracy**: Target minimum 80%\n",
    "- **Recall (Sensitivity)**: Target minimum 85% - penting untuk mendeteksi sebanyak mungkin kasus positif\n",
    "- **Precision**: Target minimum 75% - mengurangi false positives\n",
    "- **F1-Score**: Target minimum 80% - keseimbangan antara precision dan recall\n",
    "- **ROC-AUC**: Target minimum 0.85\n",
    "\n",
    "### Business Impact:\n",
    "- Model dapat mengidentifikasi high-risk individuals dengan akurat\n",
    "- Membantu mengurangi false negatives (missed diagnoses)\n",
    "- Memberikan insights yang actionable untuk prevention programs\n",
    "- Dapat diintegrasikan ke dalam sistem screening kesehatan"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Stakeholders\n",
    "\n",
    "### Primary Stakeholders:\n",
    "1. **Tenaga Medis**: Dokter, perawat, dan healthcare providers\n",
    "2. **Pasien**: Individu yang berisiko terkena serangan jantung\n",
    "3. **Rumah Sakit & Klinik**: Institusi pelayanan kesehatan\n",
    "4. **Kementerian Kesehatan**: Policy makers dan health administrators\n",
    "\n",
    "### Secondary Stakeholders:\n",
    "1. **Asuransi Kesehatan**: BPJS dan asuransi swasta\n",
    "2. **Peneliti Kesehatan**: Academic dan research institutions\n",
    "3. **NGO Kesehatan**: Organisasi yang fokus pada kardiovaskular\n",
    "4. **Masyarakat Umum**: Public awareness dan education"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Data Requirements\n",
    "\n",
    "### Required Data:\n",
    "\n",
    "#### 1. Demographics\n",
    "- Age (usia)\n",
    "- Gender (jenis kelamin)\n",
    "- Region (urban/rural)\n",
    "- Income level (tingkat pendapatan)\n",
    "\n",
    "#### 2. Clinical Risk Factors\n",
    "- Hypertension (hipertensi)\n",
    "- Diabetes\n",
    "- Cholesterol levels (kadar kolesterol)\n",
    "- Obesity (obesitas)\n",
    "- Waist circumference\n",
    "- Family history of heart disease\n",
    "\n",
    "#### 3. Lifestyle Factors\n",
    "- Smoking status\n",
    "- Alcohol consumption\n",
    "- Physical activity level\n",
    "- Dietary habits\n",
    "\n",
    "#### 4. Environmental Factors\n",
    "- Air pollution exposure\n",
    "- Stress level\n",
    "- Sleep hours\n",
    "\n",
    "#### 5. Medical Screening Data\n",
    "- Blood pressure (systolic & diastolic)\n",
    "- Fasting blood sugar\n",
    "- HDL & LDL cholesterol\n",
    "- Triglycerides\n",
    "- EKG results\n",
    "- Previous heart disease\n",
    "- Medication usage\n",
    "- Participation in free screening programs\n",
    "\n",
    "#### 6. Target Variable\n",
    "- Heart attack occurrence (Yes/No)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Constraints & Assumptions\n",
    "\n",
    "### Constraints:\n",
    "1. **Data Availability**: Dataset terbatas pada 500 sampel\n",
    "2. **Time**: Project harus diselesaikan dalam timeframe akademik\n",
    "3. **Resources**: Computational resources terbatas\n",
    "4. **Privacy**: Data harus dijaga kerahasiaannya\n",
    "\n",
    "### Assumptions:\n",
    "1. Data yang dikumpulkan representatif dari populasi Indonesia\n",
    "2. Faktor-faktor risiko yang diukur memiliki pengaruh terhadap serangan jantung\n",
    "3. Model yang dibangun dapat digeneralisasi untuk populasi yang lebih luas\n",
    "4. Data self-reported (seperti gaya hidup) akurat dan jujur\n",
    "5. Medical screening dilakukan dengan standar yang konsisten"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Expected Outcomes\n",
    "\n",
    "### Technical Outcomes:\n",
    "1. Machine learning model yang dapat memprediksi risiko serangan jantung dengan akurasi tinggi\n",
    "2. Identifikasi faktor-faktor risiko yang paling signifikan\n",
    "3. Visualisasi yang informatif untuk komunikasi hasil\n",
    "4. Web application untuk prediksi interaktif\n",
    "\n",
    "### Business Outcomes:\n",
    "1. **Early Detection**: Sistem dapat membantu deteksi dini individu berisiko tinggi\n",
    "2. **Prevention**: Insights untuk program pencegahan yang targeted\n",
    "3. **Resource Optimization**: Prioritisasi screening dan alokasi sumber daya yang lebih efisien\n",
    "4. **Cost Reduction**: Mengurangi biaya treatment dengan prevention\n",
    "5. **Public Health Impact**: Kontribusi pada pengurangan mortality rate dari serangan jantung"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Project Scope\n",
    "\n",
    "### In Scope:\n",
    "- Data cleaning dan preprocessing\n",
    "- Exploratory data analysis\n",
    "- Feature engineering\n",
    "- Model development (Logistic Regression, Decision Tree, KNN)\n",
    "- Model evaluation dan comparison\n",
    "- Visualization dan reporting\n",
    "- Web application development\n",
    "\n",
    "### Out of Scope:\n",
    "- Real-time data collection\n",
    "- Integration dengan hospital systems\n",
    "- Clinical validation\n",
    "- Deep learning models (karena keterbatasan data)\n",
    "- Mobile application development"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Risk Analysis\n",
    "\n",
    "### Technical Risks:\n",
    "1. **Data Quality**: Missing values, outliers, atau data tidak akurat\n",
    "   - *Mitigation*: Comprehensive data cleaning dan validation\n",
    "\n",
    "2. **Model Overfitting**: Model terlalu fit ke training data\n",
    "   - *Mitigation*: Cross-validation, regularization\n",
    "\n",
    "3. **Class Imbalance**: Ketidakseimbangan jumlah kasus positif dan negatif\n",
    "   - *Mitigation*: Resampling techniques, class weights\n",
    "\n",
    "### Business Risks:\n",
    "1. **False Negatives**: Missed diagnoses yang berbahaya\n",
    "   - *Mitigation*: Optimize untuk high recall\n",
    "\n",
    "2. **False Positives**: Unnecessary anxiety dan costs\n",
    "   - *Mitigation*: Balance precision dengan recall\n",
    "\n",
    "3. **Misinterpretation**: Hasil model disalahgunakan\n",
    "   - *Mitigation*: Clear documentation dan disclaimers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. Timeline & Milestones\n",
    "\n",
    "### Week 12-13: Data Science Life Cycle Implementation\n",
    "- Week 12 (Day 1-2): Business Understanding (✓)\n",
    "- Week 12 (Day 3-4): Data Mining & Cleaning\n",
    "- Week 12 (Day 5-7): Data Exploration & Feature Engineering\n",
    "- Week 13 (Day 1-3): Predictive Modeling\n",
    "- Week 13 (Day 4-5): Data Visualization & Insights\n",
    "\n",
    "### Week 14: Web Application Development\n",
    "- Frontend development\n",
    "- Backend integration\n",
    "- Testing\n",
    "\n",
    "### Week 15: Presentation Preparation\n",
    "- Documentation\n",
    "- Presentation slides\n",
    "- Demo preparation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "Pada tahap Business Understanding ini, kita telah:\n",
    "\n",
    "1. ✅ Mendefinisikan problem statement dengan jelas\n",
    "2. ✅ Menetapkan objectives yang spesifik dan terukur\n",
    "3. ✅ Mengidentifikasi business questions yang perlu dijawab\n",
    "4. ✅ Menentukan success criteria untuk model\n",
    "5. ✅ Mengidentifikasi stakeholders utama\n",
    "6. ✅ Mendefinisikan data requirements\n",
    "7. ✅ Mengidentifikasi constraints, assumptions, dan risks\n",
    "8. ✅ Menetapkan expected outcomes dan project scope\n",
    "9. ✅ Membuat timeline dan milestones\n",
    "\n",
    "### Next Steps:\n",
    "Lanjut ke **Notebook 2: Data Mining** untuk mengumpulkan dan mengekstrak data yang diperlukan.\n",
    "\n",
    "---\n",
    "\n",
    "**Note**: Notebook ini merupakan bagian dari project Indonesia Heart Attack Prediction untuk mata kuliah Artificial Intelligence."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

{'cells': [{'cell_type': 'markdown',
   'metadata': {},
   'source': ['# Indonesia Heart Attack Prediction\n',
    '## Notebook 1: Business Understanding\n',
    '\n',
    '---\n',
    '\n',
    '### Tahap 1 dari Data Science Life Cycle']},
  {'cell_type': 'markdown',
   'metadata': {},
   'source': ['## 1. Problem Definition\n',
    '\n',
    '### Background\n',
    'Penyakit kardiovaskular, khususnya serangan jantung (heart attack), merupakan salah satu penyebab kematian utama di Indonesia dan dunia. Menurut data WHO, penyakit kardiovaskular menyebabkan sekitar 17.9 juta kematian setiap tahunnya.\n',
    '\n',
    '### Problem Statement\n',
    '**Bagaimana memprediksi risiko serangan jantung pada populasi Indonesia berdasarkan faktor-faktor risiko kesehatan, gaya hidup, dan lingkungan?**\n',
    '\n',
    '### Objectives\n',
    '1. Mengidentifikasi faktor-faktor risiko utama yang berkontribusi terhadap serangan jantung\n',
    '2. Membangun model prediksi yang akurat untuk mengklas