# Heart Failure Survival Prediction Project

## Project Overview
This project focuses on predicting the survival outcomes of heart failure patients using clinical and demographic attributes. Heart disease is one of the leading causes of death worldwide, and understanding the factors that influence survival rates is critical for early intervention and improved patient outcomes.

The goal of this project is to apply **machine learning techniques** to build and compare predictive models that can forecast whether a heart failure patient will survive or not. In addition to building accurate models, this project aims to identify the key clinical and demographic features that contribute most to survival predictions. 

## Dataset Information
The dataset used for this project comes from the **[UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/519/heart+failure+clinical+records)**. It contains 299 records of patients who experienced heart failure, with 13 clinical and demographic features, along with the target variable `DEATH_EVENT`, which indicates whether the patient survived (0) or died (1).

### Key Features
- **Age**: Age of the patient
- **Anaemia**: Decrease of red blood cells or hemoglobin (Boolean)
- **Creatinine Phosphokinase (CPK)**: Level of CPK enzyme in blood (mcg/L)
- **Diabetes**: Whether the patient has diabetes (Boolean)
- **Ejection Fraction**: Percentage of blood leaving the heart at each contraction
- **High Blood Pressure**: Whether the patient has hypertension (Boolean)
- **Platelets**: Platelet count (kiloplatelets/mL)
- **Serum Creatinine**: Level of serum creatinine (mg/dL)
- **Serum Sodium**: Level of serum sodium (mEq/L)
- **Sex**: Male (1) or Female (0)
- **Smoking**: Whether the patient is a smoker (Boolean)
- **Time**: Follow-up period (days)

### Target Variable
- **DEATH_EVENT**: 
    - 1 = Patient died
    - 0 = Patient survived

---

## Project Objectives
This project will cover:
- **Exploratory Data Analysis (EDA)** to understand data distributions, trends, and relationships between features and patient outcomes.
- **Data Preprocessing & Feature Engineering** to clean, transform, and prepare the data for modeling.
- **Machine Learning Model Development** where multiple models will be trained and evaluated, including:
    - Logistic Regression
    - Decision Tree
    - Random Forest
    - Support Vector Machine (SVM)
    - Artificial Neural Network (ANN)
- **Model Comparison & Performance Evaluation** to select the best-performing model.
- **Feature Importance Analysis** to identify which features contribute most to survival predictions.
- **Conclusion & Key Insights** to summarize findings and offer actionable insights for healthcare professionals.

---

## Tools & Libraries
The following tools and libraries will be used in this project:
- Python
- Pandas
- NumPy
- Matplotlib & Seaborn
- Scikit-learn
- TensorFlow/Keras (for ANN)

---


## Feature Dictionary

| Feature | Description |
|---|---|
| **age** | Patient's age in years. Older patients are typically at higher risk. |
| **anaemia** | Whether the patient has low red blood cells (1 = Yes, 0 = No). Anaemia can worsen heart failure outcomes. |
| **creatinine_phosphokinase** | Level of CPK enzyme in the blood (mcg/L). High levels may indicate heart muscle damage. |
| **diabetes** | Whether the patient has diabetes (1 = Yes, 0 = No). Diabetes increases heart failure risk. |
| **ejection_fraction** | Percentage of blood leaving the heart at each contraction. Low values indicate poor heart function. |
| **high_blood_pressure** | Whether the patient has hypertension (1 = Yes, 0 = No). High BP adds stress to the heart. |
| **platelets** | Platelet count (kiloplatelets/mL). May indicate blood clotting ability or potential abnormalities. |
| **serum_creatinine** | Level of creatinine in the blood (mg/dL). High levels indicate potential kidney problems. |
| **serum_sodium** | Level of sodium in the blood (mEq/L). Low levels indicate fluid retention and severe heart failure. |
| **sex** | Patient's gender (1 = Male, 0 = Female). Gender can influence survival rates. |
| **smoking** | Whether the patient smokes (1 = Yes, 0 = No). Smoking worsens heart and blood vessel health. |
| **time** | Follow-up period (days). Shorter times may indicate early death. |
| **DEATH_EVENT** | Target variable (1 = Died, 0 = Survived). |
