#Heart Disease Prediction Using Machine Learning

## 1. Introduction
Heart disease is one of the leading causes of death worldwide. 
Early prediction and diagnosis can significantly reduce risk and improve survival rates.

This project builds a Machine Learning model to predict whether a patient has heart disease 
based on medical attributes.

---

## 2. Objective
The objectives of this project are:

- Perform data preprocessing
- Conduct exploratory data analysis (EDA)
- Train multiple machine learning models
- Evaluate models using classification metrics
- Generate insights from feature importance

---

## 3. Dataset Description
The dataset contains patient medical attributes including:

- Age
- Gender
- Chest pain type
- Resting blood pressure
- Cholesterol level
- Fasting blood sugar
- Resting ECG results
- Maximum heart rate
- Exercise induced angina
- ST depression
- Slope of ST segment
- Number of major vessels
- Thalassemia

### Target Variable:
- 0 → No Heart Disease
- 1 → Heart Disease Present

---

## 4. Data Preprocessing
The following preprocessing steps were performed:

- Checked for missing values
- Handled missing values using mean/mode if required
- Scaled numerical features using StandardScaler
- Split dataset into training and testing sets (80:20 ratio)

No severe class imbalance was observed.

---

## 5. Exploratory Data Analysis (EDA)

The following visualizations were created:

- Class distribution countplot
- Correlation heatmap
- Age distribution histogram
- Cholesterol vs target boxplot
- Maximum heart rate vs target boxplot

### Observations:
- Chest pain type strongly influences heart disease.
- Higher cholesterol levels are associated with higher risk.
- Maximum heart rate varies significantly between classes.

---

## 6. Model Building

Two models were trained:

### Logistic Regression
Used as baseline linear classification model.

### Random Forest Classifier
Used as ensemble model to improve prediction performance.

---

## 7. Model Evaluation

Evaluation metrics used:

- Accuracy Score
- F1 Score
- Confusion Matrix
- Cross-validation

### Results:

| Model | Accuracy |
|--------|----------|
| Logistic Regression | ~82-85% |
| Random Forest | ~85-90% |

Random Forest performed better than Logistic Regression.

---

## 8. Feature Importance

Feature importance analysis showed the most influential features:

- Chest pain type (cp)
- Cholesterol (chol)
- Maximum heart rate (thalach)
- Number of vessels (ca)
- Thalassemia (thal)

---

## 9. Prediction

The trained model was used to predict heart disease for new unseen patient data.

The model provides:
- Class prediction (0 or 1)
- Probability score

This demonstrates real-world deployment capability.

---

## 10. Insights & Recommendations

- Patients with high cholesterol and abnormal chest pain have higher risk.
- Early screening based on important features can reduce fatality.
- Ensemble models improve prediction performance.

---

## 11. Conclusion

An end-to-end Machine Learning pipeline was successfully implemented.

Random Forest achieved better performance and provided strong feature importance insights.

This model can assist in preliminary medical diagnosis and decision-making support.